zea.io_lib

Input / output functions for reading and writing files.

Use to quickly read and write files or interact with file system.

Functions

load_image(filename[, grayscale, color_order])

Load an image file and return a numpy array.

load_video(filename)

Load a video file and return a numpy array of frames.

matplotlib_figure_to_numpy(fig)

Convert matplotlib figure to numpy array.

retry_on_io_error([max_retries, ...])

Decorator to retry functions on I/O errors with exponential backoff.

search_file_tree(directory[, filetypes, ...])

Lists all files in directory and sub-directories.

zea.io_lib.load_image(filename, grayscale=True, color_order='RGB')[source]

Load an image file and return a numpy array.

Supported file types: jpg, png.

Parameters:
  • filename (str) – The path to the image file.

  • grayscale (bool, optional) – Whether to convert the image to grayscale. Defaults to True.

  • color_order (str, optional) – The desired color channel ordering. Defaults to ‘RGB’.

Returns:

A numpy array of the image.

Return type:

numpy.ndarray

Raises:

ValueError – If the file extension is not supported.

zea.io_lib.load_video(filename)[source]

Load a video file and return a numpy array of frames.

Supported file types: avi, mp4, gif.

Parameters:

filename (str) – The path to the video file.

Returns:

A numpy array of frames.

Return type:

numpy.ndarray

Raises:

ValueError – If the file extension is not supported.

zea.io_lib.matplotlib_figure_to_numpy(fig)[source]

Convert matplotlib figure to numpy array.

Parameters:

fig (matplotlib.figure.Figure) – figure to convert.

Returns:

numpy array of figure.

Return type:

np.ndarray

zea.io_lib.retry_on_io_error(max_retries=3, initial_delay=0.5, retry_action=None)[source]

Decorator to retry functions on I/O errors with exponential backoff.

Parameters:
  • max_retries (int) – Maximum number of retry attempts.

  • initial_delay (float) – Initial delay between retries in seconds.

  • retry_action (callable, optional) – Optional function to call before each retry attempt. If decorating a method: retry_action(self, exception, attempt, *args, **kwargs) If decorating a function: retry_action(exception, attempt, *args, **kwargs)

Returns:

Decorated function with retry logic.

Return type:

callable

zea.io_lib.search_file_tree(directory, filetypes=None, write=True, dataset_info_filename='dataset_info.yaml', hdf5_key_for_length=None, redo=False, parallel=False, verbose=True)[source]

Lists all files in directory and sub-directories.

If dataset_info.yaml is detected in the directory, that file is read and used to deduce the file paths. If not, the file paths are searched for in the directory and written to a dataset_info.yaml file.

Parameters:
  • directory (str) – Path to base directory to start file search.

  • filetypes (str or list, optional) – Filetypes to look for in directory. Defaults to image types (.png etc.). Make sure to include the dot.

  • write (bool, optional) – Whether to write to dataset_info.yaml file. Defaults to True. If False, the file paths are not written to file and simply returned.

  • dataset_info_filename (str, optional) – Name of dataset info file. Defaults to “dataset_info.yaml”, but can be changed to any name.

  • hdf5_key_for_length (str, optional) – Key to use for getting length of hdf5 files. Defaults to None. If set, the number of frames in each hdf5 file is calculated and stored in the dataset_info.yaml file. This is extra functionality of search_file_tree and only works with hdf5 files.

  • redo (bool, optional) – Whether to redo the search and overwrite the dataset_info.yaml file.

  • parallel (bool, optional) – Whether to use multiprocessing for hdf5 shape reading.

  • verbose (bool, optional) – Whether to print progress and info.

Returns:

Dictionary containing file paths and total number of files.

Has the following structure:

{
    "file_paths": list of file paths,
    "total_num_files": total number of files,
    "file_lengths": list of number of frames in each hdf5 file,
    "file_shapes": list of shapes of each image file,
    "total_num_frames": total number of frames in all hdf5 files
}

Return type:

dict