zea.data.dataloader

H5 dataloader for loading images from zea datasets.

Functions

generate_h5_indices(file_paths, file_shapes, ...)

Generate indices for h5 files.

Classes

H5Generator(file_paths[, key, n_frames, ...])

H5Generator class for iterating over hdf5 files in an advanced way.

class zea.data.dataloader.H5Generator(file_paths, key='data/image', n_frames=1, shuffle=True, return_filename=False, limit_n_samples=None, limit_n_frames=None, seed=None, cache=False, additional_axes_iter=None, sort_files=True, overlapping_blocks=False, initial_frame_axis=0, insert_frame_axis=True, frame_index_stride=1, frame_axis=-1, validate=True, **kwargs)[source]

Bases: Dataset

H5Generator class for iterating over hdf5 files in an advanced way. Mostly used internally, you might want to use the Dataloader class instead. Loads one item at a time. Always outputs numpy arrays.

iterator()[source]

Generator that yields images from the hdf5 files.

load(file, key, indices)[source]

Extract data from hdf5 file. :param file_name: name of the file to extract image from. :type file_name: str :type key: str :param key: key of the hdf5 dataset to grab data from. :type key: str :type indices: tuple | str :param indices: indices to extract image from (tuple of slices) :type indices: tuple

Returns:

image extracted from hdf5 file and indexed by indices.

Return type:

np.ndarray

summary()[source]

Return a string with dataset statistics and per-directory breakdown.

zea.data.dataloader.generate_h5_indices(file_paths, file_shapes, n_frames, frame_index_stride, key='data/image', initial_frame_axis=0, additional_axes_iter=None, sort_files=True, overlapping_blocks=False, limit_n_frames=None)[source]

Generate indices for h5 files.

Generates a list of indices to extract images from hdf5 files. Length of this list is the length of the extracted dataset.

Parameters:
  • file_paths (list) – List of file paths.

  • file_shapes (list) – List of file shapes.

  • n_frames (int) – Number of frames to load from each hdf5 file.

  • frame_index_stride (int) – Interval between frames to load.

  • key (str, optional) – Key of hdf5 dataset to grab data from. Defaults to “data/image”.

  • initial_frame_axis (int, optional) – Axis to iterate over. Defaults to 0.

  • additional_axes_iter (list, optional) – Additional axes to iterate over in the dataset. Defaults to None.

  • sort_files (bool, optional) – Sort files by number. Defaults to True.

  • overlapping_blocks (bool, optional) – Will take n_frames from sequence, then move by 1. Defaults to False.

  • limit_n_frames (int, optional) – Limit the number of frames to load from each file. This means n_frames per data file will be used. These will be the first frames in the file. Defaults to None.

Returns:

List of tuples with indices to extract images from hdf5 files.

(file_name, key, indices) with indices being a tuple of slices.

Return type:

list

Example

[
    (
        "/folder/path_to_file.hdf5",
        "data/image",
        [range(0, 1), slice(None, 256, None), slice(None, 256, None)],
    ),
    (
        "/folder/path_to_file.hdf5",
        "data/image",
        [range(1, 2), slice(None, 256, None), slice(None, 256, None)],
    ),
    ...,
]