Dataset

class classy_vision.dataset.CIFARDataset(split: Optional[str], batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: Optional[int], root: str, download: bool = None)
__init__(split: Optional[str], batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: Optional[int], root: str, download: bool = None)

Constructor for a ClassyDataset.

Parameters
  • batchsize_per_replica – Positive integer indicating batch size for each replica

  • shuffle – Whether to shuffle between epochs

  • transform – When set, transform to be applied to each sample

  • num_samples – When set, this restricts the number of samples provided by the dataset

classmethod from_config(config: Dict[str, Any]) → classy_vision.dataset.classy_cifar.CIFARDataset

Instantiates a CIFARDataset from a configuration.

Parameters

config – A configuration for a CIFARDataset. See __init__() for parameters expected in the config.

Returns

A CIFARDataset instance.

class classy_vision.dataset.ClassyDataset(dataset: Sequence, batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: Optional[int])

Class representing a dataset abstraction.

This class wraps a torch.utils.data.Dataset via the dataset attribute and configures the dataloaders needed to access the datasets. By default, this class will use DEFAULT_NUM_WORKERS processes to load the data (num_workers in torch.utils.data.DataLoader). Transforms which need to be applied to the data should be specified in this class. ClassyDataset can be instantiated from a configuration file as well.

__init__(dataset: Sequence, batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: Optional[int])None

Constructor for a ClassyDataset.

Parameters
  • batchsize_per_replica – Positive integer indicating batch size for each replica

  • shuffle – Whether to shuffle between epochs

  • transform – When set, transform to be applied to each sample

  • num_samples – When set, this restricts the number of samples provided by the dataset

classmethod from_config(config: Dict[str, Any]) → classy_vision.dataset.classy_dataset.ClassyDataset

Instantiates a ClassyDataset from a configuration.

Parameters

config – A configuration for the ClassyDataset.

Returns

A ClassyDataset instance.

get_batchsize_per_replica()

Get the batch size per replica.

Returns

The batch size for each replica.

get_global_batchsize()

Get the global batch size, combined over all the replicas.

Returns

The overall batch size of the dataset.

iterator(*args, **kwargs)

Returns an iterable which can be used to iterate over the data.

Parameters
  • shuffle_seed (int, optional) – Seed for the shuffle

  • current_phase_id (int, optional) – The epoch being fetched. Needed so that each epoch has a different shuffle order

Returns

An iterable over the data

classmethod parse_config(config: Dict[str, Any])

This function parses out common config options.

Parameters

config

A dict with the following string keys -

batchsize_per_replica (int): Must be a positive int, batch size
for each replica
use_shuffle (bool): Whether to enable shuffling for the dataset
num_samples (int, optional): When set, restricts the number of samples in a dataset
transforms: list of tranform configurations to be applied in order

Returns

A tuple containing the following variables -
transform_config: Config for the dataset transform. Can be passed to
batchsize_per_replica: Batch size per replica
shuffle: Whether we should shuffle between epochs
num_samples: When set, restricts the number of samples in a dataset

class classy_vision.dataset.ClassyVideoDataset(dataset: Any, split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], clips_per_video: int)

Interface specifying what a ClassyVision video dataset is expected to provide.

This dataset considers every video as a collection of video clips of fixed size, specified by frames_per_clip, where the step in frames between each clip is given by step_between_clips. It uses a clip sampler to sample a specified number of clips (clips_per_video) from each video. For training set, a random clip sampler is used to sample a small number of clips (e.g. 1) from each video For testing set, a uniform clip sampler is used to evenly sample a large number of clips (e.g. 10) from the video.

To give an example, for 2 videos with 10 and 15 frames respectively, if frames_per_clip=5 and step_between_clips=5, the dataset size will be (2 + 3) = 5, where the first two elements will come from video 1, and the next three elements from video 2. Note that we drop clips which do not have exactly frames_per_clip elements, so not all frames in a video may be present.

__init__(dataset: Any, split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], clips_per_video: int)

The constructor method of ClassyVideoDataset.

Parameters
  • dataset – the underlying video dataset from either TorchVision or other source. It should have an attribute video_clips of type torchvision.datasets.video_utils.VideoClips

  • split – dataset split. Must be either “train” or “test”

  • batchsize_per_replica – batch size per model replica

  • shuffle – If true, shuffle video clips.

  • transform – callable function to transform video clip sample from ClassyVideoDataset

  • num_samples – If provided, return at most num_samples video clips

  • clips_per_video – The number of clips sampled from each video

iterator(*args, **kwargs)

Returns an iterable which can be used to iterate over the data.

Parameters
  • shuffle_seed (int, optional) – Seed for the shuffle

  • current_phase_id (int, optional) – The epoch being fetched. Needed so that each epoch has a different shuffle order

Returns

An iterable over the data

classmethod load_metadata(filepath: str, video_dir: Optional[str] = None, update_file_path: bool = False) → Dict[str, Any]

Load pre-computed video dataset meta data.

Video dataset meta data computation takes minutes on small dataset and hours on large dataset, and thus is time-consuming. However, it only needs to be computed once, and can be saved into a file via save_metadata().

The format of meta data is defined in TorchVision.

For each video, meta data contains the video file path, presentation timestamps of all video frames, and video fps.

Parameters
  • filepath – file path of pre-computed meta data

  • video_dir – If provided, the folder where video files are stored.

  • update_file_path – If true, replace the directory part of video file path in meta data with the actual video directory provided in video_dir. This is necessary for successsfully reusing pre-computed meta data when video directory has been moved and is no longer consitent with the full video file path saved in the meta data.

classmethod parse_config(config: Dict[str, Any])

Parse config to prepare arguments needed by the class constructor.

classmethod save_metadata(metadata: Dict[str, Any], filepath: str)

Save dataset meta data into a file.

Parameters
  • metadata – dataset meta data, which contains video meta infomration, such as video file path, video fps, video frame timestamp in each video. For the format of dataset meta data, check the TorchVision documentation.

  • filepath – file path where the meta data will be saved

property video_clips

Attribute video_clips.

It is used in _get_sampler method. Its data type should be

torchvision.datasets.video_utils.VideoClips.

class classy_vision.dataset.DataloaderLimitWrapper(dataloader: Iterable, limit: int, wrap_around: bool = True)

Dataloader which wraps another dataloader and only returns a limited number of items.

This is useful for Iterable datasets where the length of the datasets isn’t known. Such datasets can wrap their returned iterators with this class. See SyntheticImageStreamingDataset.iterator() for an example.

Attribute accesses are passed to the wrapped dataloader.

__init__(dataloader: Iterable, limit: int, wrap_around: bool = True)None

Constructor for DataloaderLimitWrapper.

Parameters
  • dataloader – The dataloader to wrap around

  • limit – Specify the number of calls to the underlying dataloader. The wrapper will raise a StopIteration after limit calls.

  • wrap_around – Whether to wrap around the original datatloader if the dataloader is exhausted before limit calls.

Raises

RuntimeError – If wrap_around is set to False and the underlying dataloader is exhausted before limit calls.

class classy_vision.dataset.DataloaderSkipNoneWrapper(dataloader: Iterable)

Dataloader which wraps another dataloader and skip None batch data.

Attribute accesses are passed to the wrapped dataloader.

__init__(dataloader: Iterable)None

Initialize self. See help(type(self)) for accurate signature.

class classy_vision.dataset.DataloaderWrapper(dataloader: Iterable)

Abstract class representing dataloader which wraps another dataloader.

Attribute accesses are passed to the wrapped dataloader.

__init__(dataloader: Iterable)None

Initialize self. See help(type(self)) for accurate signature.

class classy_vision.dataset.HMDB51Dataset(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, splits_dir: str, fold: int, metadata_filepath: str)

HMDB51 is an action recognition video dataset, and it has 51 classes.

It is built on top of HMDB51 dataset class in TorchVision.

__init__(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, splits_dir: str, fold: int, metadata_filepath: str)

The constructor of HMDB51Dataset.

Parameters
  • split – dataset split which can be either “train” or “test”

  • batchsize_per_replica – batch size per model replica

  • shuffle – If true, shuffle the dataset

  • transform – a dict where transforms video and audio data

  • num_samples – if not None, it will subsample dataset

  • frames_per_clip – the number of frames in a video clip

  • video_width – rescaled video width. If 0, keep original width

  • video_height – rescaled video height. If 0, keep original height

  • video_min_dimension – rescale video so that min(height, width) = video_min_dimension. If 0, keep original video resolution. Note only one of (video_width, video_height) and (video_min_dimension) can be set

  • audio_samples – desired audio sample rate. If 0, keep original audio sample rate.

  • step_between_clips – Number of frames between each clip.

  • frame_rate – desired video frame rate. If None, keep orignal video frame rate.

  • clips_per_video – Number of clips to sample from each video

  • video_dir – path to video folder

  • splits_dir – path to dataset splitting file folder

  • fold – HMDB51 dataset has 3 folds. Valid values are 1, 2 and 3.

  • metadata_filepath – path to the dataset meta data

classmethod from_config(config: Dict[str, Any]) → classy_vision.dataset.classy_hmdb51.HMDB51Dataset

Instantiates a HMDB51Dataset from a configuration.

Parameters

config – A configuration for a HMDB51Dataset. See __init__() for parameters expected in the config.

Returns

A HMDB51Dataset instance.

class classy_vision.dataset.ImagePathDataset(batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable] = None, num_samples: Optional[int] = None, image_folder: Optional[str] = None, image_files: Optional[List[str]] = None)

Dataset which reads images from a local filesystem. Implements ClassyDataset.

__init__(batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable] = None, num_samples: Optional[int] = None, image_folder: Optional[str] = None, image_files: Optional[List[str]] = None)

Constructor for ImagePathDataset.

Only one of image_folder or image_files should be passed to specify the images.

Parameters
  • batchsize_per_replica – Positive integer indicating batch size for each replica

  • shuffle – Whether we should shuffle between epochs

  • transform – Transform to be applied to each sample

  • num_samples – When set, this restricts the number of samples provided by the dataset

  • image_folder

    A directory with one of the following structures - - A directory containing sub-directories with images for each target,

    which is the format expected by torchvision.datasets.ImageFolder -

    dog/xxx.png dog/xxy.png cat/123.png cat/nsdf3.png

    In this case, the targets are inferred from the sub-directories.

    • A directory containing images -

      123.png xyz.png

      In this case, the targets are not returned (useful for inference).

  • image_files

    A list of image files -

    [

    “123.png”, “dog/xyz.png”, “/home/cat/aaa.png”

    ]

    In this case, the targets are not returned (useful for inference).

classmethod from_config(config: Dict[str, Any])

Instantiates ImagePathDataset from a config.

Parameters

config – A configuration for ImagePathDataset. See __init__() for parameters expected in the config.

Returns

An ImagePathDataset instance.

class classy_vision.dataset.Kinetics400Dataset(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, audio_channels: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, extensions: List[str], metadata_filepath: str)

Kinetics-400 is an action recognition video dataset, and it has 400 classes. Original publication

We assume videos are already trimmed to 10-second clip, and are stored in a folder.

It is built on top of Kinetics400 dataset class in TorchVision.

__init__(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, audio_channels: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, extensions: List[str], metadata_filepath: str)

The constructor of Kinetics400Dataset.

Parameters
  • split – dataset split which can be either “train” or “test”

  • batchsize_per_replica – batch size per model replica

  • shuffle – If true, shuffle the dataset

  • transform – a dict where transforms video and audio data

  • num_samples – if provided, it will subsample dataset

  • frames_per_clip – the No. of frames in a video clip

  • video_width – rescaled video width. If 0, keep original width

  • video_height – rescaled video height. If 0, keep original height

  • video_min_dimension – rescale video so that min(height, width) = video_min_dimension. If 0, keep original video resolution. Note only one of (video_width, video_height) and (video_min_dimension) can be set

  • audio_samples – desired audio sample rate. If 0, keep original audio sample rate

  • audio_channels – desire No. of audio channel. If 0, keep original audio channels

  • step_between_clips – Number of frames between each clip.

  • frame_rate – desired video frame rate. If None, keep orignal video frame rate.

  • clips_per_video – Number of clips to sample from each video

  • video_dir – path to video folder

  • extensions – A list of file extensions, such as “avi” and “mp4”. Only video matching those file extensions are added to the dataset

  • metadata_filepath – path to the dataset meta data

classmethod from_config(config: Dict[str, Any]) → classy_vision.dataset.classy_kinetics400.Kinetics400Dataset

Instantiates a Kinetics400Dataset from a configuration.

Parameters

config – A configuration for a Kinetics400Dataset. See __init__() for parameters expected in the config.

Returns

A Kinetics400Dataset instance.

class classy_vision.dataset.SyntheticImageDataset(batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: int, crop_size: int, class_ratio: float, seed: int)

Classy Dataset which produces random synthetic images with binary targets.

The underlying dataset sets targets based on the channels in the image, so users can validate their setup by checking if they can get 100% accuracy on this dataset. Useful for testing since the dataset is much faster to initialize and fetch samples from, compared to real world datasets.

__init__(batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: int, crop_size: int, class_ratio: float, seed: int)None
Parameters
  • batchsize_per_replica – Positive integer indicating batch size for each replica

  • shuffle – Whether we should shuffle between epochs

  • transform – When specified, transform to be applied to each sample

  • num_samples – Number of samples to return

  • crop_size – Image size, used for both height and width

  • class_ratio – Ratio of the distribution of target classes

  • seed – Seed used for image generation. Use the same seed to generate the same set of samples.

  • split – When specified, split of dataset to use

classmethod from_config(config: Dict[str, Any]) → classy_vision.dataset.classy_synthetic_image.SyntheticImageDataset

Instantiates a SyntheticImageDataset from a configuration.

Parameters

config – A configuration for a SyntheticImageDataset. See __init__() for parameters expected in the config.

Returns

A SyntheticImageDataset instance.

class classy_vision.dataset.SyntheticImageStreamingDataset(batchsize_per_replica, shuffle, transform, num_samples, crop_size, class_ratio, seed, length=None)

Synthetic image dataset that behaves like a streaming dataset.

Requires a “num_samples” argument which decides the number of samples in the phase. Also takes an optional “length” input which sets the length of the dataset.

__init__(batchsize_per_replica, shuffle, transform, num_samples, crop_size, class_ratio, seed, length=None)

Constructor for a ClassyDataset.

Parameters
  • batchsize_per_replica – Positive integer indicating batch size for each replica

  • shuffle – Whether to shuffle between epochs

  • transform – When set, transform to be applied to each sample

  • num_samples – When set, this restricts the number of samples provided by the dataset

classmethod from_config(config)

Instantiates a ClassyDataset from a configuration.

Parameters

config – A configuration for the ClassyDataset.

Returns

A ClassyDataset instance.

iterator(*args, **kwargs)

Returns an iterable which can be used to iterate over the data.

Parameters
  • shuffle_seed (int, optional) – Seed for the shuffle

  • current_phase_id (int, optional) – The epoch being fetched. Needed so that each epoch has a different shuffle order

Returns

An iterable over the data

class classy_vision.dataset.SyntheticVideoDataset(num_classes: int, split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: int, frames_per_clip: int, video_width: int, video_height: int, audio_samples: int, clips_per_video: int)

Classy Dataset which produces random synthetic video clips.

Useful for testing since the dataset is much faster to initialize and fetch samples from, compared to real world datasets.

Note: Unlike SyntheticImageDataset, this dataset generates targets

randomly, independent of the video clips.

__init__(num_classes: int, split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: int, frames_per_clip: int, video_width: int, video_height: int, audio_samples: int, clips_per_video: int)

The constructor of SyntheticVideoDataset.

Parameters
  • num_classes – Number of classes in the generated targets.

  • split – Split of dataset to use

  • batchsize_per_replica – batch size per model replica

  • shuffle – Whether we should shuffle between epochs

  • transform – Transform to be applied to each sample

  • num_samples – Number of samples to return

  • frames_per_clip – Number of frames in a video clip

  • video_width – Width of the video clip

  • video_height – Height of the video clip

  • audio_samples – Audio sample rate

  • clips_per_video – Number of clips per video

classmethod from_config(config: Dict[str, Any]) → classy_vision.dataset.classy_synthetic_video.SyntheticVideoDataset

Instantiates a SyntheticVideoDataset from a configuration.

Parameters

config – A configuration for a SyntheticVideoDataset. See __init__() for parameters expected in the config.

Returns

A SyntheticVideoDataset instance.

property video_clips

Attribute video_clips.

It is used in _get_sampler method. Its data type should be

torchvision.datasets.video_utils.VideoClips.

class classy_vision.dataset.UCF101Dataset(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, splits_dir: str, fold: int, metadata_filepath: str)

UCF101 is an action recognition video dataset, and it has 101 classes.

It is built on top of UCF101 dataset class in TorchVision.

__init__(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, splits_dir: str, fold: int, metadata_filepath: str)

The constructor of UCF101Dataset.

Parameters
  • split – dataset split which can be either “train” or “test”

  • batchsize_per_replica – batch size per model replica

  • shuffle – If true, shuffle the dataset

  • transform – a dict where transforms video and audio data

  • num_samples – if not None, it will subsample dataset

  • frames_per_clip – the No. of frames in a video clip

  • video_width – rescaled video width. If 0, keep original width

  • video_height – rescaled video height. If 0, keep original height

  • video_min_dimension – rescale video so that min(height, width) = video_min_dimension. If 0, keep original video resolution. Note only one of (video_width, video_height) and (video_min_dimension) can be set

  • audio_samples – desired audio sample rate. If 0, keep original audio sample rate.

  • step_between_clips – Number of frames between each clip.

  • frame_rate – desired video frame rate. If None, keep original video frame rate.

  • clips_per_video – Number of clips to sample from each video

  • video_dir – path to video folder

  • splits_dir – path to dataset splitting file folder

  • fold – UCF101 dataset has 3 folds. Valid values are 1, 2 and 3.

  • metadata_filepath – path to the dataset meta data

classmethod from_config(config: Dict[str, Any]) → classy_vision.dataset.classy_ucf101.UCF101Dataset

Instantiates a UCF101Dataset from a configuration.

Parameters

config – A configuration for a UCF101Dataset. See __init__() for parameters expected in the config.

Returns

A UCF101Dataset instance.

classy_vision.dataset.build_dataset(config, *args, **kwargs)

Builds a ClassyDataset from a config.

This assumes a ‘name’ key in the config which is used to determine what dataset class to instantiate. For instance, a config {“name”: “my_dataset”, “folder”: “/data”} will find a class that was registered as “my_dataset” (see register_dataset()) and call .from_config on it.

classy_vision.dataset.register_dataset(name)

Registers a ClassyDataset subclass.

This decorator allows Classy Vision to instantiate a subclass of ClassyDataset from a configuration file, even if the class itself is not part of the Classy Vision framework. To use it, apply this decorator to a ClassyDataset subclass like this:

@register_dataset("my_dataset")
class MyDataset(ClassyDataset):
    ...

To instantiate a dataset from a configuration file, see build_dataset().