Dataset¶

class classy_vision.dataset.CIFARDataset(split: Optional[str], batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: Optional[int], root: str, download: Optional[bool] = None)¶

__init__(split: Optional[str], batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: Optional[int], root: str, download: Optional[bool] = None)¶

Constructor for a ClassyDataset.

Parameters

batchsize_per_replica – Positive integer indicating batch size for each replica
shuffle – Whether to shuffle between epochs
transform – When set, transform to be applied to each sample
num_samples – When set, this restricts the number of samples provided by the dataset

classmethod from_config(config: Dict[str, Any]) → classy_vision.dataset.classy_cifar.CIFARDataset¶

Instantiates a CIFARDataset from a configuration.

Parameters: config – A configuration for a CIFARDataset. See __init__() for parameters expected in the config.
Returns: A CIFARDataset instance.

class classy_vision.dataset.ClassyDataset(dataset: Sequence, batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: Optional[int])¶

Class representing a dataset abstraction.

This class wraps a torch.utils.data.Dataset via the dataset attribute and configures the dataloaders needed to access the datasets. By default, this class will use DEFAULT_NUM_WORKERS processes to load the data (num_workers in torch.utils.data.DataLoader). Transforms which need to be applied to the data should be specified in this class. ClassyDataset can be instantiated from a configuration file as well.

__init__(dataset: Sequence, batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: Optional[int]) → None¶

Constructor for a ClassyDataset.

Parameters

batchsize_per_replica – Positive integer indicating batch size for each replica
shuffle – Whether to shuffle between epochs
transform – When set, transform to be applied to each sample
num_samples – When set, this restricts the number of samples provided by the dataset

classmethod from_config(config: Dict[str, Any]) → classy_vision.dataset.classy_dataset.ClassyDataset¶

Instantiates a ClassyDataset from a configuration.

Parameters: config – A configuration for the ClassyDataset.
Returns: A ClassyDataset instance.

get_batchsize_per_replica()¶

Get the batch size per replica.

Returns: The batch size for each replica.

get_global_batchsize()¶

Get the global batch size, combined over all the replicas.

Returns: The overall batch size of the dataset.

iterator(*args, **kwargs)¶

Returns an iterable which can be used to iterate over the data.

Parameters

shuffle_seed (int, optional) – Seed for the shuffle
current_phase_id (int, optional) – The epoch being fetched. Needed so that each epoch has a different shuffle order

Returns

An iterable over the data

classmethod parse_config(config: Dict[str, Any])¶

This function parses out common config options.

Parameters

config –

A dict with the following string keys -

batchsize_per_replica (int): Must be a positive int, batch size

for each replica

use_shuffle (bool): Whether to enable shuffling for the dataset

num_samples (int, optional): When set, restricts the number of samples in a dataset

transforms: list of tranform configurations to be applied in order

Returns

A tuple containing the following variables -: transform_config: Config for the dataset transform. Can be passed to

transforms.build_transform()

batchsize_per_replica: Batch size per replica

shuffle: Whether we should shuffle between epochs

num_samples: When set, restricts the number of samples in a dataset

class classy_vision.dataset.ClassyVideoDataset(dataset: Any, split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], clips_per_video: int)¶

Interface specifying what a ClassyVision video dataset is expected to provide.

This dataset considers every video as a collection of video clips of fixed size, specified by frames_per_clip, where the step in frames between each clip is given by step_between_clips. It uses a clip sampler to sample a specified number of clips (clips_per_video) from each video. For training set, a random clip sampler is used to sample a small number of clips (e.g. 1) from each video For testing set, a uniform clip sampler is used to evenly sample a large number of clips (e.g. 10) from the video.

To give an example, for 2 videos with 10 and 15 frames respectively, if frames_per_clip=5 and step_between_clips=5, the dataset size will be (2 + 3) = 5, where the first two elements will come from video 1, and the next three elements from video 2. Note that we drop clips which do not have exactly frames_per_clip elements, so not all frames in a video may be present.

__init__(dataset: Any, split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], clips_per_video: int)¶

The constructor method of ClassyVideoDataset.

Parameters

dataset – the underlying video dataset from either TorchVision or other source. It should have an attribute video_clips of type torchvision.datasets.video_utils.VideoClips
split – dataset split. Must be either “train” or “test”
batchsize_per_replica – batch size per model replica
shuffle – If true, shuffle video clips.
transform – callable function to transform video clip sample from ClassyVideoDataset
num_samples – If provided, return at most num_samples video clips
clips_per_video – The number of clips sampled from each video

iterator(*args, **kwargs)¶

Returns an iterable which can be used to iterate over the data.

Parameters

shuffle_seed (int, optional) – Seed for the shuffle
current_phase_id (int, optional) – The epoch being fetched. Needed so that each epoch has a different shuffle order

Returns

An iterable over the data

classmethod load_metadata(filepath: str, video_dir: Optional[str] = None, update_file_path: bool = False) → Dict[str, Any]¶

Load pre-computed video dataset meta data.

Video dataset meta data computation takes minutes on small dataset and hours on large dataset, and thus is time-consuming. However, it only needs to be computed once, and can be saved into a file via save_metadata().

The format of meta data is defined in TorchVision.

For each video, meta data contains the video file path, presentation timestamps of all video frames, and video fps.

Parameters

filepath – file path of pre-computed meta data
video_dir – If provided, the folder where video files are stored.
update_file_path – If true, replace the directory part of video file path in meta data with the actual video directory provided in video_dir. This is necessary for successsfully reusing pre-computed meta data when video directory has been moved and is no longer consitent with the full video file path saved in the meta data.

classmethod parse_config(config: Dict[str, Any])¶: Parse config to prepare arguments needed by the class constructor.

classmethod save_metadata(metadata: Dict[str, Any], filepath: str)¶

Save dataset meta data into a file.

Parameters

metadata – dataset meta data, which contains video meta infomration, such as video file path, video fps, video frame timestamp in each video. For the format of dataset meta data, check the TorchVision documentation.
filepath – file path where the meta data will be saved

property video_clips¶

Attribute video_clips.

It is used in _get_sampler method. Its data type should be: torchvision.datasets.video_utils.VideoClips.

class classy_vision.dataset.DataloaderAsyncGPUWrapper(dataloader: Iterable)¶

Dataloader which wraps another dataloader, and moves the data to GPU asynchronously. At most one batch is pre-emptively copied (per worker).

credits: @vini, nvidia Apex

__init__(dataloader: Iterable) → None¶

class classy_vision.dataset.DataloaderLimitWrapper(dataloader: Iterable, limit: int, wrap_around: bool = True)¶

Dataloader which wraps another dataloader and only returns a limited number of items.

This is useful for Iterable datasets where the length of the datasets isn’t known. Such datasets can wrap their returned iterators with this class. See SyntheticImageStreamingDataset.iterator() for an example.

Attribute accesses are passed to the wrapped dataloader.

__init__(dataloader: Iterable, limit: int, wrap_around: bool = True) → None¶

Constructor for DataloaderLimitWrapper.

Parameters

dataloader – The dataloader to wrap around
limit – Specify the number of calls to the underlying dataloader. The wrapper will raise a StopIteration after limit calls.
wrap_around – Whether to wrap around the original datatloader if the dataloader is exhausted before limit calls.

Raises

RuntimeError – If wrap_around is set to False and the underlying dataloader is exhausted before limit calls.

class classy_vision.dataset.DataloaderSkipNoneWrapper(dataloader: Iterable)¶

Dataloader which wraps another dataloader and skip None batch data.

Attribute accesses are passed to the wrapped dataloader.

__init__(dataloader: Iterable) → None¶

class classy_vision.dataset.DataloaderWrapper(dataloader: Iterable)¶

Abstract class representing dataloader which wraps another dataloader.

Attribute accesses are passed to the wrapped dataloader.

__init__(dataloader: Iterable) → None¶

class classy_vision.dataset.HMDB51Dataset(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, splits_dir: str, fold: int, metadata_filepath: str)¶

HMDB51 is an action recognition video dataset, and it has 51 classes.

It is built on top of HMDB51 dataset class in TorchVision.

__init__(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, splits_dir: str, fold: int, metadata_filepath: str)¶

The constructor of HMDB51Dataset.

Parameters

split – dataset split which can be either “train” or “test”
batchsize_per_replica – batch size per model replica
shuffle – If true, shuffle the dataset
transform – a dict where transforms video and audio data
num_samples – if not None, it will subsample dataset
frames_per_clip – the number of frames in a video clip
video_width – rescaled video width. If 0, keep original width
video_height – rescaled video height. If 0, keep original height
video_min_dimension – rescale video so that min(height, width) = video_min_dimension. If 0, keep original video resolution. Note only one of (video_width, video_height) and (video_min_dimension) can be set
audio_samples – desired audio sample rate. If 0, keep original audio sample rate.
step_between_clips – Number of frames between each clip.
frame_rate – desired video frame rate. If None, keep orignal video frame rate.
clips_per_video – Number of clips to sample from each video
video_dir – path to video folder
splits_dir – path to dataset splitting file folder
fold – HMDB51 dataset has 3 folds. Valid values are 1, 2 and 3.
metadata_filepath – path to the dataset meta data

classmethod from_config(config: Dict[str, Any]) → classy_vision.dataset.classy_hmdb51.HMDB51Dataset¶

Instantiates a HMDB51Dataset from a configuration.

Parameters: config – A configuration for a HMDB51Dataset. See __init__() for parameters expected in the config.
Returns: A HMDB51Dataset instance.

class classy_vision.dataset.ImagePathDataset(batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable] = None, num_samples: Optional[int] = None, image_folder: Optional[str] = None, image_files: Optional[List[str]] = None)¶

Dataset which reads images from a local filesystem. Implements ClassyDataset.

__init__(batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable] = None, num_samples: Optional[int] = None, image_folder: Optional[str] = None, image_files: Optional[List[str]] = None)¶

Constructor for ImagePathDataset.

Only one of image_folder or image_files should be passed to specify the images.

Parameters

batchsize_per_replica – Positive integer indicating batch size for each replica
shuffle – Whether we should shuffle between epochs
transform – Transform to be applied to each sample
num_samples – When set, this restricts the number of samples provided by the dataset
image_folder –
A directory with one of the following structures - - A directory containing sub-directories with images for each target,

which is the format expected by torchvision.datasets.ImageFolder -

dog/xxx.png dog/xxy.png cat/123.png cat/nsdf3.png

In this case, the targets are inferred from the sub-directories.
- A directory containing images -
  
  123.png xyz.png
  
  In this case, the targets are not returned (useful for inference).
image_files –
A list of image files -

[
“123.png”, “dog/xyz.png”, “/home/cat/aaa.png”

]

In this case, the targets are not returned (useful for inference).

classmethod from_config(config: Dict[str, Any])¶

Instantiates ImagePathDataset from a config.

Parameters: config – A configuration for ImagePathDataset. See __init__() for parameters expected in the config.
Returns: An ImagePathDataset instance.

class classy_vision.dataset.Kinetics400Dataset(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, audio_channels: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, extensions: List[str], metadata_filepath: str)¶

Kinetics-400 is an action recognition video dataset, and it has 400 classes. Original publication

We assume videos are already trimmed to 10-second clip, and are stored in a folder.

It is built on top of Kinetics dataset class in TorchVision.

__init__(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, audio_channels: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, extensions: List[str], metadata_filepath: str)¶

The constructor of Kinetics400Dataset.

Parameters

split – dataset split which can be either “train” or “test”
batchsize_per_replica – batch size per model replica
shuffle – If true, shuffle the dataset
transform – a dict where transforms video and audio data
num_samples – if provided, it will subsample dataset
frames_per_clip – the No. of frames in a video clip
video_width – rescaled video width. If 0, keep original width
video_height – rescaled video height. If 0, keep original height
video_min_dimension – rescale video so that min(height, width) = video_min_dimension. If 0, keep original video resolution. Note only one of (video_width, video_height) and (video_min_dimension) can be set
audio_samples – desired audio sample rate. If 0, keep original audio sample rate
audio_channels – desire No. of audio channel. If 0, keep original audio channels
step_between_clips – Number of frames between each clip.
frame_rate – desired video frame rate. If None, keep orignal video frame rate.
clips_per_video – Number of clips to sample from each video
video_dir – path to video folder
extensions – A list of file extensions, such as “avi” and “mp4”. Only video matching those file extensions are added to the dataset
metadata_filepath – path to the dataset meta data

classmethod from_config(config: Dict[str, Any]) → classy_vision.dataset.classy_kinetics400.Kinetics400Dataset¶

Instantiates a Kinetics400Dataset from a configuration.

Parameters: config – A configuration for a Kinetics400Dataset. See __init__() for parameters expected in the config.
Returns: A Kinetics400Dataset instance.

class classy_vision.dataset.SyntheticImageDataset(batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: int, crop_size: int, class_ratio: float, seed: int)¶

Classy Dataset which produces random synthetic images with binary targets.

The underlying dataset sets targets based on the channels in the image, so users can validate their setup by checking if they can get 100% accuracy on this dataset. Useful for testing since the dataset is much faster to initialize and fetch samples from, compared to real world datasets.

__init__(batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: int, crop_size: int, class_ratio: float, seed: int) → None¶

Parameters

batchsize_per_replica – Positive integer indicating batch size for each replica
shuffle – Whether we should shuffle between epochs
transform – When specified, transform to be applied to each sample
num_samples – Number of samples to return
crop_size – Image size, used for both height and width
class_ratio – Ratio of the distribution of target classes
seed – Seed used for image generation. Use the same seed to generate the same set of samples.
split – When specified, split of dataset to use

classmethod from_config(config: Dict[str, Any]) → classy_vision.dataset.classy_synthetic_image.SyntheticImageDataset¶

Instantiates a SyntheticImageDataset from a configuration.

Parameters: config – A configuration for a SyntheticImageDataset. See __init__() for parameters expected in the config.
Returns: A SyntheticImageDataset instance.

class classy_vision.dataset.SyntheticImageStreamingDataset(batchsize_per_replica, shuffle, transform, num_samples, crop_size, class_ratio, seed, length=None, async_gpu_copy: bool = False)¶

Synthetic image dataset that behaves like a streaming dataset.

Requires a “num_samples” argument which decides the number of samples in the phase. Also takes an optional “length” input which sets the length of the dataset.

__init__(batchsize_per_replica, shuffle, transform, num_samples, crop_size, class_ratio, seed, length=None, async_gpu_copy: bool = False)¶

Constructor for a ClassyDataset.

Parameters

batchsize_per_replica – Positive integer indicating batch size for each replica
shuffle – Whether to shuffle between epochs
transform – When set, transform to be applied to each sample
num_samples – When set, this restricts the number of samples provided by the dataset

classmethod from_config(config)¶

Instantiates a ClassyDataset from a configuration.

Parameters: config – A configuration for the ClassyDataset.
Returns: A ClassyDataset instance.

iterator(*args, **kwargs)¶

Returns an iterable which can be used to iterate over the data.

Parameters

shuffle_seed (int, optional) – Seed for the shuffle
current_phase_id (int, optional) – The epoch being fetched. Needed so that each epoch has a different shuffle order

Returns

An iterable over the data

class classy_vision.dataset.SyntheticVideoDataset(num_classes: int, split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: int, frames_per_clip: int, video_width: int, video_height: int, audio_samples: int, clips_per_video: int)¶

Classy Dataset which produces random synthetic video clips.

Useful for testing since the dataset is much faster to initialize and fetch samples from, compared to real world datasets.

Note: Unlike SyntheticImageDataset, this dataset generates targets: randomly, independent of the video clips.

__init__(num_classes: int, split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: int, frames_per_clip: int, video_width: int, video_height: int, audio_samples: int, clips_per_video: int)¶

The constructor of SyntheticVideoDataset.

Parameters

num_classes – Number of classes in the generated targets.
split – Split of dataset to use
batchsize_per_replica – batch size per model replica
shuffle – Whether we should shuffle between epochs
transform – Transform to be applied to each sample
num_samples – Number of samples to return
frames_per_clip – Number of frames in a video clip
video_width – Width of the video clip
video_height – Height of the video clip
audio_samples – Audio sample rate
clips_per_video – Number of clips per video

classmethod from_config(config: Dict[str, Any]) → classy_vision.dataset.classy_synthetic_video.SyntheticVideoDataset¶

Instantiates a SyntheticVideoDataset from a configuration.

Parameters: config – A configuration for a SyntheticVideoDataset. See __init__() for parameters expected in the config.
Returns: A SyntheticVideoDataset instance.

property video_clips¶

Attribute video_clips.

It is used in _get_sampler method. Its data type should be: torchvision.datasets.video_utils.VideoClips.

class classy_vision.dataset.UCF101Dataset(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, splits_dir: str, fold: int, metadata_filepath: str)¶

UCF101 is an action recognition video dataset, and it has 101 classes.

It is built on top of UCF101 dataset class in TorchVision.

__init__(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, splits_dir: str, fold: int, metadata_filepath: str)¶

The constructor of UCF101Dataset.

Parameters

split – dataset split which can be either “train” or “test”
batchsize_per_replica – batch size per model replica
shuffle – If true, shuffle the dataset
transform – a dict where transforms video and audio data
num_samples – if not None, it will subsample dataset
frames_per_clip – the No. of frames in a video clip
video_width – rescaled video width. If 0, keep original width
video_height – rescaled video height. If 0, keep original height
video_min_dimension – rescale video so that min(height, width) = video_min_dimension. If 0, keep original video resolution. Note only one of (video_width, video_height) and (video_min_dimension) can be set
audio_samples – desired audio sample rate. If 0, keep original audio sample rate.
step_between_clips – Number of frames between each clip.
frame_rate – desired video frame rate. If None, keep original video frame rate.
clips_per_video – Number of clips to sample from each video
video_dir – path to video folder
splits_dir – path to dataset splitting file folder
fold – UCF101 dataset has 3 folds. Valid values are 1, 2 and 3.
metadata_filepath – path to the dataset meta data

classmethod from_config(config: Dict[str, Any]) → classy_vision.dataset.classy_ucf101.UCF101Dataset¶

Instantiates a UCF101Dataset from a configuration.

Parameters: config – A configuration for a UCF101Dataset. See __init__() for parameters expected in the config.
Returns: A UCF101Dataset instance.

classy_vision.dataset.build_dataset(config, *args, **kwargs)¶

Builds a ClassyDataset from a config.

This assumes a ‘name’ key in the config which is used to determine what dataset class to instantiate. For instance, a config {“name”: “my_dataset”, “folder”: “/data”} will find a class that was registered as “my_dataset” (see register_dataset()) and call .from_config on it.

classy_vision.dataset.register_dataset(name, bypass_checks=False)¶

Registers a ClassyDataset subclass.

This decorator allows Classy Vision to instantiate a subclass of ClassyDataset from a configuration file, even if the class itself is not part of the Classy Vision framework. To use it, apply this decorator to a ClassyDataset subclass like this:

@register_dataset("my_dataset")
class MyDataset(ClassyDataset):
    ...

To instantiate a dataset from a configuration file, see build_dataset().

Classy Vision

Dataset¶

Classy Vision

Navigation

Related Topics