Dataset¶
- class classy_vision.dataset.CIFARDataset(split: Optional[str], batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: Optional[int], root: str, download: Optional[bool] = None)¶
- __init__(split: Optional[str], batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: Optional[int], root: str, download: Optional[bool] = None)¶
Constructor for a ClassyDataset.
- Parameters
batchsize_per_replica – Positive integer indicating batch size for each replica
shuffle – Whether to shuffle between epochs
transform – When set, transform to be applied to each sample
num_samples – When set, this restricts the number of samples provided by the dataset
- classmethod from_config(config: Dict[str, Any]) classy_vision.dataset.classy_cifar.CIFARDataset ¶
Instantiates a CIFARDataset from a configuration.
- Parameters
config – A configuration for a CIFARDataset. See
__init__()
for parameters expected in the config.- Returns
A CIFARDataset instance.
- class classy_vision.dataset.ClassyDataset(dataset: Sequence, batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: Optional[int])¶
Class representing a dataset abstraction.
This class wraps a
torch.utils.data.Dataset
via the dataset attribute and configures the dataloaders needed to access the datasets. By default, this class will use DEFAULT_NUM_WORKERS processes to load the data (num_workers intorch.utils.data.DataLoader
). Transforms which need to be applied to the data should be specified in this class. ClassyDataset can be instantiated from a configuration file as well.- __init__(dataset: Sequence, batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: Optional[int]) None ¶
Constructor for a ClassyDataset.
- Parameters
batchsize_per_replica – Positive integer indicating batch size for each replica
shuffle – Whether to shuffle between epochs
transform – When set, transform to be applied to each sample
num_samples – When set, this restricts the number of samples provided by the dataset
- classmethod from_config(config: Dict[str, Any]) classy_vision.dataset.classy_dataset.ClassyDataset ¶
Instantiates a ClassyDataset from a configuration.
- Parameters
config – A configuration for the ClassyDataset.
- Returns
A ClassyDataset instance.
- get_batchsize_per_replica()¶
Get the batch size per replica.
- Returns
The batch size for each replica.
- get_global_batchsize()¶
Get the global batch size, combined over all the replicas.
- Returns
The overall batch size of the dataset.
- iterator(*args, **kwargs)¶
Returns an iterable which can be used to iterate over the data.
- classmethod parse_config(config: Dict[str, Any])¶
This function parses out common config options.
- Parameters
config –
A dict with the following string keys -
batchsize_per_replica (int): Must be a positive int, batch sizefor each replicause_shuffle (bool): Whether to enable shuffling for the datasetnum_samples (int, optional): When set, restricts the number of samples in a datasettransforms: list of tranform configurations to be applied in order- Returns
- A tuple containing the following variables -
- transform_config: Config for the dataset transform. Can be passed tobatchsize_per_replica: Batch size per replicashuffle: Whether we should shuffle between epochsnum_samples: When set, restricts the number of samples in a dataset
- class classy_vision.dataset.ClassyVideoDataset(dataset: Any, split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], clips_per_video: int)¶
Interface specifying what a ClassyVision video dataset is expected to provide.
This dataset considers every video as a collection of video clips of fixed size, specified by
frames_per_clip
, where the step in frames between each clip is given bystep_between_clips
. It uses a clip sampler to sample a specified number of clips (clips_per_video
) from each video. For training set, a random clip sampler is used to sample a small number of clips (e.g. 1) from each video For testing set, a uniform clip sampler is used to evenly sample a large number of clips (e.g. 10) from the video.To give an example, for 2 videos with 10 and 15 frames respectively, if
frames_per_clip=5
andstep_between_clips=5
, the dataset size will be (2 + 3) = 5, where the first two elements will come from video 1, and the next three elements from video 2. Note that we drop clips which do not have exactlyframes_per_clip
elements, so not all frames in a video may be present.- __init__(dataset: Any, split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], clips_per_video: int)¶
The constructor method of ClassyVideoDataset.
- Parameters
dataset – the underlying video dataset from either TorchVision or other source. It should have an attribute video_clips of type torchvision.datasets.video_utils.VideoClips
split – dataset split. Must be either “train” or “test”
batchsize_per_replica – batch size per model replica
shuffle – If true, shuffle video clips.
transform – callable function to transform video clip sample from ClassyVideoDataset
num_samples – If provided, return at most num_samples video clips
clips_per_video – The number of clips sampled from each video
- iterator(*args, **kwargs)¶
Returns an iterable which can be used to iterate over the data.
- classmethod load_metadata(filepath: str, video_dir: Optional[str] = None, update_file_path: bool = False) Dict[str, Any] ¶
Load pre-computed video dataset meta data.
Video dataset meta data computation takes minutes on small dataset and hours on large dataset, and thus is time-consuming. However, it only needs to be computed once, and can be saved into a file via
save_metadata()
.The format of meta data is defined in TorchVision.
For each video, meta data contains the video file path, presentation timestamps of all video frames, and video fps.
- Parameters
filepath – file path of pre-computed meta data
video_dir – If provided, the folder where video files are stored.
update_file_path – If true, replace the directory part of video file path in meta data with the actual video directory provided in video_dir. This is necessary for successsfully reusing pre-computed meta data when video directory has been moved and is no longer consitent with the full video file path saved in the meta data.
- classmethod parse_config(config: Dict[str, Any])¶
Parse config to prepare arguments needed by the class constructor.
- classmethod save_metadata(metadata: Dict[str, Any], filepath: str)¶
Save dataset meta data into a file.
- Parameters
metadata – dataset meta data, which contains video meta infomration, such as video file path, video fps, video frame timestamp in each video. For the format of dataset meta data, check the TorchVision documentation.
filepath – file path where the meta data will be saved
- property video_clips¶
Attribute video_clips.
- It is used in
_get_sampler
method. Its data type should be
- It is used in
- class classy_vision.dataset.DataloaderAsyncGPUWrapper(dataloader: Iterable)¶
Dataloader which wraps another dataloader, and moves the data to GPU asynchronously. At most one batch is pre-emptively copied (per worker).
credits: @vini, nvidia Apex
- class classy_vision.dataset.DataloaderLimitWrapper(dataloader: Iterable, limit: int, wrap_around: bool = True)¶
Dataloader which wraps another dataloader and only returns a limited number of items.
This is useful for Iterable datasets where the length of the datasets isn’t known. Such datasets can wrap their returned iterators with this class. See
SyntheticImageStreamingDataset.iterator()
for an example.Attribute accesses are passed to the wrapped dataloader.
- __init__(dataloader: Iterable, limit: int, wrap_around: bool = True) None ¶
Constructor for DataloaderLimitWrapper.
- Parameters
dataloader – The dataloader to wrap around
limit – Specify the number of calls to the underlying dataloader. The wrapper will raise a StopIteration after limit calls.
wrap_around – Whether to wrap around the original datatloader if the dataloader is exhausted before limit calls.
- Raises
RuntimeError – If wrap_around is set to False and the underlying dataloader is exhausted before limit calls.
- class classy_vision.dataset.DataloaderSkipNoneWrapper(dataloader: Iterable)¶
Dataloader which wraps another dataloader and skip None batch data.
Attribute accesses are passed to the wrapped dataloader.
- class classy_vision.dataset.DataloaderWrapper(dataloader: Iterable)¶
Abstract class representing dataloader which wraps another dataloader.
Attribute accesses are passed to the wrapped dataloader.
- class classy_vision.dataset.HMDB51Dataset(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, splits_dir: str, fold: int, metadata_filepath: str)¶
HMDB51 is an action recognition video dataset, and it has 51 classes.
It is built on top of HMDB51 dataset class in TorchVision.
- __init__(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, splits_dir: str, fold: int, metadata_filepath: str)¶
The constructor of HMDB51Dataset.
- Parameters
split – dataset split which can be either “train” or “test”
batchsize_per_replica – batch size per model replica
shuffle – If true, shuffle the dataset
transform – a dict where transforms video and audio data
num_samples – if not None, it will subsample dataset
frames_per_clip – the number of frames in a video clip
video_width – rescaled video width. If 0, keep original width
video_height – rescaled video height. If 0, keep original height
video_min_dimension – rescale video so that min(height, width) =
video_min_dimension
. If 0, keep original video resolution. Note only one of (video_width
,video_height
) and (video_min_dimension
) can be setaudio_samples – desired audio sample rate. If 0, keep original audio sample rate.
step_between_clips – Number of frames between each clip.
frame_rate – desired video frame rate. If None, keep orignal video frame rate.
clips_per_video – Number of clips to sample from each video
video_dir – path to video folder
splits_dir – path to dataset splitting file folder
fold – HMDB51 dataset has 3 folds. Valid values are 1, 2 and 3.
metadata_filepath – path to the dataset meta data
- classmethod from_config(config: Dict[str, Any]) classy_vision.dataset.classy_hmdb51.HMDB51Dataset ¶
Instantiates a HMDB51Dataset from a configuration.
- Parameters
config – A configuration for a HMDB51Dataset. See
__init__()
for parameters expected in the config.- Returns
A HMDB51Dataset instance.
- class classy_vision.dataset.ImagePathDataset(batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable] = None, num_samples: Optional[int] = None, image_folder: Optional[str] = None, image_files: Optional[List[str]] = None)¶
Dataset which reads images from a local filesystem. Implements ClassyDataset.
- __init__(batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable] = None, num_samples: Optional[int] = None, image_folder: Optional[str] = None, image_files: Optional[List[str]] = None)¶
Constructor for ImagePathDataset.
Only one of image_folder or image_files should be passed to specify the images.
- Parameters
batchsize_per_replica – Positive integer indicating batch size for each replica
shuffle – Whether we should shuffle between epochs
transform – Transform to be applied to each sample
num_samples – When set, this restricts the number of samples provided by the dataset
image_folder –
A directory with one of the following structures - - A directory containing sub-directories with images for each target,
which is the format expected by
torchvision.datasets.ImageFolder
-dog/xxx.png dog/xxy.png cat/123.png cat/nsdf3.png
In this case, the targets are inferred from the sub-directories.
A directory containing images -
123.png xyz.png
In this case, the targets are not returned (useful for inference).
image_files –
A list of image files -
- [
“123.png”, “dog/xyz.png”, “/home/cat/aaa.png”
]
In this case, the targets are not returned (useful for inference).
- classmethod from_config(config: Dict[str, Any])¶
Instantiates ImagePathDataset from a config.
- Parameters
config – A configuration for ImagePathDataset. See
__init__()
for parameters expected in the config.- Returns
An ImagePathDataset instance.
- class classy_vision.dataset.Kinetics400Dataset(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, audio_channels: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, extensions: List[str], metadata_filepath: str)¶
Kinetics-400 is an action recognition video dataset, and it has 400 classes. Original publication
We assume videos are already trimmed to 10-second clip, and are stored in a folder.
It is built on top of Kinetics dataset class in TorchVision.
- __init__(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, audio_channels: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, extensions: List[str], metadata_filepath: str)¶
The constructor of Kinetics400Dataset.
- Parameters
split – dataset split which can be either “train” or “test”
batchsize_per_replica – batch size per model replica
shuffle – If true, shuffle the dataset
transform – a dict where transforms video and audio data
num_samples – if provided, it will subsample dataset
frames_per_clip – the No. of frames in a video clip
video_width – rescaled video width. If 0, keep original width
video_height – rescaled video height. If 0, keep original height
video_min_dimension – rescale video so that min(height, width) = video_min_dimension. If 0, keep original video resolution. Note only one of (video_width, video_height) and (video_min_dimension) can be set
audio_samples – desired audio sample rate. If 0, keep original audio sample rate
audio_channels – desire No. of audio channel. If 0, keep original audio channels
step_between_clips – Number of frames between each clip.
frame_rate – desired video frame rate. If None, keep orignal video frame rate.
clips_per_video – Number of clips to sample from each video
video_dir – path to video folder
extensions – A list of file extensions, such as “avi” and “mp4”. Only video matching those file extensions are added to the dataset
metadata_filepath – path to the dataset meta data
- classmethod from_config(config: Dict[str, Any]) classy_vision.dataset.classy_kinetics400.Kinetics400Dataset ¶
Instantiates a Kinetics400Dataset from a configuration.
- Parameters
config – A configuration for a Kinetics400Dataset. See
__init__()
for parameters expected in the config.- Returns
A Kinetics400Dataset instance.
- class classy_vision.dataset.SyntheticImageDataset(batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: int, crop_size: int, class_ratio: float, seed: int)¶
Classy Dataset which produces random synthetic images with binary targets.
The underlying dataset sets targets based on the channels in the image, so users can validate their setup by checking if they can get 100% accuracy on this dataset. Useful for testing since the dataset is much faster to initialize and fetch samples from, compared to real world datasets.
- __init__(batchsize_per_replica: int, shuffle: bool, transform: Optional[Callable], num_samples: int, crop_size: int, class_ratio: float, seed: int) None ¶
- Parameters
batchsize_per_replica – Positive integer indicating batch size for each replica
shuffle – Whether we should shuffle between epochs
transform – When specified, transform to be applied to each sample
num_samples – Number of samples to return
crop_size – Image size, used for both height and width
class_ratio – Ratio of the distribution of target classes
seed – Seed used for image generation. Use the same seed to generate the same set of samples.
split – When specified, split of dataset to use
- classmethod from_config(config: Dict[str, Any]) classy_vision.dataset.classy_synthetic_image.SyntheticImageDataset ¶
Instantiates a SyntheticImageDataset from a configuration.
- Parameters
config – A configuration for a SyntheticImageDataset. See
__init__()
for parameters expected in the config.- Returns
A SyntheticImageDataset instance.
- class classy_vision.dataset.SyntheticImageStreamingDataset(batchsize_per_replica, shuffle, transform, num_samples, crop_size, class_ratio, seed, length=None, async_gpu_copy: bool = False)¶
Synthetic image dataset that behaves like a streaming dataset.
Requires a “num_samples” argument which decides the number of samples in the phase. Also takes an optional “length” input which sets the length of the dataset.
- __init__(batchsize_per_replica, shuffle, transform, num_samples, crop_size, class_ratio, seed, length=None, async_gpu_copy: bool = False)¶
Constructor for a ClassyDataset.
- Parameters
batchsize_per_replica – Positive integer indicating batch size for each replica
shuffle – Whether to shuffle between epochs
transform – When set, transform to be applied to each sample
num_samples – When set, this restricts the number of samples provided by the dataset
- classmethod from_config(config)¶
Instantiates a ClassyDataset from a configuration.
- Parameters
config – A configuration for the ClassyDataset.
- Returns
A ClassyDataset instance.
- iterator(*args, **kwargs)¶
Returns an iterable which can be used to iterate over the data.
- class classy_vision.dataset.SyntheticVideoDataset(num_classes: int, split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: int, frames_per_clip: int, video_width: int, video_height: int, audio_samples: int, clips_per_video: int)¶
Classy Dataset which produces random synthetic video clips.
Useful for testing since the dataset is much faster to initialize and fetch samples from, compared to real world datasets.
- Note: Unlike
SyntheticImageDataset
, this dataset generates targets randomly, independent of the video clips.
- __init__(num_classes: int, split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: int, frames_per_clip: int, video_width: int, video_height: int, audio_samples: int, clips_per_video: int)¶
The constructor of SyntheticVideoDataset.
- Parameters
num_classes – Number of classes in the generated targets.
split – Split of dataset to use
batchsize_per_replica – batch size per model replica
shuffle – Whether we should shuffle between epochs
transform – Transform to be applied to each sample
num_samples – Number of samples to return
frames_per_clip – Number of frames in a video clip
video_width – Width of the video clip
video_height – Height of the video clip
audio_samples – Audio sample rate
clips_per_video – Number of clips per video
- classmethod from_config(config: Dict[str, Any]) classy_vision.dataset.classy_synthetic_video.SyntheticVideoDataset ¶
Instantiates a SyntheticVideoDataset from a configuration.
- Parameters
config – A configuration for a SyntheticVideoDataset. See
__init__()
for parameters expected in the config.- Returns
A SyntheticVideoDataset instance.
- property video_clips¶
Attribute video_clips.
- It is used in
_get_sampler
method. Its data type should be
- It is used in
- Note: Unlike
- class classy_vision.dataset.UCF101Dataset(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, splits_dir: str, fold: int, metadata_filepath: str)¶
UCF101 is an action recognition video dataset, and it has 101 classes.
It is built on top of UCF101 dataset class in TorchVision.
- __init__(split: str, batchsize_per_replica: int, shuffle: bool, transform: Callable, num_samples: Optional[int], frames_per_clip: int, video_width: int, video_height: int, video_min_dimension: int, audio_samples: int, step_between_clips: int, frame_rate: Optional[int], clips_per_video: int, video_dir: str, splits_dir: str, fold: int, metadata_filepath: str)¶
The constructor of UCF101Dataset.
- Parameters
split – dataset split which can be either “train” or “test”
batchsize_per_replica – batch size per model replica
shuffle – If true, shuffle the dataset
transform – a dict where transforms video and audio data
num_samples – if not None, it will subsample dataset
frames_per_clip – the No. of frames in a video clip
video_width – rescaled video width. If 0, keep original width
video_height – rescaled video height. If 0, keep original height
video_min_dimension – rescale video so that min(height, width) =
video_min_dimension
. If 0, keep original video resolution. Note only one of (video_width
,video_height
) and (video_min_dimension
) can be setaudio_samples – desired audio sample rate. If 0, keep original audio sample rate.
step_between_clips – Number of frames between each clip.
frame_rate – desired video frame rate. If None, keep original video frame rate.
clips_per_video – Number of clips to sample from each video
video_dir – path to video folder
splits_dir – path to dataset splitting file folder
fold – UCF101 dataset has 3 folds. Valid values are 1, 2 and 3.
metadata_filepath – path to the dataset meta data
- classmethod from_config(config: Dict[str, Any]) classy_vision.dataset.classy_ucf101.UCF101Dataset ¶
Instantiates a UCF101Dataset from a configuration.
- Parameters
config – A configuration for a UCF101Dataset. See
__init__()
for parameters expected in the config.- Returns
A UCF101Dataset instance.
- classy_vision.dataset.build_dataset(config, *args, **kwargs)¶
Builds a
ClassyDataset
from a config.This assumes a ‘name’ key in the config which is used to determine what dataset class to instantiate. For instance, a config {“name”: “my_dataset”, “folder”: “/data”} will find a class that was registered as “my_dataset” (see
register_dataset()
) and call .from_config on it.
- classy_vision.dataset.register_dataset(name, bypass_checks=False)¶
Registers a
ClassyDataset
subclass.This decorator allows Classy Vision to instantiate a subclass of ClassyDataset from a configuration file, even if the class itself is not part of the Classy Vision framework. To use it, apply this decorator to a ClassyDataset subclass like this:
@register_dataset("my_dataset") class MyDataset(ClassyDataset): ...
To instantiate a dataset from a configuration file, see
build_dataset()
.