Transforms

Classy Vision is able to work directly with torchvision transforms, so it ships with very few built-in transforms. However, during research it’s common to experiment with new transforms. The ClassyTransform class allows users to express their transforms in a common format and define them in a configuration file.

Like other Classy Vision abstractions, ClassyTransform is accompanied by a register_transform() decorator and build_transform() function for integration with the config system.

class classy_vision.dataset.transforms.ApplyTransformToKey(transform: Callable, key: Union[int, str] = 'input')

Serializable class that applies a transform to a key specified field in samples.

__call__(sample: Union[Tuple[Any], Dict[str, Any]]) Union[Tuple[Any], Dict[str, Any]]

Updates sample by applying a transform to the value at the specified key.

Parameters

sample – input sample which will be transformed

__init__(transform: Callable, key: Union[int, str] = 'input') None

The constructor method of ApplyTransformToKey class.

Parameters
  • transform – a callable function that takes sample data of type dict as input

  • key – the key in sample whose corresponding value will undergo the transform

class classy_vision.dataset.transforms.ClassyTransform

Class representing a data transform abstraction.

Data transform is most often needed to pre-process input data (e.g. image, video) before sending it to a model. But it can also be used for other purposes.

abstract __call__(image)

The interface __call__ is used to transform the input data. It should contain the actual implementation of data transform.

Parameters

image – input image data

class classy_vision.dataset.transforms.GenericImageTransform(transform: Optional[Callable] = None, split: Optional[str] = None)

Default transform for images used in the classification task

This transform does several things. First, it expects a tuple or list input (torchvision datasets supply tuples / lists). Second, it applies a user-provided image transforms to the first entry in the tuple (again, matching the torchvision tuple format). Third, it transforms the tuple to a dict sample with entries “input” and “target”.

The defaults are for the standard imagenet augmentations

This is just a convenience wrapper to cover the common use-case. You can get the same behavior by composing torchvision transforms + ApplyTransformToKey + TupleToMapTransform.

__call__(sample: Tuple[Any])

Applied transform to sample

Parameters

sample – A tuple with length >= 2. The first entry should be the image data, the second entry should be the target data.

__init__(transform: Optional[Callable] = None, split: Optional[str] = None)

Constructor for GenericImageTransfrom Only one of the two arguments (transform, split) should be specified. :param transform: A callable or ClassyTransform to be applied to the image only :param split: ‘train’ or ‘test’

class classy_vision.dataset.transforms.ImagenetAugmentTransform(crop_size: int = 224, mean: List[float] = [0.485, 0.456, 0.406], std: List[float] = [0.229, 0.224, 0.225])

The default image transform with data augmentation.

It is often useful for training models on Imagenet. It sequentially resizes the image into a random scale, takes a random spatial cropping, randomly flips the image horizontally, transforms PIL image data into a torch.Tensor and normalizes the pixel values by mean subtraction and standard deviation division.

__call__(img)

Callable function which applies the tranform to the input image.

Parameters

image – input image that will undergo the transform

__init__(crop_size: int = 224, mean: List[float] = [0.485, 0.456, 0.406], std: List[float] = [0.229, 0.224, 0.225])

The constructor method of ImagenetAugmentTransform class.

Parameters
  • crop_size – expected output size per dimension after random cropping

  • mean – a 3-tuple denoting the pixel RGB mean

  • std – a 3-tuple denoting the pixel RGB standard deviation

class classy_vision.dataset.transforms.ImagenetNoAugmentTransform(resize: int = 256, crop_size: int = 224, mean: List[float] = [0.485, 0.456, 0.406], std: List[float] = [0.229, 0.224, 0.225])

The default image transform without data augmentation.

It is often useful for testing models on Imagenet. It sequentially resizes the image, takes a central cropping, transforms PIL image data into a torch.Tensor and normalizes the pixel values by mean subtraction and standard deviation division.

__call__(img)

Callable function which applies the tranform to the input image.

Parameters

image – input image that will undergo the transform

__init__(resize: int = 256, crop_size: int = 224, mean: List[float] = [0.485, 0.456, 0.406], std: List[float] = [0.229, 0.224, 0.225])

The constructor method of ImagenetNoAugmentTransform class.

Parameters
  • resize – expected image size per dimension after resizing

  • crop_size – expected size for a dimension of central cropping

  • mean – a 3-tuple denoting the pixel RGB mean

  • std – a 3-tuple denoting the pixel RGB standard deviation

class classy_vision.dataset.transforms.LightingTransform(alphastd=0.1, eigval=[0.2175, 0.0188, 0.0045], eigvec=[[- 144.7125, 183.396, 102.2295], [- 148.104, - 1.1475, - 207.57], [- 148.818, - 177.174, 107.1765]])

Lighting noise(AlexNet - style PCA - based noise). This trick was originally used in AlexNet paper

The eigen values and eigen vectors, are taken from caffe2 ImageInputOp.h.

__call__(img)

img: (C x H x W) Tensor with values in range [0.0, 1.0]

__init__(alphastd=0.1, eigval=[0.2175, 0.0188, 0.0045], eigvec=[[- 144.7125, 183.396, 102.2295], [- 148.104, - 1.1475, - 207.57], [- 148.818, - 177.174, 107.1765]])
class classy_vision.dataset.transforms.TupleToMapTransform(list_of_map_keys: List[str])

A transform which maps image data from tuple to dict.

This transform has a list of keys (key1, key2, …), takes a sample of the form (data1, data2, …) and returns a sample of the form {key1: data1, key2: data2, …} If duplicate keys are used, the corresponding values are merged into a list.

It is useful for mapping output from datasets like the PyTorch ImageFolder dataset (tuple) to dict with named data fields.

If sample is already a dict with the required keys, pass sample through.

__call__(sample)

Transform sample from type tuple to type dict.

Parameters

sample – input sample which will be transformed

__init__(list_of_map_keys: List[str])

The constructor method of TupleToMapTransform class.

Parameters

list_of_map_keys – a list of dict keys that in order will be mapped to items in the input data sample list

classy_vision.dataset.transforms.build_transform(transform_config: Dict[str, Any]) Callable

Builds a ClassyTransform from a config.

This assumes a ‘name’ key in the config which is used to determine what transform class to instantiate. For instance, a config {“name”: “my_transform”, “foo”: “bar”} will find a class that was registered as “my_transform” (see register_transform()) and call .from_config on it.

In addition to transforms registered with register_transform(), we also support instantiating transforms available in the torchvision.transforms module. Any keys in the config will get expanded to parameters of the transform constructor. For instance, the following call will instantiate a torchvision.transforms.CenterCrop:

build_transform({"name": "CenterCrop", "size": 224})
classy_vision.dataset.transforms.build_transforms(transforms_config: List[Dict[str, Any]]) Callable

Builds a transform from the list of transform configurations.

classy_vision.dataset.transforms.register_transform(name: str, bypass_checks=False)

Registers a ClassyTransform subclass.

This decorator allows Classy Vision to instantiate a subclass of ClassyTransform from a configuration file, even if the class itself is not part of the Classy Vision framework. To use it, apply this decorator to a ClassyTransform subclass like this:

@register_transform("my_transform")
class MyTransform(ClassyTransform):
    ...

To instantiate a transform from a configuration file, see build_transform().