Transforms¶
Classy Vision is able to work directly with torchvision transforms, so it ships with
very few built-in transforms. However, during research it’s common to
experiment with new transforms. The ClassyTransform
class allows users to
express their transforms in a common format and define them in a configuration
file.
Like other Classy Vision abstractions, ClassyTransform
is accompanied by a
register_transform()
decorator and build_transform()
function for integration
with the config system.
- class classy_vision.dataset.transforms.ApplyTransformToKey(transform: Callable, key: Union[int, str] = 'input')¶
Serializable class that applies a transform to a key specified field in samples.
- class classy_vision.dataset.transforms.ClassyTransform¶
Class representing a data transform abstraction.
Data transform is most often needed to pre-process input data (e.g. image, video) before sending it to a model. But it can also be used for other purposes.
- abstract __call__(image)¶
The interface __call__ is used to transform the input data. It should contain the actual implementation of data transform.
- Parameters
image – input image data
- class classy_vision.dataset.transforms.GenericImageTransform(transform: Optional[Callable] = None, split: Optional[str] = None)¶
Default transform for images used in the classification task
This transform does several things. First, it expects a tuple or list input (torchvision datasets supply tuples / lists). Second, it applies a user-provided image transforms to the first entry in the tuple (again, matching the torchvision tuple format). Third, it transforms the tuple to a dict sample with entries “input” and “target”.
The defaults are for the standard imagenet augmentations
This is just a convenience wrapper to cover the common use-case. You can get the same behavior by composing torchvision transforms +
ApplyTransformToKey
+TupleToMapTransform
.
- class classy_vision.dataset.transforms.ImagenetAugmentTransform(crop_size: int = 224, mean: List[float] = [0.485, 0.456, 0.406], std: List[float] = [0.229, 0.224, 0.225])¶
The default image transform with data augmentation.
It is often useful for training models on Imagenet. It sequentially resizes the image into a random scale, takes a random spatial cropping, randomly flips the image horizontally, transforms PIL image data into a torch.Tensor and normalizes the pixel values by mean subtraction and standard deviation division.
- __call__(img)¶
Callable function which applies the tranform to the input image.
- Parameters
image – input image that will undergo the transform
- __init__(crop_size: int = 224, mean: List[float] = [0.485, 0.456, 0.406], std: List[float] = [0.229, 0.224, 0.225])¶
The constructor method of ImagenetAugmentTransform class.
- Parameters
crop_size – expected output size per dimension after random cropping
mean – a 3-tuple denoting the pixel RGB mean
std – a 3-tuple denoting the pixel RGB standard deviation
- class classy_vision.dataset.transforms.ImagenetNoAugmentTransform(resize: int = 256, crop_size: int = 224, mean: List[float] = [0.485, 0.456, 0.406], std: List[float] = [0.229, 0.224, 0.225])¶
The default image transform without data augmentation.
It is often useful for testing models on Imagenet. It sequentially resizes the image, takes a central cropping, transforms PIL image data into a torch.Tensor and normalizes the pixel values by mean subtraction and standard deviation division.
- __call__(img)¶
Callable function which applies the tranform to the input image.
- Parameters
image – input image that will undergo the transform
- __init__(resize: int = 256, crop_size: int = 224, mean: List[float] = [0.485, 0.456, 0.406], std: List[float] = [0.229, 0.224, 0.225])¶
The constructor method of ImagenetNoAugmentTransform class.
- Parameters
resize – expected image size per dimension after resizing
crop_size – expected size for a dimension of central cropping
mean – a 3-tuple denoting the pixel RGB mean
std – a 3-tuple denoting the pixel RGB standard deviation
- class classy_vision.dataset.transforms.LightingTransform(alphastd=0.1, eigval=[0.2175, 0.0188, 0.0045], eigvec=[[- 144.7125, 183.396, 102.2295], [- 148.104, - 1.1475, - 207.57], [- 148.818, - 177.174, 107.1765]])¶
Lighting noise(AlexNet - style PCA - based noise). This trick was originally used in AlexNet paper
The eigen values and eigen vectors, are taken from caffe2 ImageInputOp.h.
- __call__(img)¶
img: (C x H x W) Tensor with values in range [0.0, 1.0]
- __init__(alphastd=0.1, eigval=[0.2175, 0.0188, 0.0045], eigvec=[[- 144.7125, 183.396, 102.2295], [- 148.104, - 1.1475, - 207.57], [- 148.818, - 177.174, 107.1765]])¶
- class classy_vision.dataset.transforms.TupleToMapTransform(list_of_map_keys: List[str])¶
A transform which maps image data from tuple to dict.
This transform has a list of keys (key1, key2, …), takes a sample of the form (data1, data2, …) and returns a sample of the form {key1: data1, key2: data2, …} If duplicate keys are used, the corresponding values are merged into a list.
It is useful for mapping output from datasets like the PyTorch ImageFolder dataset (tuple) to dict with named data fields.
If sample is already a dict with the required keys, pass sample through.
- __call__(sample)¶
Transform sample from type tuple to type dict.
- Parameters
sample – input sample which will be transformed
- classy_vision.dataset.transforms.build_transform(transform_config: Dict[str, Any]) Callable ¶
Builds a
ClassyTransform
from a config.This assumes a ‘name’ key in the config which is used to determine what transform class to instantiate. For instance, a config {“name”: “my_transform”, “foo”: “bar”} will find a class that was registered as “my_transform” (see
register_transform()
) and call .from_config on it.In addition to transforms registered with
register_transform()
, we also support instantiating transforms available in the torchvision.transforms module. Any keys in the config will get expanded to parameters of the transform constructor. For instance, the following call will instantiate atorchvision.transforms.CenterCrop
:build_transform({"name": "CenterCrop", "size": 224})
- classy_vision.dataset.transforms.build_transforms(transforms_config: List[Dict[str, Any]]) Callable ¶
Builds a transform from the list of transform configurations.
- classy_vision.dataset.transforms.register_transform(name: str, bypass_checks=False)¶
Registers a
ClassyTransform
subclass.This decorator allows Classy Vision to instantiate a subclass of
ClassyTransform
from a configuration file, even if the class itself is not part of the Classy Vision framework. To use it, apply this decorator to a ClassyTransform subclass like this:@register_transform("my_transform") class MyTransform(ClassyTransform): ...
To instantiate a transform from a configuration file, see
build_transform()
.