Models

class classy_vision.models.AnyNet(*args, **kwargs)

Implementation of an AnyNet.

See https://arxiv.org/abs/2003.13678 for details.

__init__(params: classy_vision.models.anynet.AnyNetParams)

Constructor for ClassyModel.

forward(x, *args, **kwargs)

Perform computation of blocks in the order define in get_blocks.

classmethod from_config(config: Dict[str, Any]) classy_vision.models.anynet.AnyNet

Instantiates an AnyNet from a configuration.

Parameters

config – A configuration for an AnyNet. See AnyNetParams for parameters expected in the config.

Returns

An AnyNet instance.

class classy_vision.models.ClassyBlock(name, module)

This is a thin wrapper for head execution, which records the output of wrapped module for executing the heads forked from this module.

__init__(name, module)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class classy_vision.models.ClassyModel(*args, **kwargs)

Base class for models in classy vision.

A model refers either to a specific architecture (e.g. ResNet50) or a family of architectures (e.g. ResNet). Models can take arguments in the constructor in order to configure different behavior (e.g. hyperparameters). Classy Models must implement from_config() in order to allow instantiation from a configuration file. Like regular PyTorch models, Classy Models must also implement forward(), where the bulk of the inference logic lives.

Classy Models also have some advanced functionality for production fine-tuning systems. For example, we allow users to train a trunk model and then attach heads to the model via the attachable blocks. Making your model support the trunk-heads paradigm is completely optional.

NOTE: Advanced users can modify the behavior of their implemented models by

specifying the wrapper_cls class attribute, which should be a class derived from ClassyModelWrapper (see the documentation for that class for more information). Users can set it to None to skip wrapping their model and to make their model torchscriptable. This is set to ClassyModelHeadExecutorWrapper by default.

__init__()

Constructor for ClassyModel.

property attachable_block_names

Return names of all attachable blocks.

extract_features(x)

Extract features from the model.

Derived classes can implement this method to extract the features before applying the final fully connected layer.

forward(x)

Perform computation of blocks in the order define in get_blocks.

classmethod from_config(config: Dict[str, Any]) classy_vision.models.classy_model.ClassyModel

Instantiates a ClassyModel from a configuration.

Parameters

config – A configuration for the ClassyModel.

Returns

A ClassyModel instance.

classmethod from_model(model: torch.nn.modules.module.Module, input_shape: Optional[Tuple] = None, model_depth: Optional[int] = None)

Converts an nn.Module to a ClassyModel.

Parameters
  • model – The model to convert

  • args (For the remaining) –

  • ClassyModel (look at the corresponding properties of) –

Returns

A ClassyModel instance.

get_classy_state(deep_copy=False)

Get the state of the ClassyModel.

The returned state is used for checkpointing.

NOTE: For advanced users, the structure of the returned dict is -

{“model”: {“trunk”: trunk_state, “heads”: heads_state}}. The trunk state is the state of the model when no heads are attached.

Parameters

deep_copy – If True, creates a deep copy of the state Dict. Otherwise, the returned Dict’s state will be tied to the object’s.

Returns

A state dictionary containing the state of the model.

get_heads()

Returns the heads on the model

Function returns the heads a dictionary of block names to nn.Modules attached to that block.

property head_outputs

Return outputs of all heads in the format of Dict[head_id, output]

Head outputs are cached during a forward pass.

property input_shape

Returns the input shape that the model can accept, excluding the batch dimension.

By default it returns (3, 224, 224).

load_head_states(state, strict=True)

Load only the state (weights) of the heads.

For a trunk-heads model, this function allows the user to only update the head state of the model. Useful for attaching fine-tuned heads to a pre-trained trunk.

Parameters

state (Dict) – Contains the classy model state under key “model”

set_classy_state(state, strict=True)

Set the state of the ClassyModel.

Parameters

state_dict – The state dictionary. Must be the output of a call to get_classy_state().

This is used to load the state of the model from a checkpoint.

set_heads(heads: Dict[str, List[classy_vision.heads.classy_head.ClassyHead]])

Attach all the heads to corresponding blocks.

A head is expected to be a ClassyHead object. For more details, see classy_vision.heads.ClassyHead.

Parameters

heads (Dict) –

a mapping between attachable block name and a list of heads attached to that block. For example, if you have two different teams that want to attach two different heads for downstream classifiers to the 15th block, then they would use:

heads = {"block15":
    [classifier_head1, classifier_head2]
}

wrapper_cls

alias of classy_vision.models.classy_model.ClassyModelHeadExecutorWrapper

class classy_vision.models.ClassyModelHeadExecutorWrapper(classy_model)

Wrapper which changes the forward to also execute and return head output.

class classy_vision.models.ClassyModelWrapper(classy_model)

Base ClassyModel wrapper class.

This class acts as a thin pass through wrapper which lets users modify the behavior of ClassyModels, such as changing the return output of the forward() call. This wrapper acts as a ClassyModel by itself and the underlying model can be accessed by the classy_model attribute.

__call__(*args, **kwargs)

Call self as a function.

__init__(classy_model)
class classy_vision.models.DenseNet(*args, **kwargs)
__init__(num_blocks, num_classes, init_planes, growth_rate, expansion, small_input, final_bn_relu, use_se=False, se_reduction_ratio=16)

Implementation of a standard densely connected network (DenseNet).

Contains the following attachable blocks:
block{block_idx}-{idx}: This is the output of each dense block,

indexed by the block index and the index of the dense layer

transition-{idx}: This is the output of the transition layers trunk_output: The final output of the DenseNet. This is

where a fully_connected head is normally attached.

Parameters
  • small_input – set to True for 32x32 sized image inputs.

  • final_bn_relu – set to False to exclude the final batchnorm and ReLU layers. These settings are useful when training Siamese networks.

  • use_se – Enable squeeze and excitation

  • se_reduction_ratio – The reduction ratio to apply in the excitation stage. Only used if use_se is True.

forward(x)

Perform computation of blocks in the order define in get_blocks.

classmethod from_config(config: Dict[str, Any]) classy_vision.models.densenet.DenseNet

Instantiates a DenseNet from a configuration.

Parameters

config – A configuration for a DenseNet. See __init__() for parameters expected in the config.

Returns

A DenseNet instance.

class classy_vision.models.EfficientNet(*args, **kwargs)

Implementation of EfficientNet, https://arxiv.org/pdf/1905.11946.pdf .. rubric:: References

https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet https://github.com/lukemelas/EfficientNet-PyTorch

NOTE: the original implementation uses the names depth_divisor and min_depth

to refer to the number of channels, which is confusing, since the paper refers to the channel dimension as width. We use the width_divisor and min_width names instead.

__init__(num_classes: int, model_params: classy_vision.models.efficientnet.EfficientNetParams, bn_momentum: float, bn_epsilon: float, width_divisor: int, min_width: Optional[int], drop_connect_rate: float, use_se: bool)

Constructor for ClassyModel.

forward(inputs)

Perform computation of blocks in the order define in get_blocks.

classmethod from_config(config)

Instantiates an EfficientNet from a configuration.

Parameters

config – A configuration for an EfficientNet. See __init__() for parameters expected in the config.

Returns

A ResNeXt instance.

property input_shape

Returns the input shape that the model can accept, excluding the batch dimension.

By default it returns (3, 224, 224).

class classy_vision.models.MLP(*args, **kwargs)

MLP model using ReLU. Useful for testing on CPUs.

__init__(input_dim, output_dim, hidden_dims, dropout, first_dropout, use_batchnorm, first_batchnorm)

Constructor for ClassyModel.

forward(x)

Perform computation of blocks in the order define in get_blocks.

classmethod from_config(config: Dict[str, Any]) classy_vision.models.mlp.MLP

Instantiates a MLP from a configuration.

Parameters

config – A configuration for a MLP. See __init__() for parameters expected in the config.

Returns

A MLP instance.

class classy_vision.models.RegNet(*args, **kwargs)

Implementation of RegNet, a particular form of AnyNets.

See https://arxiv.org/abs/2003.13678 for introduction to RegNets, and details about RegNetX and RegNetY models.

See https://arxiv.org/abs/2103.06877 for details about RegNetZ models.

__init__(params: classy_vision.models.regnet.RegNetParams)

Constructor for ClassyModel.

forward(x, *args, **kwargs)

Perform computation of blocks in the order define in get_blocks.

classmethod from_config(config: Dict[str, Any]) classy_vision.models.regnet.RegNet

Instantiates a RegNet from a configuration.

Parameters

config – A configuration for a RegNet. See RegNetParams for parameters expected in the config.

Returns

A RegNet instance.

class classy_vision.models.ResNeXt(*args, **kwargs)
__init__(num_blocks, init_planes: int = 64, reduction: int = 4, small_input: bool = False, zero_init_bn_residuals: bool = False, base_width_and_cardinality: Optional[Sequence] = None, basic_layer: bool = False, final_bn_relu: bool = True, use_se: bool = False, se_reduction_ratio: int = 16)

Implementation of ResNeXt.

Parameters
  • small_input – set to True for 32x32 sized image inputs.

  • final_bn_relu – set to False to exclude the final batchnorm and ReLU layers. These settings are useful when training Siamese networks.

  • use_se – Enable squeeze and excitation

  • se_reduction_ratio – The reduction ratio to apply in the excitation stage. Only used if use_se is True.

forward(x)

Perform computation of blocks in the order define in get_blocks.

classmethod from_config(config: Dict[str, Any]) classy_vision.models.resnext.ResNeXt

Instantiates a ResNeXt from a configuration.

Parameters

config – A configuration for a ResNeXt. See __init__() for parameters expected in the config.

Returns

A ResNeXt instance.

get_classy_state(deep_copy=False)

Get the state of the ClassyModel.

The returned state is used for checkpointing.

NOTE: For advanced users, the structure of the returned dict is -

{“model”: {“trunk”: trunk_state, “heads”: heads_state}}. The trunk state is the state of the model when no heads are attached.

Parameters

deep_copy – If True, creates a deep copy of the state Dict. Otherwise, the returned Dict’s state will be tied to the object’s.

Returns

A state dictionary containing the state of the model.

set_classy_state(state, strict=True)

Set the state of the ClassyModel.

Parameters

state_dict – The state dictionary. Must be the output of a call to get_classy_state().

This is used to load the state of the model from a checkpoint.

class classy_vision.models.ResNeXt3D(*args, **kwargs)
Implementation of:

1. Conventional post-activated 3D ResNe(X)t.

2. Pre-activated 3D ResNe(X)t. The model consists of one stem, a number of stages, and one or multiple heads that are attached to different blocks in the stage.

__init__(input_key, input_planes, clip_crop_size, skip_transformation_type, residual_transformation_type, frames_per_clip, num_blocks, stem_name, stem_planes, stem_temporal_kernel, stem_spatial_kernel, stem_maxpool, stage_planes, stage_temporal_kernel_basis, temporal_conv_1x1, stage_temporal_stride, stage_spatial_stride, num_groups, width_per_group, zero_init_residual_transform)
Parameters
  • input_key (str) – a key that can index into model input that is of dict type.

  • input_planes (int) – the channel dimension of the input. Normally 3 is used for rgb input.

  • clip_crop_size (int) – spatial cropping size of video clip at train time.

  • skip_transformation_type (str) – the type of skip transformation.

  • residual_transformation_type (str) – the type of residual transformation.

  • frames_per_clip (int) – Number of frames in a video clip.

  • num_blocks (list) – list of the number of blocks in stages.

  • stem_name (str) – name of model stem.

  • stem_planes (int) – the output dimension of the convolution in the model stem.

  • stem_temporal_kernel (int) – the temporal kernel size of the convolution in the model stem.

  • stem_spatial_kernel (int) – the spatial kernel size of the convolution in the model stem.

  • stem_maxpool (bool) – If true, perform max pooling.

  • stage_planes (int) – the output channel dimension of the 1st residual stage

  • stage_temporal_kernel_basis (list) – Basis of temporal kernel sizes for each of the stage.

  • temporal_conv_1x1 (bool) – Only useful for BottleneckTransformation. In a pathaway, if True, do temporal convolution in the first 1x1 Conv3d. Otherwise, do it in the second 3x3 Conv3d.

  • stage_temporal_stride (int) – the temporal stride of the residual transformation.

  • stage_spatial_stride (int) – the spatial stride of the the residual transformation.

  • num_groups (int) – number of groups for the convolution. num_groups = 1 is for standard ResNet like networks, and num_groups > 1 is for ResNeXt like networks.

  • width_per_group (int) – Number of channels per group in 2nd (group) conv in the residual transformation in the first stage

  • zero_init_residual_transform (bool) – if true, the weight of last operation, which could be either BatchNorm3D in post-activated transformation or Conv3D in pre-activated transformation, in the residual transformation is initialized to zero

classmethod from_config(config: Dict[str, Any]) classy_vision.models.resnext3d.ResNeXt3D

Instantiates a ResNeXt3D from a configuration.

Parameters

config – A configuration for a ResNeXt3D. See __init__() for parameters expected in the config.

Returns

A ResNeXt3D instance.

class classy_vision.models.ResNet(*args, **kwargs)

ResNet is a special case of ResNeXt.

__init__(**kwargs)

See ResNeXt.__init__()

class classy_vision.models.SqueezeAndExcitationLayer(in_planes, reduction_ratio: Optional[int] = 16, reduced_planes: Optional[int] = None, activation: Optional[torch.nn.modules.module.Module] = None)

Squeeze and excitation layer, as per https://arxiv.org/pdf/1709.01507.pdf

__init__(in_planes, reduction_ratio: Optional[int] = 16, reduced_planes: Optional[int] = None, activation: Optional[torch.nn.modules.module.Module] = None)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class classy_vision.models.VisionTransformer(*args, **kwargs)

Vision Transformer as per https://arxiv.org/abs/2010.11929.

__init__(image_size, patch_size, num_layers, num_heads, hidden_dim, mlp_dim, dropout_rate=0, attention_dropout_rate=0, classifier='token', conv_stem_layers: Optional[Union[List[classy_vision.models.vision_transformer.ConvStemLayer], List[Dict]]] = None)

Constructor for ClassyModel.

forward(x: torch.Tensor)

Perform computation of blocks in the order define in get_blocks.

classmethod from_config(config)

Instantiates a ClassyModel from a configuration.

Parameters

config – A configuration for the ClassyModel.

Returns

A ClassyModel instance.

property input_shape

Returns the input shape that the model can accept, excluding the batch dimension.

By default it returns (3, 224, 224).

set_classy_state(state, strict=True)

Set the state of the ClassyModel.

Parameters

state_dict – The state dictionary. Must be the output of a call to get_classy_state().

This is used to load the state of the model from a checkpoint.

classy_vision.models.build_model(config)

Builds a ClassyModel from a config.

This assumes a ‘name’ key in the config which is used to determine what model class to instantiate. For instance, a config {“name”: “my_model”, “foo”: “bar”} will find a class that was registered as “my_model” (see register_model()) and call .from_config on it.

classy_vision.models.register_model(name, bypass_checks=False)

Registers a ClassyModel subclass.

This decorator allows Classy Vision to instantiate a subclass of ClassyModel from a configuration file, even if the class itself is not part of the Classy Vision framework. To use it, apply this decorator to a ClassyModel subclass, like this:

@register_model('resnet')
class ResidualNet(ClassyModel):
   ...

To instantiate a model from a configuration file, see build_model().