Models

classy_vision.models.build_model(config)

Builds a ClassyModel from a config.

This assumes a ‘name’ key in the config which is used to determine what model class to instantiate. For instance, a config {“name”: “my_model”, “foo”: “bar”} will find a class that was registered as “my_model” (see register_model()) and call .from_config on it.

classy_vision.models.register_model(name)

Registers a ClassyModel subclass.

This decorator allows Classy Vision to instantiate a subclass of ClassyModel from a configuration file, even if the class itself is not part of the Classy Vision framework. To use it, apply this decorator to a ClassyModel subclass, like this:

@register_model('resnet')
class ResidualNet(ClassyModel):
   ...

To instantiate a model from a configuration file, see build_model().

class classy_vision.models.ClassyBlock(name, module)

This is a thin wrapper for head execution, which records the output of wrapped module for executing the heads forked from this module.

__init__(name, module)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

set_cache_output(should_cache_output: bool = True)

Whether to cache the output of wrapped module for head execution.

class classy_vision.models.ClassyModel

Base class for models in classy vision.

A model refers either to a specific architecture (e.g. ResNet50) or a family of architectures (e.g. ResNet). Models can take arguments in the constructor in order to configure different behavior (e.g. hyperparameters). Classy Models must implement from_config() in order to allow instantiation from a configuration file. Like regular PyTorch models, Classy Models must also implement forward(), where the bulk of the inference logic lives.

Classy Models also have some advanced functionality for production fine-tuning systems. For example, we allow users to train a trunk model and then attach heads to the model via the attachable blocks. Making your model support the trunk-heads paradigm is completely optional.

__init__()

Constructor for ClassyModel.

property attachable_block_names

Return names of all attachable blocks.

build_attachable_block(name, module)

Add a wrapper to the module to allow to attach heads to the module.

property evaluation_mode

Used by video models for averaging over contiguous clips.

execute_heads() → Dict[str, torch.Tensor]
extract_features(x)

Extract features from the model.

Derived classes can implement this method to extract the features before applying the final fully connected layer.

forward(x)

Perform computation of blocks in the order define in get_blocks.

classmethod from_checkpoint(checkpoint)
classmethod from_config(config: Dict[str, Any]) → classy_vision.models.classy_model.ClassyModel

Instantiates a ClassyModel from a configuration.

Parameters

config – A configuration for the ClassyModel.

Returns

A ClassyModel instance.

get_block_outputs() → Dict[str, torch.Tensor]
get_classy_state(deep_copy=False)

Get the state of the ClassyModel.

The returned state is used for checkpointing.

Parameters

deep_copy – If True, creates a deep copy of the state Dict. Otherwise, the returned Dict’s state will be tied to the object’s.

Returns

A state dictionary containing the state of the model.

get_heads()

Returns the heads on the model

Function returns the heads a dictionary of block names to nn.Modules attached to that block.

get_optimizer_params(bn_weight_decay=False)

Returns param groups for optimizer.

Function to return dict of params with “keys” from {“regularized_params”, “unregularized_params”} to “values” a list of pytorch Params.

“weight_decay” provided as part of optimizer is only used for “regularized_params”. For “unregularized_params”, weight_decay is set to 0.0

This implementation sets BatchNorm’s all trainable params to be unregularized_params if bn_weight_decay is False.

Override this function for any custom behavior.

Parameters

bn_weight_decay (bool) – Apply weight decay to bn params if true

property head_outputs

Return outputs of all heads in the format of Dict[head_id, output]

Head outputs are cached during a forward pass.

property input_shape

If implemented, returns expected input tensor shape

load_head_states(state)

Load only the state (weights) of the heads.

For a trunk-heads model, this function allows the user to only update the head state of the model. Useful for attaching fine-tuned heads to a pre-trained trunk.

Parameters

state (Dict) – Contains the classy model state under key “model”

property model_depth

If implemented, returns number of layers in model

property output_shape

If implemented, returns expected output tensor shape

set_classy_state(state)

Set the state of the ClassyModel.

Parameters

state_dict – The state dictionary. Must be the output of a call to get_classy_state().

This is used to load the state of the model from a checkpoint.

set_heads(heads: Dict[str, Dict[str, classy_vision.heads.classy_head.ClassyHead]])

Attach all the heads to corresponding blocks.

A head is expected to be a ClassyHead object. For more details, see classy_vision.heads.ClassyHead.

Parameters

heads (Dict) –

a mapping between attachable block name and a dictionary of heads attached to that block. For example, if you have two different teams that want to attach two different heads for downstream classifiers to the 15th block, then they would use:

heads = {"block15":
    {"team1": classifier_head1, "team2": classifier_head2}
}

class classy_vision.models.ClassyModelEvaluationMode

An enumeration.

DEFAULT = 0
VIDEO_CLIP_AVERAGING = 1
class classy_vision.models.ClassyModelWrapper(model: torch.nn.modules.module.Module, input_shape: Optional[Tuple] = None, output_shape: Optional[Tuple] = None, model_depth: Optional[int] = None)

Class which wraps an nn.Module within a ClassyModel.

The only required argument is the model, the additional args are needed to get some additional capabilities from Classy Vision to work.

__init__(model: torch.nn.modules.module.Module, input_shape: Optional[Tuple] = None, output_shape: Optional[Tuple] = None, model_depth: Optional[int] = None)

Constructor for ClassyModel.

extract_features(x)

Extract features from the model.

Derived classes can implement this method to extract the features before applying the final fully connected layer.

forward(x)

Perform computation of blocks in the order define in get_blocks.

property input_shape

If implemented, returns expected input tensor shape

property model_depth

If implemented, returns number of layers in model

property output_shape

If implemented, returns expected output tensor shape

class classy_vision.models.DenseNet(num_blocks, num_classes, init_planes, growth_rate, expansion, small_input, final_bn_relu)
__init__(num_blocks, num_classes, init_planes, growth_rate, expansion, small_input, final_bn_relu)

Implementation of a standard densely connected network (DenseNet).

Set small_input to True for 32x32 sized image inputs.

Set final_bn_relu to False to exclude the final batchnorm and ReLU layers. These settings are useful when training Siamese networks.

forward(x)

Perform computation of blocks in the order define in get_blocks.

classmethod from_config(config: Dict[str, Any]) → classy_vision.models.densenet.DenseNet

Instantiates a DenseNet from a configuration.

Parameters

config – A configuration for a DenseNet. See __init__() for parameters expected in the config.

Returns

A DenseNet instance.

get_optimizer_params()

Returns param groups for optimizer.

Function to return dict of params with “keys” from {“regularized_params”, “unregularized_params”} to “values” a list of pytorch Params.

“weight_decay” provided as part of optimizer is only used for “regularized_params”. For “unregularized_params”, weight_decay is set to 0.0

This implementation sets BatchNorm’s all trainable params to be unregularized_params if bn_weight_decay is False.

Override this function for any custom behavior.

Parameters

bn_weight_decay (bool) – Apply weight decay to bn params if true

property input_shape

If implemented, returns expected input tensor shape

property model_depth

If implemented, returns number of layers in model

property output_shape

If implemented, returns expected output tensor shape

class classy_vision.models.MLP(input_dim, output_dim, hidden_dims, dropout, first_dropout, use_batchnorm, first_batchnorm)

MLP model using ReLU. Useful for testing on CPUs.

__init__(input_dim, output_dim, hidden_dims, dropout, first_dropout, use_batchnorm, first_batchnorm)

Constructor for ClassyModel.

forward(x)

Perform computation of blocks in the order define in get_blocks.

classmethod from_config(config: Dict[str, Any]) → classy_vision.models.mlp.MLP

Instantiates a MLP from a configuration.

Parameters

config – A configuration for a MLP. See __init__() for parameters expected in the config.

Returns

A MLP instance.

property input_shape

If implemented, returns expected input tensor shape

property model_depth

If implemented, returns number of layers in model

property output_shape

If implemented, returns expected output tensor shape

class classy_vision.models.ResNet(**kwargs)

ResNet is a special case of ResNeXt.

__init__(**kwargs)

See ResNeXt.__init__()

class classy_vision.models.ResNeXt(num_blocks, init_planes, reduction, small_input, zero_init_bn_residuals, base_width_and_cardinality, basic_layer, final_bn_relu)
__init__(num_blocks, init_planes, reduction, small_input, zero_init_bn_residuals, base_width_and_cardinality, basic_layer, final_bn_relu)

Implementation of ResNeXt.

Set small_input to True for 32x32 sized image inputs.

Set final_bn_relu to False to exclude the final batchnorm and ReLU layers. These settings are useful when training Siamese networks.

forward(x)

Perform computation of blocks in the order define in get_blocks.

classmethod from_config(config: Dict[str, Any]) → classy_vision.models.resnext.ResNeXt

Instantiates a ResNeXt from a configuration.

Parameters

config – A configuration for a ResNeXt. See __init__() for parameters expected in the config.

Returns

A ResNeXt instance.

property input_shape

If implemented, returns expected input tensor shape

property model_depth

If implemented, returns number of layers in model

property output_shape

If implemented, returns expected output tensor shape

class classy_vision.models.ResNeXt3D(input_key, input_planes, clip_crop_size, skip_transformation_type, residual_transformation_type, frames_per_clip, num_blocks, stem_name, stem_planes, stem_temporal_kernel, stem_spatial_kernel, stem_maxpool, stage_planes, stage_temporal_kernel_basis, temporal_conv_1x1, stage_temporal_stride, stage_spatial_stride, num_groups, width_per_group, zero_init_residual_transform)
Implementation of:

1. Conventional post-activated 3D ResNe(X)t.

2. Pre-activated 3D ResNe(X)t. The model consists of one stem, a number of stages, and one or multiple heads that are attached to different blocks in the stage.

__init__(input_key, input_planes, clip_crop_size, skip_transformation_type, residual_transformation_type, frames_per_clip, num_blocks, stem_name, stem_planes, stem_temporal_kernel, stem_spatial_kernel, stem_maxpool, stage_planes, stage_temporal_kernel_basis, temporal_conv_1x1, stage_temporal_stride, stage_spatial_stride, num_groups, width_per_group, zero_init_residual_transform)
Parameters
  • input_key (str) – a key that can index into model input that is of dict type.

  • input_planes (int) – the channel dimension of the input. Normally 3 is used for rgb input.

  • clip_crop_size (int) – spatial cropping size of video clip at train time.

  • skip_transformation_type (str) – the type of skip transformation.

  • residual_transformation_type (str) – the type of residual transformation.

  • frames_per_clip (int) – Number of frames in a video clip.

  • num_blocks (list) – list of the number of blocks in stages.

  • stem_name (str) – name of model stem.

  • stem_planes (int) – the output dimension of the convolution in the model stem.

  • stem_temporal_kernel (int) – the temporal kernel size of the convolution in the model stem.

  • stem_spatial_kernel (int) – the spatial kernel size of the convolution in the model stem.

  • stem_maxpool (bool) – If true, perform max pooling.

  • stage_planes (int) – the output channel dimension of the 1st residual stage

  • stage_temporal_kernel_basis (list) – Basis of temporal kernel sizes for each of the stage.

  • temporal_conv_1x1 (bool) – Only useful for BottleneckTransformation. In a pathaway, if True, do temporal convolution in the first 1x1 Conv3d. Otherwise, do it in the second 3x3 Conv3d.

  • stage_temporal_stride (int) – the temporal stride of the residual transformation.

  • stage_spatial_stride (int) – the spatial stride of the the residual transformation.

  • num_groups (int) – number of groups for the convolution. num_groups = 1 is for standard ResNet like networks, and num_groups > 1 is for ResNeXt like networks.

  • width_per_group (int) – Number of channels per group in 2nd (group) conv in the residual transformation in the first stage

  • zero_init_residual_transform (bool) – if true, the weight of last operation, which could be either BatchNorm3D in post-activated transformation or Conv3D in pre-activated transformation, in the residual transformation is initialized to zero

classmethod from_config(config: Dict[str, Any]) → classy_vision.models.resnext3d.ResNeXt3D

Instantiates a ResNeXt3D from a configuration.

Parameters

config – A configuration for a ResNeXt3D. See __init__() for parameters expected in the config.

Returns

A ResNeXt3D instance.