Before reading this, please go over the Getting Started tutorial.
Working with Classy Vision requires models to be instances of ClassyModel
. A ClassyModel
is an instance of torch.nn.Module
, but packed with a lot of extra features!
If your model isn't implemented as a ClassyModel
, don't fret - you can easily convert it to one in one line.
In this tutorial, we will cover:
As ClassyModel
s are also instances of nn.Module
, they can be treated as any normal PyTorch model.
import torch
from classy_vision.models import build_model
model = build_model({"name": "resnet50"})
input = torch.ones(10, 3, 224, 224) # a batch of 10 images with 3 channels with dimensions of 224 x 224
output = model(input)
Classy Vision provides the functions get_classy_state()
and set_classy_state()
to fetch and save the state of the models. These are considered drop-in replacements for the torch.nn.Module.state_dict
and torch.nn.Module.load_state_dict()
functions and work similarly. For more information, refer to the docs.
state = model.get_classy_state()
model.set_classy_state(state)
A lot of work in Computer Vision utilizes the concept of re-using a trunk model, like a ResNet 50, and using it for various tasks. This is accomplished by attaching different "heads" to the end of the trunk.
Some use cases involve re-training a model trained with a certain head by removing the old head and attaching a new one. This is a special case of fine tuning. If you are interested in fine tuning your models, there's a tutorial for that as well! But first, let's understand the basics.
Normally, attaching heads or changing them requires users to write code and update their model implementations. Classy Vision does all of this work for you - all of this happens under the hood, with no work required by users!
All you need to do is decide which ClassyHead
you want to attach to your model and where. We will use a simple fully connected head in our example, and attach it to the output of the block3-2
module of our model. Note that a head can be attached to any module, as long as the name of the module is unique.
from classy_vision.heads import FullyConnectedHead
# a resnet 50 model's trunk outputs a tensor of 2048 dimension, which will be the
# in_plane of out head
#
# let's say we want a 100 dimensional output
#
# Tip: you can use build_head() as well to create a head instead of initializing the
# class directly
head = FullyConnectedHead(unique_id="default", num_classes=100, in_plane=2048)
# let's attach this head to our model
model.set_heads({"block3-2": [head]})
output = model(input)
assert output.shape == (10, 100)
# let's change the head one more time
head = FullyConnectedHead(unique_id="default", num_classes=10, in_plane=2048)
model.set_heads({"block3-2": [head]})
output = model(input)
assert output.shape == (10, 10)
Classy Vision supports attaching multiple heads to one or more blocks as well, but that is an advanced concept which this tutorial does not cover. For inquisitive users, here is an example -
head_1_1 = FullyConnectedHead(unique_id="1_1", num_classes=10, in_plane=1024)
head_1_2 = FullyConnectedHead(unique_id="1_2", num_classes=20, in_plane=1024)
head_2 = FullyConnectedHead(unique_id="2", num_classes=100, in_plane=2048)
# we can attach these heads to our model at different blocks
model.set_heads({"block2-2": [head_1_1, head_1_2], "block3-2": [head_2]})
output = model(input)
This section will demonstrate: (1) how to create a custom model within Classy Vision; (2) how to integrate your model with Classy Vision's configuration system; (3) how to use the model for training and inference;
Creating a new model in Classy Vision is as simple as creating one within PyTorch. The model needs to derive from ClassyModel
and implement a forward
method to perform inference. ClassyModel
inherits from torch.nn.Module
, so it works exactly as you would expect.
import torch.nn as nn
from classy_vision.models import ClassyModel
class MyModel(ClassyModel):
def __init__(self, num_classes):
super().__init__()
# Average all the pixels, generate one output per class
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
num_channels = 3
self.fc = nn.Linear(num_channels, num_classes)
def forward(self, x):
# perform average pooling
out = self.avgpool(x)
# reshape the output and apply the fc layer
out = out.reshape(out.size(0), -1)
out = self.fc(out)
return out
Now we can start using this model for training. Take a look at our Getting started tutorial for more details on how to train a model from a Jupyter notebook.
from classy_vision.tasks import ClassificationTask
my_model = MyModel(num_classes=1000)
my_task = ClassificationTask().set_model(my_model)
Classy Vision is also able to read a configuration file and instantiate the model. This is useful to keep your experiments organized and reproducible. For that, you have to:
from_config
methodregister_model
decorator to MyModel
import torch.nn as nn
from classy_vision.models import ClassyModel, register_model
@register_model("my_model")
class MyModel(ClassyModel):
def __init__(self, num_classes):
super().__init__()
# Average all the pixels, generate one output per class
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
num_channels = 3
self.fc = nn.Linear(num_channels, num_classes)
@classmethod
def from_config(cls, config):
# This method takes a configuration dictionary
# and returns an instance of the class. In this case,
# we'll let the number of classes be configurable.
return cls(num_classes=config["num_classes"])
def forward(self, x):
# perform average pooling
out = self.avgpool(x)
# reshape the output and apply the fc layer
out = out.reshape(out.size(0), -1)
out = self.fc(out)
return out
Now we can start using this model in our configurations. The argument passed to register_model
is used to identify the model class in the configuration:
from classy_vision.models import build_model
import torch
model_config = {
"name": "my_model",
"num_classes": 3
}
my_model = build_model(model_config)
assert isinstance(my_model, MyModel)
# my_model inherits from torch.nn.Module, so inference works as usual:
x = torch.rand((1, 3, 200, 200))
with torch.no_grad():
print(my_model(x))
tensor([[-0.1739, -0.7974, -0.0818]])
Now that your model is integrated with the configuration system, you can train it using classy_train.py
as described in the Getting started tutorial, no further changes are needed! Just make sure the code defining your model is in the models
folder of your classy project.
Any model can be converted to a Classy Model with a simple function call - ClassyModel.from_model()
from torchvision.models import resnet18
from classy_vision.models import ClassyModel
model = resnet18()
classy_model = ClassyModel.from_model(model)
output = classy_model(input)
assert output.shape == (10, 1000)
In fact, as soon as a model becomes a Classy Model, it gains all its abilities as well, including the ability to attach heads! Let us inspect the original model to see the modules it comprises.
model
ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (layer1): Sequential( (0): BasicBlock( (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (1): BasicBlock( (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer2): Sequential( (0): BasicBlock( (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer3): Sequential( (0): BasicBlock( (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer4): Sequential( (0): BasicBlock( (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (avgpool): AdaptiveAvgPool2d(output_size=(1, 1)) (fc): Linear(in_features=512, out_features=1000, bias=True) )
It seems that the final trunk layer of this model is called layer4
. Let's try to attach heads here.
# the output of layer4 is 512 dimensional
head = FullyConnectedHead(unique_id="default", num_classes=10, in_plane=512)
classy_model.set_heads({"layer4": [head]})
output = classy_model(input)
assert output.shape == (10, 10) # it works!
You might be wondering how to figure out the in_plane
for any module. A simple trick is to try attaching any head and noticing the Exception
if there is a size mismatch!
try:
head = FullyConnectedHead(unique_id="default", num_classes=10, in_plane=1234)
classy_model.set_heads({"layer4": [head]})
output = classy_model(input)
except Exception as e:
print(e)
size mismatch, m1: [10 x 512], m2: [1234 x 10] at ../aten/src/TH/generic/THTensorMath.cpp:136
The error tells us that the size should be 512.
In this tutorial, we covered how to use Classy Models, how to get and set their state, and how to create our own models & integrating them with the configuration system. We also got familiarized with the concept of heads and how they work with Classy Vision. Lastly, we learned how we can easily convert any PyTorch models to Classy Models and unlock all the features they provide.
For more information, refer to our API docs.