Skip to content

Latest commit

 

History

History
138 lines (102 loc) · 4.49 KB

Train-PyTorch-Model.md

File metadata and controls

138 lines (102 loc) · 4.49 KB

How to Train PyTorch Models

In this article, we demonstrate how to train a PyTorch model in RAF.

Before start, please install the following dependencies at first:

  • PyTorch
  • Torchvision (optional)
  • Hugging Face Transformers (optinal)

You can install them via pip, or refer to: https://pytorch.org/get-started/locally/ and https://huggingface.co/docs/transformers/installation

pip install torch torchvision transformers

The process to train a PyTorch model in RAF consists of importing model and writing training script, the following up sections will introduce them.

Import PyTorch Model

Users can use RAF frontend API to obtain RAF model from PyTorch model. In addition, RAF provides short-cut APIs to help users import model from torchvision or transformers.

An example of import torch model directly:

import torch
import torch.nn as nn
import torch.nn.functional as F

import raf
from raf.frontend import from_pytorch
from raf.testing import randn_torch, one_hot_torch, get_vm_executor, run_vm_executor


class TorchLeNet(nn.Module):
    def __init__(self, input_shape=28, num_classes=10):
        super(TorchLeNet, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5, padding=2, bias=False)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, bias=False)
        self.linear1 = nn.Linear(((input_shape // 2 - 4) // 2) ** 2 * 16, 120)
        self.linear2 = nn.Linear(120, 84)
        self.linear3 = nn.Linear(84, num_classes)

    def forward(self, x):
        out = self.conv1(x)
        out = torch.sigmoid(out)
        out = F.avg_pool2d(out, (2, 2), (2, 2))
        out = self.conv2(out)
        out = torch.sigmoid(out)
        out = F.avg_pool2d(out, (2, 2), (2, 2))
        out = torch.flatten(out, 1)
        out = self.linear1(out)
        out = self.linear2(out)
        out = self.linear3(out)
        return out

shape_dict = {"input0": ((32, 3, 28, 28), "float32")}
device = "cuda"
input_shape, dtype = list(shape_dict.values())[0]
batch_size = input_shape[0]

t_model = TorchLeNet()
r_model = from_pytorch(t_model, shape_dict)
r_model.to(device=device)
r_model.train_mode()

Now we get the imported model. Note that there is no loss function in this model, you can append loss function to the converted model or write it in the torch model definition directly.

from raf._op import sym

# prepare random data, they just provides shape and dtype info
r_x, _ = randn_torch(input_shape, device=device, dtype=dtype)
r_ytrue, _ = one_hot_torch(size=batch_size, num_classes=10, device=device)

out = r_model.record(r_x)
y_pred = sym.log_softmax(out)
loss = sym.nll_loss(r_ytrue, y_pred)
r_model_w_loss = r_model + loss

If you want to use models in transformers or torchvision:

from raf.testing import get_transformer_model

model, out_shape = get_transformer_model("bert-base-uncased", batch_size=32, seq_length=128, dtype="float32")
model.to(device=device)
model.train_mode()
from raf.testing import get_torchvision_model

model, out_shape = get_torchvision_model("resnet18", batch_size=32, image_size=(224, 224), dtype="float32")
model.to(device=device)
model.train_mode()

Write Training Script

In order to train a model, in addition to loss function, we also need an optimizer. For a RAF model already with loss function, just use RAF optimizer API to wrap the model:

r_trainer = raf.optim.sgd.with_sgd(learning_rate=0.1, momentum=0.01)(r_model_w_loss)

Actually, we can append loss function and optimizer in one line:

from raf.testing import append_loss_n_optimizer

r_trainer = append_loss_n_optimizer(r_model, args=[r_x], out_shape=(32, 10), y_true=r_ytrue, optimizer="sgd")

Finally we can write the training script and train the model:

dy, _ = randn_torch((), std=0.0, mean=1.0, requires_grad=False, device=device)  # dy = tensor(1.0)

# get vm exec
record = r_trainer._internal(dy, r_x, r_ytrue)
executor = get_vm_executor(record.mod, device)

# training loop
for _ in range(num_epoch):
    # prepare input data, use random data as example here
    r_x, _ = randn_torch(input_shape, device=device, dtype=dtype)
    r_ytrue, _ = one_hot_torch(size=batch_size, num_classes=10, device=device)
    args = [dy, r_x, r_ytrue]
    ret = run_vm_executor(executor, record, args, device)
    loss = ret[0]  # ret[0][0] for some models
    print("LOSS:", loss)