- Create a Conda environment.
- Install pip via Conda
- Run the following command:
./install_deps.sh
pytest -rx
In config/Config.py
, there is a variable Config
which should be used to set all hyperparameters and settings. Do not instantiate another instance of the __Config
class, Config
is an EasyDict so you can simply use Config.update(newConfig)
to update it.
Config.load_yaml_config(path)
is for loading from yaml files
Config.save_yaml_config(path, keys_to_save)
is for saving to yaml files, keys_to_save
is a list of keys which should be saved alongside their values in the yaml file.
Config.load_cmd_args(**defaults)
is for loading from command line arguments. Only the keys in defaults
will be loaded from the command line arguments, and their corresponding values define the default value to use if the command line argument is not present. If nothing is passed, Config
will be used for this.
Certain properties of Config
will automatically parse arguments passed to them as strings and apply them. Currently, these are model_args
, train_args
, and peft_args
.
For example, if you set
Config.model_args = "hidden_size=768,patch_size=16,alpha=0.5"
then
Config.hidden_size == 768
Config.patch_size == 16
Config.alpha == 0.5
Note that the original string property will still be present in the Config
, so Config.model_args
would contain "hidden_size=768,patch_size=16,alpha=0.5"
.
In model/base.py
, we define class MoEBase(torch.nn.Module)
and class MergedMoEwithPEFTBase(MoEBase)
model = MoEBase(root_dir, model_path)
root_dir
is the root directory which should store the model weights, the experts' weights, and the YAML config.
model_path
is the HuggingFace path for the model, i.e. "huggyllama/llama-7b".
By default, this will simply hold the original model, unchanged. If we want to MoE-fy it, we call
model.install_moe_layers(**kwargs)
This will set up the model with DeepSpeed MoE layers according to the passed kwargs
. Then, we should call
model.load_moe_weights()
This will load from a filepath relative to root_dir
determined by specific arguments of Config
. Currently, these arguments are "gate_type", "num_experts", "k", and "tasks". For example, if root_dir
is "moe-base", one possible path to load from is moe-base/gate_type=expert_choice/num_experts=32_k=2_task=c4/moe_weights.pt
.
After installing our layers and loading their weights, our MoE-fied model should be good to go.
This class implements similar PEFT analogs for the weight loading/saving functionality as well as directory structure from above. The main difference is that the MoE filepaths are hashed so that, given a root_dir
of finetune
, the weight path might look like finetune/37569875693847562984/lora_dim=32_peft_task=c4/peft_weights.pt
. We do not currently have model.install_peft_layers
implemented, as we have not yet decided whether we are definitively going with LoRA or if we might go with prefix/prompt tuning, adapters, or other PEFT methods.
class MoEModule(pl.LightningModule)
uses the above functionality (specifically model.install_moe_layers
) to setup upcycling in the PyTorch Lightning trainer. We also have MoEDataModule(pl.LightningDataModule)
for loading data, which we use in class ImageNet(MoEDataModule)
. If we set data_module = ImageNet(module.model)
, we can simply use data_module.train_dataloader()
and data_module.val_dataloader()
.
There is a class both for the model (class MoELLMBase(MergedMoEwithPEFTBase)
) and, soon, for training.
There is a class both for the model (class MoEVitBase(MergedMoEwithPEFTBase)
) and for training (class Vit(MoEModule)
). This implements the training and validation steps.