Skip to content

J-Rojas/UT-MoE-Thesis-Research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MoeMixing

Setup

  1. Create a Conda environment.
  2. Install pip via Conda
  3. Run the following command:
./install_deps.sh

Running tests

pytest -rx

Config

In config/Config.py, there is a variable Config which should be used to set all hyperparameters and settings. Do not instantiate another instance of the __Config class, Config is an EasyDict so you can simply use Config.update(newConfig) to update it.

Config Methods

Config.load_yaml_config(path) is for loading from yaml files

Config.save_yaml_config(path, keys_to_save) is for saving to yaml files, keys_to_save is a list of keys which should be saved alongside their values in the yaml file.

Config.load_cmd_args(**defaults) is for loading from command line arguments. Only the keys in defaults will be loaded from the command line arguments, and their corresponding values define the default value to use if the command line argument is not present. If nothing is passed, Config will be used for this.

Arguments passed as a single string

Certain properties of Config will automatically parse arguments passed to them as strings and apply them. Currently, these are model_args, train_args, and peft_args.
For example, if you set
Config.model_args = "hidden_size=768,patch_size=16,alpha=0.5"
then
Config.hidden_size == 768
Config.patch_size == 16
Config.alpha == 0.5

Note that the original string property will still be present in the Config, so Config.model_args would contain "hidden_size=768,patch_size=16,alpha=0.5".

Models

Base Classes

In model/base.py, we define class MoEBase(torch.nn.Module) and class MergedMoEwithPEFTBase(MoEBase)

MoEBase

model = MoEBase(root_dir, model_path)
root_dir is the root directory which should store the model weights, the experts' weights, and the YAML config.
model_path is the HuggingFace path for the model, i.e. "huggyllama/llama-7b".

By default, this will simply hold the original model, unchanged. If we want to MoE-fy it, we call
model.install_moe_layers(**kwargs)
This will set up the model with DeepSpeed MoE layers according to the passed kwargs. Then, we should call model.load_moe_weights()
This will load from a filepath relative to root_dir determined by specific arguments of Config. Currently, these arguments are "gate_type", "num_experts", "k", and "tasks". For example, if root_dir is "moe-base", one possible path to load from is moe-base/gate_type=expert_choice/num_experts=32_k=2_task=c4/moe_weights.pt.
After installing our layers and loading their weights, our MoE-fied model should be good to go.

MergedMoEwithPEFTBase

This class implements similar PEFT analogs for the weight loading/saving functionality as well as directory structure from above. The main difference is that the MoE filepaths are hashed so that, given a root_dir of finetune, the weight path might look like finetune/37569875693847562984/lora_dim=32_peft_task=c4/peft_weights.pt. We do not currently have model.install_peft_layers implemented, as we have not yet decided whether we are definitively going with LoRA or if we might go with prefix/prompt tuning, adapters, or other PEFT methods.

Training Classes

class MoEModule(pl.LightningModule) uses the above functionality (specifically model.install_moe_layers) to setup upcycling in the PyTorch Lightning trainer. We also have MoEDataModule(pl.LightningDataModule) for loading data, which we use in class ImageNet(MoEDataModule). If we set data_module = ImageNet(module.model), we can simply use data_module.train_dataloader() and data_module.val_dataloader().

LLM Classes

There is a class both for the model (class MoELLMBase(MergedMoEwithPEFTBase)) and, soon, for training.

ViT Class

There is a class both for the model (class MoEVitBase(MergedMoEwithPEFTBase)) and for training (class Vit(MoEModule)). This implements the training and validation steps.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published