-
Notifications
You must be signed in to change notification settings - Fork 130
models mmeft
Multimodal Early Fusion Transformer (MMEFT) is a transformer-based model tailored for processing both structured and unstructured data.
It can be used for multi-class and multi-label multimodal classification tasks, and is capable of handling datasets with features from diverse modes, including categorical, numerical, image, and text. MMEFT architecture is composed of embedding, fusion, aggregation, and output layers. The embedding layer produces independent non-contextual embeddings for features of varying modes. Then, the fusion Layer integrates the non-contextual embeddings to yield contextual multimodal embeddings. The aggregation layer consolidates these contextual multimodal embeddings into a single multimodal embedding vector. Lastly, the output Layer, processes the final multimodal embedding to generate the model's prediction based on task for which it is used. MMEFT uses bert-base-uncased from HuggingFace for text data embeddings, resnet-18 from HuggingFace for image data embeddings, Feature Tokenizer + Transformer (FT-Transofrmer) from Revisiting Deep Learning Models for Tabular Data for tabular data embeddings. This model is designed to offer a comprehensive approach to multimodal data, ensuring accurate and efficient classification across varied datasets.
NOTE: We highly recommend to finetune the model on your dataset before deploying.
Inference type | Python sample (Notebook) | CLI with YAML |
---|---|---|
Real time | multimodal-classification-online-endpoint.ipynb | multimodal-classification-online-endpoint.sh |
Batch | multimodal-classification-batch-endpoint.ipynb | multimodal-classification-batch-endpoint.sh |
Task | Dataset | Python sample (Notebook) | CLI with YAML |
---|---|---|---|
Multimodal multi-class classification | Airbnb listings dataset | multimodal-multiclass-classification.ipynb | multimodal-multiclass-classification.sh |
Multimodal multi-label classification | Chest X-Rays dataset | multimodal-multilabel-classification.ipynb | multimodal-multilabel-classification.sh |
{
"input_data": {
"columns": ["column1","column2","column3","column4","column5","column6"],
"data": [[22,11.2,"It was a great experience!",image1,"Categorical value",True],
[111,8.2,"I may not consider this option again.",image2,"Categorical value",False]
]
}
}
Note:
- "image1", "image2" are strings in base64 format.
[
{
"label1": 0.1,
"label2": 0.7,
"label3": 0.2
},
{
"label1": 0.3,
"label2": 0.3,
"label3": 0.4
},
]
Version: 5
license : mit
task : multimodal-classification
hiddenlayerscanned
openmmlab_model_id : mmeft
SharedComputeCapacityEnabled
finetune_compute_allow_list : ['Standard_NC6s_v3', 'Standard_NC8as_T4_v3', 'Standard_NC12s_v3', 'Standard_NC16as_T4_v3', 'Standard_NC24s_v3', 'Standard_NC4as_T4_v3', 'Standard_NC64as_T4_v3', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2']
inference_compute_allow_list : ['Standard_DS2_v2', 'Standard_D2a_v4', 'Standard_D2as_v4', 'Standard_DS3_v2', 'Standard_D4a_v4', 'Standard_D4as_v4', 'Standard_DS4_v2', 'Standard_D8a_v4', 'Standard_D8as_v4', 'Standard_DS5_v2', 'Standard_D16a_v4', 'Standard_D16as_v4', 'Standard_D32a_v4', 'Standard_D32as_v4', 'Standard_D48a_v4', 'Standard_D48as_v4', 'Standard_D64a_v4', 'Standard_D64as_v4', 'Standard_D96a_v4', 'Standard_D96as_v4', 'Standard_F4s_v2', 'Standard_FX4mds', 'Standard_F8s_v2', 'Standard_FX12mds', 'Standard_F16s_v2', 'Standard_F32s_v2', 'Standard_F48s_v2', 'Standard_F64s_v2', 'Standard_F72s_v2', 'Standard_FX24mds', 'Standard_FX36mds', 'Standard_FX48mds', 'Standard_E2s_v3', 'Standard_E4s_v3', 'Standard_E8s_v3', 'Standard_E16s_v3', 'Standard_E32s_v3', 'Standard_E48s_v3', 'Standard_E64s_v3', 'Standard_NC4as_T4_v3', 'Standard_NC6s_v3', 'Standard_NC8as_T4_v3', 'Standard_NC12s_v3', 'Standard_NC16as_T4_v3', 'Standard_NC24s_v3', 'Standard_NC64as_T4_v3', 'Standard_NC24ads_A100_v4', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2']
View in Studio: https://ml.azure.com/registries/azureml/models/mmeft/version/5
License: mit
SharedComputeCapacityEnabled: True
finetune-min-sku-spec: 4|1|28|176
finetune-recommended-sku: Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC4as_T4_v3, Standard_NC64as_T4_v3, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2
finetuning-tasks: multimodal-classification
inference-min-sku-spec: 2|0|7|14
inference-recommended-sku: Standard_DS2_v2, Standard_D2a_v4, Standard_D2as_v4, Standard_DS3_v2, Standard_D4a_v4, Standard_D4as_v4, Standard_DS4_v2, Standard_D8a_v4, Standard_D8as_v4, Standard_DS5_v2, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_F4s_v2, Standard_FX4mds, Standard_F8s_v2, Standard_FX12mds, Standard_F16s_v2, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E2s_v3, Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2