-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
draft migration path doc from ML Model to MLM Extension #19
base: deprecate
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a strongly held opinion, but would we make lives easier on ourselves if we migrated the mlm-extension
repo to the stac-extensions
organization first, then updated the text here? To me it feels funny to merge this PR knowing that we'll have to go through and update a bunch of URLs shortly thereafter.
Another nice-to-have would be a migration script that we could point folks to ... maybe as part of the stac-model repo?
Other than those two points, LGTM.
Thanks for the comments @gadomski Unfortunately I don't have the bandwidth to provide a migration script right now. There might be some nontrivial decisions to handle in the migration. But I'm happy to review and work with others on PRs. stac-model is maintained alongside the spec here https://github.com/crim-ca/mlm-extension/blob/main/README_STAC_MODEL.md I'm happy to merge this PR after the migration. If STAC maintainers initiate that @fmigneault can you accept the invite to do so for the https://github.com/crim-ca org? I'm not sure how full repo migrations work and who initiates. |
I've done it a few times (for https://github.com/stactools-packages/) but it's been a minute. I believe @fmigneault can go to https://github.com/crim-ca/mlm-extension/transfer and then choose |
|
||
## Shared Goals | ||
|
||
Both the ML Model Extension and the Machine Learning Model (MLM) Extension aim to provide a standard way to catalog machine learning (ML) models that work with Earth observation (EO) data. Their main goals are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding something like "but not limited to" regarding the mention of working with EO data. This is one of the main goals of generalization provided by MLM, although using STAC often implies EO data is used somewhere.
Both the ML Model Extension and the Machine Learning Model (MLM) Extension aim to provide a standard way to catalog machine learning (ML) models that work with Earth observation (EO) data. Their main goals are: | ||
|
||
1. **Search and Discovery**: Helping users find and use ML models. | ||
2. **Describing Inference Requirements**: Making it easier to run these models by describing input requirements and outputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider replacing "inference" by "execution" or "usage", or add "training" explicitly?
Technically, training requirements can be defined as well inference using mlm:training-runtime
.
|
||
- The MLM Extension covers more details at both the Item and Asset levels, making it easier to describe and use model metadata. | ||
- The MLM Extension covers Runtime requirements within the [Container Asset](https://github.com/crim-ca/mlm-extension?tab=readme-ov-file#container-asset), while the ML Model Extension records [similar information](./README.md#inferencetraining-runtimes) in the `ml-model:inference-runtime` or `ml-model:training-runtime` asset roles. | ||
- The MLM extension has a corresponding Python library, [`stac-model`](https://pypi.org/project/stac-model/) which can be used to create and validate MLM metadata. An example of the library in action is [here](https://github.com/crim-ca/mlm-extension/blob/main/stac_model/examples.py#L14). The ML Model extension does not support this and requires the JSON to be written manually by interpreting the JSON Schema or existing examples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a mention of its direct interoperability with common utilities (ie: pydantic
and pystac
) would be a great justification to further motivate its use.
| `ml-model:type` | N/A | No direct equivalent, it is implied by the `mlm` prefix in MLM fields and directly specified by the schema identifier. | | ||
| `ml-model:learning_approach` | `mlm:tasks` | Removed in favor of specifying specific `mlm:tasks`. | | ||
| `ml-model:prediction_type` | `mlm:tasks` | `mlm:tasks` provides a more comprehensive enum of prediction types. | | ||
| `ml-model:architecture` | `mlm:architecture` | The MLM provides specific guidance on using Papers With Code - Computer Vision identifiers for model architectures. No guidance is provided in ML Model. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add *
around the "Papers With Code - Computer Vision"
| `ml-model:learning_approach` | `mlm:tasks` | Removed in favor of specifying specific `mlm:tasks`. | | ||
| `ml-model:prediction_type` | `mlm:tasks` | `mlm:tasks` provides a more comprehensive enum of prediction types. | | ||
| `ml-model:architecture` | `mlm:architecture` | The MLM provides specific guidance on using Papers With Code - Computer Vision identifiers for model architectures. No guidance is provided in ML Model. | | ||
| `ml-model:training-processor-type` | `mlm:accelerator` | MLM defines more choices for accelerators in an enum and specifies that this is the accelerator for inference (the focus of the MLM extension is inference). ML Model only accepts `cpu` or `gpu` but this isn't sufficient today where we have models optimized for different CPU architectures, CUDA GPUs, Intel GPUs, AMD GPUs, Mac Silicon, and TPUs. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest removing the focus on inference part.
Many examples use pretrained models in inference for simplicity, but there shouldn't be anything limiting the definition for training. If anything is missing, I would rather motivate users to submit issues for improvements of future MLM revisions to better handle training (if anything is actually missing).
- **`mlm:name`**: A required name for the model. | ||
- **`mlm:framework`**: The framework used to train the model. | ||
- **`mlm:framework_version`**: The version of the framework. Useful in case a container runtime asset is not specified or if the consumer of the MLM wants to run the model outside of a container. | ||
- **`mlm:memory_size`**: The in-memory size of the model. | ||
- **`mlm:total_parameters`**: Total number of model parameters. | ||
- **`mlm:pretrained`**: Indicates if the model is derived from a pretrained model. | ||
- **`mlm:pretrained_source`**: Source of the pretrained model by name or URL if it is less well known. | ||
- **`mlm:batch_size_suggestion`**: Suggested batch size for the given accelerator. | ||
- **`mlm:accelerator_constrained`**: Indicates if the model requires a specific accelerator. | ||
- **`mlm:accelerator_summary`**: Description of the accelerator. This might contain details on the exact accelerator version (TPUv4 vs TPUv5) and their configuration. | ||
- **`mlm:accelerator_count`**: Minimum number of accelerator instances required. | ||
- **`mlm:input`**: Describes the model's input shape, dtype, and normalization and resize transformations. | ||
- **`mlm:output`**: Describes the model's output shape and dtype. | ||
- **`mlm:hyperparameters`**: Additional hyperparameters relevant to the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using a table just to keep sections consistent and more readable.
- **`mlm:name`**: A required name for the model. | |
- **`mlm:framework`**: The framework used to train the model. | |
- **`mlm:framework_version`**: The version of the framework. Useful in case a container runtime asset is not specified or if the consumer of the MLM wants to run the model outside of a container. | |
- **`mlm:memory_size`**: The in-memory size of the model. | |
- **`mlm:total_parameters`**: Total number of model parameters. | |
- **`mlm:pretrained`**: Indicates if the model is derived from a pretrained model. | |
- **`mlm:pretrained_source`**: Source of the pretrained model by name or URL if it is less well known. | |
- **`mlm:batch_size_suggestion`**: Suggested batch size for the given accelerator. | |
- **`mlm:accelerator_constrained`**: Indicates if the model requires a specific accelerator. | |
- **`mlm:accelerator_summary`**: Description of the accelerator. This might contain details on the exact accelerator version (TPUv4 vs TPUv5) and their configuration. | |
- **`mlm:accelerator_count`**: Minimum number of accelerator instances required. | |
- **`mlm:input`**: Describes the model's input shape, dtype, and normalization and resize transformations. | |
- **`mlm:output`**: Describes the model's output shape and dtype. | |
- **`mlm:hyperparameters`**: Additional hyperparameters relevant to the model. | |
| Field Name | Description | | |
|----------------------------------|-------------------------------------------------------------------------------------------------------------------------| | |
| **`mlm:name`** | A required name for the model. | | |
| **`mlm:framework`** | The framework used to train the model. | | |
| **`mlm:framework_version`** | The version of the framework. Useful in case a container runtime asset is not specified or if the consumer of the MLM wants to run the model outside of a container. | | |
| **`mlm:memory_size`** | The in-memory size of the model. | | |
| **`mlm:total_parameters`** | Total number of model parameters. | | |
| **`mlm:pretrained`** | Indicates if the model is derived from a pretrained model. | | |
| **`mlm:pretrained_source`** | Source of the pretrained model by name or URL if it is less well known. | | |
| **`mlm:batch_size_suggestion`** | Suggested batch size for the given accelerator. | | |
| **`mlm:accelerator_constrained`**| Indicates if the model requires a specific accelerator. | | |
| **`mlm:accelerator_summary`** | Description of the accelerator. This might contain details on the exact accelerator version (TPUv4 vs TPUv5) and their configuration. | | |
| **`mlm:accelerator_count`** | Minimum number of accelerator instances required. | | |
| **`mlm:input`** | Describes the model's input shape, dtype, and normalization and resize transformations. | | |
| **`mlm:output`** | Describes the model's output shape and dtype. | | |
| **`mlm:hyperparameters`** | Additional hyperparameters relevant to the model. | |
- **`mlm:accelerator_constrained`**: Indicates if the model requires a specific accelerator. | ||
- **`mlm:accelerator_summary`**: Description of the accelerator. This might contain details on the exact accelerator version (TPUv4 vs TPUv5) and their configuration. | ||
- **`mlm:accelerator_count`**: Minimum number of accelerator instances required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing the mlm:accelerator
?
| `ml-model:inference-runtime` | `mlm:inference-runtime` | Direct conversion; same role and function. | | ||
| `ml-model:training-runtime` | `mlm:training-runtime` | Direct conversion; same role and function. | | ||
| `ml-model:checkpoint` | `mlm:checkpoint` | Direct conversion; same role and function. | | ||
| N/A | `mlm:model` | New required role for model assets in MLM. This represents the asset that is loaded for inference. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to describe this as the "source" of the model weights and definition (typically an ONNX, a checkpoint or similar). This would allow the flexibility for re-training/fine-tuning rather than only inference. However, the note about loading this asset can be left as well.
| N/A | `mlm:inference` | Recommended for inference pipelines. | | ||
|
||
|
||
The MLM is focused on search, discovery descriptions, and reproducibility of inference. Nevertheless, the MLM provides a recommended asset role for `mlm:training-runtime` and asset `mlm:training`, which can point to a container URL that has the training runtime requirements. The ML Model extension specifies a field for `ml-model:training-runtime` but like `mlm:training` it only contains the default STAC Asset fields and additional fields specified by the Container Asset. Training requirements typically differ from inference requirements so therefore we recommend that fields and assets for reproducing model training or fine-tuning models be contained in a separate STAC extension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous. Adjust "inference" for "runtime" or similar.
The intent would better be described as reproducibility of experiments, whether those are for training, fine-tuning or inference.
I agree with the start of the last sentence about differing requirements (hence the separate runtimes), but I strongly disagree about using another extension. This will only cause a lot of metadata duplication (already seeing a similar trend happening with OGC's TrainingDML-AI sub "model-extension"...). There are already examples of using MLM for both training and inference simultaneously. I would rather improve upon MLM to add any missing gaps to describe the harder "training" scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good feedback! With regard to training, Francis and I talked and we agree mlm:hyperparameters should be specified as a field that is for including training hyperparameters. Inference scalars or other "variables" that are not tensors/arrays (like temperature setting, text input) can be represented in the InputStructure. In the future we might want to consider inference parameters or inference variables as a separate field.
I'll tweak language throughout to remove the focus on inference at expense of training. Will note that required fields are marked required either for inference or search and discovery.
@rbavery @gadomski However, once this is done (and that I get the approvals from CRIM higher ups), I will need from an admin of stac-extensions to provide the necessary permissions to adjust the settings. The migration page displays a warning that all configured protection rules will be removed during the transfer. I would like to re-enable those, and make sure @rbavery, @languith and @fmigneault (at least) have the necessary access to manage the repository. |
No problem at all ... when migrated, I'll be able to add you as a maintainer of the repo so you can PR (or just YOLO change) anything as needed. You will also be able to add others as needed. |
This adds a migration doc that describes similarities and differences in capabilities between the ML Model Extension and the Machine Learning Model Extension. It also highlights how some field names have changed, enums have been expanded, and some fields have been removed. It notes that the ML Model Extension is not maintained and it's primary public facing implementation is deprecated.
If this is approved and merged I think we should then approve and merge #16. Once the repo ownership of https://github.com/crim-ca/mlm-extension/tree/main is changed to STAC Extensions, then I can update the links in this migration document to point to the new repo url.
I think I will need assistance from the STAC Extension maintainers to make a request to CRIM to initiate a move of the entire repo since we want to preserve stars, github history, issues, etc. cc @fmigneault