Documentation: https://karma.eka.care
Source Code: https://github.com/eka-care/KARMA-OpenMedEvalKit
KARMA provides a unified package for evaluating medical AI systems, supporting text, image, and audio-based models. The framework includes support for 12 medical datasets and offers standardized evaluation metrics commonly used in healthcare AI research.
The key features are:
- Fast: Very high performance evaluation, capable of processing thousands of medical examples efficiently
- Easy: Designed to be easy to use and learn. Less time reading docs, more time evaluating models
- Comprehensive: Support for 12+ medical datasets across multiple modalities (text, images, VQA)
- Model Agnostic: Works with any model - Qwen, MedGemma, API providers (OpenAI, AWS Bedrock) or your custom architecture
- Smart Caching: Intelligent result caching with DuckDB/DynamoDB backends for faster re-evaluations
- Standards-based: Extensible architecture with registry-based auto-discovery of models and datasets
pip install karma-medeval- Requirements
- Installation
- Example
- Supported Models
- Custom Model and Dataset Registration
- Usage
- Configuration
- Contributing
- License
Install KARMA from PyPI:
pip install karma-medevalOr install from source:
# Clone the repository
git clone https://github.com/eka-care/KARMA-OpenMedEvalKit.git
cd KARMA-OpenMedEvalKit
# Install with uv (recommended)
uv sync
# Or install with pip
pip install -e .
# source the environment
source .venv/bin/activateEvaluate your first medical AI model Using the Example of Qwen3 Model:
$ karma eval --model "Qwen/Qwen3-0.6B" --datasets openlifescienceai/pubmedqaKARMA depends on PyTorch and HuggingFace Transformers.
Check supported models through
$ karma list modelsKARMA supports custom model integration through its registry system. See the Contributing section for details on adding new models.
KARMA uses a decorator-based registry system that makes it easy to add your own models and datasets for evaluation.
Create a new model by inheriting from BaseHFModel and then call the register_model_meta method from registry.py with the ModelMeta
See sample implementation from qwen.py Multiple models from the same family can be imported through this now.
Take any model specific inputs through the loader_kwargs in ModelMeta, they have to be set as init parameters to be used.
They are passed as kwargs from the model registry.
from karma.models.base_model_abs import BaseHFModel
from karma.data_models.model_meta import ModelMeta, ModelType, ModalityType
from karma.registries.model_registry import register_model_meta
logger = logging.getLogger(__name__)
class MyCustomModel(BaseHFModel):
"""Custom model implementation."""
def __init__(
self,
model_name_or_path: str,
device: str = "mps",
max_tokens: int = 32768,
temperature: float = 0.7,
top_p: float = 0.9,
top_k: Optional[int] = None,
enable_thinking: bool = True,
**kwargs,
):
super().__init__(
model_name_or_path=model_name_or_path,
device=device,
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
top_k=top_k,
enable_thinking=enable_thinking,
**kwargs,
)
...
my_custom_model = ModelMeta(
name="Qwen/Qwen3-1.7B",
description="QWEN model",
loader_class="karma.models.custom.MyCustomModel",
loader_kwargs={
"temperature": 0.7,
"top_k": 50,
"top_p": 0.9,
"enable_thinking": True,
"max_tokens": 256,
},
revision=None,
reference=None,
model_type=ModelType.TEXT_GENERATION,
modalities=[ModalityType.TEXT],
n_parameters=None,
memory_usage_mb=None,
max_tokens=None,
embed_dim=None,
framework=["PyTorch", "Transformers"],
)
register_model_meta(my_custom_model)Create a new dataset by inheriting from BaseMultimodalDataset and using the @register_dataset decorator:
from karma.eval_datasets.base_dataset import BaseMultimodalDataset
from karma.registries.dataset_registry import register_dataset
@register_dataset(
"my_custom_dataset",
metrics=["exact_match", "accuracy"],
task_type="mcqa",
required_args=["domain"],
optional_args=["split", "subset"],
default_args={"split": "test"}
)
class MyCustomDataset(BaseMultimodalDataset):
"""Custom dataset implementation."""
def __init__(self, domain: str, split: str = "test", subset: str = None, **kwargs):
self.domain = domain
self.split = split
self.subset = subset
super().__init__(**kwargs)
After defining your custom model and dataset, use them with the CLI:
# Use your custom model and dataset
karma eval --model my_custom_model --model-path "path/to/model" \
--datasets "my_custom_dataset" \
--dataset-args "my_custom_dataset:domain=medical"
--model-kwargs '{"temperature":0.5}'Model Registration:
name: Unique identifier for your model
Dataset Registration:
name: Unique identifier for your datasetmetrics: List of applicable metrics (e.g.,["exact_match", "bleu", "accuracy"])task_type: Type of task ("mcqa","vqa","translation","qa")required_args: Arguments that must be provided when creating the datasetoptional_args: Arguments that can be provided but have defaultsdefault_args: Default values for arguments
List available resources:
karma list models
karma list datasetsBasic evaluation:
karma eval --model qwen --model-path "Qwen/Qwen3-0.6B"Evaluate specific datasets:
karma eval --model qwen --model-path "Qwen/Qwen3-0.6B" --datasets "pubmedqa,medmcqa"With dataset-specific arguments:
karma eval --model qwen --model-path "Qwen/Qwen3-0.6B" --datasets "in22conv" \
--dataset-args "in22conv:source_language=en,target_language=hi"Advanced options:
karma eval --model qwen --model-path "Qwen/Qwen3-0.6B" \
--datasets "pubmedqa" --batch-size 16 --output results.json --no-cacheKARMA supports environment-based configuration. Create a .env file:
# Cache configuration
KARMA_CACHE_TYPE=duckdb
KARMA_CACHE_PATH=./cache.db
# Model configuration
HUGGINGFACE_TOKEN=your_token
LOG_LEVEL=INFO- DuckDB (default) - for local development
- DynamoDB - for production environments
Enable or disable caching:
karma eval --cache # Enable (default)
karma eval --no-cache # DisableWe welcome contributions to KARMA!
KARMA uses a registry-based architecture that makes it easy to add:
- New datasets - Extend BaseMultimodalDataset and register with @register_dataset
- New models - Extend BaseLLM and register with @register_model
- New metrics - Implement custom evaluation metrics
- New processors - Add data preprocessing capabilities
See the existing implementations in karma/eval_datasets/ and karma/models/ for examples.
This project is licensed under the terms of the MIT license.