KARMA: Knowledge Assessment and Reasoning for Medical Applications

---

Documentation: https://karma.eka.care

Source Code: https://github.com/eka-care/KARMA-OpenMedEvalKit

KARMA provides a unified package for evaluating medical AI systems, supporting text, image, and audio-based models. The framework includes support for 12 medical datasets and offers standardized evaluation metrics commonly used in healthcare AI research.

The key features are:

Fast: Very high performance evaluation, capable of processing thousands of medical examples efficiently
Easy: Designed to be easy to use and learn. Less time reading docs, more time evaluating models
Comprehensive: Support for 12+ medical datasets across multiple modalities (text, images, VQA)
Model Agnostic: Works with any model - Qwen, MedGemma, API providers (OpenAI, AWS Bedrock) or your custom architecture
Smart Caching: Intelligent result caching with DuckDB/DynamoDB backends for faster re-evaluations
Standards-based: Extensible architecture with registry-based auto-discovery of models and datasets

pip install karma-medeval

Installation

Install KARMA from PyPI:

pip install karma-medeval

Or install from source:

# Clone the repository
git clone https://github.com/eka-care/KARMA-OpenMedEvalKit.git
cd KARMA-OpenMedEvalKit

# Install with uv (recommended)
uv sync

# Or install with pip
pip install -e .

# source the environment
source .venv/bin/activate

Example

Evaluate your first medical AI model Using the Example of Qwen3 Model:

$ karma eval --model "Qwen/Qwen3-0.6B" --datasets openlifescienceai/pubmedqa

Supported Models

KARMA depends on PyTorch and HuggingFace Transformers.

Check supported models through

$ karma list models

Adding Custom Models

KARMA supports custom model integration through its registry system. See the Contributing section for details on adding new models.

Custom Model and Dataset Registration

KARMA uses a decorator-based registry system that makes it easy to add your own models and datasets for evaluation.

Registering a Model

Create a new model by inheriting from BaseHFModel and then call the register_model_meta method from registry.py with the ModelMeta

See sample implementation from qwen.py Multiple models from the same family can be imported through this now.

Take any model specific inputs through the loader_kwargs in ModelMeta, they have to be set as init parameters to be used. They are passed as kwargs from the model registry.

from karma.models.base_model_abs import BaseHFModel
from karma.data_models.model_meta import ModelMeta, ModelType, ModalityType
from karma.registries.model_registry import register_model_meta

logger = logging.getLogger(__name__)

class MyCustomModel(BaseHFModel):
    """Custom model implementation."""
    
    def __init__(
        self,
        model_name_or_path: str,
        device: str = "mps",
        max_tokens: int = 32768,
        temperature: float = 0.7,
        top_p: float = 0.9,
        top_k: Optional[int] = None,
        enable_thinking: bool = True,
        **kwargs,
    ):
    super().__init__(
            model_name_or_path=model_name_or_path,
            device=device,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k,
            enable_thinking=enable_thinking,
            **kwargs,
        )
      
    ...
  
my_custom_model = ModelMeta(
    name="Qwen/Qwen3-1.7B",
    description="QWEN model",
    loader_class="karma.models.custom.MyCustomModel",
    loader_kwargs={
        "temperature": 0.7,
        "top_k": 50,
        "top_p": 0.9,
        "enable_thinking": True,
        "max_tokens": 256,
    },
    revision=None,
    reference=None,
    model_type=ModelType.TEXT_GENERATION,
    modalities=[ModalityType.TEXT],
    n_parameters=None,
    memory_usage_mb=None,
    max_tokens=None,
    embed_dim=None,
    framework=["PyTorch", "Transformers"],
)
register_model_meta(my_custom_model)

Registering a Custom Dataset

Create a new dataset by inheriting from BaseMultimodalDataset and using the @register_dataset decorator:

from karma.eval_datasets.base_dataset import BaseMultimodalDataset
from karma.registries.dataset_registry import register_dataset

@register_dataset(
    "my_custom_dataset", 
    metrics=["exact_match", "accuracy"], 
    task_type="mcqa",
    required_args=["domain"],
    optional_args=["split", "subset"],
    default_args={"split": "test"}
)
class MyCustomDataset(BaseMultimodalDataset):
    """Custom dataset implementation."""
    
    def __init__(self, domain: str, split: str = "test", subset: str = None, **kwargs):
        self.domain = domain
        self.split = split
        self.subset = subset
        super().__init__(**kwargs)

Using Your Custom Components

After defining your custom model and dataset, use them with the CLI:

# Use your custom model and dataset
karma eval --model my_custom_model --model-path "path/to/model" \
  --datasets "my_custom_dataset" \
  --dataset-args "my_custom_dataset:domain=medical"
  --model-kwargs '{"temperature":0.5}'

Registration Parameters

Model Registration:

name: Unique identifier for your model

Dataset Registration:

name: Unique identifier for your dataset
metrics: List of applicable metrics (e.g., ["exact_match", "bleu", "accuracy"])
task_type: Type of task ("mcqa", "vqa", "translation", "qa")
required_args: Arguments that must be provided when creating the dataset
optional_args: Arguments that can be provided but have defaults
default_args: Default values for arguments

Usage

List available resources:

karma list models
karma list datasets

Basic evaluation:

karma eval --model qwen --model-path "Qwen/Qwen3-0.6B"

Evaluate specific datasets:

karma eval --model qwen --model-path "Qwen/Qwen3-0.6B" --datasets "pubmedqa,medmcqa"

With dataset-specific arguments:

karma eval --model qwen --model-path "Qwen/Qwen3-0.6B" --datasets "in22conv" \
  --dataset-args "in22conv:source_language=en,target_language=hi"

Advanced options:

karma eval --model qwen --model-path "Qwen/Qwen3-0.6B" \
  --datasets "pubmedqa" --batch-size 16 --output results.json --no-cache

Configuration

KARMA supports environment-based configuration. Create a .env file:

# Cache configuration  
KARMA_CACHE_TYPE=duckdb
KARMA_CACHE_PATH=./cache.db

# Model configuration
HUGGINGFACE_TOKEN=your_token
LOG_LEVEL=INFO

Caching options

DuckDB (default) - for local development
DynamoDB - for production environments

Enable or disable caching:

karma eval --cache      # Enable (default)
karma eval --no-cache   # Disable

Contributing

We welcome contributions to KARMA!

Adding New Components

KARMA uses a registry-based architecture that makes it easy to add:

New datasets - Extend BaseMultimodalDataset and register with @register_dataset
New models - Extend BaseLLM and register with @register_model
New metrics - Implement custom evaluation metrics
New processors - Add data preprocessing capabilities

See the existing implementations in karma/eval_datasets/ and karma/models/ for examples.

License

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 291 Commits
.github/workflows		.github/workflows
docs_v2		docs_v2
examples		examples
karma		karma
notebooks		notebooks
scripts		scripts
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KARMA: Knowledge Assessment and Reasoning for Medical Applications

Table of Contents

Installation

Example

Supported Models

Adding Custom Models

Custom Model and Dataset Registration

Registering a Model

Registering a Custom Dataset

Using Your Custom Components

Registration Parameters

Usage

Configuration

Caching options

Contributing

Adding New Components

License

About

Uh oh!

Releases 13

Packages

Contributors 8

Uh oh!

Languages

License

eka-care/KARMA-OpenMedEvalKit

Folders and files

Latest commit

History

Repository files navigation

KARMA: Knowledge Assessment and Reasoning for Medical Applications

Table of Contents

Installation

Example

Supported Models

Adding Custom Models

Custom Model and Dataset Registration

Registering a Model

Registering a Custom Dataset

Using Your Custom Components

Registration Parameters

Usage

Configuration

Caching options

Contributing

Adding New Components

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Contributors 8

Uh oh!

Languages

Packages