aiDIVA - augmented intelligence-based DIsease Variant Analysis

aiDIVA is an analysis pipeline that combines a pathogenicity-based approach and an optional evidence-based approach with state-of-the-art large language models (LLMs) to identify potential disease causing variants in a given rare disease sample.

aiDIVA comprises the following steps:

A pathogenicity-based approach that utilizes the predictions of a random forest model trained on ClinVar data, supplemented with phenotype information given as HPO terms.
An optional evidence-based approach includes the ranks and scores that can be obtained from the VariantRanking tool included in ngs-bits.
State-of-the-art LLMs are utilized to refine the ranking results from the previous two approaches.
A final meta model is used to combine all preliminary results and create a final ranking of the variants. For this model we also use a random forest.

Citing

If you use aiDIVA in your work please cite our preprint:

aiDIVA - Diagnostics of Rare Genetic Diseases Using Large Language Models (link)

Support

Please report any issues or questions to the aiDIVA Issue Tracker.

System Requirements

The program is written in Python 3

Latest used version: 3.12.3

The following additional libraries need to be installed in order to use the program (latest used version):

networkx (v3.4.2)
numpy (v1.26.4)
openai (v1.60.2)
pandas (v2.2.3)
pysam (v0.22.1)
pyyaml (v6.0.2)
scipy (v1.15.1)
scikit-learn (v1.3.2)

For easy package installation in your Python virtual environment we included a requirements.txt just run pip install -r requirements.txt to install all necessary packages (the versions in the requirements.txt match our own setup at that time).

If a newer scikit-learn version is used it is advised to create a new model with the newer scikit-learn version.

Run the aiDIVA software

To run the aiDIVA software you need a TAB separated file containing the annotation information for every variant present in your sample file.

Detailed instructions on how to run the software and what columns need to be present in the input table can be found here.

Annotation Resources and Tools

If you don't have an annotated table with the necessary columns mentioned before. You can use the run_annotation script provided in the annotation folder to create a table with the necessary information. Before you use this annotation script make sure that the necessary database resources and tools are present on your system and the paths in the configuration file are set correctly (IMPORTANT: use the correct configuration file, it differs between the annotation and aiDIVA!).

Instructions on how to use the annotation script and prepare the annotation resources and tools can be found here.

HPO Resources

The HPO resources required for the prioritization step need to be downloaded before using aiDIVA. See the instructions (found in the doc/aidiva folder) for the relevant download links. You can place the generated files in the data/hpo_resources folder. The path to the files is specified in the configuration file make sure that it leads to the correct location.

Pathogenicity Prediction

There is one random forest model that is used in aiDIVA to predict the pathogenicity of a given variant. It is a combined model for SNV and inframe indel variants. The training data of the model consists of variants from Clinvar.

The scripts used to train the model can be found in the following GitHub repository: aiDIVA-Training

Frameshift variants will get a default score of 0.9, whereas synonymous variants always get the lowest score 0.0

A pretrained random forest model (aidiva-rf) using our current feature set can be found here. The latest model was trained using scikit-learn v1.3.2. The trained models of scikit-learn are version dependent.

LLM Usage

aiDIVA supports the use of the official OpenAI API to send the requests to GPT-4o or GPT-4.1 for example. To use the OpenAI API you need an account and an API-Key that needs to be specified in the configuration file. Alternatively it is possible to set up your own local LLM (eg., LLama-8b, Mistral-12b, ...) and provide it locally as a Webservice. For an easy deployment you could use the NVIDIA NIM Containers see here for more details on how to do that. These local LLMs use the same python package for inference you just have to specify the port and URL where to find the local model in the configuration file.

Meta Model

You can download the pretrained meta models (aidiva-meta & aidiva-meta-rf) here. For these two models we used a random forest model that takes as features the ranking position and scores from the initial rankings (pathogenicity-based and evidence-based) plus the ranking result from the LLMs and the inheritance mode used in the evidence-based model.

License and Disclaimer

Medical Use Disclaimer

This software is provided for research and informational purposes only.
It is not intended to provide medical, clinical, diagnostic, or therapeutic advice, and it must not be used as a substitute for professional judgment.

The software has not been validated, certified, or approved by any regulatory authority (including but not limited to the FDA, EMA, or other healthcare agencies).
It is not designed or intended for use in real-world medical or clinical decision-making.

Always consult qualified healthcare professionals for medical advice, diagnosis, or treatment.

Data Usage Disclaimer

This software is released under the MIT License; however, it may reference or interoperate with external datasets or databases that are subject to their own license restrictions or terms of use.

Users are solely responsible for ensuring they have the legal right to access and use any external databases required by this project, and must comply with all applicable terms set by the data providers.

Name		Name	Last commit message	Last commit date
Latest commit History 356 Commits
aidiva		aidiva
annotation		annotation
data		data
doc		doc
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aiDIVA - augmented intelligence-based DIsease Variant Analysis

Citing

Support

System Requirements

Run the aiDIVA software

Annotation Resources and Tools

HPO Resources

Pathogenicity Prediction

LLM Usage

Meta Model

License and Disclaimer

Medical Use Disclaimer

Data Usage Disclaimer

About

Uh oh!

Releases 33

Packages

Languages

License

imgag/aiDIVA

Folders and files

Latest commit

History

Repository files navigation

aiDIVA - augmented intelligence-based DIsease Variant Analysis

Citing

Support

System Requirements

Run the aiDIVA software

Annotation Resources and Tools

HPO Resources

Pathogenicity Prediction

LLM Usage

Meta Model

License and Disclaimer

Medical Use Disclaimer

Data Usage Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 33

Packages 0

Languages

Packages