README.md |
---|
HTAN Orator is a tool for generating a natural language description of Human Tumor Atlas Network (HTAN) data. The tool takes a Synapse ID of a HTAN Data File and returns a natural language description of the dataset.
- Natural language generator: Generates a human understandable description of a HTAN dataset given a Synapse ID.
- BigQuery integration: Retrieves additional information about the dataset from Google BigQuery tables.
- Assay support: Supports ImagingLevel2 component type and will add more types in future.
HTAN Orator requires Python 3.11.
Other requirements include:
- Google Cloud BigQuery Python client: Allows querying data stores on BigQuery.
- SynapseClient: Enables programmatic interaction with Synapse, a data sharing platform.
- Pandas: For data manipulation and analysis.
These can be installed by creating a Conda environment with the supplied 'environment.yml' file.
- Clone this repository.
- Set up a new Conda environment using 'environment.yml':
conda env create -f environment.yml
conda activate htan-orator
- Run the tool with your input Synapse ID
python orator.py <synapse_id>
Note: Credentials setup for Google Cloud and Synapse is required.
You can use HTAN Orator in two ways:
- Running the stand-alone
orator.py
script which takes a Synapse ID as input and prints a natural language text on the console. - As a Python module in your own Python scripts. It provides an 'orate' function that takes a Synapse ID and returns a string.
Both methods require a valid Google Cloud service account and Synapse credentials if interacting with Google's BigQuery tables or Synapse respectively.
Python:
import orator
orator.orate('syn24829433')
CLI:
python orate.py syn24829433
returns the following (inserted elements in underlined bold italic)
'HTA9_1_19362 is a mIHC file submitted by the HTAN OHSU center of a biopsy (BiospecimenHTA9_1_17) from a 70 year old female (Participant HTA9_1) diagnosed with infiltrating duct carcinoma NOS. The image contains 12 channels, approximately 8.96M pixels, and measures 1939µm wide by 1157µm high. It was acquired on a Leica, Aperio AT2 at 20x magnification
import orator
orator.orate_miti('syn24829433')
CLI:
python orate.py syn24829433 --miti
returns the following
Age at Diagnosis: 63
Primary Diagnosis: Infiltrating duct carcinoma NOS
Site of Resection or Biopsy: Unknown
Tumor Grade: G3
Stage at Diagnosis: NoneSpecies: Human
Vital Status: Dead
Cause of death: Coming soon!
Gender: female
Race: white
Ethnicity: not hispanic or latinoType: Hormone Therapy
Therapeutic agents: Exemestane
Treatment Regimen: Exemestane
Initial Disease Status: NoneProgression: Yes - Progression or Recurrence
Last Known Disease Status: Distant met recurrence/> progression
Age at Follow Up: 75
Days to Progression: Coming soon!Acquisition Method Type: Biopsy
Imaging Assay Type: mIHC
Fixative Type: Formalin
Microscope: Leica, Aperio AT2
Objective: 20XVisit the HTAN Data Portal > to learn more.
Please cite the underlying data as:
Coming soon!Please cite this Minerva Story as:
Coming soon!
ID Type ID HTAN Data File ID HTA9_1_19362 HTAN Participant ID HTA9_1 HTAN Assayed Biospecimen ID HTA9_1_17 HTAN Originating Biospecimen ID HTA9_1_6
Input | Output |
---|---|
syn24829433 | HTA9_1_19362 is a mIHC image submitted by the HTAN OHSU center of a biopsy (Biospecimen HTA9_1_17) from a 70 year old female (Participant HTA9_1) diagnosed with infiltrating duct carcinoma NOS. The image contains 12 channels, approximately 8.96M pixels, and measures 1939 µm wide by 1157 µm high. It was acquired on a Leica, Aperio AT2 at 20x magnification |
syn25074523 | HTA13_1_7000 is a H&E image submitted by the HTAN TNP SARDANA center of a surgical Resection (Biospecimen HTA13_1_5) from a 69 year old male (Participant HTA13_1) diagnosed with mucous adenocarcinoma. The image contains 3 channels, approximately 3.12G pixels, and measures 18638 µm wide by 17656 µm high. It was acquired on a Rarecyte;HT;3 at 20x magnification |
syn26642484 | HTA7_927_1002 is a t-CyCIF image submitted by the HTAN HMS center of a surgical Resection (Biospecimen HTA7_927_4) from a 40 year old year old female (Participant HTA7_927) diagnosed with adenocarcinoma NOS. The image contains 52 channels, approximately 485.63M pixels, and measures 17791 µm wide by 11533 µm high. It was acquired on a RareCyte;HT;3 at 20x magnification |
syn24191311 | HTA10_01_10193094173699420948081950544055 is a ScRNA-seqLevel1 file submitted by the HTAN Stanford center of a surgical Resection (Biospecimen HTA10_01_023) from a 45 year old male (Participant HTA10_01) diagnosed with familial adenomatous polyposis. |
We welcome contributions! Please submit your changes via pull request.
This project is licensed under [Insert License Name Here].
Please raise an issue in the HTAN Orator repository if you have any questions or feedback.