Skip to content

Commit

Permalink
Merge pull request #123 from monarch-initiative/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
pnrobinson authored Jul 13, 2024
2 parents dca7a3c + 30077bb commit 4a11236
Show file tree
Hide file tree
Showing 20 changed files with 1,843 additions and 2,386 deletions.
25 changes: 25 additions & 0 deletions docs/developers/internal.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,28 @@ Unit tests were written for pytest, which can be installed with pip and run from
```bash
pytest
```


## Documentation

These pages are generated with mkdocs.

To set things up, perform the following steps (substitute name of venv if needed).

```
python3 -m venv venvhpoo
source venvhpo/bin/activate
pip install --upgrade pip
pip install mkdocs
pip install mkdocs-material
pip install mkdocs-material[imaging]
pip install pillow cairosvg
pip install mkdocs-material-extensions
pip install mkdocstrings[python]
```

To start a local server, enter:
```
mkdocs serve
```

42 changes: 42 additions & 0 deletions docs/user-guide/discombobulator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Discombobulator


Sometimes tables contain many different related items. For instance, the following box shows the contents of one cell in a table from [PMID:30057029](https://pubmed.ncbi.nlm.nih.gov/30057029/){:target="_blank"} in a column entitled "dysmorpholoy".


!!! dysmorphology

frontal bossing, curled hair, highly arched and sparse eyebrows, long eyelashes, downslanting palpebral fissures, depressed nasal ridge, tented upper lip

While we could create annotations by hand and create one column for each entry in this cell (and all of the other entries in the column), it can be error-prone and time consuming. Therefore, pyphetools has a (currently experimental) feature called "discombobulation", then takes all of the entries in such a cell from each cell in a column, and creates corresponding columns and rows for the standard Excel template file. To do this, we create the following python code.


```python
from pyphetools.creation import Discombobulator, HpoParser
import pandas as pd
parser = HpoParser()
hp_cr = parser.get_hpo_concept_recognizer()
disco = Discombobulator(hpo_cr=hp_cr)
```

This creates a Discombobulator object that can be used for all of the relevant columns of the original supplemental file. Assuming for this example that the original file is called "temp.xslx" and the column of interest is called "face", we would use the following python code.


```python
df = pd.read_excel("temp.xlsx")
df_face = disco.decode(df=df, column="face", assumeExcluded=True)
df_face.head(3)
```

The assumeExcluded argument determines if we call a feature to be absent if it is not mentioned in a certain cell but is mentioned in another cell in the same column. This assumption seems justified for dysmorphology features if the authors state a full examination was conducted.


For now, this function operates one column at a time. We can save the results in an excel file and manually add them to the template file.

```python
df_face.to_excel("temp_face.xlsx")
```

This functionality is currently in an experimental stage and we are exploring ways to make its use easier. We do not recommend using the Decombobualtor unless you are very comfortable with Python and Excel.

There is no need to keep the temporary excel files or python code after creating the main template file.
6 changes: 3 additions & 3 deletions docs/user-guide/python_notebook.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The following sections explain how to use Python code to create phenopackets fro
We first import the TemplateImporter to import the data and create phenopackets, and several classes to visualize the data.

```python
from pyphetools.creation import TemplateImporter
from pyphetools.creation import TemplateImporter, Moi
from pyphetools.visualization import IndividualTable, QcVisualizer
from IPython.display import display, HTML
import pyphetools
Expand Down Expand Up @@ -78,11 +78,11 @@ the mode of inheritance (MOI) and then indicate the MOI. If multiple distinct di
Check results of variant encoding.
```python
pmid = "PMID:36333996"
timporter.create_hpoa_from_phenopackets(pmid=pmid, moi="Autosomal recessive")
df = timporter.create_hpoa_from_phenopackets(pmid=pmid, mode_of_inheritance=Moi.AR)
```
or

```python
pmid = "PMID:36333996"
timporter.create_hpoa_from_phenopackets(pmid=pmid, moi="Autosomal recessive", target="OMIM:620427")
df = timporter.create_hpoa_from_phenopackets(pmid=pmid, mode_of_inheritance=Moi.AD, target="OMIM:620427")
```
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ nav:
- 'user-guide/python_notebook.md'
- 'user-guide/tips_for_curation.md'
- 'user-guide/variant_notation.md'
- 'user-guide/discombobulator.md'
- Coding tabular data with Python scripts:
- Overview: 'tabular/overview.md'
- Jupyter notebooks: 'tabular/jupyter.md'
Expand Down
Loading

0 comments on commit 4a11236

Please sign in to comment.