Skip to content

Commit

Permalink
docs
Browse files Browse the repository at this point in the history
  • Loading branch information
pnrobinson committed Jun 21, 2024
1 parent 67167aa commit 568b4f5
Show file tree
Hide file tree
Showing 4 changed files with 69 additions and 1 deletion.
25 changes: 25 additions & 0 deletions docs/developers/internal.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,28 @@ Unit tests were written for pytest, which can be installed with pip and run from
```bash
pytest
```


## Documentation

These pages are generated with mkdocs.

To set things up, perform the following steps (substitute name of venv if needed).

```
python3 -m venv venvhpoo
source venvhpo/bin/activate
pip install --upgrade pip
pip install mkdocs
pip install mkdocs-material
pip install mkdocs-material[imaging]
pip install pillow cairosvg
pip install mkdocs-material-extensions
pip install mkdocstrings[python]
```

To start a local server, enter:
```
mkdocs serve
```

42 changes: 42 additions & 0 deletions docs/user-guide/discombobulator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Discombobulator


Sometimes tables contain many different related items. For instance, the following box shows the contents of one cell in a table from [PMID:30057029](https://pubmed.ncbi.nlm.nih.gov/30057029/){:target="_blank"} in a column entitled "dysmorpholoy".


!!! dysmorphology

frontal bossing, curled hair, highly arched and sparse eyebrows, long eyelashes, downslanting palpebral fissures, depressed nasal ridge, tented upper lip

While we could create annotations by hand and create one column for each entry in this cell (and all of the other entries in the column), it can be error-prone and time consuming. Therefore, pyphetools has a (currently experimental) feature called "discombobulation", then takes all of the entries in such a cell from each cell in a column, and creates corresponding columns and rows for the standard Excel template file. To do this, we create the following python code.


```python
from pyphetools.creation import Discombobulator, HpoParser
import pandas as pd
parser = HpoParser()
hp_cr = parser.get_hpo_concept_recognizer()
disco = Discombobulator(hpo_cr=hp_cr)
```

This creates a Discombobulator object that can be used for all of the relevant columns of the original supplemental file. Assuming for this example that the original file is called "temp.xslx" and the column of interest is called "face", we would use the following python code.


```python
df = pd.read_excel("temp.xlsx")
df_face = disco.decode(df=df, column="face", assumeExcluded=True)
df_face.head(3)
```

The assumeExcluded argument determines if we call a feature to be absent if it is not mentioned in a certain cell but is mentioned in another cell in the same column. This assumption seems justified for dysmorphology features if the authors state a full examination was conducted.


For now, this function operates one column at a time. We can save the results in an excel file and manually add them to the template file.

```python
df_face.to_excel("temp_face.xlsx")
```

This functionality is currently in an experimental stage and we are exploring ways to make its use easier. We do not recommend using the Decombobualtor unless you are very comfortable with Python and Excel.

There is no need to keep the temporary excel files or python code after creating the main template file.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ nav:
- 'user-guide/python_notebook.md'
- 'user-guide/tips_for_curation.md'
- 'user-guide/variant_notation.md'
- 'user-guide/discombobulator.md'
- Coding tabular data with Python scripts:
- Overview: 'tabular/overview.md'
- Jupyter notebooks: 'tabular/jupyter.md'
Expand Down
2 changes: 1 addition & 1 deletion src/pyphetools/creation/create_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def arrange_terms(self) -> List[hpotk.model.TermId]:
hp_term_list = list()
## Arrange hp_terms so that all terms that belong to a given top level term go together
PHENO_ROOT_TERM_ID = "HP:0000118"
top_level_term_ids = self._hpo_ontology.graph.get_children(PHENO_ROOT_TERM_ID, False)
top_level_term_ids = self._hpo_ontology.graph.get_children(PHENO_ROOT_TERM_ID, True)
top_level_term_ids = list(top_level_term_ids)
top_level_d = defaultdict(list)
for hpt in self._all_added_hp_term_set:
Expand Down

0 comments on commit 568b4f5

Please sign in to comment.