Running into Panda DataFrame issue when running phenotype similarity search #3

deepakunni3 · 2018-08-24T23:54:40Z

@stuppie
I refactored a bit of the phenotype_similarity code but the overall functionality still stays the same.
I ran into the following issue when trying to do a similarity search:

Traceback (most recent call last):
  File "pheno_test.py", line 24, in <module>
  File "/Users/deepak.unni3/GIT/mvp-module-library/Modules/phenotype_similarity.py", line 50, in similarity_search
    self.phenogene_score = reduce(lambda x, y: pd.merge(x, y, on='id').set_index('id').sum(axis=1), self.results)
  File "/Users/deepak.unni3/GIT/mvp-module-library/Modules/phenotype_similarity.py", line 50, in <lambda>
    self.phenogene_score = reduce(lambda x, y: pd.merge(x, y, on='id').set_index('id').sum(axis=1), self.results)
  File "/Users/deepak.unni3/GIT/mvp-module-library/env/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 61, in merge
    validate=validate)
  File "/Users/deepak.unni3/GIT/mvp-module-library/env/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 524, in __init__
    'type {left}'.format(left=type(left)))
ValueError: can not merge DataFrame with instance of type <class 'pandas.core.series.Series'>

I am not entirely sure what that line is trying to do.

Test script used:

from BioLink.biolink_client import BioLinkWrapper
from SimSearch.simsearch_client import SimSearchWrapper

input_disease = 'MONDO:0019391'  # Fanconi Anemia

blw = BioLinkWrapper()

# get fa related genes from BioLink
fa_gene_associations = blw.disease2genes(input_disease)
fa_gene_curies = fa_gene_associations['objects']
fa_gene_curies


from Modules.phenotype_similarity import PhenotypeSimilarity
p = PhenotypeSimilarity()

p.load_gene_set(fa_gene_curies,taxon='9606')
p.load_associations()
p.similarity_search()

The text was updated successfully, but these errors were encountered:

stuppie · 2018-08-25T16:56:46Z

I cannot run it because it says it cannot find "ontobio.analysis" (link). (I installed ontobio from pip and cloned git repo, but still doesn't find it).

But I see its a bug anyways!! I'm summing the df before finishing the reduce!
reduce(lambda x, y: pd.merge(x, y, on='id').set_index('id').sum(axis=1), self.results)
should be
reduce(lambda x, y: pd.merge(x, y, on='id'), self.results).set_index('id').sum(axis=1)
The line is merging all of the dataframes in that list together, and then summing the counts

Also, there is another bug. Should be doing an outer join not inner (the default)
reduce(lambda x, y: pd.merge(x, y, on='id', how='outer'), self.results).set_index('id').sum(axis=1)

deepakunni3 · 2018-08-27T15:08:35Z

Whoops. A recently merged PR relies on a new release of Ontobio.
You can comment out the breaking line from GenericSimilarity and then run your code for Phenotype Similarity. That should still work.

I'll create another issue for the ontobio.analysis error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running into Panda DataFrame issue when running phenotype similarity search #3

Running into Panda DataFrame issue when running phenotype similarity search #3

deepakunni3 commented Aug 24, 2018

stuppie commented Aug 25, 2018 •

edited

Loading

deepakunni3 commented Aug 27, 2018

Running into Panda DataFrame issue when running phenotype similarity search #3

Running into Panda DataFrame issue when running phenotype similarity search #3

Comments

deepakunni3 commented Aug 24, 2018

stuppie commented Aug 25, 2018 • edited Loading

deepakunni3 commented Aug 27, 2018

stuppie commented Aug 25, 2018 •

edited

Loading