Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running into Panda DataFrame issue when running phenotype similarity search #3

Open
deepakunni3 opened this issue Aug 24, 2018 · 2 comments

Comments

@deepakunni3
Copy link
Member

@stuppie
I refactored a bit of the phenotype_similarity code but the overall functionality still stays the same.
I ran into the following issue when trying to do a similarity search:

Traceback (most recent call last):
  File "pheno_test.py", line 24, in <module>
  File "/Users/deepak.unni3/GIT/mvp-module-library/Modules/phenotype_similarity.py", line 50, in similarity_search
    self.phenogene_score = reduce(lambda x, y: pd.merge(x, y, on='id').set_index('id').sum(axis=1), self.results)
  File "/Users/deepak.unni3/GIT/mvp-module-library/Modules/phenotype_similarity.py", line 50, in <lambda>
    self.phenogene_score = reduce(lambda x, y: pd.merge(x, y, on='id').set_index('id').sum(axis=1), self.results)
  File "/Users/deepak.unni3/GIT/mvp-module-library/env/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 61, in merge
    validate=validate)
  File "/Users/deepak.unni3/GIT/mvp-module-library/env/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 524, in __init__
    'type {left}'.format(left=type(left)))
ValueError: can not merge DataFrame with instance of type <class 'pandas.core.series.Series'>

I am not entirely sure what that line is trying to do.

Test script used:

from BioLink.biolink_client import BioLinkWrapper
from SimSearch.simsearch_client import SimSearchWrapper

input_disease = 'MONDO:0019391'  # Fanconi Anemia

blw = BioLinkWrapper()

# get fa related genes from BioLink
fa_gene_associations = blw.disease2genes(input_disease)
fa_gene_curies = fa_gene_associations['objects']
fa_gene_curies


from Modules.phenotype_similarity import PhenotypeSimilarity
p = PhenotypeSimilarity()

p.load_gene_set(fa_gene_curies,taxon='9606')
p.load_associations()
p.similarity_search()
@stuppie
Copy link
Contributor

stuppie commented Aug 25, 2018

I cannot run it because it says it cannot find "ontobio.analysis" (link). (I installed ontobio from pip and cloned git repo, but still doesn't find it).

But I see its a bug anyways!! I'm summing the df before finishing the reduce!
reduce(lambda x, y: pd.merge(x, y, on='id').set_index('id').sum(axis=1), self.results)
should be
reduce(lambda x, y: pd.merge(x, y, on='id'), self.results).set_index('id').sum(axis=1)
The line is merging all of the dataframes in that list together, and then summing the counts

Also, there is another bug. Should be doing an outer join not inner (the default)
reduce(lambda x, y: pd.merge(x, y, on='id', how='outer'), self.results).set_index('id').sum(axis=1)

@deepakunni3
Copy link
Member Author

Whoops. A recently merged PR relies on a new release of Ontobio.
You can comment out the breaking line from GenericSimilarity and then run your code for Phenotype Similarity. That should still work.

I'll create another issue for the ontobio.analysis error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants