Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illunima smasher jobs fail when trying to convert string to float #3511

Open
3 tasks
davidsmejia opened this issue Aug 30, 2024 · 0 comments
Open
3 tasks

Illunima smasher jobs fail when trying to convert string to float #3511

davidsmejia opened this issue Aug 30, 2024 · 0 comments

Comments

@davidsmejia
Copy link
Contributor

davidsmejia commented Aug 30, 2024

Context

There was a recently opened issue #3510 where someone is trying to download a dataset but there is no expression matrix in the file.

From the logs you can see that there are two errors that are not handled correctly.

  1. When parsing the original file we attempt to convert string values in column nuID to a float.

ValueError: could not convert string to float: 'ritxUH.kuHlYqjozpE'
TypeError: Cannot cast array data from dtype('O') to dtype('float32') according to the rule 'safe'

  1. Since all samples in this experiment have similarly structured data there was no output data but the smasher still handled this job as successful.

ERROR [key: HOMO_SAPIENS] [job_id: 29798209]: Was told to smash a key with no frames!

The original file used to determine the errant column.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1182499

Problem or idea

For the failure in smashing these files, this is probably something that we wanted to have addressed in the no_op processor. However, because these files are are already downloaded we may need to account for this in the smasher_tools module.

For the other error, we will just want to add another check that determines if at least one key is present in the expression matrix. If not we should either throw an error or add a flag to indicate that there is data missing from the zip.

Solution or next step

  • Confirm assessment with someone from the science team
  • Update smasher to fail instead of creating metadata only download
  • Determine what to do about nuID column in original file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant