Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sites to extract for input wgs vcf #17

Open
aymanm opened this issue May 26, 2024 · 3 comments
Open

sites to extract for input wgs vcf #17

aymanm opened this issue May 26, 2024 · 3 comments

Comments

@aymanm
Copy link

aymanm commented May 26, 2024

Thanks for your great work.
I have been testing this tool the last couple of days and wondering if there are optimal sites to select /subset for the input vcf ?
A large vcf would require lots of memory and therefore a minimal size vcf input that contains optimal markers would be great.
would appreciate your advice on this. I believe one other user asked a similar question in the issues.

thanks again

@andreirajkovic
Copy link
Collaborator

Hi @aymanm! That is an interesting idea, and one I don't believe we've explored thoroughly (@audrey-bollas correct me if I'm wrong). I think you could technically pull this off, but you'd need to compute feature importance on the training data and then take the n-top (e.g. 100 variants) features for each population and filter your vcfs doing that. Likely, the model would still be performant, especially with WGS data. WES you might lose some accuracy as you'd be at the mercy of the WES kit having probes that cover those variants.

@aymanm
Copy link
Author

aymanm commented May 31, 2024

thanks for the clarification. i might give this a try, i'll submit a pull request if am successful.

@RaviBot
Copy link

RaviBot commented Aug 21, 2024

Hello @andreirajkovic! I had a similar question to @aymanm. I have a very large VCF (914GB) and was wondering if there was a suggested course of action for this. I have a 126 GB memory system and was still not able to get it to run.

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants