Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Personalized RIV and region_to_gene tables #9

Open
r-trimbour opened this issue Jun 11, 2024 · 2 comments
Open

Personalized RIV and region_to_gene tables #9

r-trimbour opened this issue Jun 11, 2024 · 2 comments

Comments

@r-trimbour
Copy link

Hi dear developers!

Thanks a lot for this very interesting tool :)
It works very fast and intuitively !

I would like to use scMultiSim data in the context of a GRN inference method from multi-omics data, and in this context I had some questions about ATAC influence on gene expression and how I could personalize region-to-gene table.

I'm interested in a setup where the CIF-gene links are only based on the combination of CIF-ATAC and ATAC-genes, to explore how methods can make use of TF-regions/regions-genes informations to retrieve the corresponding GRN.

  1. Is it correct that the RIV and the region_to_gene tables are built independently from the GRN input by default ?
  2. In your opinion, would it make sense for this specific use to build myself the RIV, region_to_gene and GRN and give them as input ?
  3. If I do so, can I use discretes or continuous values for the region_to_gene ? Would it require to dig a bit into the code or is there a function I can use to simulate the counts if inputting myself these different tables ?

Thanks a lot for your feedbacks!!
Rémi

@lhc70000
Copy link
Collaborator

Hi Rémi, Thanks for these good questions!

Is it correct that the RIV and the region_to_gene tables are built independently from the GRN input by default ?

Yes, they are both generated randomly and independently. RIV is sampled from a distribution controlled by riv.mean, riv.prob and riv.sd; while region_to_gene is controlled by region.distrib.

In your opinion, would it make sense for this specific use to build myself the RIV, region_to_gene and GRN and give them as input ?

If you need to encode information about the regions, these matrices are definitely the right place to put it. We simplified the ATAC part because there was too much other work in this project, and we plan to improve this in the future. However, although this works theoretically, you may need to do some tests to see how well this information is encoded in the simulated data.

If I do so, can I use discretes or continuous values for the region_to_gene ?

Yes, I think continuous values will also work, because basically atac_data * region_to_gene = k_on, and region_to_gene acts like a weight matrix here.

Would it require to dig a bit into the code or is there a function I can use to simulate the counts if inputting myself these different tables ?

I may need some time to update the package and make it a general option. In the mean time, you may modify the code directly. You can find calls to these functions in 1_main.R:

  • region_to_gene is generated by .regionToGeneMatrix()
  • RIV is generated by .regionIdentityVectors()

@r-trimbour
Copy link
Author

Thanks a lot for this detailed answer !

I will try to modify these 2 functions and if this specific case interests you too, I'll let you know how the simulation results look like :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants