Personalized RIV and region_to_gene tables #9

r-trimbour · 2024-06-11T16:07:18Z

Hi dear developers!

Thanks a lot for this very interesting tool :)
It works very fast and intuitively !

I would like to use scMultiSim data in the context of a GRN inference method from multi-omics data, and in this context I had some questions about ATAC influence on gene expression and how I could personalize region-to-gene table.

I'm interested in a setup where the CIF-gene links are only based on the combination of CIF-ATAC and ATAC-genes, to explore how methods can make use of TF-regions/regions-genes informations to retrieve the corresponding GRN.

Is it correct that the RIV and the region_to_gene tables are built independently from the GRN input by default ?
In your opinion, would it make sense for this specific use to build myself the RIV, region_to_gene and GRN and give them as input ?
If I do so, can I use discretes or continuous values for the region_to_gene ? Would it require to dig a bit into the code or is there a function I can use to simulate the counts if inputting myself these different tables ?

Thanks a lot for your feedbacks!!
Rémi

lhc70000 · 2024-06-12T23:50:25Z

Hi Rémi, Thanks for these good questions!

Is it correct that the RIV and the region_to_gene tables are built independently from the GRN input by default ?

Yes, they are both generated randomly and independently. RIV is sampled from a distribution controlled by riv.mean, riv.prob and riv.sd; while region_to_gene is controlled by region.distrib.

In your opinion, would it make sense for this specific use to build myself the RIV, region_to_gene and GRN and give them as input ?

If you need to encode information about the regions, these matrices are definitely the right place to put it. We simplified the ATAC part because there was too much other work in this project, and we plan to improve this in the future. However, although this works theoretically, you may need to do some tests to see how well this information is encoded in the simulated data.

If I do so, can I use discretes or continuous values for the region_to_gene ?

Yes, I think continuous values will also work, because basically atac_data * region_to_gene = k_on, and region_to_gene acts like a weight matrix here.

Would it require to dig a bit into the code or is there a function I can use to simulate the counts if inputting myself these different tables ?

I may need some time to update the package and make it a general option. In the mean time, you may modify the code directly. You can find calls to these functions in 1_main.R:

region_to_gene is generated by .regionToGeneMatrix()
RIV is generated by .regionIdentityVectors()

r-trimbour · 2024-06-13T09:27:02Z

Thanks a lot for this detailed answer !

I will try to modify these 2 functions and if this specific case interests you too, I'll let you know how the simulation results look like :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Personalized RIV and region_to_gene tables #9

Personalized RIV and region_to_gene tables #9

r-trimbour commented Jun 11, 2024

lhc70000 commented Jun 12, 2024

r-trimbour commented Jun 13, 2024

Personalized RIV and region_to_gene tables #9

Personalized RIV and region_to_gene tables #9

Comments

r-trimbour commented Jun 11, 2024

lhc70000 commented Jun 12, 2024

r-trimbour commented Jun 13, 2024