Skip to content

Design question - different protein panels in one TileDB Soma #4327

@mattinbits

Description

@mattinbits

I am working on a case of using TileDB-SOMA for proteomics data. The goal is to store the readings from many plates in a single TileDB, so we can easily query/slice across the full dataset.

Our current thinking:

  • The plate barcodes, well positions, and other sample metadata go in the OBS dataframe
  • The protein IDs (e.g. uniprot) go in the VAR dataframe
  • The different data points per well+protein go in different layers

The challenge comes from different plates having different panels. We can make the var matrix the union of all proteins across all panels, but then we have to decide how to handle the "gaps", where a well position from a plate does not have values for a protein. If using a sparse matrix, it seems like we cannot differentiate between the empty value being a zero, or an absence of a value.

Some ideas we considered:

  • Initially, I thought that multiple measurements could help us with our protein panels problem. But on closer study it seems this multi-modal support is designed to handle the case where the features vary across the same set of observations. but here the features vary across different sets of observations.
  • Use a dense matrix rather than a sparse matrix, so "NaN" can represent lack of a value, and zero can mean zero.
  • Having a layer of type "string", which, if present, gives contextual information about that position in the matrix. As well as whether the observation has a value for the protein, this allows us to represent different error information, since we see different types of "bad reading" conditions, that we might want to represent. If we use a sparse matrix, then it would hopefully be space efficient for those positions with no extra context to store.

I am interested in the experience of others in handling this kind of scenario, and advice on the suitability of the ideas we've considered for handling the issue.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions