Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable display and search of channel names #636

Open
adamjtaylor opened this issue May 8, 2024 · 6 comments
Open

Enable display and search of channel names #636

adamjtaylor opened this issue May 8, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@adamjtaylor
Copy link
Contributor

Objective:

Implement a feature on the HTAN portal to display harmonized target names for multiplexed tissue imaging data. This aims to assist researchers in easily locating and identifying datasets with specific antibody markers.

User stories:

As a cancer researcher interested in HTAN multiplexed tissue imaging data, I want to view a list of antibody targets and channels for images on the HTAN portal and use filters to search for datasets based on these attributes, so that I can easily locate datasets with specific markers relevant to my research.

As a cancer researcher, I want to identify HTAN imaging datasets where antibodies CD45, CD8, and CD4 were targeted, so that I can specifically identify cytotoxic and helper T cell populations for my studies.

Background:

Currently, channel metadata is not easily exposed or searchable by users. Additionally it was not validated at ingestion so is poorly structured. @adamjtaylor is exploring an LLM approach with Lama3 for harmonizing target names that seems promising.
To support this work, and provide a MVP solution for users, this issue focuses on creating a method to display these names effectively on the portal.

For the MVP:

  • Mapping File Creation
    • We'll need a file that links entity IDs to channel names.
  • Portal Integration:
    • Add a new column in the file tab called Targets.
    • This column will list targets as a single string, like ['DNA', 'CD45', 'CD8', 'CD4'].

Looking Ahead:

Eventually, we want to incorporate these target names directly into the dataset metadata. Starting with this simpler display feature will help us lay the groundwork for future enhancements.

@adamjtaylor
Copy link
Contributor Author

@inodb lets have a quick think about what mapping file setup would be best and think about any backend changes needed to enable this - I am hoping this is simply a join operation between the mapping file and the master JSON

@adamjtaylor adamjtaylor added the enhancement New feature or request label May 8, 2024
@adamjtaylor
Copy link
Contributor Author

adamjtaylor commented May 8, 2024

One option would be a mapping file like this

{
  "syn1234": ["Target1","Target2"] 
  "syn53284675": ["DNA", "CD8", "CD45"."CD4", "Ki-67"],
},

I think this seems extensible enough to start with the original as provided target names and switch to harmonized ones in due course.

@adamjtaylor
Copy link
Contributor Author

adamjtaylor commented May 8, 2024

The following Big Query gets us a table close to what we need:

SELECT 
    e.entityId,
    cm.Channel_Metadata_ID, 
    STRING_AGG(attribute.attributeValue, ", ") AS channel_names,
    
FROM 
    `htan-dcc.ISB_CGC_r5.channel_metadata` cm,
    UNNEST(cm.channel_attributes) AS attribute
INNER JOIN 
    `htan-dcc.released.entities_v5_1` e ON cm.Channel_Metadata_ID = e.channel_metadata_synapseId
WHERE 
    attribute.attributeName = 'Channel Name'
AND attribute.attributeValue NOT IN  ('Red','Green','Blue')
GROUP BY 
    cm.Channel_Metadata_ID, e.entityId
Screenshot 2024-05-08 at 4 14 47 PM

@adamjtaylor
Copy link
Contributor Author

@inodb I'd like to move forward with discussing how to implement this portal side so I can ensure outputs are prepared correctly.

@inodb
Copy link
Collaborator

inodb commented Jun 4, 2024

@adamjtaylor the bigquery table looks good to me! We already have a way to pull from BigQuery directly and store it, so I don't think you need to provide anything else

@adamjtaylor
Copy link
Contributor Author

OK. So I will look to push back a new table to BQ that has entityId, Channel_Metadata_ID, and a new column harmonized_channel_names

I'll point you to that once complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants