Suggested CICD for October 2025 #170

djsmith17 · 2025-10-22T04:17:42Z

This pull request includes a range of improvements and refactoring across several modules, focusing on code clarity, maintainability, and efficiency. The main changes involve cleaning up and optimizing phenotyping logic, enhancing logging and diagnostics in the main web app, improving subprocess handling for package installation, and updating documentation and comments for better readability.

Phenotyping logic improvements:

Refactored the init_pheno_cols, init_pheno_assign, and init_pheno_summ functions in basic_phenotyper_lib.py for more efficient and vectorized operations, replacing manual loops and deprecated code with pandas-native methods. This includes a new, faster implementation for generating species_name_short and calculating counts and percentages. [1] [2] [3]

Logging and diagnostics enhancements:

Added and configured a logging system in Multiplex_Analysis_Web_Apps.py to provide informative runtime messages for key operations such as platform initialization, directory creation, session state initialization, and benchmarking actions. [1] [2] [3] [4] [5]

Subprocess and package installation robustness:

Updated all subprocess calls in install_missing_packages.py to explicitly set check=False for better error handling and clarity, and improved code formatting for readability. [1] [2] [3] [4]

Documentation and code clarity:

Improved docstrings and comments in nidap_io.py for functions interacting with NIDAP, clarifying argument types, return values, and usage examples. [1] [2] [3]
Minor typo and comment fixes in nidap_dashboard_lib.py to improve code readability. [1] [2]

Module import and structure updates:

Changed imports in Multiplex_Analysis_Web_Apps.py from pages2 to pages, removed the live package installation call, and updated sidebar UI elements for clarity. [1] [2]

These changes collectively improve performance, maintainability, and user/developer experience across the codebase.This pull request refactors the codebase to improve maintainability, logging, and usability. The main changes include moving modules from the pages2 directory to pages, enhancing logging and user feedback in the main app, and improving subprocess handling in package installation scripts. Several modules also received docstrings and code cleanup for clarity.

Module refactoring and imports:

All imports previously referencing pages2 are updated to use the new pages directory, and associated files are renamed accordingly. This standardizes module organization and simplifies future maintenance. [1] [2] [3] [4]

Logging and user feedback improvements:

Introduced Python's logging module in Multiplex_Analysis_Web_Apps.py for better tracking of application events, such as directory creation, session state initialization, and sidebar actions. Logging statements now provide real-time feedback for key operations. [1] [2] [3]
Minor UI improvement: updated sidebar documentation link to use a book emoji for better readability.

Subprocess and installation script enhancements:

Updated all subprocess.run calls in install_missing_packages.py to explicitly set check=False for more robust error handling and readability. [1] [2] [3] [4]

Code documentation and cleanup:

Added module-level docstrings and function docstrings to several files (e.g., datafile_format_unifier.py), clarifying their purpose and usage for future developers. [1] [2] [3]
Minor code cleanup, such as removing unnecessary blank lines and reordering imports for clarity. [1] [2]

Main function usability:

Added a docstring to the main() function in Multiplex_Analysis_Web_Apps.py to clarify its role as the entry point for the application.

And renaming skeleton2.py to skeleton.py

…ot-in-the-subplots-section' into suggested-cicd

For legibility and for CI/CD

Previously a bunch of UI elements were removed from the Data Import and Export page that were not necessary for local use/development. I am adding them back in for a few reasons. - It makes development of these widgets easier when developing locally - It will force us to consider how this design will need to change when we move from the NIDAP foundry platform to the next version. I am also populating the list of files in the ./input folder to better test a nother festure. We can undo line 281 later if its confusing

Save this for later. I need to stop working on this for now, but I will be back

This reverts commit 3231eb3.

The column starting width can sometimes be way too large. This is an attempt to regulate that size so that its easier to read the tables in the Data Import and Export Page

Removing .apply for .agg

This is mostly the same just with some added f strings and indentation. Following co-pilot and this indentation looked good.

Much more straight forward. Wish I had thought of it

How far we have come with making MAWA a more effecient tool

Made this more efficient and I cut this down by 40s for a 21M cell detaframe

Replaces the custom species_name_long_to_short function with a pandas str.extractall and groupby approach to generate the species_name_short column. The new method simplifies the code and leverages pandas string operations for improved clarity and maintainability.

Copilot

Pull Request Overview

This pull request refactors and modernizes the codebase to improve maintainability, performance, and developer experience. The primary change is reorganizing modules from pages2 to pages, with additional enhancements to logging infrastructure, code documentation, subprocess handling, and phenotyping performance.

Key Changes:

Module reorganization from pages2 to pages directory across all imports
Introduction of structured logging for application events and diagnostics
Performance optimizations in phenotyping logic using vectorized pandas operations
Enhanced documentation with comprehensive docstrings and parameter descriptions

Reviewed Changes

Copilot reviewed 15 out of 40 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
Multiplex_Analysis_Web_Apps.py	Updated imports from pages2 to pages, added logging infrastructure, improved main function documentation
streamlit_session_state_management.py	Updated import path from pages2 to pages
streamlit_dataframe_editor.py	Enhanced docstrings with detailed parameter descriptions, improved code formatting
platform_io.py	Added comprehensive docstrings, improved code organization, added column configuration for dataframes
pages2/skeleton.py	File removed (relocated to pages directory)
pages/skeleton.py	Cleaned up comments for clarity
pages/thresholded_phenotyping.py	Changed expander to form for marker selection, added benchmarking timers
pages/spatial_umap_prediction_app.py	Updated import from pages2 to pages
pages/datafile_format_unifier.py	Added module and function docstrings
pages/child_process_killer.py	Added module docstring, reorganized imports
pages/adaptive_phenotyping.py	Updated import from pages2 to pages, removed extra blank line
nidap_io.py	Improved docstrings with detailed parameter and return value documentation
nidap_dashboard_lib.py	Fixed typo in comment ("BUttons" to "Buttons"), removed unused import
install_missing_packages.py	Added explicit check=False to all subprocess calls, improved code formatting
basic_phenotyper_lib.py	Refactored phenotyping functions for performance using vectorized pandas operations

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-23T07:34:58Z

basic_phenotyper_lib.py

+    #     #     return 'Other'
+    #     # return ' + '.join(marker_names[i] for i in marker_indices) + '+'
+    # df['species_name_short'] = df['species_name_long'].apply(species_name_long_to_short)
+    df['species_name_short'] = df['species_name_long'].str.extractall(r'(\w+)\+').groupby(level=0).agg(' + '.join).fillna('Other') + '+'


The new vectorized implementation for 'species_name_short' changes the output format compared to the commented-out function. The original function returns 'Other' for cases with no positive markers, but the new implementation returns 'Other+'. Additionally, cells with positive markers will get an extra '+' at the end. For example, 'ECAD+ COX2+' in the old version will become 'ECAD + COX2++' in the new version. Consider using: df['species_name_short'] = df['species_name_long'].str.extractall(r'(\w+)\+').groupby(level=0).agg(' + '.join).fillna('Other').apply(lambda x: x if x == 'Other' else x + '+')

Suggested change

df['species_name_short'] = df['species_name_long'].str.extractall(r'(\w+)\+').groupby(level=0).agg(' + '.join).fillna('Other') + '+'

df['species_name_short'] = df['species_name_long'].str.extractall(r'(\w+)\+').groupby(level=0).agg(' + '.join).fillna('Other').apply(lambda x: x if x == 'Other' else x + '+')

Copilot · 2025-10-23T07:34:59Z

pages/thresholded_phenotyping.py

+        with st.form('Choose Markers to include'):
            st.multiselect('Markers', options = st.session_state.loaded_marker_names,
-                                      key = 'marker_multi_sel',
-                                      on_change=marker_multiselect_callback)
+                                      key = 'marker_multi_sel')
+
+            # Every form must have a submit button.
+            submitted = st.form_submit_button('Apply Changes')
+            if submitted:
+                marker_multiselect_callback()


The form ID 'Choose Markers to include' contains spaces which could cause issues with Streamlit's internal form handling. Form identifiers should follow a consistent naming convention without spaces. Consider using a key like 'marker_selection_form' instead.

nidap_dashboard_lib.py

djsmith17 added 8 commits August 13, 2025 13:45

Deleting live_package_installation

9439700

Adding logging

078ea38

Bit of linting in the install_missing_packages

264984f

renaming pages2->pages

02d6bb9

Linting

85d80da

Removing old skeleton.py

cd81944

And renaming skeleton2.py to skeleton.py

Merge branch '166-export-all-button-that-exports-the-data-for-each-pl…

fc1a0fc

…ot-in-the-subplots-section' into suggested-cicd

Merge branch 'develop' into suggested-cicd

4eaa594

djsmith17 self-assigned this Oct 22, 2025

djsmith17 added 13 commits October 22, 2025 21:01

Linting

82120a9

For legibility and for CI/CD

output of data_editor for tracking selections

3231eb3

Save this for later. I need to stop working on this for now, but I will be back

Revert "output of data_editor for tracking selections"

8f7deec

This reverts commit 3231eb3.

An attempt to fix the width of the table columns

ffb7298

The column starting width can sometimes be way too large. This is an attempt to regulate that size so that its easier to read the tables in the Data Import and Export Page

Updating the mark_bits to be more effecient

7390592

Removing .apply for .agg

Updating the species_name_long

c0359b3

This is mostly the same just with some added f strings and indentation. Following co-pilot and this indentation looked good.

Improving the column has_pos_mark

0de413d

Much more straight forward. Wish I had thought of it

More effecient init_pheno_assign

5dd31ee

How far we have come with making MAWA a more effecient tool

Improved timing for init_pheno_summ

f5ca96f

Made this more efficient and I cut this down by 40s for a 21M cell detaframe

Making the marker selection box a form to avoid long waits

84483e7

Some logging for tracking

5b683bc

djsmith17 requested review from andrew-weisman and Copilot October 23, 2025 07:34

djsmith17 added bug Something isn't working dependencies Pull requests that update a dependency file labels Oct 23, 2025

Copilot AI reviewed Oct 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suggested CICD for October 2025 #170

Suggested CICD for October 2025 #170

Uh oh!

djsmith17 commented Oct 22, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 23, 2025

Uh oh!

Copilot AI Oct 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	df['species_name_short'] = df['species_name_long'].str.extractall(r'(\w+)\+').groupby(level=0).agg(' + '.join).fillna('Other') + '+'
	df['species_name_short'] = df['species_name_long'].str.extractall(r'(\w+)\+').groupby(level=0).agg(' + '.join).fillna('Other').apply(lambda x: x if x == 'Other' else x + '+')

Suggested CICD for October 2025 #170

Are you sure you want to change the base?

Suggested CICD for October 2025 #170

Uh oh!

Conversation

djsmith17 commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

djsmith17 commented Oct 22, 2025 •

edited

Loading