Skip to content

Add zero-aware abundance-based grouping refinement and re-integration-first workflow, plus enhanced Experiment-results abundance and MS/MS analysis/export tools#5

Draft
Copilot wants to merge 27 commits into
mainfrom
copilot/implement-feature-similarity-check
Draft

Add zero-aware abundance-based grouping refinement and re-integration-first workflow, plus enhanced Experiment-results abundance and MS/MS analysis/export tools#5
Copilot wants to merge 27 commits into
mainfrom
copilot/implement-feature-similarity-check

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 11, 2026

This change implements the requested additional “belong together” check inside metabolite grouping by evaluating similarity of feature abundances within each preliminary group, updates the processing order so grouping runs on re-integrated values before annotation, and adds extended MS/MS comparison and export capabilities in Experiment results.

  • What changed

    • Added/updated second-pass grouping refinement in calculateMetaboliteGroups that:
      • builds per-feature abundance vectors from *_Abundance_N columns
      • applies graph-based subgroup splitting within each initial group using configurable abundance similarity thresholds and connection-rate criteria
    • Updated processing flow to:
      1. bracketing
      2. re-integration
      3. grouping
      4. annotation
  • New grouping helper module

    • Added src/metaboliteGrouping.py with focused utilities for abundance-based splitting:
      • custom pairwise similarity that:
        • computes Pearson correlation on samples where both features are quantified
        • includes samples where both are not quantified with lower weight
        • omits samples where only one feature is quantified
      • graph construction and dense-subcluster based splitting (replacing dendrogram/HCA strategy for abundance refinement)
      • safeguards to avoid over-separating groups when no dense core is found
      • robust handling for sparse/degenerate and constant proportional profiles
  • Processing tab UI controls (added per feedback)

    • Added Use abundance similarity checkbox in Convolute results to enable/disable abundance-based subgroup splitting.
    • Added Abundance similarity threshold (% spinbox) in Convolute results to configure the abundance-profile similarity threshold independently.
    • Wired both controls through:
      • settings persistence/load (MExtract.py)
      • calculateMetaboliteGroups(...) call parameters
      • grouping parameter export into the results Parameters table.
  • Experiment results visualization tab (added per feedback)

    • Added a new Abundance profiles tab in Experiment results for quick inspection of selected features.
    • Added visualization mode switch:
      • Boxplot (one tick per experimental group, with one offset box per selected feature within each group)
      • Line plot (one line per selected feature across samples)
    • Added abundance scale switch:
      • Linear
      • Logarithmic
    • Added normalization mode switch:
      • None
      • Scale to max sample (per feature)
      • Scale to max experimental group (per feature)
    • Samples are sorted by:
      1. experimental group name
      2. sample name
    • Plot updates on feature selection changes and visualization control changes.
  • MS/MS similarity, overview, and export enhancements (added per feedback)

    • Added matchms-based pairwise MS/MS similarity comparison for selected features, separately for native and labeled isotopolog spectra.
    • Similarity inputs are restricted to MS/MS scans matching feature precursor windows and per-file feature RT boundaries (start/end RT where available).
    • Added MS/MS tab actions:
      • Native similarity button
      • Labeled similarity button
    • Added similarity popup with:
      • pairwise similarity matrix
      • red→green cell coloring by score
      • user-configurable similarity threshold
      • mirror-plot display for selected spectrum pairs
    • Added MS/MS overview dialog with per-feature summaries and graphical overview.
    • Added per-spectrum context menu copy actions:
      • m/z-intensity list
      • TSV
      • MassBank-style format
    • Added flexible MGF export in Experiment results:
      • export modes:
        • raw spectra
        • one average spectrum per feature
        • most abundant spectrum per feature
        • cleaned spectrum per feature
      • separate native and labeled exports
      • collision setup/energy selection and optional split into separate files per setup
      • FragExtract-like cleaning option with configurable allowance of 0 labeling atoms
      • exported metadata includes feature identifiers (Num, OGroup) and spectrum context fields
  • Behavior and safety guards

    • Keeps original group unchanged when abundance profiles are incomplete/misaligned.
    • Uses threshold-driven graph split criteria with configurable abundance threshold and connection-rate logic.
    • Added focused test coverage for zero/missing-value handling, threshold-driven splitting, and constant-profile behavior in tests/test_metabolite_grouping.py.
similarities[feature_a][feature_b] = _presence_aware_profile_similarity(
    profiles[feature_a],
    profiles[feature_b],
)
if similarities[feature_a][feature_b] >= min_peak_correlation:
    adjacency[feature_a].add(feature_b)
    adjacency[feature_b].add(feature_a)
Original prompt

Start implementation

[Chronological Review: The conversation began with the user outlining a requirement to enhance the software's feature grouping based on chromatographic peak correlation and SIL ratios. The user requested an implementation that would involve checking the similarity of features within groups using a dendrogram based on relative abundances. The user then prompted to start the implementation, leading to the current state where a summary is being generated.]

[Intent Mapping: The user explicitly requested to implement a method to check feature similarity within groups using a dendrogram and correlation thresholds. The user stated, "please implement this as a further check which features belong together," indicating a clear goal to refine the grouping process.]

[Technical Inventory: The conversation involves concepts such as chromatographic peak correlation, SIL ratios, dendrograms, and correlation thresholds. The user is working within the MetExtract-II repository, focusing on metabolite feature grouping.]

[Code Archaeology: No specific files or code segments were discussed or modified yet, as the conversation is still in the planning and initial request phase.]

[Progress Assessment: The user has outlined the requirements for the implementation but has not yet completed any coding tasks. The implementation is pending based on the user's request.]

[Context Validation: All critical information for continuation is captured, including the user's request for a specific implementation method and the context of the MetExtract-II repository.]

[Recent Commands Analysis: The last agent command was the user requesting to "Start implementation." There were no tool results from this command as it was a directive rather than a tool execution. The immediate pre-summarization state was focused on preparing to implement the requested feature similarity check.]

1. Conversation Overview: - Primary Objectives: The user requested to implement a method to check feature similarity within metabolite groups using a dendrogram based on relative abundances, stating, "please implement this as a further check which features belong together." - Session Context: The conversation has focused on enhancing the software's feature grouping process, with the user outlining specific requirements for the implementation. - User Intent Evolution: The user's needs have remained consistent, focusing on refining the grouping of features based on similarity metrics.
  1. Technical Foundation:
  • [Core Technology 1]: MetExtract-II repository, focusing on metabolite feature analysis.
  • [Framework/Library 2]: Not specified yet, but likely involves statistical or data analysis libraries for dendrogram creation and correlation calculations.
  • [Architectural Pattern 3]: The implementation will involve hierarchical clustering techniques.
  • [Environment Detail 4]: The current branch is 'main' in the MetExtract-II repository.
  1. Codebase Status:
  • [File Name 1]: Not specified yet as the implementation has not started.
  • Purpose: To implement the feature similarity check.
  • Current State: Awaiting implementation based on user request.
  • Key Code Segments: Not applicable yet.
  • Dependencies: Not specified yet.
  1. Problem Resolution:
  • Issues Encountered: No technical problems have been reported yet.
  • Solutions Implemented: None yet, as the implementation phase has not begun.
  • Debugging Context: Not applicable at this stage.
  • Lessons Learned: Not applicable yet.
  1. Progress Tracking:
  • Completed Tasks: None yet; the user has outlined the requirements.
  • Partially Complete Work: The implementation is pending.
  • Validated Outcomes: None yet.
  1. Active Work State:
  • Current Focus: Preparing to implement the requested feature similarity check using a dendrogram.
  • Recent Context: The user requested to "Start implementation," indicating readiness to proceed with coding.
  • Working Code: No code has been modified or discussed yet.
  • Immediate Context: The user is focused on implementing a method to check feature similarity based on the outlined requirements.
  1. Recent Operations:
  • Last Agent Commands: The last command was the user requesting to "Start implementation."
  • Tool Results Summary: No tool results were generated as this was a directive.
  • Pre-Summary State: The agent was preparing to implement the requested feature similarity check.
  • Operation Context: The command was executed to initiate the coding process for the user's outlined requirements.
  1. Continuation Plan:
  • [Pending Task 1]: Implement the feature similarity check using a dendrogram based on relative abundances.
  • [Pending Task 2]: Establish correlation thresholds for determining feature similarity.
  • [Priority Information]: The implementation of the feature similarity check is the most urgent task.
  • [Next Action]: Begin coding the requested implementation based on the user's specifications.

Created from VS Code.

Copilot AI changed the title [WIP] Implement feature similarity check using dendrograms Add abundance-profile dendrogram refinement to metabolite feature grouping May 11, 2026
Copilot AI requested a review from chrboku May 11, 2026 09:11
@chrboku chrboku marked this pull request as ready for review May 11, 2026 09:22
@chrboku chrboku marked this pull request as draft May 11, 2026 09:27
Copy link
Copy Markdown
Owner

@chrboku chrboku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Please also add the user interface controls in the processing tab to a) use this new abundance similarity method, and b) the threshold for the similarity

@chrboku
Copy link
Copy Markdown
Owner

chrboku commented May 11, 2026

@copilot Thank you. Please also add the user interface controls in the processing tab to a) use this new abundance similarity method, and b) the threshold for the similarity

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 11, 2026

Thank you. Please also add the user interface controls in the processing tab to a) use this new abundance similarity method, and b) the threshold for the similarity

Implemented in commit e26d2f7 (follow-up cleanup 875b256). The Processing tab now has:

  • Use abundance similarity checkbox to enable/disable the method
  • Abundance similarity threshold control (percent)
    These are persisted in settings and wired into calculateMetaboliteGroups.

Screenshot:
Processing tab abundance similarity controls

@copilot Thank you. Please also add the user interface controls in the processing tab to a) use this new abundance similarity method, and b) the threshold for the similarity

Implemented in commit e26d2f7 (follow-up cleanup 875b256). The Processing tab now includes:

  • Use abundance similarity toggle for the new method
  • Abundance similarity threshold input (percent)
    The values are saved/loaded from settings and passed into grouping.

Screenshot:
Processing tab abundance similarity controls

Copilot AI changed the title Add abundance-profile dendrogram refinement to metabolite feature grouping Add abundance-profile dendrogram refinement with Processing-tab controls for metabolite grouping May 11, 2026
Copilot AI requested a review from chrboku May 11, 2026 09:38
@chrboku
Copy link
Copy Markdown
Owner

chrboku commented May 11, 2026

@copilot Thank you.
Now, for the user to quickly inspect such results, please have a new tab in the "Experiment results" section. Make this tab show boxplots (i.e., abundancdes of features grouped by experimental group and feature_id) or line-plots (one line per feature for each selected features showing the samples on the x-axis and the abundances on the y-axis, sort samples by experimental group name, then sample-names) of the selected features. allow the user to switch between the visulization methods and also between linear and logarithmic abundance scale.

Copilot AI and others added 2 commits May 11, 2026 10:02
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 11, 2026

@copilot Thank you.
Now, for the user to quickly inspect such results, please have a new tab in the "Experiment results" section. Make this tab show boxplots (i.e., abundancdes of features grouped b...

Implemented in commit b800c82 (follow-up fix ffe00cf). A new Abundance profiles tab is now available in Experiment results with:

  • visualization switch: Boxplot / Line plot
  • abundance scale switch: Linear / Logarithmic
  • plotting for currently selected features using sample abundances, with samples ordered by experimental group name and then sample name.

Screenshot:
Abundance profiles tab

Copilot AI changed the title Add abundance-profile dendrogram refinement with Processing-tab controls for metabolite grouping Add abundance-profile dendrogram refinement with Processing-tab controls and Experiment-results abundance visualization May 11, 2026
@chrboku
Copy link
Copy Markdown
Owner

chrboku commented May 11, 2026

@copilot Great, thank you. That looks well. Pleaes further adapt:

  • in the boxplot, put the boxes of the same features next to each other and avoid duplicating the experimental groups on the x-axis. Have one tick for each experimental group and have different boxes for each feature

  • allow the user to scale the data to the most abundant sample, or to the most abundant experimental group for each feature

  • the calculation of the groups seems to work, however, there are problems with 0 values, as these drastically reduce the correlation coefficient. Thus, please try the following: implement a custom distance metric for a pair of features. Calculate the pearson correlation on these features for all samples where both samples have quantification values, or both samples do not have quantification values. give the latter a lower weight though. omit all features in the pairwise comparison where one features has an abundance in one sample, but not the other, or vice versa. As a result the dendrogram strategy no longer is feasible. Thus, convert it to a graph-based approach similar than with the SIL and correlation testing. keep these features in a group, which have a connection to at least XXX % of the features in the group, otherwise split is (see other code for an example)

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 11, 2026

@copilot Great, thank you. That looks well. Pleaes further adapt:

  • in the boxplot, put the boxes of the same features next to each other and avoid duplicating the experimental groups on the x-axis. ...

Implemented in commit 670e617.

Updates included:

  • Boxplot now has one x-axis tick per experimental group, with per-feature boxes plotted side-by-side within each group.
  • Added abundance normalization modes: None, Scale to max sample, and Scale to max experimental group.
  • Replaced abundance-profile HCA refinement with a graph-based split using a custom pairwise similarity that:
    • compares samples where both features are quantified,
    • includes both-missing samples with lower weight,
    • omits mismatched presence/absence samples,
    • and keeps/splits nodes by minimum connection rate.

Screenshot:
Updated abundance profiles tab

Copilot AI changed the title Add abundance-profile dendrogram refinement with Processing-tab controls and Experiment-results abundance visualization Add zero-aware graph-based abundance refinement with Processing-tab controls and enhanced Experiment-results abundance visualization May 11, 2026
@chrboku
Copy link
Copy Markdown
Owner

chrboku commented May 11, 2026

@copilot Thank you

  • the boxplors are incorrect, show a box for each feature for each experimental group. so if there are 4 experimental groups and 2 features selected, show for experimental group A two boxes at A with a slight offset so one can see the difference

  • The separation seems not to work well. almost all features of one group are separated. Thus please:

  • implment the grouping step after the re-integration step, so 1st is the bracketing, then the re-integration, then the grouping, then the annotation.

  • please rework the graph separation step. it should find tightly connected subclusters from the abundance correlation data

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 11, 2026

@copilot Thank you

  • the boxplors are incorrect, show a box for each feature for each experimental group. so if there are 4 experimental groups and 2 features selected, show for experimental group A ...

Implemented in commit 3454eb5 (with the main functional updates in 6a8aba3).

Changes now include:

  • Boxplot rendering adjusted to show one tick per experimental group with one offset box per selected feature at each group.
  • Processing order updated to: bracketing → re-integration → grouping → annotation.
  • Graph separation reworked to use dense subcluster detection from abundance-correlation connectivity, reducing over-separation into singletons.

Screenshot:
Updated abundance profiles grouped boxplot

Copilot AI changed the title Add zero-aware graph-based abundance refinement with Processing-tab controls and enhanced Experiment-results abundance visualization Add zero-aware graph-based abundance refinement with re-integration-first grouping and enhanced Experiment-results abundance visualization May 11, 2026
chrboku added 7 commits May 11, 2026 15:37
- Removed unused UI files: groupEditor.ui and heteroAtomEditor.ui.
- Added a new group box for integrated missed peaks in mainWindow.py with controls for max time difference and intensity cutoff.
- Updated convolute results section in mainWindow.py to improve layout and organization of controls.
- Implemented a custom delegate in statisticsTab.py to render selected rows in bold without altering their background color in the SelectedFeaturesTable.
- added splitter in "Experimental results"
- MSMS spectrum viewer — fragment m/z labels are no longer rotated; they now appear horizontally above each peak tip at twice the previous font size. RT filtering now uses the actual detected peak start/end time per sample instead of a fixed window around the apex.

- File stats dialog — added intensity distribution statistics (min, 10–99th percentiles, max) computed from the actual measured signal intensities of all MS1 spectra, split by polarity (positive/negative).

- Caching — file stats results are cached per file using an MD5/mtime key; the cache is automatically invalidated when the computation logic changes (versioned key).

- Sortable, color-coded stats table — all columns sort numerically; cells are highlighted in proportion to their deviation from the column mean; the first column shows the experimental group with its assigned color.

- minor bugfixes
@chrboku
Copy link
Copy Markdown
Owner

chrboku commented May 17, 2026

@copilot

Please further implement this pull request by implementing the following functionality

  • use the matches package to compare the spectra obtained for the detected features. Restrict features to the MSMS spectra within the feature (ie within the peaks's start and end retention times for each file). Compare the obtained MAMS spectra with matchms. Do this separately for the spectra of the native and labeled isotopologs
  • in the MSMS tab allow the user to click a button (one for the native and one for the labeled isotpolog forms)to access this similarity function outlined above. Show a popup window with a table for the pairwise comparisons. Color the cell by the similarity score (green for a high score, red for a low one) and let the user set a similarity score. When they select a cell, show the two MSMS spectra as a mirror plot.
  • implement functionality for an MSMS overview. This means that summaries for all features with MSMS spectra are generated and graphically illustrated.
  • implement functionality to export all MSMS spectra to an MGF file
  • allow the user for the MSMS MGF export to a) simply export all raw MSMS spectra, or b) export one average MSMS spectrum per feature, c) the most abundant MSMS spectrum for each feature, d) a cleaned spectrum file. Generate an MGF file for the native and a separate file for the labeled isotpologs.
  • for the cleaning implement the FragExtract approach. For this select matching fragment so that the score is maximized. Derive the labeling information from the settings used for processing the spectra, and have an option to allow for 0 labeling atoms or not (check box)
  • implement a context menu for each MSMS spectrum allowing the user to copy the spectrum to the clipboard in various formats (eg as a list of mz and intensity values, or as tsv table for MassBank searches)
  • make sure that for the exported MGF spectra the meta-information and ids are also exported (column Num and OGroup).
  • also keep in mind that the spectra can be obtained with different collision setups and energies. Consider these during the export and ask the user which they want to export and include in an MGF file. Allow them to generate separate MGF files or to include all in one

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 17, 2026

@copilot

Please further implement this pull request by implementing the following functionality...

Implemented in commit 8c77bf4 (main feature implementation in 81bcc9e).

Added:

  • matchms-based pairwise similarity for native and labeled spectra (feature-RT constrained), launched via separate Native similarity / Labeled similarity buttons in the MS/MS tab
  • popup matrix with score-based cell coloring (red→green), configurable threshold, and mirror-plot display for selected cell pairs
  • MS/MS overview dialog (table + bar-chart summary for all features with MS/MS spectra)
  • MGF export with modes: raw, average per feature, most abundant per feature, cleaned per feature (FragExtract-like matching with optional 0-label-atoms)
  • export controls for collision setups/energies (selection + optional separate output files)
  • separate native/labeled export outputs
  • context-menu copy options per spectrum (mz-int list, TSV, MassBank-style)
  • MGF metadata export including Num and OGroup

Screenshot:
MSMS tab controls and dialogs

Copilot AI changed the title Add zero-aware graph-based abundance refinement with re-integration-first grouping and enhanced Experiment-results abundance visualization Add zero-aware abundance-based grouping refinement and re-integration-first workflow, plus enhanced Experiment-results abundance and MS/MS analysis/export tools May 17, 2026
chrboku added 2 commits May 19, 2026 13:13
- improved chromatographic peak picking dialog
- improved and renamed in findIsoPairs_matchPartners.py
- several other bugfixes and improvements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants