Skip to content

Cell eval 0.7.0#225

Open
noamteyssier wants to merge 19 commits intomainfrom
cell-eval-0.7.0
Open

Cell eval 0.7.0#225
noamteyssier wants to merge 19 commits intomainfrom
cell-eval-0.7.0

Conversation

@noamteyssier
Copy link
Collaborator

No description provided.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @noamteyssier, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on a significant upgrade of the cell-eval package to version 0.7.0. The core of these changes involves adapting the codebase to a new major version of the pdex library, which required updating function signatures and parameter handling for differential expression computations. Additionally, the project's Python version requirement has been bumped, and a new documentation file for AI code assistants has been introduced. These updates collectively aim to improve the project's maintainability, compatibility, and leverage the latest features of its dependencies.

Highlights

  • Version Upgrade: The project version has been updated to 0.7.0, reflecting significant changes and improvements.
  • Python Version Requirement: The minimum required Python version has been raised from 3.10 to 3.11.
  • pdex Library Update & Refactoring: The pdex dependency has been updated to 0.2.0, necessitating extensive refactoring across the codebase to adapt to its new API, including changes to parameter names and function calls for differential expression analysis.
  • New CLAUDE.md File: A new CLAUDE.md file has been added to provide specific guidance for Claude Code when interacting with this repository.
  • Type Hinting and Compatibility Improvements: Enhanced type hinting with explicit cast calls for pandas Series and DataFrames, and introduced a utility to convert float16 AnnData matrices to float32 to ensure compatibility with pdex's Numba backend.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • CLAUDE.md
    • Added a new file providing guidance for Claude Code.
  • pyproject.toml
    • Updated project version to "0.7.0".
    • Updated minimum Python requirement to >=3.11,<3.13.
    • Updated pdex dependency to >=0.2.0.
    • Added ty>=0.0.19 to development dependencies.
    • Removed [tool.pyright] configuration.
  • ruff.toml
    • Removed the ruff configuration file.
  • src/cell_eval/_baseline.py
    • Imported cast and pandas.
    • Removed batch_size and de_method parameters from build_base_mean_adata.
    • Updated adata.var casting to pd.DataFrame.
    • Updated write_h5ad call with type: ignore for compatibility.
    • Refactored _build_pdex_kwargs parameters from groupby_key, num_workers, metric, batch_size to groupby, threads.
    • Replaced parallel_differential_expression with pdex and added mode="ref".
    • Added cast for pd.Series when accessing unique perturbation values.
  • src/cell_eval/_cli/_prep.py
    • Imported cast.
    • Added cast for pd.Series when checking unique perturbation values.
    • Added cast for pd.Series when assigning adata.obs values to new_obs.
    • Updated write_h5ad call with type: ignore for compatibility.
  • src/cell_eval/_cli/_run.py
    • Removed --batch-size and --de-method arguments from CLI parsing.
    • Removed de_method and batch_size parameters from MetricsEvaluator initialization.
  • src/cell_eval/_evaluator.py
    • Imported pdex instead of parallel_differential_expression.
    • Imported _cast_float16_to_float32 from utils.
    • Removed de_method and batch_size parameters from MetricsEvaluator class and __init__ method.
    • Applied _cast_float16_to_float32 to real and predicted AnnData objects.
    • Refactored _build_pdex_kwargs parameters from groupby_key, num_workers, metric, batch_size to groupby, threads.
    • Removed as_polars = True from _build_pdex_kwargs as pdex now defaults to Polars DataFrames.
    • Replaced parallel_differential_expression with pdex and added mode="ref".
  • src/cell_eval/_types/_anndata.py
    • Imported cast and pandas.
    • Added cast for pd.Series when accessing perturbation columns in AnnData objects.
  • src/cell_eval/metrics/_anndata.py
    • Imported cast.
    • Removed type: ignore from pearsonr call.
    • Removed type: ignore from feats.dtype and feats.astype calls.
    • Added cast for pd.Series when accessing category keys in AnnData obs.
    • Removed type: ignore from np.unique and centroids dtype.
    • Removed type: ignore from feats[mask].mean(axis=0).
    • Added cast for pd.DataFrame when setting ad_real_cent.obs and ad_pred_cent.obs.
  • src/cell_eval/metrics/base.py
    • Updated return type of MetricResult.to_dict to include None for perturbation.
    • Removed type: ignore from self.perturbation in to_dict.
  • src/cell_eval/utils.py
    • Imported cast, pandas, and scipy.sparse.
    • Added type: ignore to adata.X.toarray() call.
    • Added type: ignore to np.max and np.min calls.
    • Added cast for pd.Series when splitting AnnData on celltype.
    • Added _cast_float16_to_float32 function to convert float16 matrices to float32 for pdex compatibility.
  • tests/test_eval.py
    • Imported cast and pandas.
    • Added cast for pd.DataFrame when dropping columns from adata.obs.
    • Added type: ignore to de_method="unknown" in test_unknown_alternative_de_metric.
    • Updated pdex_kwargs in tests to reflect pdex API changes (geometric_mean instead of exp_post_agg, threads instead of num_workers).
    • Removed test_eval_alt_metric function.
  • tutorials/vcc/vcc.ipynb
    • Updated execution_count to null for code cells.
    • Imported pandas.
    • Changed adata.obs["target_gene"].unique() to pd.Series(adata.obs["target_gene"]).unique() for robustness.
    • Changed np.all(adata.var_names.values == ntc_adata.var_names.values) to np.array_equal(adata.var_names, ntc_adata.var_names) for gene name comparison.
    • Added type: ignore to adata.write_h5ad call for compatibility.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/CI.yml
Activity
  • The project's version was incremented to 0.7.0.
  • The minimum Python version requirement was updated to 3.11.
  • The pdex dependency was upgraded to 0.2.0, leading to significant refactoring of pdex related calls across _baseline.py, _cli/_run.py, and _evaluator.py to align with its new API.
  • A new CLAUDE.md file was added to provide AI code assistant guidance.
  • The ruff.toml configuration file was removed.
  • Type hinting was improved across several files (_baseline.py, _cli/_prep.py, _evaluator.py, _types/_anndata.py, metrics/_anndata.py, metrics/base.py, utils.py, tests/test_eval.py) by adding cast imports and explicit casting for pandas objects.
  • A new utility function _cast_float16_to_float32 was introduced in src/cell_eval/utils.py and integrated into _evaluator.py to handle float16 data type compatibility issues with pdex.
  • Command-line arguments --batch-size and --de-method were removed from src/cell_eval/_cli/_run.py due to pdex API changes.
  • Test cases in tests/test_eval.py were updated to reflect the new pdex API and a test for an alternative DE metric was removed.
  • The tutorials/vcc/vcc.ipynb notebook was updated for compatibility with the new changes, including pandas Series casting and write_h5ad type ignoring.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request brings cell-eval to version 0.7.0, introducing compatibility with pdex>=0.2.0 and updating the required Python version to >=3.11. The changes primarily involve adapting to the new pdex API, which affects function calls and parameters for differential expression analysis. A notable improvement is the new utility to cast float16 data to float32, ensuring compatibility with pdex's NUMBA backend. The PR also includes extensive typing fixes and adds a CLAUDE.md file for AI-assisted development. The changes are well-integrated. I have one suggestion to simplify a newly added utility function.

Comment on lines +128 to +131
if sp.issparse(x):
adata.X = x.astype(np.float32)
else:
adata.X = x.astype(np.float32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This if/else block contains redundant code. The astype(np.float32) method works for both sparse (scipy.sparse) and dense (numpy) arrays, so you can simplify this by removing the conditional check.

        adata.X = x.astype(np.float32)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant