feat: Implement Statvar Calculation aggregation by SandeepTuniki · Pull Request #589 · datacommonsorg/import

SandeepTuniki · 2026-06-25T13:03:12Z

No description provided.

…and Spanner

codacy-production · 2026-06-25T13:04:18Z

Not up to standards ⛔

🔴 Issues 9 medium · 17 minor

Alerts:
⚠ 26 issues (≤ 0 issues of at least minor severity)

Results:
26 new issues

Category Results

Documentation 11 minor

Security 4 medium

CodeStyle 6 minor

Complexity 5 medium

View in Codacy

🟢 Metrics 39 complexity

Metric Results

Complexity 39

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

gemini-code-assist

Code Review

This pull request introduces the StatVarCalculationGenerator class along with its corresponding end-to-end integration tests. This generator builds and executes multi-statement SQL scripts using BigQuery Federation to perform mathematical operations (such as DIVIDE, MULTIPLY, ADD, and SUBTRACT) on statistical variables, constructing output TimeSeries and Observation rows to write back to Spanner. The review feedback highlights two critical SQL injection vulnerabilities: the multiplier value is embedded into the SQL query without validation or casting, and the regular expressions (sv_regex and mm_regex) are not escaped, which could allow malicious inputs to alter the query structure. Both issues should be addressed by casting the multiplier to a float and escaping single quotes in the regex filters.

SandeepTuniki · 2026-06-25T13:25:47Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the StatVarCalculationGenerator class to perform statistical variable calculations (such as DIVIDE, MULTIPLY, ADD, and SUBTRACT) using BigQuery Federation and write the results back to Spanner, along with comprehensive integration tests. The review feedback highlights several critical issues: a major performance bottleneck caused by multiple full table scans on the Spanner Observation table, a correctness bug where the provenance column is missing from the TimeSeries export, an incorrect escaping helper used for BigQuery-only literals, and a regex escaping bug inside BigQuery raw string literals.

…raw string escaping

n-h-diaz

Thanks! very cool sql :)

just want to double check: I see import_name_regex as an option to filter facets in the config - is anything needed to handle this here? or will the other facet matching take care of it? (and for my own understanding, does the calculation do a cross product of all input imports that match the facets or only within an import?)

also curious if you've run this on the existing configs and what the performance is like? (the climate calculation is huge(!), so want to make sure this is feasible. for example, if it gets very large, maybe we can do more aggressive filtering early on vs fetching the entire input import from spanner)

also since we're doing this for a subset of imports, we should ensure that all required input imports are included within the same aggregation run (not for this pr, but something to keep in mind)

Implement StatVar calculation aggregations using BigQuery Federation …

8383b01

…and Spanner

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread pipeline/workflow/ingestion-helper/aggregation/stat_var_calculation_generator.py Outdated

Comment thread pipeline/workflow/ingestion-helper/aggregation/stat_var_calculation_generator.py Outdated

Security: Validate multiplier and escape single quotes in regex filters

b7c1803

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

SandeepTuniki added 2 commits June 25, 2026 19:09

Optimize: Cache Spanner inputs in BigQuery temp tables and fix regex …

e2aaeac

…raw string escaping

Fix: Resolve Codacy warnings and expand metadata assertions in E2E tests

30c1ff3

SandeepTuniki marked this pull request as ready for review June 26, 2026 03:47

SandeepTuniki requested review from n-h-diaz and vish-cs June 26, 2026 03:47

n-h-diaz approved these changes Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Implement Statvar Calculation aggregation#589

feat: Implement Statvar Calculation aggregation#589
SandeepTuniki wants to merge 4 commits into
masterfrom
statvar-calculation-aggregation

SandeepTuniki commented Jun 25, 2026

Uh oh!

codacy-production Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

SandeepTuniki commented Jun 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

n-h-diaz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

SandeepTuniki commented Jun 25, 2026

Uh oh!

codacy-production Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Not up to standards ⛔

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

SandeepTuniki commented Jun 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

n-h-diaz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codacy-production Bot commented Jun 25, 2026 •

edited

Loading