feat: Implement Statvar Calculation aggregation#589
Conversation
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| Documentation | 11 minor |
| Security | 4 medium |
| CodeStyle | 6 minor |
| Complexity | 5 medium |
🟢 Metrics 39 complexity
Metric Results Complexity 39
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Code Review
This pull request introduces the StatVarCalculationGenerator class along with its corresponding end-to-end integration tests. This generator builds and executes multi-statement SQL scripts using BigQuery Federation to perform mathematical operations (such as DIVIDE, MULTIPLY, ADD, and SUBTRACT) on statistical variables, constructing output TimeSeries and Observation rows to write back to Spanner. The review feedback highlights two critical SQL injection vulnerabilities: the multiplier value is embedded into the SQL query without validation or casting, and the regular expressions (sv_regex and mm_regex) are not escaped, which could allow malicious inputs to alter the query structure. Both issues should be addressed by casting the multiplier to a float and escaping single quotes in the regex filters.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the StatVarCalculationGenerator class to perform statistical variable calculations (such as DIVIDE, MULTIPLY, ADD, and SUBTRACT) using BigQuery Federation and write the results back to Spanner, along with comprehensive integration tests. The review feedback highlights several critical issues: a major performance bottleneck caused by multiple full table scans on the Spanner Observation table, a correctness bug where the provenance column is missing from the TimeSeries export, an incorrect escaping helper used for BigQuery-only literals, and a regex escaping bug inside BigQuery raw string literals.
n-h-diaz
left a comment
There was a problem hiding this comment.
Thanks! very cool sql :)
just want to double check: I see import_name_regex as an option to filter facets in the config - is anything needed to handle this here? or will the other facet matching take care of it? (and for my own understanding, does the calculation do a cross product of all input imports that match the facets or only within an import?)
also curious if you've run this on the existing configs and what the performance is like? (the climate calculation is huge(!), so want to make sure this is feasible. for example, if it gets very large, maybe we can do more aggressive filtering early on vs fetching the entire input import from spanner)
also since we're doing this for a subset of imports, we should ensure that all required input imports are included within the same aggregation run (not for this pr, but something to keep in mind)
No description provided.