Skip to content

Conversation

@olavloite
Copy link
Collaborator

Support partitioned queries using only SQL statements.

Support partitioned queries using only SQL statements.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @olavloite, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the database connector by integrating support for partitioned queries directly into the SQL statement parsing and execution flow. It introduces a new RUN PARTITIONED QUERY syntax and a suite of connection properties, allowing users to configure and execute partitioned queries with greater ease and flexibility, moving beyond reliance solely on programmatic ExecOptions. This change streamlines the use of advanced query features and improves the overall developer experience.

Highlights

  • SQL Statement Support for Partitioned Queries: Introduced the ability to execute partitioned queries directly through a new RUN PARTITIONED QUERY SQL statement, providing a more SQL-centric interface for this functionality.
  • New Connection Properties: Added several new connection properties: auto_partition_mode to automatically enable partitioned queries, data_boost_enabled for partitioned queries, max_partitions to set partition hints, and max_partitioned_parallelism to control worker count.
  • Flexible Partitioned Query Configuration: Enhanced the AutoPartitionQuery mechanism to allow configuration not only via ExecOptions but also through the newly introduced connection properties, offering greater flexibility.
  • Parser and Execution Logic Updates: Modified the SQL parser to recognize the RUN PARTITIONED QUERY statement and updated the connection and transaction execution logic to correctly handle these new statements and apply the relevant connection properties.
  • Expanded Test Coverage: Added new test cases for parsing the RUN PARTITIONED QUERY statement and extended existing AutoPartitionQuery tests to cover scenarios where partitioned query settings are applied via connection properties.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for partitioned queries through SQL statements, including a new RUN PARTITIONED QUERY statement and several connection properties (auto_partition_mode, data_boost_enabled, etc.). The changes look good overall, but I've found several issues, including some critical bugs in the tests that would cause compilation failures or incorrect test behavior. I've also identified a functional issue where RUN PARTITIONED QUERY is not self-contained and opportunities for refactoring and improving robustness. Please see my detailed comments.

"max_partitions",
"The max partitions hint value to use for partitioned queries. "+
"Set to 0 if you do not want to specify a hint.",
0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The default value 0 is of type int, but the converter connectionstate.ConvertInt64 returns an int64. This type mismatch will cause a compilation error. The default value should be explicitly cast to int64.

Suggested change
0,
int64(0),

{
input: "run partitioned query select * from my_table",
want: ParsedRunPartitionedQueryStatement{
statement: " select * from my_table",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The field statement is not a member of ParsedRunPartitionedQueryStatement. The correct field name is Statement (with a capital 'S'). This will cause a compilation error. This error is repeated for all test cases in this test function.

Suggested change
statement: " select * from my_table",
Statement: " select * from my_table",

tests := make([]autoPartitionTest, 0)
for _, useExecOption := range []bool{true, false} {
for _, withTx := range []bool{false} {
for maxResultsPerPartition := range []int{0, 1, 5, 50, 200} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This loop is iterating over the indices of the slice, not its values. maxResultsPerPartition will be 0, 1, 2, 3, 4 instead of 0, 1, 5, 50, 200. This is a common Go pitfall. To iterate over the values, you should use for _, maxResultsPerPartition := range ....

Suggested change
for maxResultsPerPartition := range []int{0, 1, 5, 50, 200} {
for _, maxResultsPerPartition := range []int{0, 1, 5, 50, 200} {

Comment on lines +455 to +461
rows, err := db.QueryContext(ctx, "run partitioned query "+testutil.SelectFooFromBar)
if err != nil {
t.Fatal(err)
}
for rows.Next() {

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This test is incomplete and leaks resources.

  1. It does not verify any results. It should configure the mock server with partitions and results, and then assert that the query returns the expected data.
  2. The rows object is not closed, which will lead to a resource leak. A defer rows.Close() should be added after the query execution.
    _, allResults, err := setupRandomPartitionResults(server, testutil.SelectFooFromBar, 10)
    if err != nil {
        t.Fatalf("failed to set up partition results: %v", err)
    }

    rows, err := db.QueryContext(ctx, "run partitioned query "+testutil.SelectFooFromBar)
    if err != nil {
        t.Fatal(err)
    }
    defer rows.Close()

    var results []int64
    for rows.Next() {
        var val int64
        if err := rows.Scan(&val); err != nil {
            t.Fatal(err)
        }
        results = append(results, val)
    }
    if err := rows.Err(); err != nil {
        t.Fatal(err)
    }
    if len(results) != len(allResults) {
        t.Fatalf("row count mismatch: got %d, want %d", len(results), len(allResults))
    }

Comment on lines +287 to +290
func (s *executableRunPartitionedQueryStatement) queryContext(ctx context.Context, c *conn, opts *ExecOptions) (driver.Rows, error) {
args := []driver.NamedValue{{Value: opts}}
return c.QueryContext(ctx, s.stmt.Statement, args)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The RUN PARTITIONED QUERY statement should implicitly trigger a partitioned query. Currently, it relies on the ExecOptions passed to it. To make it self-contained, you should ensure that AutoPartitionQuery is set to true on the options before passing them to c.QueryContext.

It's also good practice to work with a copy of the options to avoid side effects.

func (s *executableRunPartitionedQueryStatement) queryContext(ctx context.Context, c *conn, opts *ExecOptions) (driver.Rows, error) {
    newOpts := *opts
    newOpts.PartitionedQueryOptions.AutoPartitionQuery = true
    args := []driver.NamedValue{{Value: &newOpts}}
    return c.QueryContext(ctx, s.stmt.Statement, args)
}

"data_boost_enabled",
"Enable data boost for all partitioned queries that are executed by this connection. "+
"This setting is only used for partitioned queries and is ignored by all other statements. "+
"Either set `auto_partition_query=true` or execute a query with `RUN PARTITIONED QUERY SELECT ... FROM ...` "+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the description. The connection property is auto_partition_mode, but the description refers to auto_partition_query.

Suggested change
"Either set `auto_partition_query=true` or execute a query with `RUN PARTITIONED QUERY SELECT ... FROM ...` "+
"Either set `auto_partition_mode=true` or execute a query with `RUN PARTITIONED QUERY SELECT ... FROM ...` "+

Comment on lines +108 to +136
func isRunBatch(parser *StatementParser, query string) bool {
sp := &simpleParser{sql: []byte(query), statementParser: parser}
if !sp.eatKeyword("run") {
return false
}
if !sp.hasMoreTokens() {
// START is a synonym for START TRANSACTION
return false
}
if sp.eatKeyword("batch") {
return true
}
return false
}

func isRunPartitionedQuery(parser *StatementParser, query string) bool {
sp := &simpleParser{sql: []byte(query), statementParser: parser}
if !sp.eatKeyword("run") {
return false
}
if !sp.hasMoreTokens() {
// START is a synonym for START TRANSACTION
return false
}
if sp.eatKeyword("partitioned") {
return true
}
return false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The functions isRunBatch and isRunPartitionedQuery are very similar and contain duplicated code. They can be refactored into a single helper function to improve maintainability.

Additionally, the comment // START is a synonym for START TRANSACTION is misleading in this context as it was likely copied from isStartTransaction. It should be removed.

func isRunCommand(parser *StatementParser, query string, command string) bool {
	sp := &simpleParser{sql: []byte(query), statementParser: parser}
	if !sp.eatKeyword("run") {
		return false
	}
	if !sp.hasMoreTokens() {
		return false
	}
	return sp.eatKeyword(command)
}

func isRunBatch(parser *StatementParser, query string) bool {
	return isRunCommand(parser, query, "batch")
}

func isRunPartitionedQuery(parser *StatementParser, query string) bool {
	return isRunCommand(parser, query, "partitioned")
}

Comment on lines +324 to +325
execOptions.QueryOptions.DataBoostEnabled = execOptions.QueryOptions.DataBoostEnabled || propertyDataBoostEnabled.GetValueOrDefault(tx.state)
partitions, err := tx.boTx.PartitionQueryWithOptions(ctx, stmt, partitionOptions, execOptions.QueryOptions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Modifying execOptions.QueryOptions.DataBoostEnabled directly can lead to unexpected side effects if the execOptions are reused elsewhere. It's safer to work with a copy of QueryOptions.

    queryOptions := execOptions.QueryOptions
    queryOptions.DataBoostEnabled = queryOptions.DataBoostEnabled || propertyDataBoostEnabled.GetValueOrDefault(tx.state)
    partitions, err := tx.boTx.PartitionQueryWithOptions(ctx, stmt, partitionOptions, queryOptions)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant