Skip to content

SPOC-484: TRANSDISCARD/SUB_DISABLE: switch from subtransaction replay to read-only dry-run logging#400

Open
danolivo wants to merge 3 commits intomainfrom
spoc-484
Open

SPOC-484: TRANSDISCARD/SUB_DISABLE: switch from subtransaction replay to read-only dry-run logging#400
danolivo wants to merge 3 commits intomainfrom
spoc-484

Conversation

@danolivo
Copy link
Contributor

  • In TRANSDISCARD and SUB_DISABLE modes, replace per-DML subtransaction replay with a single read-only transaction. Each DML handler logs a row to spock.exception_log instead of attempting the operation. Only the record that originally caused the error carries the real error message; other records get an empty string.
  • Identify the failed record by xact_action_counter (saved as failed_action in SpockExceptionLog), which is unique per DML within a transaction.
  • In DISCARD mode, each DML still runs inside a subtransaction; only the failed ones are logged.
  • Add dry_run_logging regression test covering all three exception modes with different breakage methods (absent table, truncated table, deleted row) and an INSERT_EXISTS conflict to verify that dry-run modes leave spock.resolutions empty while DISCARD populates it.

…-run replay

In TRANSDISCARD and SUB_DISABLE modes, the second (retry) pass replays
the transaction as read-only.  Each DML handler logs a row to
spock.exception_log instead of attempting the operation.  The record
that originally caused the error carries the real error message; other
records get an empty string.

The failed record is identified by xact_action_counter (saved as
failed_action in SpockExceptionLog), which is unique per DML within a
transaction.

In DISCARD mode, each DML still runs inside a subtransaction; only the
failed ones are logged.

Add dry_run_logging regression test covering TRANSDISCARD (absent
table), DISCARD (truncated table), and SUB_DISABLE (deleted row), with
INSERT_EXISTS conflict on drl_t1 to verify that dry-run modes leave
spock.resolutions empty while DISCARD populates it.
@danolivo danolivo self-assigned this Mar 24, 2026
@danolivo danolivo added the enhancement New feature or request label Mar 24, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

Adds a new failed_action field to exception logs, changes exception-log message handling, enforces read-only behavior for retry replays in TRANSDISCARD/SUB_DISABLE modes while logging skipped operations, simplifies commit/disable control flow, and adds/updates regression tests plus dry_run_logging in the REGRESS list.

Changes

Cohort / File(s) Summary
Exception handler declarations
include/spock_exception_handler.h
Added uint32 failed_action to SpockExceptionLog to record the xact action counter at the time of an error.
Exception logging behavior
src/spock_exception_handler.c
When error_message is NULL, store an empty string ("") in spock.exception_log.error_message instead of "unknown"; removed a related comment.
Apply worker / replay logic
src/spock_apply.c
Refactor: set replay transactions read-only for TRANSDISCARD/SUB_DISABLE retries (allowing catalog writes for logging), prevent LSN advancement for SUB_DISABLE retries, remove prior transaction-restart flow in favor of normal commit with log, treat retry-with-skipped-DML as disable-trigger for SUB_DISABLE, skip executing DML/SQL in dry-run modes and instead log entries (use failed_action to attribute real error messages to the originating action).
New and updated regression tests
tests/regress/sql/dry_run_logging.sql, tests/regress/sql/replication_set.sql
Added dry_run_logging.sql covering transdiscard/discard/sub_disable scenarios with multi-table transactions, exception_log/resolutions/assertions, and subscription-disable checks. replication_set.sql now truncates spock.exception_log before runs and adds extra exception-log validation selects.
Build/test config
Makefile
Added dry_run_logging to the REGRESS test list.

Poem

🐇 I counted each hop where an action failed,
Logged the step where replay turned pale.
In read-only silence I skipped the fall,
Wrote the tale, then let the transaction call.
Thump — the logs remember, one and all.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: switching TRANSDISCARD/SUB_DISABLE modes from per-DML subtransaction replay to read-only dry-run logging with exception tracking.
Description check ✅ Passed The description comprehensively explains the implementation details, including the read-only transaction approach, failed_action tracking, differentiation between exception modes, and test coverage additions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spoc-484

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as outdated.

…producer

Replace wait_slot_confirm_lsn with sync_event/wait_for_sync_event to
eliminate replication race conditions when querying exception_log on the
subscriber.

Add a TRANSDISCARD sub-test that mixes DML with a replicate_ddl call
inside a single failing transaction.  This reproduces a duplicate DDL
logging bug in handle_queued_ddl detected by CodeRabbit during review:
the TRANSDISCARD/SUB_DISABLE branch logs via add_entry_to_exception_log,
then falls through to the generic logging block which logs again.

Remove test section numbering per project conventions.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/regress/sql/dry_run_logging.sql`:
- Around line 107-110: Update the test comment that currently reads "Expect
entries with duplicated DDL record" to reflect the intended behavior of no
duplicate DDL logging; modify the comment above the SELECT querying
spock.exception_log (the query using table_name, operation, (error_message <>
'') AS has_error, ddl_statement ordered by command_counter) to state that the
test expects NO duplicate DDL entries (i.e., a single DDL record per command).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fc49989d-5676-4a35-890d-8275efe21d33

📥 Commits

Reviewing files that changed from the base of the PR and between f359f4d and 936e414.

⛔ Files ignored due to path filters (1)
  • tests/regress/expected/dry_run_logging.out is excluded by !**/*.out
📒 Files selected for processing (2)
  • src/spock_apply.c
  • tests/regress/sql/dry_run_logging.sql

…ISABLE

The TRANSDISCARD/SUB_DISABLE branch logged via add_entry_to_exception_log
then fell through to the generic logging block which called
should_log_exception and logged again, producing duplicate DDL entries.

Merge both call sites into a single add_entry_to_exception_log after the
if/else branches.  The TRANSDISCARD/SUB_DISABLE branch now only extracts
the SQL text; actual logging is done in the shared block.

Add Assert(my_exception_log_index >= 0) before accessing exception_log_ptr
since use_try_block guarantees the index was set in handle_begin.

Bug reported by CodeRabbit during review.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/spock_apply.c (3)

1287-1308: Consider adding defensive assertion before accessing exception_log_ptr.

The code at line 2302 includes Assert(my_exception_log_index >= 0) before accessing exception_log_ptr, but the DML handlers (INSERT, UPDATE, DELETE) don't have this check. While my_exception_log_index should be valid when use_try_block is true (set during handle_begin), adding the assertion would provide consistent defensive validation.

🛡️ Suggested defensive assertion
 		if (exception_behaviour == TRANSDISCARD ||
 			exception_behaviour == SUB_DISABLE)
 		{
+			Assert(my_exception_log_index >= 0);
+
 			/*
 			 * TRANSDISCARD and SUB_DISABLE: skip the DML and log the
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 1287 - 1308, Add a defensive Assert that
my_exception_log_index is valid before dereferencing exception_log_ptr in the
DML handlers (the INSERT/UPDATE/DELETE branches where exception_log_ptr[...] is
accessed and log_insert_exception(...) is called). Specifically, in the block
using exception_log_ptr[my_exception_log_index] (and before setting
exception_log_ptr[my_exception_log_index].local_tuple or reading
.initial_error_message), insert Assert(my_exception_log_index >= 0) to mirror
the check used in handle_begin/other code paths so the handlers (where
use_try_block is expected) validate the index before use.

1299-1304: Optional: Extract repeated error_msg pattern to helper function.

The same logic for computing error_msg is repeated in four handlers (INSERT, UPDATE, DELETE, SQL):

char *error_msg =
    (xact_action_counter ==
     exception_log_ptr[my_exception_log_index].failed_action &&
     exception_log_ptr[my_exception_log_index].initial_error_message[0]) ?
    exception_log_ptr[my_exception_log_index].initial_error_message :
    NULL;

This could be extracted to a helper function to improve maintainability and ensure consistent behavior if the logic ever needs to change.

♻️ Suggested helper function
/*
 * Get the error message to log for the current action during dry-run replay.
 * Returns the initial error message only for the action that caused the
 * original failure; returns NULL for all other actions.
 */
static inline char *
get_dry_run_error_msg(void)
{
    Assert(my_exception_log_index >= 0);
    
    if (xact_action_counter == exception_log_ptr[my_exception_log_index].failed_action &&
        exception_log_ptr[my_exception_log_index].initial_error_message[0] != '\0')
        return exception_log_ptr[my_exception_log_index].initial_error_message;
    
    return NULL;
}

Also applies to: 1477-1482, 1594-1599, 2307-2312

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 1299 - 1304, Extract the repeated error_msg
computation into a small static helper (e.g. static inline char
*get_dry_run_error_msg(void)) that reads my_exception_log_index,
xact_action_counter and exception_log_ptr and returns
exception_log_ptr[my_exception_log_index].initial_error_message when
xact_action_counter == exception_log_ptr[my_exception_log_index].failed_action
and the initial_error_message is non-empty, otherwise returns NULL; replace the
duplicated ternary in the four handlers (the blocks around the places that set
error_msg in INSERT/UPDATE/DELETE/SQL handlers) with a call to
get_dry_run_error_msg() and keep the Assert(my_exception_log_index >= 0) in the
helper.

871-878: Consider clearing initial_error_message for consistency.

Lines 840-843 (TRANSDISCARD path) clear initial_error_message[0] = '\0', but lines 871-876 (SUB_DISABLE path) do not. While this is harmless since the worker exits after spock_disable_subscription(), adding the clear would maintain consistency between the two paths.

♻️ Suggested change for consistency
 			exception_log = &exception_log_ptr[my_exception_log_index];
 			exception_log->commit_lsn = InvalidXLogRecPtr;
+			exception_log->initial_error_message[0] = '\0';
 			MySpockWorker->restart_delay = 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 871 - 878, In the SUB_DISABLE path add the
same clearing of the saved initial error string as done in the TRANSDISCARD
path: after assigning exception_log = &exception_log_ptr[my_exception_log_index]
and before calling spock_disable_subscription()/elog(ERROR...), set
exception_log->initial_error_message[0] = '\0' so the SpockExceptionLog
(exception_log) is consistent with the TRANSDISCARD handling; keep the rest
(commit_lsn, MySpockWorker->restart_delay, spock_disable_subscription(), elog)
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/spock_apply.c`:
- Around line 1287-1308: Add a defensive Assert that my_exception_log_index is
valid before dereferencing exception_log_ptr in the DML handlers (the
INSERT/UPDATE/DELETE branches where exception_log_ptr[...] is accessed and
log_insert_exception(...) is called). Specifically, in the block using
exception_log_ptr[my_exception_log_index] (and before setting
exception_log_ptr[my_exception_log_index].local_tuple or reading
.initial_error_message), insert Assert(my_exception_log_index >= 0) to mirror
the check used in handle_begin/other code paths so the handlers (where
use_try_block is expected) validate the index before use.
- Around line 1299-1304: Extract the repeated error_msg computation into a small
static helper (e.g. static inline char *get_dry_run_error_msg(void)) that reads
my_exception_log_index, xact_action_counter and exception_log_ptr and returns
exception_log_ptr[my_exception_log_index].initial_error_message when
xact_action_counter == exception_log_ptr[my_exception_log_index].failed_action
and the initial_error_message is non-empty, otherwise returns NULL; replace the
duplicated ternary in the four handlers (the blocks around the places that set
error_msg in INSERT/UPDATE/DELETE/SQL handlers) with a call to
get_dry_run_error_msg() and keep the Assert(my_exception_log_index >= 0) in the
helper.
- Around line 871-878: In the SUB_DISABLE path add the same clearing of the
saved initial error string as done in the TRANSDISCARD path: after assigning
exception_log = &exception_log_ptr[my_exception_log_index] and before calling
spock_disable_subscription()/elog(ERROR...), set
exception_log->initial_error_message[0] = '\0' so the SpockExceptionLog
(exception_log) is consistent with the TRANSDISCARD handling; keep the rest
(commit_lsn, MySpockWorker->restart_delay, spock_disable_subscription(), elog)
unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 031139ab-7009-4644-ab84-545c7b0c6bb2

📥 Commits

Reviewing files that changed from the base of the PR and between 936e414 and 7f4beea.

⛔ Files ignored due to path filters (1)
  • tests/regress/expected/dry_run_logging.out is excluded by !**/*.out
📒 Files selected for processing (2)
  • src/spock_apply.c
  • tests/regress/sql/dry_run_logging.sql
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/regress/sql/dry_run_logging.sql

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant