Replace wal_sender_timeout-based liveness with TCP keepalive.#373
Replace wal_sender_timeout-based liveness with TCP keepalive.#373ibrarahmad wants to merge 1 commit intomainfrom
Conversation
The apply worker previously relied on wal_sender_timeout as both a server-side disconnect trigger and an indirect keepalive pressure on the subscriber. This caused spurious disconnects in two scenarios: a flood of 'w' messages keeping the subscriber too busy to send 'r' feedback in time, and large transactions whose apply time exceeded wal_sender_timeout. The workaround was maybe_send_feedback(), which force-sent 'r' after every 10 'w' messages or wal_sender_timeout/2, whichever came first. This was a fragile band-aid that coupled subscriber behavior to a server GUC it cannot control. Replace the entire mechanism with a clean two-layer model: - TCP keepalive (keepalives_idle=10, keepalives_interval=5, keepalives_count=3) is the primary liveness detector on both sides. A dead network or crashed host is detected in ~25 seconds. - wal_sender_timeout=0 is set on replication connections so the walsender never disconnects due to missing 'r' feedback. Liveness on the server side is now handled entirely by TCP keepalive. - spock.apply_idle_timeout (default 300s) is a subscriber-side safety net for a hung-but-connected walsender whose TCP keepalive probes are answered by the kernel but sends no data. Set to 0 to disable. Fix a bug in last_receive_timestamp handling: it was updated unconditionally after every PQgetCopyData call, including when r==0 (no data available). Each 1-second WL_TIMEOUT spin silently reset the timer, making apply_idle_timeout never fire. Move the update to after the r==0 guard so it reflects actual data receipt only. Remove maybe_send_feedback() as it is no longer needed.
📝 WalkthroughWalkthroughA new idle timeout control mechanism was introduced for apply workers with a configurable GUC parameter. Replication connection handling was extended to disable server-side wal_sender_timeout. Timeout logic in apply worker was refactored from keepalive-based to idle-timeout-based, with supporting infrastructure adjustments throughout the codebase. Changes
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/spock.c (1)
345-356: Consider making TCP keepalive parameters configurable.The hardcoded values (idle=10s, interval=5s, count=3) result in ~25s dead connection detection. While reasonable for most deployments, high-latency or unreliable network environments may experience false-positive disconnects.
Consider exposing these as GUCs (e.g.,
spock.keepalives_idle,spock.keepalives_interval,spock.keepalives_count) to allow tuning without code changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock.c` around lines 345 - 356, Replace the hardcoded TCP keepalive literals in the keys/vals block with configurable GUC-backed values: define GUCs (e.g., spock.keepalives_idle, spock.keepalives_interval, spock.keepalives_count) as int variables (suggest names spock_keepalives_idle, spock_keepalives_interval, spock_keepalives_count) during module initialization (e.g., in _PG_init or the existing GUC registration area), register them with DefineCustomIntVariable, and then use those variables' stringified values when populating vals[] for the keys "keepalives_idle", "keepalives_interval", and "keepalives_count" in the code that sets keys[i]/vals[i]; keep the default values 10/5/3 and ensure bounds checking on the GUCs (positive integers) when registering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/spock.c`:
- Around line 345-356: Replace the hardcoded TCP keepalive literals in the
keys/vals block with configurable GUC-backed values: define GUCs (e.g.,
spock.keepalives_idle, spock.keepalives_interval, spock.keepalives_count) as int
variables (suggest names spock_keepalives_idle, spock_keepalives_interval,
spock_keepalives_count) during module initialization (e.g., in _PG_init or the
existing GUC registration area), register them with DefineCustomIntVariable, and
then use those variables' stringified values when populating vals[] for the keys
"keepalives_idle", "keepalives_interval", and "keepalives_count" in the code
that sets keys[i]/vals[i]; keep the default values 10/5/3 and ensure bounds
checking on the GUCs (positive integers) when registering.
ℹ️ Review info
Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 69aec04d-1d81-4de7-913f-14b9f06f8761
📒 Files selected for processing (3)
include/spock.hsrc/spock.csrc/spock_apply.c
| * kernel ACKs them, but no data is being sent. | ||
| */ | ||
| if (rc & WL_TIMEOUT) | ||
| if (rc & WL_TIMEOUT && spock_apply_idle_timeout > 0) |
There was a problem hiding this comment.
It seems like if walsender just doesn't have data to send for a long time, subscriber will restart. Am I wrong?
It would be better to modify walsender little: skip keepalive messages being busy and rely on TCP status. But send keepalive messages if no data arrives from the WAL. In this case we don't need any subscriber-side GUC at all.
|
@ibrarahmad Needs rebase |
mason-sharp
left a comment
There was a problem hiding this comment.
Added a comment about having a non-zero wal_sender_timeout.
Also, needs a rebase.
Also, could use that test file.
| if (replication) | ||
| { | ||
| keys[i] = "options"; | ||
| vals[i] = "-c wal_sender_timeout=0"; |
There was a problem hiding this comment.
Maybe this should be non-zero, like 300000. Less?
What if there is a proxy that blocks the sender when it is trying to send WAL? The keepalives sent from the idle receiver won't make a difference. Meanwhile, WAL is accumulating.
The apply worker previously relied on wal_sender_timeout as both a server-side disconnect trigger and an indirect keepalive pressure on the subscriber. This caused spurious disconnects in two scenarios: a flood of 'w' messages keeping the subscriber too busy to send 'r' feedback in time, and large transactions whose apply time exceeded wal_sender_timeout.
The workaround was maybe_send_feedback(), which force-sent 'r' after every 10 'w' messages or wal_sender_timeout/2, whichever came first. This was a fragile band-aid that coupled subscriber behavior to a server GUC it cannot control.
Replace the entire mechanism with a clean two-layer model:
TCP keepalive (keepalives_idle=10, keepalives_interval=5, keepalives_count=3) is the primary liveness detector on both sides. A dead network or crashed host is detected in ~25 seconds.
wal_sender_timeout=0 is set on replication connections so the walsender never disconnects due to missing 'r' feedback. Liveness on the server side is now handled entirely by TCP keepalive.
spock.apply_idle_timeout (default 300s) is a subscriber-side safety net for a hung-but-connected walsender whose TCP keepalive probes are answered by the kernel but sends no data. Set to 0 to disable.
Fix a bug in last_receive_timestamp handling: it was updated unconditionally after every PQgetCopyData call, including when r==0 (no data available). Each 1-second WL_TIMEOUT spin silently reset the timer, making apply_idle_timeout never fire. Move the update to after the r==0 guard so it reflects actual data receipt only.
Remove maybe_send_feedback() as it is no longer needed.
SPOC-419