Skip to content

htlcswitch: add FSM fuzz harness for channelLink commit protocol#1

Open
MPins wants to merge 10 commits intomasterfrom
link_fsm_fuzz
Open

htlcswitch: add FSM fuzz harness for channelLink commit protocol#1
MPins wants to merge 10 commits intomasterfrom
link_fsm_fuzz

Conversation

@MPins
Copy link
Copy Markdown
Owner

@MPins MPins commented Mar 12, 2026

The channelLink commit protocol — the sequence of CommitSig / RevokeAndAck exchanges that advance commitment heights on both sides of a channel — is one of the most critical and subtle state machines in lnd. Despite extensive unit tests, the ordering of these messages is highly concurrent and easy to get wrong. A single missed revocation or out-of-order commit can corrupt channel state irreparably.

This PR adds a coverage-guided fuzz harness that exercises the full commit protocol FSM by randomly interleaving HTLC additions, commits, revocations, settlements, and failures from both Alice and Bob. The fuzzer checks structural invariants (monotonic commit heights, mirror symmetry between peers) after every event, catching protocol violations that deterministic tests cannot anticipate.

Testing

go test ./htlcswitch/ -run TestChannelLinkFSMScenarios -v
go test ./htlcswitch -run=^$ -fuzz=FuzzChannelLinkFSM -fuzztime=1m

Comment thread htlcswitch/fuzz_link_test.go Outdated
Comment thread htlcswitch/link_isolated_test.go Outdated
@coveralls
Copy link
Copy Markdown

coveralls commented Mar 12, 2026

Coverage Report for CI Build 25184596720

Coverage increased (+0.04%) to 62.232%

Details

  • Coverage increased (+0.04%) from the base build.
  • Patch coverage: 53 uncovered changes across 3 files (142 of 195 lines covered, 72.82%).
  • 97 coverage regressions across 15 files.

Uncovered Changes

File Changed Covered %
htlcswitch/mock.go 55 19 34.55%
htlcswitch/test_utils.go 96 85 88.54%
lnwallet/channel.go 29 23 79.31%

Coverage Regressions

97 previously-covered lines in 15 files lost coverage.

Top 10 Files by Coverage Loss Lines Losing Coverage Coverage
lnrpc/websocket_proxy.go 21 77.58%
contractcourt/chain_watcher.go 18 76.12%
discovery/syncer.go 10 85.05%
discovery/gossiper.go 8 78.21%
funding/manager.go 6 74.58%
graph/db/notifications.go 6 83.61%
htlcswitch/link.go 6 78.23%
watchtower/wtclient/queue.go 6 85.92%
chainntnfs/bitcoindnotify/bitcoind.go 3 80.4%
lnwire/ping.go 3 77.27%

Coverage Stats

Coverage Status
Relevant Lines: 230926
Covered Lines: 143710
Line Coverage: 62.23%
Coverage Strength: 19066.91 hits per line

💛 - Coveralls

@MPins MPins force-pushed the link_fsm_fuzz branch 3 times, most recently from 001781c to 1400d9a Compare March 13, 2026 21:50
@MPins MPins force-pushed the link_fsm_fuzz branch 4 times, most recently from 1b219d1 to ca4be0a Compare March 28, 2026 00:16
@MPins MPins force-pushed the link_fsm_fuzz branch 10 times, most recently from fcb4bb0 to 6e4610c Compare April 1, 2026 12:59
@MPins MPins force-pushed the link_fsm_fuzz branch 2 times, most recently from 78af141 to 7a5aa99 Compare April 8, 2026 19:40
@MPins
Copy link
Copy Markdown
Owner Author

MPins commented Apr 9, 2026

@Crypt-iQ when you have time, could you take a look?

@Crypt-iQ
Copy link
Copy Markdown

Crypt-iQ commented Apr 9, 2026

@Crypt-iQ when you have time, could you take a look?

Sure I will take a look

@MPins MPins force-pushed the link_fsm_fuzz branch 4 times, most recently from b28e3d8 to 5f570d5 Compare April 11, 2026 01:34
Copy link
Copy Markdown

@Crypt-iQ Crypt-iQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I took a (quick) look at the fuzz test because I've lost context and am short on time so I can't give a detailed line-by-line code-level review that I'd like to give. I skipped over the commits that stub out the signing stuff, seems good because signing could be a bottleneck. Using a RAM disk is a good idea since you want to avoid disk i/o. My main comment is that the fuzz harness could use more of the fuzz input. For example, when a new fee is being sent, the fee is calculated by newFee := len(f.aliceLink.channel.ActiveHtlcs())*100 + 1000. I think most of the messages should instead be constructed from the fuzz input (besides things like signatures that would cause obvious invalidity but even those you could sometimes make invalid). So instead of the fuzz input being a sequence of events, it would be a sequence of events + some byte slice that can be parsed as message fields. You could also just accept one byte blob as you do here, but parse it into events + message fields. Also, if possible, I think it'd be good to see if the link can start up after being stopped. There have been several bugs over the years where the link can't start up properly due to reestablishment (and I think the other work-in-progress link harness found an issue just like this). Finally, it would be a good idea to measure the coverage and see if there are any obvious blind spots for this fuzzer and then improve on those by adding extra events.

Comment thread htlcswitch/fuzz_link_test.go
}

// Check total balances.
var aliceHtlcAmt, bobHtlcAmt lnwire.MilliSatoshi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to assert that alice or bob's balance is a certain expected value? this works, but I'm wondering if there's any way to detect funds loss

Comment thread htlcswitch/link_test.go Outdated
Comment thread htlcswitch/fuzz_link_test.go Outdated
Comment thread htlcswitch/fuzz_link_test.go Outdated
Comment thread htlcswitch/fuzz_link_test.go
Comment thread htlcswitch/fuzz_link_test.go
@MPins
Copy link
Copy Markdown
Owner Author

MPins commented Apr 13, 2026

Thank you @Crypt-iQ for your time. I’ll be addressing the comments above.

@MPins MPins force-pushed the link_fsm_fuzz branch 6 times, most recently from e12af25 to d8ef428 Compare April 22, 2026 20:23
MPins added 4 commits April 30, 2026 11:35
Expose the `invoiceRegistry` field in `singleLinkTestHarness` so
tests can register and look up invoices directly.

Add `generateSingleHopHtlc`, a test helper that builds a single-hop
`UpdateAddHTLC` with a random preimage, intended for use in unit and
fuzz tests.
Add a no-op MailBox implementation and a no-op ticker for use in
the channelLink FSM fuzz harness.
Replace createChannelLinkWithPeer (which required a Switch and spawned the
htlcManager goroutine) with newFuzzLink, a minimal link factory that:

- accepts dependencies directly (registry, preimage cache, circuit map,
  bestHeight) instead of a mockServer, so no Switch or background goroutines
  are created at all
- sets link.upstream directly to a buffered channel controlled by the
  caller, bypassing the mailbox entirely
- attaches a mockMailBox so mailBox.ResetPackets() in resumeLink succeeds
Add a failReason string field to channelLink that is populated by
failf alongside the existing failed flag. This gives fuzz and unit
tests direct access to the human-readable failure reason without
requiring a dedicated OnChannelFailure callback or log scraping.
@MPins MPins force-pushed the link_fsm_fuzz branch 4 times, most recently from e6dcfd1 to 22b07c0 Compare April 30, 2026 16:31
MPins added 5 commits April 30, 2026 16:15
Introduce `fuzz_link_test.go` with a model-based fuzzer that drives
the Alice-Bob channel link through arbitrary sequences of protocol
events and checks key invariants after each step.
Introduce fuzzSigner and fuzzSigVerifier in the fuzz harness, along
with the SigVerifier hook in LightningChannel (WithSigVerifier,
verifySig) and a matching SigPool extension (VerifyFunc field) so the
harness can bypass secp256k1 verification end-to-end. Also refactors
createTestChannel to accept functional options (testChannelOpt) so
the signer and channel options can be injected from tests.
Introduce CommitKeyDeriverFunc and WithCommitKeyDeriver to allow
LightningChannel to bypass the secp256k1-based DeriveCommitmentKeys
on every commit round. All internal call sites are migrated to
lc.deriveCommitmentKeys. The fuzz harness injects fuzzCommitKeyDeriver,
a trivial identity deriver that avoids scalar-multiplication overhead.
createTestChannel started alicePool and bobPool but never stopped
them. During fuzzing this caused goroutines to leak per. Register
t.Cleanup handlers to call Stop() on both pools so all workers are
torn down when the test ends.
newMockRegistry started an InvoiceRegistry but never stopped it.
InvoiceRegistry internally starts two background goroutines —
invoiceEventLoop and the InvoiceExpiryWatcher mainLoop — that
run for the lifetime of the registry. Without a matching Stop()
call both goroutines leaked for every test that called
newMockRegistry, accumulating thousands of goroutines during
fuzzing.

Register a t.Cleanup to call registry.Stop() so both loops are
torn down when the test ends.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants