Skip to content

Race condition on fullsync start/stop [JIRA: RIAK-2535] #741

Description

@macintux

The repl_cancel_fullsync test can fail when the sync_worker field of the state record in riak_repl2_fssource is undefined and riak_repl_keylist_server:cancel_fullsync is invoked. This happens if the worker has not started yet.

Extraordinarily unlikely to happen in production, but the code should handle the situation better. Some thoughts from @bsparrow435:

so the worker hadnt started yet
and it tried to do a gen_fsm:send_event to an undefined pid

2016-04-29 00:25:13.232 [info]  ---riak_test--- Starting fullsync.
2016-04-29 00:25:13.588 [info]  ---riak_test--- Stopping fullsync.
maybe we should wait for worker start here
instead of checking if the fscoordinator is running

honestly cancel_fullsync should be able to catch this in a guard and return no workers started

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions