blk-cgroup: fix races and deadlocks #540

blktests-ci · 2026-02-03T19:09:23Z

Pull request for series with
subject: blk-cgroup: fix races and deadlocks
version: 2
url: https://patchwork.kernel.org/project/linux-block/list/?series=1050146

… blkcg_mutex blkg_destroy_all() iterates q->blkg_list without holding blkcg_mutex, which can race with blkg_free_workfn() that removes blkgs from the list while holding blkcg_mutex. Add blkcg_mutex protection around the q->blkg_list iteration to prevent potential list corruption or use-after-free issues. Signed-off-by: Yu Kuai <[email protected]>

…mutex bfq_end_wr_async() iterates q->blkg_list while only holding bfqd->lock, but not blkcg_mutex. This can race with blkg_free_workfn() that removes blkgs from the list while holding blkcg_mutex. Add blkcg_mutex protection in bfq_end_wr() before taking bfqd->lock to ensure proper synchronization when iterating q->blkg_list. Signed-off-by: Yu Kuai <[email protected]>

When switching an IO scheduler on a block device, blkcg_activate_policy() allocates blkg_policy_data (pd) for all blkgs attached to the queue. However, blkcg_activate_policy() may race with concurrent blkcg deletion, leading to use-after-free and memory leak issues. The use-after-free occurs in the following race: T1 (blkcg_activate_policy): - Successfully allocates pd for blkg1 (loop0->queue, blkcgA) - Fails to allocate pd for blkg2 (loop0->queue, blkcgB) - Enters the enomem rollback path to release blkg1 resources T2 (blkcg deletion): - blkcgA is deleted concurrently - blkg1 is freed via blkg_free_workfn() - blkg1->pd is freed T1 (continued): - Rollback path accesses blkg1->pd->online after pd is freed - Triggers use-after-free In addition, blkg_free_workfn() frees pd before removing the blkg from q->blkg_list. This allows blkcg_activate_policy() to allocate a new pd for a blkg that is being destroyed, leaving the newly allocated pd unreachable when the blkg is finally freed. Fix these races by extending blkcg_mutex coverage to serialize blkcg_activate_policy() rollback and blkg destruction, ensuring pd lifecycle is synchronized with blkg list visibility. Link: https://lore.kernel.org/all/[email protected]/ Fixes: f1c006f ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()") Signed-off-by: Zheng Qixing <[email protected]> Signed-off-by: Yu Kuai <[email protected]>

When switching IO schedulers on a block device, blkcg_activate_policy() can race with concurrent blkcg deletion, leading to a use-after-free in rcu_accelerate_cbs. T1: T2: blkg_destroy kill(&blkg->refcnt) // blkg->refcnt=1->0 blkg_release // call_rcu(__blkg_release) ... blkg_free_workfn ->pd_free_fn(pd) elv_iosched_store elevator_switch ... iterate blkg list blkg_get(blkg) // blkg->refcnt=0->1 list_del_init(&blkg->q_node) blkg_put(pinned_blkg) // blkg->refcnt=1->0 blkg_release // call_rcu again rcu_accelerate_cbs // uaf Fix this by checking hlist_unhashed(&blkg->blkcg_node) before getting a reference to the blkg. This is the same check used in blkg_destroy() to detect if a blkg has already been destroyed. If the blkg is already unhashed, skip processing it since it's being destroyed. Link: https://lore.kernel.org/all/[email protected]/ Fixes: f1c006f ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()") Signed-off-by: Zheng Qixing <[email protected]> Signed-off-by: Yu Kuai <[email protected]>

Move the teardown sequence which offlines and frees per-policy blkg_policy_data (pd) into a helper for readability. No functional change intended. Signed-off-by: Zheng Qixing <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Yu Kuai <[email protected]> Signed-off-by: Yu Kuai <[email protected]>

…cy() Some policies like iocost and iolatency perform percpu allocation in pd_alloc_fn(). Percpu allocation with queue frozen can cause deadlock because percpu memory reclaim may issue IO. Now that q->blkg_list is protected by blkcg_mutex, restructure blkcg_activate_policy() to allocate all pds before freezing the queue: 1. Allocate all pds with GFP_KERNEL before freezing the queue 2. Freeze the queue 3. Initialize and online all pds Note: Future work is to remove all queue freezing before blkcg_activate_policy() to fix the deadlocks thoroughly. Signed-off-by: Yu Kuai <[email protected]>

The current rq_qos_mutex handling has an awkward pattern where callers must acquire the mutex before calling rq_qos_add()/rq_qos_del(), and blkg_conf_open_bdev_frozen() had to release and re-acquire the mutex around queue freezing to maintain proper locking order (freeze queue before mutex). On the other hand, with rq_qos_mutex held after blkg_conf_prep(), there are many possible deadlocks: - allocating memory with GFP_KERNEL, like blk_throtl_init(); - allocating percpu memory, like pd_alloc_fn() for iocost/iolatency; This patch refactors the locking by: 1. Moving queue freeze and rq_qos_mutex acquisition inside rq_qos_add()/rq_qos_del(), with the correct order: freeze first, then acquire mutex. 2. Removing external mutex handling from wbt_init() since rq_qos_add() now handles it internally. 3. Removing rq_qos_mutex handling from blkg_conf_open_bdev() entirely, making it only responsible for parsing MAJ:MIN and opening the bdev. 4. Removing blkg_conf_open_bdev_frozen() and blkg_conf_exit_frozen() functions which are no longer needed. 5. Updating ioc_qos_write() to use the simpler blkg_conf_open_bdev() and blkg_conf_exit() functions. This eliminates the release-and-reacquire pattern and makes rq_qos_add()/rq_qos_del() self-contained, which is cleaner and reduces complexity. Each function now properly manages its own locking with the correct order: queue freeze → mutex acquire → modify → mutex release → queue unfreeze. Signed-off-by: Yu Kuai <[email protected]>

blktests-ci · 2026-02-03T19:09:24Z

Upstream branch: 130b8e3
series: https://patchwork.kernel.org/project/linux-block/list/?series=1050146
version: 2

hailan94 and others added 7 commits February 4, 2026 04:09

blktests-ci bot added new for-next V2 V2-ci-fail labels Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blk-cgroup: fix races and deadlocks #540

blk-cgroup: fix races and deadlocks #540

blktests-ci bot commented Feb 3, 2026

Uh oh!

blktests-ci bot commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

blk-cgroup: fix races and deadlocks #540

Are you sure you want to change the base?

blk-cgroup: fix races and deadlocks #540

Conversation

blktests-ci bot commented Feb 3, 2026

Uh oh!

blktests-ci bot commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants