Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion Documentation/block/biovecs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,6 @@ Usage of helpers:
bio_first_bvec_all()
bio_first_page_all()
bio_first_folio_all()
bio_last_bvec_all()

* The following helpers iterate over single-page segment. The passed 'struct
bio_vec' will contain a single-page IO vector during the iteration::
Expand Down
6 changes: 6 additions & 0 deletions Documentation/block/inline-encryption.rst
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,12 @@ it to a bio, given the blk_crypto_key and the data unit number that will be used
for en/decryption. Users don't need to worry about freeing the bio_crypt_ctx
later, as that happens automatically when the bio is freed or reset.

To submit a bio that uses inline encryption, users must call
``blk_crypto_submit_bio()`` instead of the usual ``submit_bio()``. This will
submit the bio to the underlying driver if it supports inline crypto, or else
call the blk-crypto fallback routines before submitting normal bios to the
underlying drivers.

Finally, when done using inline encryption with a blk_crypto_key on a
block_device, users must call ``blk_crypto_evict_key()``. This ensures that
the key is evicted from all keyslots it may be programmed into and unlinked from
Expand Down
64 changes: 60 additions & 4 deletions Documentation/block/ublk.rst
Original file line number Diff line number Diff line change
Expand Up @@ -260,9 +260,12 @@ The following IO commands are communicated via io_uring passthrough command,
and each command is only for forwarding the IO and committing the result
with specified IO tag in the command data:

- ``UBLK_IO_FETCH_REQ``
Traditional Per-I/O Commands
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Sent from the server IO pthread for fetching future incoming IO requests
- ``UBLK_U_IO_FETCH_REQ``

Sent from the server I/O pthread for fetching future incoming I/O requests
destined to ``/dev/ublkb*``. This command is sent only once from the server
IO pthread for ublk driver to setup IO forward environment.

Expand All @@ -278,7 +281,7 @@ with specified IO tag in the command data:
supported by the driver, daemons must be per-queue instead - i.e. all I/Os
associated to a single qid must be handled by the same task.

- ``UBLK_IO_COMMIT_AND_FETCH_REQ``
- ``UBLK_U_IO_COMMIT_AND_FETCH_REQ``

When an IO request is destined to ``/dev/ublkb*``, the driver stores
the IO's ``ublksrv_io_desc`` to the specified mapped area; then the
Expand All @@ -293,7 +296,7 @@ with specified IO tag in the command data:
requests with the same IO tag. That is, ``UBLK_IO_COMMIT_AND_FETCH_REQ``
is reused for both fetching request and committing back IO result.

- ``UBLK_IO_NEED_GET_DATA``
- ``UBLK_U_IO_NEED_GET_DATA``

With ``UBLK_F_NEED_GET_DATA`` enabled, the WRITE request will be firstly
issued to ublk server without data copy. Then, IO backend of ublk server
Expand Down Expand Up @@ -322,6 +325,59 @@ with specified IO tag in the command data:
``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
the server buffer (pages) read to the IO request pages.

Batch I/O Commands (UBLK_F_BATCH_IO)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``UBLK_F_BATCH_IO`` feature provides an alternative high-performance
I/O handling model that replaces the traditional per-I/O commands with
per-queue batch commands. This significantly reduces communication overhead
and enables better load balancing across multiple server tasks.

Key differences from traditional mode:

- **Per-queue vs Per-I/O**: Commands operate on queues rather than individual I/Os
- **Batch processing**: Multiple I/Os are handled in single operations
- **Multishot commands**: Use io_uring multishot for reduced submission overhead
- **Flexible task assignment**: Any task can handle any I/O (no per-I/O daemons)
- **Better load balancing**: Tasks can adjust their workload dynamically

Batch I/O Commands:

- ``UBLK_U_IO_PREP_IO_CMDS``

Prepares multiple I/O commands in batch. The server provides a buffer
containing multiple I/O descriptors that will be processed together.
This reduces the number of individual command submissions required.

- ``UBLK_U_IO_COMMIT_IO_CMDS``

Commits results for multiple I/O operations in batch, and prepares the
I/O descriptors to accept new requests. The server provides a buffer
containing the results of multiple completed I/Os, allowing efficient
bulk completion of requests.

- ``UBLK_U_IO_FETCH_IO_CMDS``

**Multishot command** for fetching I/O commands in batch. This is the key
command that enables high-performance batch processing:

* Uses io_uring multishot capability for reduced submission overhead
* Single command can fetch multiple I/O requests over time
* Buffer size determines maximum batch size per operation
* Multiple fetch commands can be submitted for load balancing
* Only one fetch command is active at any time per queue
* Supports dynamic load balancing across multiple server tasks

It is one typical multishot io_uring request with provided buffer, and it
won't be completed until any failure is triggered.

Each task can submit ``UBLK_U_IO_FETCH_IO_CMDS`` with different buffer
sizes to control how much work it handles. This enables sophisticated
load balancing strategies in multi-threaded servers.

Migration: Applications using traditional commands (``UBLK_U_IO_FETCH_REQ``,
``UBLK_U_IO_COMMIT_AND_FETCH_REQ``) cannot use batch mode simultaneously.

Zero copy
---------

Expand Down
20 changes: 20 additions & 0 deletions Documentation/networking/iou-zcrx.rst
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,26 @@ Return buffers back to the kernel to be used again::
rqe->len = cqe->res;
IO_URING_WRITE_ONCE(*refill_ring.ktail, ++refill_ring.rq_tail);

Area chunking
-------------

zcrx splits the memory area into fixed-length physically contiguous chunks.
This limits the maximum buffer size returned in a single io_uring CQE. Users
can provide a hint to the kernel to use larger chunks by setting the
``rx_buf_len`` field of ``struct io_uring_zcrx_ifq_reg`` to the desired length
during registration. If this field is set to zero, the kernel defaults to
the system page size.

To use larger sizes, the memory area must be backed by physically contiguous
ranges whose sizes are multiples of ``rx_buf_len``. It also requires kernel
and hardware support. If registration fails, users are generally expected to
fall back to defaults by setting ``rx_buf_len`` to zero.

Larger chunks don't give any additional guarantees about buffer sizes returned
in CQEs, and they can vary depending on many factors like traffic pattern,
hardware offload, etc. It doesn't require any application changes beyond zcrx
registration.

Testing
=======

Expand Down
20 changes: 10 additions & 10 deletions block/bfq-iosched.c
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@ static struct kmem_cache *bfq_pool;
#define BFQ_RQ_SEEKY(bfqd, last_pos, rq) \
(get_sdist(last_pos, rq) > \
BFQQ_SEEK_THR && \
(!blk_queue_nonrot(bfqd->queue) || \
(blk_queue_rot(bfqd->queue) || \
blk_rq_sectors(rq) < BFQQ_SECT_THR_NONROT))
#define BFQQ_CLOSE_THR (sector_t)(8 * 1024)
#define BFQQ_SEEKY(bfqq) (hweight32(bfqq->seek_history) > 19)
Expand Down Expand Up @@ -4165,7 +4165,7 @@ static bool bfq_bfqq_is_slow(struct bfq_data *bfqd, struct bfq_queue *bfqq,

/* don't use too short time intervals */
if (delta_usecs < 1000) {
if (blk_queue_nonrot(bfqd->queue))
if (!blk_queue_rot(bfqd->queue))
/*
* give same worst-case guarantees as idling
* for seeky
Expand Down Expand Up @@ -4487,7 +4487,7 @@ static bool idling_boosts_thr_without_issues(struct bfq_data *bfqd,
struct bfq_queue *bfqq)
{
bool rot_without_queueing =
!blk_queue_nonrot(bfqd->queue) && !bfqd->hw_tag,
blk_queue_rot(bfqd->queue) && !bfqd->hw_tag,
bfqq_sequential_and_IO_bound,
idling_boosts_thr;

Expand Down Expand Up @@ -4521,7 +4521,7 @@ static bool idling_boosts_thr_without_issues(struct bfq_data *bfqd,
* flash-based device.
*/
idling_boosts_thr = rot_without_queueing ||
((!blk_queue_nonrot(bfqd->queue) || !bfqd->hw_tag) &&
((blk_queue_rot(bfqd->queue) || !bfqd->hw_tag) &&
bfqq_sequential_and_IO_bound);

/*
Expand Down Expand Up @@ -4722,7 +4722,7 @@ bfq_choose_bfqq_for_injection(struct bfq_data *bfqd)
* there is only one in-flight large request
* at a time.
*/
if (blk_queue_nonrot(bfqd->queue) &&
if (!blk_queue_rot(bfqd->queue) &&
blk_rq_sectors(bfqq->next_rq) >=
BFQQ_SECT_THR_NONROT &&
bfqd->tot_rq_in_driver >= 1)
Expand Down Expand Up @@ -6340,7 +6340,7 @@ static void bfq_update_hw_tag(struct bfq_data *bfqd)
bfqd->hw_tag_samples = 0;

bfqd->nonrot_with_queueing =
blk_queue_nonrot(bfqd->queue) && bfqd->hw_tag;
!blk_queue_rot(bfqd->queue) && bfqd->hw_tag;
}

static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)
Expand Down Expand Up @@ -7293,7 +7293,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_queue *eq)
INIT_HLIST_HEAD(&bfqd->burst_list);

bfqd->hw_tag = -1;
bfqd->nonrot_with_queueing = blk_queue_nonrot(bfqd->queue);
bfqd->nonrot_with_queueing = !blk_queue_rot(bfqd->queue);

bfqd->bfq_max_budget = bfq_default_max_budget;

Expand Down Expand Up @@ -7328,9 +7328,9 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_queue *eq)
* Begin by assuming, optimistically, that the device peak
* rate is equal to 2/3 of the highest reference rate.
*/
bfqd->rate_dur_prod = ref_rate[blk_queue_nonrot(bfqd->queue)] *
ref_wr_duration[blk_queue_nonrot(bfqd->queue)];
bfqd->peak_rate = ref_rate[blk_queue_nonrot(bfqd->queue)] * 2 / 3;
bfqd->rate_dur_prod = ref_rate[!blk_queue_rot(bfqd->queue)] *
ref_wr_duration[!blk_queue_rot(bfqd->queue)];
bfqd->peak_rate = ref_rate[!blk_queue_rot(bfqd->queue)] * 2 / 3;

/* see comments on the definition of next field inside bfq_data */
bfqd->actuator_load_threshold = 4;
Expand Down
14 changes: 1 addition & 13 deletions block/bio-integrity-auto.c
Original file line number Diff line number Diff line change
Expand Up @@ -52,19 +52,7 @@ static bool bip_should_check(struct bio_integrity_payload *bip)

static bool bi_offload_capable(struct blk_integrity *bi)
{
switch (bi->csum_type) {
case BLK_INTEGRITY_CSUM_CRC64:
return bi->metadata_size == sizeof(struct crc64_pi_tuple);
case BLK_INTEGRITY_CSUM_CRC:
case BLK_INTEGRITY_CSUM_IP:
return bi->metadata_size == sizeof(struct t10_pi_tuple);
default:
pr_warn_once("%s: unknown integrity checksum type:%d\n",
__func__, bi->csum_type);
fallthrough;
case BLK_INTEGRITY_CSUM_NONE:
return false;
}
return bi->metadata_size == bi->pi_tuple_size;
}

/**
Expand Down
Loading
Loading