PyNVMe3 IO Models
Last Modified: September 5, 2025
Copyright © 2020-2025 GENG YUN Technology Pte. Ltd.
All Rights Reserved.
1. IOWorker
With ioworker, we can hit greater IO pressure than fio, and meanwhile achieve a variety of test targets and features, such as SGL, ZNS, etc. PyNVMe3 can also define other actions in scripts while ioworker is running, such as power cycle, reset, and commands. This allows the script to enable more complex test scenarios. Each ioworker will create a separate child process to send and receive IO, and will create its own qpair, which will be deleted after the ioworker is completed.
In order to make better use of IOWorker, we will introduce the parameters of IOWorker one by one with pieces of demonstrated examples. But the test script does not need to define every parameter one by one, because the default values are reasonable for most of the time. IOWorker has many parameters, so scripts have to use keyword to define each parameter.
1.1 Parameters
io_size
io_size defines the size of each IO, which is in LBA units, and the default is 8 LBA. It can be a fixed size, or a list of multiple sizes. If the proportion of the sizes is not evenly distributed, we can the percentage in the dictionary form.
The script below demonstrates 3 cases: 4k random read; 4K/8k/128k uniform mixed random read; 50% 4K random read, 25% 8K random read, and 25% 128K random read.
def test_ioworker_io_size(nvme0n1):
nvme0n1.ioworker(io_size=8,
time=5).start().close()
nvme0n1.ioworker(io_size=[8, 16, 256],
time=5).start().close()
nvme0n1.ioworker(io_size={8:50, 16:25, 256:25},
time=5).start().close()
lba_align
lba_align sets the alignment (in LBA units) of the starting LBA for every I/O that ioworker issues. The starting LBA (slba) of each command will be a multiple of lba_align. Default: 1 (no additional alignment constraint).
def test_ioworker_lba_align(nvme0n1):
# Issue 4KiB random reads; each IO starts on a 4KiB boundary.
nvme0n1.ioworker(io_size=8, # 8 LBAs = 4KiB
lba_align=8, # align starts to 4KiB boundaries
lba_random=True,
time=5).start().close()
time
time controls the running time of the ioworker in seconds. Below is an example of an ioworker running for 5 seconds.
def test_ioworker_time(nvme0n1):
nvme0n1.ioworker(io_size=8,
time=5).start().close()
io_count
io_count specify the number of IOs to send in the ioworker. The default value is 0, which means unlimited. Either io_count or time has to be specified. When both are specified, the ioworker ends when either limit is met. The following example demonstrates sending 10,000 IO.
def test_ioworker_io_count(nvme0n1):
nvme0n1.ioworker(io_size=8,
io_count=10000).start().close()
lba_count
lba_count caps the total number of LBAs the ioworker processes before it stops. It’s more precise than io_count (which counts commands) and is especially handy when io_size mixes multiple sizes.
def test_ioworker_lba_count_mixed(nvme0n1):
gib = 1024 * 1024 * 1024
lbas_1gib = gib // nvme0n1.sector_size # convert 1 GiB to LBAs
nvme0n1.ioworker(
io_size={8: 60, 128: 40}, # 4 KiB & 64 KiB mix (if 1 LBA = 512 B)
lba_align=8, # align to 4 KiB boundaries
lba_random=True,
read_percentage=70, # 70% reads, 30% writes
lba_count=lbas_1gib, # stop after totaling 1 GiB of LBAs
time=600 # safety timeout; first condition wins
).start().close()
lba_random
lba_random specifies the percentage of random IO, the default is True, which means 100% random LBA. The following example demonstrates a sequential IO and a 50%-random IO.
def test_ioworker_lba_random(nvme0n1):
nvme0n1.ioworker(lba_random=False,
time=5).start().close()
nvme0n1.ioworker(lba_random=50,
time=5).start().close()
lba_start
lba_start sets the starting LBA for the first command. Default: if region_start is provided, the first command starts at region_start; otherwise it starts at 0.
lba_step
lba_step can only be used in sequential IO, where the starting LBA of IO will be controlled by lba_step. The size of the lba_step, like the io_size, is in LBA unit. The following example demonstrates an IO size of 4k sequential read, with a 4K-gap between each IO.
def test_ioworker_lba_step(nvme0n1):
nvme0n1.ioworker(io_size=8,
lba_random=False,
lba_step=16,
time=5).start().close()
With lba_step, ioworker can also decrease the LBA address of the IO. Script can set it to a negative number. The following example demonstrates reading in reverse order, where ioworker sends read commands on LBA 10, 9, 8, 7, 6, 5, 4, 3, 2, 1.
def test_ioworker_lba_step(nvme0n1):
nvme0n1.ioworker(lba_random=False,
io_size=1,
lba_start=10,
lba_step=-1,
io_count=10).start().close()
When the lba_step is set to 0, ioworker can repeatedly read and write to the specified LBA.
def test_ioworker_lba_step(nvme0n1):
nvme0n1.ioworker(lba_random=False,
lba_start=100,
lba_step=0,
time=5).start().close()
start_time
start_time specifies the earliest epoch timestamp at which the ioworker can begin issuing and completing I/O operations. Use a value generated by time.time() (optionally with an added offset) to introduce a pre-run idle period or synchronize the start of multiple ioworkers. By default, the ioworker starts sending I/O immediately.
def test_ioworker_start_time(nvme0n1):
import time
target = time.time() + 10
start_time = time.time()
wlist = []
for i in range(5):
w = nvme0n1.ioworker(io_size=8,
read_percentage=100,
time=10,
cpu_id=i,
start_time=target).start()
wlist.append(w)
time.sleep(1)
for w in wlist:
r = w.close()
read_percentage
read_percentage Specify the ratio of reads and writes, 0 means all write, 100 means all read. The default is 100. The following is an example of 50% each.
def test_ioworker_read_percentage(nvme0n1):
nvme0n1.ioworker(read_percentage=50,
time=5).start().close()
op_percentage
op_percentage lets you specify an explicit mix of NVMe opcodes (not limited to read/write). Provide a dict of {opcode: percentage}. If both op_percentage and read_percentage are set, op_percentage takes precedence. Percentages may be decimals; the total must sum to exactly 100.00% (to two decimal places).
def test_ioworker_op_percentage_int(nvme0n1):
nvme0n1.ioworker(
op_percentage={2: 40, 1: 30, 9: 30}, # Read/Write/Deallocate
time=5
).start().close()
def test_ioworker_op_percentage_decimal(nvme0n1):
nvme0n1.ioworker(
op_percentage={2: 33.34, 1: 33.33, 9: 33.33}, # must sum to 100.00
lba_random=True,
io_size=8,
time=10
).start().close()
sgl_percentage
sgl_percentage specifies the percentage of IO using SGL. 0 means only PRP and 100 means only SGL. The default value is 0. The following example demonstrates setting the commands issued by ioworker to use 50% PRP and 50% SGL.
def test_ioworker_sgl_percentage(nvme0n1):
nvme0n1.ioworker(sgl_percentage=50,
time=5).start().close()
qdepth
qdepth specifies the queue depth of the Qpair object created by the ioworker. The default value is 63. Below is an example of an IO queue depth of 127 (Q’s size is 128) used in ioworker.
def test_ioworker_qdepth(nvme0n1):
nvme0n1.ioworker(qdepth=127,
time=5).start().close()
qprio
qprio specifies the priority of the SQ created by the ioworker. The default value is 0. This parameter is only valid when the arbitration mechanism is selected as weighted round robin with urgent priority (WRR).
def test_ioworker_qprio(nvme0n1):
nvme0n1.ioworker(qprio=0,
time=5).start().close()
region_start
IOWorker sends IO in the specified LBA region, from region_start to region_end. Below is an example of sending IO starting from an LBA 0x10.
def test_ioworker_region_start(nvme0n1):
nvme0n1.ioworker(region_start=0x10,
lba_random=True,
time=5).start().close()
region_end
IOWorker sends IO in the specified LBA region, from region_start to region_end. region_end is not included in the region. Its default value is the max_lba of the drive. When send IO with sequential LBA, and neither time nor io_count specified, ioworker send IO from region_start to region_end by one pass. Below is an example of sending IO from LBA 0x10 to 0xff.
def test_ioworker_region_end(nvme0n1):
nvme0n1.ioworker(region_start=0x10,
region_end=0x100,
time=5).start().close()
nvme0n1.ioworker(region_start=[0x10, 0x1010],
region_end=[0x100, 0x1100], # two regions: 0x10-0x100, 0x1010-0x1100
time=5).start().close()
iops
In order to construct test scenarios under different pressures, the iops parameter in ioworker can specify the maximum IOPS. Then ioworker limits the speed at which IOs are sent. The default value is 0, which means unlimited. Below is an example that specifies an IOPS pressure of 12345 IO per second.
def test_ioworker_iops(nvme0n1):
nvme0n1.ioworker(iops=12345,
time=5).start().close()
io_flags
io_flags specifies the 16 bits of dword12 of io commands issued in the ioworker. The default value is 0. The following is an example of sending write command with FUA bit enabled.
def test_ioworker_io_flags(nvme0n1):
nvme0n1.ioworker(io_flags=0x4000,
read_percentage=0,
time=5).start().close()
distribution
distribution parameter divides the whole LBA space into 100 parts, and distribute all 10,000 parts of IO into 100 parts of LBA space. The list indicates how to allocate 10,000 IOs to these 100 different parts. This parameter can be used to implement the JEDEC endurance workload as below: 1000 IOs are allocated in each of the first 5 1% intervals, that is, the first 5% interval contains half of the IO; Each of the 15 1% intervals of 5%-20% allocates 200 IOs, that is, this 15% LBA space contains 30% of IO; The last 80% of the interval contains the remaining 20%.
def test_ioworker_jedec_workload(nvme0n1):
distribution = [1000]*5 + [200]*15 + [25]*80
iosz_distribution = {1: 4,
2: 1,
3: 1,
4: 1,
5: 1,
6: 1,
7: 1,
8: 67,
16: 10,
32: 7,
64: 3,
128: 3}
nvme0n1.ioworker(io_size=iosz_distribution,
lba_random=True,
qdepth=128,
distribution=distribution,
read_percentage=0,
ptype=0xbeef, pvalue=100,
time=1).start().close()
ptype and pvalue
The same as ptype and pvalue of Buffer objects, ioworker also can specify data pattern with these two parameter. The default pattern in ioworker is a total random buffer that cannot be compressed.
def test_ioworker_pvalue(nvme0n1):
nvme0n1.ioworker(ptype=32,
pvalue=0x5a5a5a5a,
read_percentage=0,
time=5).start().close()
io_sequence
io_sequence can specify the starting LBA, the number of LBAs, the opcode of the command, and the send timestamp (in us) for each IO sent by ioworker. io_sequence is a list containing commands information, and one command information is (slba, nlb, opcode, time_sent_us). The following is an example of sending read and write commands via ioworker. With parameters, the script can send the specified IO at the specified time through ioworker.
def test_ioworker_pvalue(nvme0n1):
nvme0n1.ioworker(io_sequence=[(0, 1, 2, 0),
(0, 1, 1, 1000000)],
ptype=0, pvalue=0).start().close()
slow_latency
slow_latency is in unit of microseconds (us). When the IO latency is greater than this parameter, ioworker prints a debug message and throws a warning. The default is 1 second.
def test_ioworker_slow_latency(nvme0n1):
nvme0n1.ioworker(io_size=128,
slow_latency=2000_000,
time=5).start().close()
exit_on_error
When any IO command fails, the ioworker exits immediately. If you want to continue running ioworker when any IO command fails, you need to specify this parameter exit_on_error to False.
def test_ioworker_exit_on_error(nvme0n1):
nvme0n1.ioworker(exit_on_error=True,
time=5).start().close()
verify_disable
verify_disable turns off data-integrity checking for the affected operation(s), bypassing CRC/pattern validation to remove host-side overhead during pure performance runs. Default: False (verification is enabled if globally turned on).
def test_perf_unverified_ioworker(nvme0n1):
a = nvme0n1.ioworker(
io_size=8, # 4 KiB if 1 LBA = 512 B
lba_align=8,
lba_random=True,
qdepth=127,
verify_disable=True, # <-- disable verify for this worker
time=10
).start()
r = a.close()
logging.info(r)
cpu_id
The cpu_id parameter in ioworker is designed to distribute the workload across different CPU cores. To achieve optimal performance and latency, it is assumed that each ioworker utilizes 100% of a single CPU core’s resources. However, in practice, multiple ioworkers may sometimes be allocated to the same CPU core. When this happens, the combined performance of these ioworkers is nearly the same as that of a single ioworker. This outcome is not desirable when using multiple ioworkers, so the cpu_id parameter is used to enforce the allocation of different ioworkers to separate CPU cores.
For example, in a 4K random read performance test, one ioworker can achieve 1M IOPS, and two ioworkers can achieve 2M IOPS. However, if they are accidentally allocated to the same CPU core, the two ioworkers still only achieve 1M IOPS. By using the cpu_id parameter, we can avoid this situation and ensure that each ioworker is assigned to a different CPU core, thus achieving the expected performance increase.
def test_performance(nvme0, nvme0n1):
qcount = 1
iok = 4
qdepth = 128
random = True
readp = 100
iosize = iok*1024//nvme0n1.sector_size
l = []
for i in range(qcount):
a = nvme0n1.ioworker(io_size=iosize,
lba_align=iosize,
lba_random=random,
cpu_id=i+1,
qdepth=qdepth-1,
read_percentage=readp,
time=10).start()
l.append(a)
io_total = 0
for a in l:
r = a.close()
logging.debug(r)
io_total += (r.io_count_read+r.io_count_nonread)
logging.info("Q %d IOPS: %.3fK, %dMB/s" % (qcount, io_total/10000, io_total/10000*iok))
1.2 Output Parameters
output_io_per_second
Save the number of IOs per second in the form of a list. Default value: None, no data is collected. Below is an example of collecting IOPS per second in a io_per_second list.
def test_ioworker_output_io_per_second(nvme0n1):
io_per_second = []
nvme0n1.ioworker(output_io_per_second=io_per_second,
time=5).start().close()
logging.info(io_per_second)
output_percentile_latency
IO latency is important on both Client SSD and Enterprise SSD. IOWorker use parameter output_percentile_latency to collect latency information of all IO. ioworker can collect IO latency on different percentages in the form of a dictionary. The dictionary key is a percentage and the value is the delay in microseconds (us). Default value: None, no data is collected. The following example demonstrates the latency of 99%, 99.9%, 99.999% IO.
def test_ioworker_output_percentile_latency(nvme0n1):
percentile_latency = dict.fromkeys([99, 99.9, 99.999])
nvme0n1.ioworker(output_percentile_latency=percentile_latency,
time=5).start().close()
logging.info(percentile_latency)
After specifying this parameter, we can obtain the number of IOs on each latency time time point from latency_distribution in returned object. With these data, scripts can draw distribution graph as below.

output_percentile_latency_opcode
IOWorker can tracked the specified opcode commands latency. Default value: None, which tracks the latency of all opcodes. The following example demonstrates only tracking the latency of DSM commands in returned output_percentile_latency.
def test_ioworker_output_percentile_latency_opcode(nvme0n1):
percentile_latency = dict.fromkeys([99, 99.9, 99.999])
nvme0n1.ioworker(op_percentage={2: 40, 9: 30, 1: 30},
output_percentile_latency=percentile_latency,
output_percentile_latency_opcode=9,
time=5).start().close()
logging.info(percentile_latency)
output_cmdlog_list
This parameter collects information about commands that ioworker latest sent and reaped. The information for each command includes the starting LBA, the number of LBAs, the command operator, the send timestamp, the reap timestamp, the return status. These information of commands are recorded in tuples (slba, nlb, opcode, time_sent_us, time_cplt_us, status). The default value is None, which does not collect data. The following scripts can give latest 1000 commands’ information before poweroff happen.
def test_power_cycle_dirty(nvme0, nvme0n1, subsystem):
cmdlog_list = [None]*1000
# 128K random write
with nvme0n1.ioworker(io_size=256,
lba_align=256,
lba_random=True,
read_percentage=30,
slow_latency=2_000_000,
time=15,
qdepth=63,
output_cmdlog_list=cmdlog_list):
# sudden power loss before the ioworker end
time.sleep(5)
subsystem.poweroff()
# power on and reset controller
time.sleep(5)
start = time.time()
subsystem.poweron()
nvme0.reset()
logging.info(cmdlog_list)
cmdlog_error_only
When cmdlog_error_only is True, ioworker only collects the information of error commands into output_cmdlog_list.
1.3 Return Values
IOWorker.close() returns after all IO sent and reaped by ioworker, and it gives a structure includes these parameters:
io_count_read: Counts the read commands executed by the ioworker.io_count_nonread: Tally of non-read commands executed by the ioworker.mseconds: Duration of the ioworker operation in milliseconds.latency_max_us: Maximum command latency, measured in microseconds.error: Error code recorded in case of an IO error.error_cmd: Submission Queue Entry of the command causing an IO error.error_cpl: Completion Queue Entry of the command causing an IO error.cpu_usage: CPU usage during the test to assess host CPU load.latency_average_us: Average latency of all IO commands, in microseconds.latency_distribution: Latency count at each time point up to 1,000,000 microseconds.io_count_write: Number of write commands executed by the ioworker.lba_count_read: Number of LBAs read during the operation.lba_count_nonread: Count of LBAs processed in non-read operations.lba_count_write: Total LBAs written by the ioworker.
Here’s a normal ioworker returned object:
'io_count_read': 10266880,
'io_count_nonread': 0,
'mseconds': 10001,
'latency_max_us': 296,
'error': 0,
'error_cmd': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'error_cpl': [0, 0, 0, 0],
'cpu_usage': 0.8465153484651535,
'latency_average_us': 60,
'test_start_sec': 1669363788,
'test_start_nsec': 314956299,
'latency_distribution': None,
'io_count_write': 0
And this is the return object with error:
'io_count_read': 163,
'io_count_nonread': 0,
'mseconds': 2,
'latency_max_us': 791,
'error': 641,
'error_cmd': [3735554, 1, 0, 0, 0, 0, 400643072, 4, 0, 0, 5, 0, 0, 0, 0, 0],
'error_cpl': [0, 0, 131121, 3305242681],
'cpu_usage': 0.0,
'latency_average_us': 304,
'test_start_sec': 1669364412,
'test_start_nsec': 822867651,
'latency_distribution': None,
'io_count_write': 0
Here is the ioworker error codes and descriptions
| Error Code | Error Name | Description |
|---|---|---|
| 0 | Success | Indicates that no error occurred during the operation. |
| -1 | Init Fail in Pyx | Initialization failed in the underlying Pyx library or hardware abstraction. |
| -2 | IO Size is Larger than MDTS | The I/O request size exceeds the Maximum Data Transfer Size (MDTS) limit. |
| -3 | IO Timeout | The I/O operation did not complete within the expected time limit. |
| -4 | IOWorker Timeout | The ioworker process or thread exceeded its execution time limit. |
| -5 | Buffer Pool Alloc Fail | Failed to allocate memory from the buffer pool for the I/O operation. |
| -6 | IO Cmd Error | An error occurred during the execution of an NVMe I/O command. |
| -7 | Sudden Terminated | The ioworker was unexpectedly terminated before completing its operation. |
| -8 | Slow IO | The I/O operation completed but took significantly longer than expected. |
| -9 | Create Qpair Fail | Failed to create a queue pair (Submission Queue or Completion Queue). |
| -10 | Illegal Sector Size | The sector size specified for the operation is invalid or unsupported. |
1.4 Examples
- 4K Full Disk Sequential Reading:
def test_ioworker_full_disk(nvme0n1): ns_size = nvme0n1.id_data(7, 0) nvme0n1.ioworker(lba_random=False, io_size=8, read_percentage=100, region_end=ns_size).start().close() - 4K full disk random write:
def test_ioworker_qpair(nvme0n1): nvme0n1.ioworker(lba_random=True, io_size=8, read_percentage=0, time=3600).start().close() - Inject reset event during the IO:
def test_reset_controller_reset_ioworker(nvme0, nvme0n1): # issue controller reset while ioworker is running with nvme0n1.ioworker(io_size=8, time=10): time.sleep(5) nvme0.reset()
1.5 Compare with FIO
Both FIO and IOWorker provide many parameters. This table can help us to port FIO tests to PyNVMe3/IOWorker.
| fio parameter | ioworker parameter | Description |
|---|---|---|
bs |
io_size |
Sets the block size for I/O operations. ioworker can provide a single block size or specify multiple sizes through a list or dict. |
ba |
lba_align |
The LBA alignment for I/O. By default, fio aligns with the bs value, whereas ioworker defaults to 1. |
rwmixread |
read_percentage |
Specifies the percentage allocation of read-write mix. |
percentage_random |
lba_random |
Defines the percentage of random versus sequential operations. |
runtime |
time |
Defines the duration of the test run. |
size |
region_start, region_end |
Defines the size or range of the test area. PyNVMe3 can define the start and end LBA addresses of a single continuous region or specify multiple discrete areas through a list parameter. |
iodepth |
qdepth |
Sets the queue depth. |
buffer_pattern |
ptype, pvalue |
Sets the data pattern for the I/O buffer. ioworker fills the data buffer with the specified pattern upon initialization. |
rate_iops |
iops |
Limits the number of I/O operations per second. |
verify |
fio uses verify to check data integrity. PyNVMe3, by default, verifies the data consistency of each LBA with CRC after each read operation. |
|
ioengine |
fio typically selects libaio, while ioworker directly utilizes the higher performance SPDK driver. | |
norandommap |
fio uses norandommap for entirely random reads and writes. PyNVMe3 behaves completely randomly and uses LBA locks to ensure asynchronous I/O mutuality on LBAs, allowing data consistency checks to be performed in most cases. |
|
lba_step |
Specifies the step increment for sequential read/write LBAs. Normally, sequential reads/writes will cover all LBAs continuously, without gaps. However, lba_step can produce sequences with LBA gaps or overlaps. Specifying a negative lba_step can generate sequences with decreasing starting LBA addresses. |
|
op_percentage |
While fio supports only three operations (read, write, trim), ioworker can specify any type of I/O command and its percentage by specifying opcode. |
|
sgl_percentage |
ioworker can use PRP or SGL to represent the address range of data buffers. This parameter specifies the percentage of I/Os using SGL. | |
io_flags |
Specifies the high 16 bits of the 12th command word for all I/Os, including flags like FUA. | |
qprio |
Specifies the queue priority to implement scenarios with Weighted Round Robin arbitration. |
1.6 Performance
When an ioworker‘s return value’s cpu_usage is close to or exceeds 0.9, it indicates that the performance bottleneck is on the host side. To achieve higher performance in this scenario, it’s recommended to employ additional ioworkers. By specifying different cpu_id parameters, different ioworkers can be distributed across various CPUs. Here’s an example illustrating this approach:
l = []
for i in range(qcount):
a = nvme0n1.ioworker(io_size=iosize,
lba_align=iosize,
region_end=region_end,
lba_random=random,
cpu_id=i+1,
qdepth=qdepth-1,
read_percentage=readp,
time=10).start()
l.append(a)
io_total = 0
for a in l:
r = a.close()
logging.info(r)
io_total += (r.io_count_read + r.io_count_nonread)
logging.info("Q %d IOPS: %.3fK, %dMB/s" % (qcount, io_total/10000, io_total/10000 * iok))
In this example, multiple ioworkers are created within a loop, each with a unique cpu_id. This ensures that each ioworker operates on a different CPU, thereby distributing the workload and potentially enhancing overall performance. After starting all ioworkers, their results are aggregated to calculate total I/O operations and performance metrics, such as IOPS and throughput, are logged.
It is often sufficient to use a single ioworker to achieve adequate performance. However, when testing PCIe Gen5 NVMe drives’ random read performance, it is advisable to utilize 2-4 ioworkers. The need for additional ioworkers increases further if verify feature is enabled during read operations.
2. metamode IO
PyNVMe3 provides high-performance NVMe drivers for SSD product testing. But high performance can also lead to a lack of flexibility to test every detail defined in the NVMe specification. In order to cover more details, PyNVMe3 provides metamode to send and receive IO.
Through metamode, the script can directly create IOSQ/IOCQ on system buffer, write SQE into IOSQ, and read CQE from IOCQ.
Metamode requires script development engineers to have a certain understanding of the NVMe specification. But on the other head, metamode also help engineers better understand the command processing flow of NVMe.
The following simple example shows how to write test scripts using metamode following the command processing flow defined by the NVMe specification.

Step1: Host writes command to SQ Entry
cmd_read = SQE(2, 1)
cmd_read.cid = 0
buf = PRP(4096)
cmd_read.prp1 = buf
sq[0] = cmd_read
Step2: The host updates the SQ tail doorbell register. Notify SSD that a new command is pending.
sq.tail = 1
Step3: DUT get SQE from IOSQ
Step4: DUT processing SQE
Step5: DUT writes CQE to IOCQ
Step6: DUT sends interrupt (optional)
step3-6 are all handled by the DUT.
Step7: Host processes Completion entry
cq.wait_pbit(cq.head, 1)
Step8: The host writes the CQ head doorbell to release the completion entry
cq.head = 1
The complete test script is as follows:
def test_metamode_read_command(nvme0):
# Create an IOCQ and IOSQ
cq = IOCQ(nvme0, 1, 10, PRP(10*16))
sq = IOSQ(nvme0, 1, 10, PRP(10*64), cq=cq)
# Step1: Host writes command to SQ Entry
cmd_read = SQE(2, 1)
cmd_read.cid = 0
buf = PRP(4096)
cmd_read.prp1 = buf
sq[0] = cmd_read
# Step2: The host updates the SQ tail doorbell register. Notify SSD that a new command is pending.
sq.tail = 1
# Step7: Host processes Completion entry
cq.wait_pbit(cq.head, 1)
# Step8: The host writes the CQ head doorbell to release the completion entry
cq.head = 1
# print first CQE's status field
logging.info(cq[0].status)
The above script is a little bit more complex, but you can control every detail of the test. metamode also encapsulates a read/write interface to facilitate scripts to send common IOs in metamode. The following test script re-implements the same IO process as the example above.
def test_metamode_example(nvme0):
# Create an IOCQ and IOSQ
cq = IOCQ(nvme0, 1, 10, PRP(10*16))
sq = IOSQ(nvme0, 1, 10, PRP(10*64), cq=cq)
# Step1: Host writes command to SQ Entry
sq.read(cid=0, nsid=1, lba=0, lba_count=1, prp1=PRP(4096))
# Step2: The host updates the SQ tail doorbell register. Notify SSD that a new command is pending.
sq.tail = 1
# Step7: Host processes Completion entry
cq.waitdone(1)
# Step8: The host writes the CQ head doorbell to release the completion entry
cq.head = 1
# print first CQE's status field
logging.info(cq[0].status)
In addition to define the host command processing flow, metamode can also configure any parameters in the IO command. Here are a few examples of scenarios to help you better understand metamode’s capability.
2.1 Customized PRP List
def test_prp_valid_offset_in_prplist(nvme0):
# Create an IOCQ and IOSQ
cq = IOCQ(nvme0, 1, 10, PRP(10*16))
sq = IOSQ(nvme0, 1, 10, PRP(10*64), cq=cq)
# Construct PRP1, set offset to 0x10
buf = PRP(ptype=32, pvalue=0xffffffff)
buf.offset = 0x10
buf.size -= 0x10
# Construct PRP list, and set offset to 0x20
prp_list = PRPList()
prp_list.offset = 0x20
prp_list.size -= 0x20
# Fill 8 PRP entries into the PRP list
for i in range(8):
prp_list[i] = PRP(ptype=32, pvalue=0xffffffff)
# Construct a read command with the above PRP and PRP list
cmd = SQE(2, 1)
cmd.prp1 = buf
cmd.prp2 = prp_list
# Set the cdw12 of the command to 1
cmd[12] = 1
# Write command to SQ Entry, update SQ tail doorbell
sq[0] = cmd
sq.tail = 1
# Wait for the CQ pbit to flip
cq.wait_pbit(0, 1)
# Updated CQ head doorbell
cq.head = 1
2.2 Asymmetric SQ and CQ
def test_multi_sq_and_single_cq(nvme0):
# Create 3 IOSQ, and mapping to a single IOCQ
cq = IOCQ(nvme0, 1, 10, PRP(10*16))
sq1 = IOSQ(nvme0, 1, 10, PRP(10*64), cq=cq)
sq2 = IOSQ(nvme0, 2, 10, PRP(10*64), cq=cq)
sq3 = IOSQ(nvme0, 3, 10, PRP(10*64), cq=cq)
# Construct a write command
cmd_write = SQE(1, 1)
cmd_write.cid = 1
cmd_write[12] = 1<<30
# Set buffer of the write command
buf2 = PRP(4096)
buf2[10:21] = b'hello world'
cmd_write.prp1 = buf2
# Construct a read command
cmd_read1 = SQE(2, 1)
buf1 = PRP(4096)
cmd_read1.prp1 = buf1
cmd_read1.cid = 2
# Construct another read command
cmd_read2 = SQE(2, 1)
buf3 = PRP(4096)
cmd_read2.prp1 = buf3
cmd_read2.cid = 3
# place the commands into the SQs
sq1[0] = cmd_write
sq2[0] = cmd_read1
sq3[0] = cmd_read2
# Update SQ1 Tail doorbell to 1
sq1.tail = 1
# Wait for the Phase Tag of the head of CQE to be 1
cq.wait_pbit(cq.head, 1)
# Update CQ Head doorbell to 1
cq.head = 1
# Update SQ2 Tail doorbell to 1, and wait for the command completion
sq2.tail = 1
cq.wait_pbit(cq.head, 1)
# Update CQ Head doorbell to 2
cq.head = 2
# Update SQ3 Tail doorbell to 1
sq3.tail = 1
cq.wait_pbit(cq.head, 1)
# update CQ Head doorbell to 3
cq.head = 3
# Get the command represented by the first entry in CQ to complete the whole and command id
logging.info(cq[0].status)
logging.info(cq[0].cid)
2.3 Inject conflict cid error
Normally, NVMe device driver assign CID to each command, so the cid is always correct, and test script has no way to inject errors. With metamode provided by PyNVMe3, scripts can specify CID in each command. So, we can deliberately send multiple commands with the same CID to exam the DUT’s error handling.
def test_same_cid(nvme0):
# Create IOCQ/IOSQ with the depth of 10
cq = IOCQ(nvme0, 1, 10, PRP(10*16))
sq = IOSQ(nvme0, 1, 10, PRP(10*64), cq=cq)
# Write two commands, and both cid are 1
cmd_read1 = SQE(2, 1)
buf1 = PRP(4096)
cmd_read1.prp1 = buf1
cmd_read1.cid = 1
cmd_read2 = SQE(2, 1)
buf2 = PRP(4096)
cmd_read2.prp1 = buf2
cmd_read2.cid = 1
# fill two SQE to SQ
sq[0] = cmd_read1
sq[1] = cmd_read2
# Updated SQ tail doorbell
sq.tail = 2
# Wait for the Phase Tag in entry 1 in CQ to be 1
cq.wait_pbit(1, 1)
# Update CQ Head doorbell to 1
cq.head = 2
# Get the command completion status and command id indicated by the second entry in CQ
logging.info(cq[1].status)
logging.info(cq[1].cid)
2.4 Inject invalid doorbell errors
def test_aer_doorbell_out_of_range(nvme0, buf):
# Send an AER command
nvme0.aer()
# Create a pair of CQ and SQ with a queue depth of 16
cq = IOCQ(nvme0, 4, 16, PRP(16*16))
sq = IOSQ(nvme0, 4, 16, PRP(16*64), cq.id)
# Update the SQ tail to 20, which exceeds the SQ depth
with pytest.warns(UserWarning, match="AER notification is triggered: 0x10100"):
sq.tail = 20
time.sleep(0.1)
nvme0.getfeatures(7).waitdone()
# Send get logpage command to clear asynchronous events
nvme0.getlogpage(1, buf, 512).waitdone()
#Delete SQ and CQ
sq.delete()
cq.delete()
By sending IO through metamode, scripts can directly read and write various metadata structures defined by the NVMe protocol, including: IOSQ, IOCQ, SQE, CQE, doorbell, PRP, PRPList, and various SGLs. The script creates and accesses the shared memory with the NVMe DUT directly, without any restrictions form OS or driver.

