PyNVMe3 API

PyNVMe3 API

Last Modified: September 5, 2025

Copyright © 2020-2025 GENG YUN Technology Pte. Ltd.
All Rights Reserved.


0. Overview

This chapter documents the core classes and APIs of PyNVMe3, the Python-based NVMe test framework. Users will find step-by-step explanations and code examples for building reliable test cases. AI agents can use the structured descriptions and idiomatic Python snippets to generate correct and efficient scripts automatically.

PyNVMe3 closely follows the NVMe specification, while exposing the hardware through intuitive Python objects. Each object corresponds to a key part of the NVMe/PCIe architecture:

  • Buffer – contiguous DMA memory for PRPs, SGLs, queues, and user data.
  • Pcie – low-level access to PCIe configuration, BAR registers, resets, and link states.
  • Controller – the NVMe controller abstraction, supporting initialization, admin commands, and advanced features.
  • Namespace – encapsulates namespaces, I/O commands, and data verification.
  • Qpair – combined I/O submission and completion queue, essential for data path testing.
  • Subsystem – optional wrapper for power and reset control, allowing integration with external power controllers such as Quarch PAM or custom hardware.

Throughout this chapter, you will find:

  • Detailed explanations of each class and method, linked back to the NVMe specification.
  • Executable code snippets showing how to use the APIs in realistic tests.
  • Best practices for test robustness, such as using fixtures for cleanup, enabling data verification, and handling resets.

By mastering these APIs, you can write scripts that are concise, reproducible, and specification-compliant, whether your goal is functional validation, stress testing, or automated compliance checks.

1. Buffer

The Buffer class is one of the most fundamental building blocks of PyNVMe3. It provides a safe and convenient interface for allocating and managing physically contiguous memory that can be directly used in DMA (Direct Memory Access) transactions between the host and the NVMe device.

Because the NVMe and PCIe protocols are based on a shared-memory architecture, nearly every test script needs to allocate buffers for user data, I/O queues, PRP/SGL tables, PRP lists, log pages, or feature structures. The Buffer class encapsulates these shared memory regions and exposes a Pythonic API that makes it easy to manipulate memory contents while remaining fully consistent with NVMe specification requirements.

1.1 Construction

Buffer(
    length: int,
    name: str = 'buffer',
    pvalue: int = 0,
    ptype: int = 0,
    offset: int = 0,
    nvme: Controller = None,
)

Parameters:

  • length (int)
    The total size of the buffer in bytes. The memory is allocated from a physically contiguous and page-aligned region, making it DMA-safe.
  • name (str)
    Logical name of the buffer, useful for identifying it in logs and debug dumps. Default: "buffer".
  • pvalue (int)
    Initial value for the data pattern. Its meaning depends on ptype. Default: 0.
  • ptype (int)
    Data pattern type used for initialization. See 1.2 Data Patterns for supported values.
  • offset (int)
    Byte offset into the buffer at which the effective region begins. The buffer’s base physical address starts at zero, but the offset allows test scripts to simulate unaligned or partial buffer usage. Default: 0.
  • nvme (Controller)
    If specified, the buffer is allocated in the Controller Memory Buffer (CMB) instead of host DRAM. This is useful for testing controller-managed memory regions. Default: None.

Buffer objects allocate memory from the reserved huge-page pool. You must run make setup beforehand to prepare these huge pages. If memory allocation fails (for example, due to insufficient huge pages), the Buffer constructor raises an exception immediately.

This mechanism ensures buffers always reside in contiguous physical memory and are not subject to OS paging or swapping.

1.2 Data Patterns

By default, a buffer is initialized with zeros. Using data patterns, you can easily prefill buffers with deterministic or pseudo-random contents for reproducible test results.

ptype Interpretation of pvalue
0 Constant: 0 → all zeros, 1 → all ones
32 32-bit repeating value
0xbeef Random data with compression ratio (0 = all zero, 100 = fully random)
0xf17e File path (string); buffer is filled from the specified file
0x1234 Incrementing 16-bit sequence starting from pvalue
0x4321 Decrementing 16-bit sequence starting from pvalue

Example: Using data patterns

def test_write_buffer(nvme0n1, qpair):
    write_buf = Buffer(512, ptype=0xbeef, pvalue=100)
    nvme0n1.write(qpair, write_buf, 0, 1).waitdone()

Here, a 512-byte buffer is allocated and filled with fully random data (compression ratio 100). The buffer is then written to namespace nvme0n1 at LBA 0.

⚠️ Note: The buffer must be at least as large as the data to transfer. Otherwise, PyNVMe3 will reject the request and raise an error to prevent invalid DMA access.

In addition to initialization, a buffer’s contents can be overwritten at runtime with a new pattern:

# Refill with a repeating 32-bit pattern
write_buf.fill_pattern(pvalue=0x55, ptype=32)
nvme0n1.write(qpair, write_buf, 0, 1).waitdone()

1.3 Memory Layout

The Buffer class exposes three closely related properties: length, offset, and size. Together, these define how the buffer is presented to the device.

 0             offset                             length
 |===============|===================================|
                 |<========= size =========>|
  • length
    Total allocated physical memory size in bytes. Fixed at creation time.
  • offset
    Starting point within the buffer (relative to its base physical address) that is exposed to the device. Adjusting the offset allows simulation of non-aligned PRPs or SGLs.
  • size
    Effective data region length, starting at offset. This is the portion of the buffer that the device is expected to access.

PyNVMe3 always generates PRP or SGL entries based on the (offset, size) pair, not the entire length. By default, offset = 0 and size = length.

⚠️ Important: offset + size must not exceed length. Otherwise, the device would attempt to access memory beyond the allocated region, which may cause test failures or memory corruption.

Example: Forcing a non-aligned PRP mapping

def test_prp_page_offset(nvme0n1, qpair):
    # Allocate 515 bytes (512 + 3)
    buf = Buffer(512 + 3)

    # Expose only the last 512 bytes, starting at offset 3
    buf.offset = 3
    buf.size = 512

    # Submit a read command into this adjusted buffer
    nvme0n1.read(qpair, buf, 0, 1).waitdone()

In this case, the PRP entry points to an address offset by 3 bytes from a page boundary. This technique is often used to stress-test device PRP handling and ensure full compliance with the NVMe specification.

1.4 Helper Functions

The Buffer class provides helper functions to construct data structures required by NVMe commands:

  • Dataset Management (DSM)
    dsm_buf = Buffer(4096)
    dsm_buf.set_dsm_range(0, 0, 1)  # LBA 0, 1 block
    dsm_buf.set_dsm_range(1, 2, 1)  # LBA 2, 1 block
    nvme0n1.dsm(qpair, dsm_buf, 2).waitdone()
    
  • Copy Command
    copy_buf = Buffer(4096)
    copy_buf.set_copy_range(0, lba=0, lba_count=8)
    nvme0n1.copy(qpair, copy_buf, 1).waitdone()
    
  • Controller List
    ctrl_buf = Buffer(4096)
    ctrl_buf.set_controller_list(1, 2, 3)
    

1.5 Host Memory Buffer (HMB)

A buffer can be registered as HMB for use by the device. Once registered, the controller may access this host memory directly for internal caching or metadata storage.

When using HMB in PyNVMe3, the script must hold references to all HMB buffers for as long as the device relies on them. If the Python garbage collector reclaims an HMB buffer, the physical memory may be reused for other purposes, corrupting the HMB region from the device’s perspective.

PyNVMe3 provides dedicated utilities (scripts/hmb.py) and conformance tests (scripts/conformance/03_features/hmb) to validate HMB functionality, including allocation, registration, data integrity checks, and robustness testing under stress.

1.6 Controller Memory Buffer (CMB)

When the nvme parameter is specified during construction, the buffer is allocated from the device’s Controller Memory Buffer (CMB). This memory is exposed through the PCIe BAR and can be used for queues, PRP/SGL entries, or user data.

def test_cmb_io_queue(nvme0):
    cmb_buf = Buffer(64 * 10, nvme=nvme0)
    cq = IOCQ(nvme0, 1, 10, PRP(16 * 10))
    sq = IOSQ(nvme0, 1, 10, cmb_buf, cq=cq)

1.7 Data Inspection and Manipulation

  • Indexing and slicing
    buf = Buffer(16)
    buf[0] = 0x5a
    buf[1:4] = [1, 2, 3]
    
  • Dumping contents
    logging.info(buf.dump(16))  # Hex dump of first 16 bytes
    
  • Extracting structured fields
    Parse values directly from buffer fields, following NVMe specification offsets.

    id_buf = Buffer(4096)
    nvme0.identify(id_buf).waitdone()
    vid = id_buf.data(1, 0)                   # PCI Vendor ID
    model = id_buf.data(63, 24, type=str)    # Model number string
    
  • crc8()
    Returns an 8-bit checksum of the buffer contents.

    checksum = buf.crc8()
    
  • Direct comparison
    Buffers can be compared with equality operators. Equality means byte-for-byte identical contents.

    buf1 = Buffer(512, ptype=0x1234, pvalue=1)
    buf2 = Buffer(512, ptype=0x1234, pvalue=1)
    assert buf1 == buf2
    

1.8 Physical Addressing

The phys_addr property exposes the physical DMA address of the buffer starting at its current offset. By adjusting offset, a single buffer can represent different effective DMA regions, allowing flexible PRP/SGL construction.

buf = Buffer(520)
buf.offset = 8
buf.size = 512
logging.info(hex(buf.phys_addr))

1.9 Summary

The Buffer class is central to PyNVMe3 because it bridges Python test scripts with the shared-memory architecture of NVMe. By providing guaranteed contiguous memory backed by huge pages, flexible control over effective regions, built-in helpers for command data structures, and support for advanced features like HMB and CMB, it allows test developers to model nearly every memory interaction defined in the NVMe specification. In practice, most PyNVMe3 tests—from simple read/write operations to complex SGL chains and conformance checks—are built on buffers. Mastering Buffer therefore equips users to manage shared memory reliably and to interact directly with NVMe SSDs, including in advanced modes such as metamode.

2. Pcie

The NVMe protocol is layered on top of PCI Express (PCIe), so every NVMe device under test (DUT) is also a PCIe endpoint. The Pcie class in PyNVMe3 provides direct access to this PCIe layer, enabling test scripts to configure the device at a low level, perform resets, control link parameters, and inspect registers.

Whereas the Controller class focuses on NVMe-specific features, the Pcie class exposes the raw PCIe interface. This separation allows scripts to perform initialization, fault injection, and compliance tests that would not be possible if relying only on operating system drivers.

2.1 Construction

pcie = Pcie(
    addr: str,               # BDF address of PCIe device
    vf: int = 0,             # SR-IOV virtual function (default: 0)
    msi_only: bool = False   # Use MSI instead of MSI-X
)

This constructor binds PyNVMe3 directly to a physical PCIe device, a virtual function, or a virtualized PCIe endpoint. The msi_only flag is useful when testing environments that do not support MSI-X.

2.2 Configuration Space Access

The Pcie object supports reading and writing PCIe configuration space using either subscript-style access or dedicated APIs at different granularities:

  • cfg_byte_read/write(addr)
  • cfg_word_read/write(addr)
  • cfg_dword_read/write(addr)
# Read Vendor ID from config space (offset 0)
vid = pcie.cfg_word_read(0x0)

# Enable Bus Mastering by updating the Command register
cmd = pcie.cfg_word_read(0x4)
pcie.cfg_word_write(0x4, cmd | 0x4)

This enables full control of PCIe configuration registers, which is essential for tests involving enumeration, error injection, or capability discovery.

2.3 Capability Discovery

The cap_offset(cap_id, extend=False) method locates the offset of a specific PCIe capability within configuration space:

exp_cap = pcie.cap_offset(0x10)  # PCI Express Capability

This is commonly used to find advanced capabilities such as MSI-X, power management, or link features defined by the PCIe specification.

2.4 BAR Space Access

The Pcie class provides APIs to access the DUT’s BAR0 memory-mapped registers directly:

  • mem_byte_read/write(addr)
  • mem_word_read/write(addr)
  • mem_dword_read/write(addr)
  • mem_qword_read/write(addr)
def test_bar0_access(pcie, nvme0):
    cap_lo = pcie.mem_qword_read(0)
    assert cap_lo == nvme0.cap

This allows scripts to validate BAR mappings and check register contents against the NVMe controller abstraction.

2.5 Reset Operations

PyNVMe3 supports several types of PCIe resets:

  • Hot Reset: Pcie.reset()
  • Function Level Reset (FLR): Pcie.flr()
  • Link Disable/Enable: Pcie.link_disable_enable()

After any of these resets, the controller must be reinitialized with Controller.reset() before further NVMe operations can proceed.

Example: FLR reset during workload

def test_pcie_flr_with_io(pcie, nvme0, nvme0n1, qpair):
    # Launch background I/O workload
    nvme0n1.ioworker(io_size=8, lba_random=True,
                     read_percentage=50, time=10).start()
    
    # Perform FLR reset while I/O is active
    pcie.flr()
    nvme0.reset()

    # Resume workload after reset
    nvme0n1.ioworker(io_size=8, lba_random=True,
                     read_percentage=100, time=5).start().close()

This test stresses the DUT’s ability to recover from a reset while handling active traffic, verifying compliance with reset recovery requirements.

2.6 Attributes

The Pcie object exposes attributes that can be read or modified directly:

  • speed – The current PCIe link speed (Gen1 through Gen5).
  • aspm – The Active State Power Management policy in effect.
  • power_state – The PCIe device power state (e.g., D0, D3hot).

These attributes allow validation of link training, low-power entry/exit behavior, and device compliance with power management requirements.

2.7 Resource Management

  • close() – Releases the PCIe object and frees resources.

It is critical to close the Pcie object at the end of every test session. If the object is not released, memory mappings and PCIe resources may remain allocated, potentially interfering with subsequent tests or leaving the DUT in an undefined state.

Because tests may terminate unexpectedly (for example, when an assert fails), relying on manual cleanup is risky. A more robust approach is to manage the Pcie lifecycle through pytest fixtures, which guarantee both setup and teardown.

Example: Using pytest fixture for safe setup/teardown

import pytest
from nvme import Pcie

@pytest.fixture(scope="function")
def pcie():
    dev = Pcie("0000:01:00.0")
    yield dev
    dev.close()

def test_flr_reset(pcie, nvme0):
    pcie.reset()
    nvme0.reset()

Here, the fixture ensures the Pcie object is always released, regardless of whether the test passes, fails, or raises an exception. This prevents resource leaks and keeps the test environment clean across the entire test suite.

2.8 Summary

Beyond standard NVMe testing, the Pcie class offers unique advantages that are especially valuable in controller development and early bring-up. Because it operates directly at the PCIe level, tests can be executed even when the NVMe controller firmware or initialization sequence is incomplete. This makes it possible to validate link training, configuration space access, resets, and BAR mappings at the very earliest stages of silicon or firmware development.

Moreover, the Pcie class is not limited to NVMe SSDs. Any PCIe-compliant device can be exercised through this interface. Internally, PyNVMe3 leverages the same abstraction to support non-NVMe drivers such as our VDM (Vendor Defined Message) driver and DoE (Data Object Exchange) driver. By building these features on top of the Pcie object, PyNVMe3 avoids the complexity of kernel-level implementations while providing a flexible and scriptable testing environment.

In short, the Pcie class extends PyNVMe3 beyond NVMe-specific testing, enabling both early-stage controller validation and support for custom PCIe-based protocols.

3. Controller

The Controller class represents the NVMe controller as defined in the NVMe specification. In PyNVMe3, a Controller object is always created on top of a Pcie object. Since the NVMe specification distinguishes clearly between Controller and Namespace, PyNVMe3 provides corresponding Controller and Namespace classes to mirror this architecture.

Multiple Controller objects can coexist in the same test environment. This flexibility makes it possible to validate advanced scenarios such as multi-device systems, multi-port configurations, or SR-IOV virtualization. This section introduces the Controller class in detail; Namespace is covered separately.

3.1 Initialization

The simplest way to create a controller is from a Pcie object:

def test_default_nvme_init(pcie):
    nvme0 = Controller(pcie)
    nvme0.getfeatures(7).waitdone()

In most environments, PyNVMe3 also provides convenient fixtures:

def test_fixture_nvme_init(nvme0):
    nvme0.getfeatures(7).waitdone()

By default, PyNVMe3 performs the standard NVMe initialization sequence as defined in the specification. This includes:

  1. Disabling the controller (CC.EN = 0).
  2. Waiting until CSTS.RDY = 0.
  3. Setting up admin queue registers.
  4. Writing to the CC register.
  5. Enabling the controller (CC.EN = 1).
  6. Waiting until CSTS.RDY = 1.
  7. Issuing Identify commands for the controller and namespaces.
  8. Initializing I/O queues.
  9. Submitting AER commands up to the controller’s advertised limit.

For specialized tests, this process can be overridden by providing a custom initialization function:

def nvme_init_user_defined(nvme0):
    # 1. Disable controller
    nvme0[0x14] = 0
    nvme0.wait_csts(rdy=False)

    # 2. Initialize admin queue
    if nvme0.init_adminq() < 0:
        raise NvmeEnumerateError("failed to initialize admin queue")

    # 3. Enable controller
    nvme0[0x14] = 0x00460001
    nvme0.wait_csts(rdy=True)

    # 4. Identify controller and namespaces
    id_buf = Buffer(4096)
    nvme0.identify(id_buf).waitdone()
    if nvme0.init_ns() < 0:
        raise NvmeEnumerateError("namespace initialization failed")

    # 5. Configure queues
    nvme0.setfeatures(7, cdw11=0xfffefffe).waitdone()
    cdw0 = nvme0.getfeatures(7).waitdone()
    nvme0.init_queues(cdw0)

    # 6. Register AERs
    aerl = nvme0.id_data(259) + 1
    for _ in range(aerl):
        nvme0.aer()

def test_custom_init(pcie):
    nvme0 = Controller(pcie, nvme_init_func=nvme_init_user_defined)

⚠️ Note:

  • Use the provided template when writing custom initialization.
  • Always maintain the correct order of operations.
  • Avoid changing timeout values inside initialization, as this can destabilize the test flow.

3.2 Admin Commands

The Controller class provides high-level wrappers for most admin commands. These operations are asynchronous: a command is enqueued immediately, and completion must be checked with waitdone().

def test_admin_commands(nvme0, buf):
    # Retrieve the SMART log page
    nvme0.getlogpage(2, buf, 512).waitdone()

    # Send a generic admin command (opcode = 0x0A)
    nvme0.send_cmd(0x0A, nsid=1, cdw10=7).waitdone()

Callbacks

For advanced workflows, each command can register a callback function that processes the completion entry (CQE):

def test_callback(nvme0):
    cdw0 = 0
    def _cb(cqe):
        nonlocal cdw0
        cdw0 = cqe[0]
        logging.info("getfeatures completed")

    nvme0.getfeatures(7, cb=_cb).waitdone()
    logging.info("Number of queues = %d", (cdw0 & 0xffff) + 1)

⚠️ Warning: Do not call waitdone() inside a callback to prevent deadlocks.

For simple use cases, waitdone() returns CQE dword 0 directly:

def test_num_queues(nvme0):
    cdw0 = nvme0.getfeatures(7).waitdone()
    num_of_queue = (cdw0 & 0xffff) + 1

The controller also records the cid and latency for the most recent command:

nvme0.format().waitdone()
logging.info("last cid: %d", nvme0.latest_cid)
logging.info("last latency (us): %d", nvme0.latest_latency)

3.3 Asynchronous Event Requests (AER)

Asynchronous Event Requests are special admin commands that only complete when specific events occur (e.g., errors, namespace changes).

PyNVMe3 handles AERs automatically:

  • Do not call waitdone() on an AER directly.
  • If an AER CQE is encountered, PyNVMe3 logs a warning, re-submits another AER, and continues processing.
  • To flush pending AERs, issue a lightweight admin command such as getfeatures().
def test_aer_handling(nvme0):
    cq = IOCQ(nvme0, 1, 10, PRP())
    sq = IOSQ(nvme0, 1, 10, PRP(), cq=cq)

    with pytest.warns(UserWarning, match="AER notification is triggered"):
        sq.tail = 10
        time.sleep(0.1)
        nvme0.getfeatures(7).waitdone()

    sq.delete()
    cq.delete()

3.4 Other Methods

Besides raw admin commands, the controller exposes high-level utilities:

  • Firmware download and reset
    nvme0.downfw("firmware.bin")
    nvme0.reset()
    
  • Reset during workload
    Ensures controller re-initializes correctly while I/O is active.

    with nvme0n1.ioworker(io_size=1, iops=1000, time=10):
        time.sleep(5)
        nvme0[0x14] = 0   # deassert CC.EN
        nvme0.wait_csts(rdy=False)
        nvme0.reset()
    
  • Opcode support check
    if nvme0.supports(0x80):
        nvme0.format().waitdone()
    
  • Query LBA format ID
    fid = nvme0.get_lba_format(data_size=512, meta_size=0, nsid=1)
    
  • Identify data via fixtures
    def test_identify(id_ctrl, id_ns):
        logging.info(id_ctrl.VID)
        logging.info(id_ctrl.SN)
        logging.info(id_ns.NSZE)
    
  • Command Timeout

    By default, each command times out after 10 seconds. Timeouts can be adjusted globally or per-opcode:

    nvme0.timeout = 10_000  # Global: 10s
    nvme0.set_timeout_ms(opcode=0x80, msec=30_000)  # Format: 30s
    

    After a controller reset, all timeouts revert to defaults.

  • Command Log

    The command log records submitted SQEs and completed CQEs with precise timestamps. This is invaluable for diagnosing failures and reproducing timing issues.

    def test_cmdlog(nvme0):
        for c in nvme0.cmdlog_merged(50):
            logging.info(c)
    

    It collects SQE and CQE from all the queues and order by SQE timestamps.

  • MDTS

    The controller enforces its declared MDTS. PyNVMe3 supports up to 2 MB by default; transfers beyond this limit require metamode.

    mdts = nvme0.mdts // nvme0n1.sector_size
    buf = Buffer((mdts + 1) * nvme0n1.sector_size)
    nvme0n1.read(qpair, buf, 0, mdts + 1).waitdone()
    

3.5 Lazy Doorbell Policy

Normally, the Admin SQ doorbell is updated after every submission. If lazy_doorbell=True is passed to init_adminq(), the driver postpones doorbell updates until waitdone() is called, batching multiple submissions into a single MMIO write.

nvme0.getfeatures(7)
nvme0.getfeatures(7)
nvme0.getfeatures(7)
nvme0.waitdone(3)  # one doorbell update for three commands

3.6 Summary

The Controller class is the centerpiece of PyNVMe3, providing the essential bridge between raw PCIe access and high-level NVMe operations. It encapsulates initialization, command submission, event handling, and advanced features in a way that is both faithful to the NVMe specification and flexible for custom test development. With built-in support for asynchronous commands, AER re-registration, firmware updates, timeout management, and debugging tools such as command logging, the class enables test developers to validate controllers under normal workloads, stress conditions, and failure scenarios. Mastering the Controller API is therefore fundamental for building robust, specification-compliant NVMe validation scripts.

4. Namespace

The Namespace class represents an NVMe namespace as defined by the NVMe specification. In PyNVMe3, a namespace is always created on top of a Controller object. Each namespace object encapsulates its own identity, size, and command set.

When creating a Namespace object manually, you must explicitly close it after use to free resources:

nvme0n1 = Namespace(nvme0)
# Perform operations...
nvme0n1.close()

In practice, it is strongly recommended to use the nvme0n1 fixture provided by PyNVMe3. Fixtures ensure that resources are automatically cleaned up by pytest, even if a test fails early due to an assert.

4.1 I/O Commands

Namespaces are primarily used to send I/O commands, such as reads, writes, and compares. The API mirrors the asynchronous style of admin commands, but requires an explicit Qpair to submit requests. Each command produces a CQE that must be reaped using Qpair.waitdone().

For example, a simple write to LBA 0:

def test_write_to_lba0(nvme0n1, qpair, buf):
    # Write a buffer into namespace LBA 0
    nvme0n1.write(qpair, buf, 0).waitdone()

Additional I/O commands supported:

  • Read
    nvme0n1.read(qpair, buf, lba=0, lba_count=8).waitdone()
    
  • Compare (data verification without transfer)
    nvme0n1.compare(qpair, buf, lba=0).waitdone()
    
  • Flush (ensure all cached writes are persisted)
    nvme0n1.flush(qpair).waitdone()
    
  • Write Uncorrectable (mark LBA range as invalid)
    nvme0n1.write_uncor(qpair, lba=100, lba_count=4).waitdone()
    
  • Verify (check data integrity against existing LBAs)
    nvme0n1.verify(qpair, lba=200, lba_count=16).waitdone()
    

All of these commands support the same features as admin commands: callbacks, timeout adjustments, and CQE inspection.

4.2 Helper Methods

In addition to I/O commands, the Namespace class provides helper methods that abstract common operations. Unlike I/O commands, these methods are synchronous utilities and do not require Qpair.waitdone().

  1. Format
    Controller.format() is asynchronous, but Namespace.format() is a simplified helper that automatically sets an appropriate timeout and waits for completion. This is especially useful when formatting is just one step in a larger test.

    def test_namespace_format(nvme0n1):
        # Format with default parameters
        nvme0n1.format()
    
  2. Opcode Support
    Check if the device supports a given I/O opcode:

    if nvme0n1.supports(0x02):  # Compare
        nvme0n1.compare(qpair, buf, 0).waitdone()
    
  3. Timeout Control
    Adjust the timeout for specific I/O command’s opcodes at the namespace level:

    nvme0n1.set_timeout_ms(opcode=0x02, msec=5000)  # Compare must finish within 5s
    
  4. Get LBA Format ID
    Query the correct format ID for given data and metadata sizes:

    fid = nvme0n1.get_lba_format(data_size=4096, meta_size=0)
    logging.info(f"LBA Format ID = {fid}")
    

4.3 Identify and Properties

Namespaces have their own Identify data, exposed through the id_ns fixture:

def test_identify_namespace(id_ns):
    logging.info(f"Namespace Size = {id_ns.NSZE}")
    logging.info(f"Capacity       = {id_ns.NCAP}")
    logging.info(f"LBA Format     = {id_ns.LBAF}")

The Namespace object also provides commonly used read-only properties:

  • nsid – the namespace ID.
  • capacity – usable capacity in LBAs.
  • sector_size – logical block size.

These shortcuts reduce the need for repeated parsing of identify data.

Here’s a polished and restructured version of your Data Verification section. I’ve focused on professional English grammar and syntax, added logical flow, and grouped related ideas together for clarity:

4.4 Data Verification

PyNVMe3 emphasizes not only performance but also data consistency validation, which is the most fundamental requirement of any storage device. To ensure this, PyNVMe3 integrates a transparent data verification mechanism that checks consistency automatically in most I/O operations. We strongly recommend enabling verification in all test cases, except for pure performance benchmarks.

Basic Example

def test_write_read(nvme0n1, qpair, buf, verify):
    # Write buffer to LBA 0
    nvme0n1.write(qpair, buf, 0).waitdone()
    # Read back LBA 0
    nvme0n1.read(qpair, buf, 0).waitdone()

In this script, the verify fixture activates automatic data checking. After the read completes, PyNVMe3 compares the retrieved data against the data that was just written. This process is transparent to the user—only the verify fixture needs to be added.

⚠️ Note: The verify fixture is unrelated to the NVMe Verify command. PyNVMe3 performs validation internally.

CRC-Based Verification

PyNVMe3 validates data by computing CRC checksums per LBA:

  • When a write completes, the CRC of the LBA is computed and stored in host DRAM.
  • When a read completes, the CRC is recalculated and compared with the stored value.

This provides a lightweight but effective data integrity check. Each LBA consumes one byte of CRC metadata in huge-page memory. For example, verifying a 4 TB namespace with 512-byte LBAs requires ~8 GB of reserved huge-page memory. Users must run make setup with enough huge pages allocated to enable verification.

If insufficient memory is available, PyNVMe3 issues a warning and disables verification. Alternatively, you can restrict the verification scope using the nlba_verify parameter when creating a Namespace object. However, for formal tests, it is preferable to provision sufficient DRAM and huge-page memory.

Injected Metadata for Uniqueness

CRC alone may not detect certain issues (e.g., when all LBAs are filled with identical data). To increase robustness, PyNVMe3 injects metadata into each LBA by default:

  • First 8 bytes – Encoded with (nsid, LBA), ensuring each LBA contains unique content.
  • Last 8 bytes – A global token that increments across all writes, ensuring versioning and detecting stale data.

This mechanism is conceptually similar to NVMe’s Protection Information, but PyNVMe3 embeds the metadata directly into the data area rather than metadata space.

If necessary, injected metadata can be disabled via:

nvme0n1.inject_write_buffer_disable()

CRC verification continues to function even with injection disabled.

Persisting CRC Data

By default, CRC data exists only in memory and is lost when the test ends. To preserve verification state across runs, scripts can export and import CRC data:

nvme0n1.save_crc("crc_snapshot.bin")
# Later test session
nvme0n1.load_crc("crc_snapshot.bin")

This makes data verification persistent across multiple test executions.

LBA Locking

Because NVMe commands are asynchronous, the execution order of overlapping I/Os is undefined. For example, multiple writes to the same LBA may complete out of order, producing indeterminate results during verification.

To prevent this, PyNVMe3 enforces per-LBA locks:

  • Before issuing an I/O, PyNVMe3 checks if the target LBAs are locked.
  • If free, the LBAs are locked, the command is issued, and they are unlocked after completion.
  • This ensures that only one I/O operates on a given LBA at any moment.

🔒 LBA locks apply only within a single process. They do not extend across multiple processes because inter-process locking would be prohibitively expensive. If multiple ioworkers access overlapping LBAs, nondeterministic results may occur. In such cases, divide LBAs among workers or disable verification for stress testing.

Scope of Verification

CRC checks and LBA locks apply broadly:

  • Standard I/O commands (read, write, compare, flush, verify).
  • Media-modifying commands (format, trim/DSM, sanitize, append in ZNS).

⚠️ For large-scale commands (e.g., DSM over large ranges), LBA lock checks may block execution for a long time. In such scenarios, consider reducing the CRC coverage via nlba_verify during namespace creation.

End-to-End Example

def test_verify_integrity(nvme0n1, qpair, buf, verify):
    # Step 1: Perform sequential writes
    nvme0n1.ioworker(io_size=8, lba_random=False,
                     read_percentage=0, qdepth=2,
                     io_count=1000).start().close()

    # Step 2: Overwrite specific LBA
    nvme0n1.write(qpair, buf, 0).waitdone()

    # Step 3: Read back data for verification
    nvme0n1.ioworker(io_size=8, lba_random=False,
                     read_percentage=100,
                     io_count=1000).start().close()

In this example, all written data is validated transparently using CRCs, injected metadata, and LBA locks.

PyNVMe3’s data verification framework ensures end-to-end integrity checking across all supported commands. By combining CRC checks, injected metadata, LBA locks, and persistence options, it provides a comprehensive mechanism to catch data corruption, stale data, or ordering errors. While performance may be impacted, especially for large tests, enabling verification is essential for meaningful functional and reliability validation.

4.5 Summary

The Namespace class provides the essential abstraction for exercising I/O operations in PyNVMe3. Built directly on top of the Controller, it exposes both low-level NVMe commands (read, write, compare, flush, verify, write-uncorrectable) and high-level helpers such as format, get_lba_format, and supports. Through fixtures like nvme0n1 and id_ns, scripts gain safe lifecycle management and convenient access to identify data. Advanced capabilities—including data verification via CRCs and injected metadata, persistent verification state, LBA locking, and Zoned Namespace (ZNS) command support—enable developers to validate data integrity and functional correctness under realistic workloads. By combining usability with rigorous checking, the Namespace class ensures that test scripts can probe both the performance and reliability dimensions of NVMe devices with confidence.
Great — here’s a complete and polished Qpair section that integrates your draft with all the missing APIs I found in nvme.py. It follows the same style and structure we used for Controller and Namespace:

5. Qpair

In PyNVMe3, I/O commands must be submitted through a queue pair (Qpair). A Qpair combines a Submission Queue (SQ) and a Completion Queue (CQ) into a single object, mirroring the NVMe specification.

  • Admin Qpair: Embedded inside the Controller object, created automatically during controller initialization, and used for admin commands.
  • I/O Qpair: Created explicitly via the Qpair class and required for submitting I/O commands through a Namespace object.

After testing, each Qpair must be properly destroyed to free queue memory and resources.

def test_qpair_basic(nvme0):
    # Create a Qpair with depth 16
    qpair = Qpair(nvme0, depth=16)
    # Use it with I/O commands...
    qpair.delete()  # Cleanup

When using the qpair fixture, PyNVMe3 automatically deletes the Qpair at the end of the test, even if the test fails. This is the recommended usage pattern:

def test_fixture_qpair(nvme0n1, qpair, buf):
    nvme0n1.write(qpair, buf, 0).waitdone()

5.1 Qpair Creation

The Qpair constructor allows fine-grained control over queue parameters:

qpair = Qpair(
    nvme0,             # Controller handle
    depth=1024,        # Queue depth
    ien=True,          # Enable interrupts (default True)
    vector=0,          # Interrupt vector to use
    prio=0,            # Queue priority (0=urgent, 3=low)
    sqid=None,         # Explicit SQID if required
    lazy_doorbell=False # Lazy doorbell mode
)
  • Interrupt Enable (ien) – enables or disables MSI/MSI-X signaling.
  • Vector – select an interrupt vector when MSI-X is active.
  • Priority (prio) – queue arbitration priority level.
  • Lazy Doorbell – if True, doorbells are written only when waitdone() is called, reducing MMIO writes.

Example of lazy doorbell batching:

def test_lazy_doorbell(nvme0n1, buf):
    qpair = Qpair(nvme0n1.controller, depth=16, lazy_doorbell=True)
    nvme0n1.read(qpair, buf, 0)
    nvme0n1.read(qpair, buf, 8)
    nvme0n1.read(qpair, buf, 16)
    qpair.waitdone(3)   # Only one doorbell update
    qpair.delete()

By default, queue memory is allocated in system DRAM, but Qpairs can also be backed by Controller Memory Buffer (CMB) when supported.

5.2 Common Properties

The Qpair object exposes useful runtime properties:

  • latest_cid – Command ID of the most recently submitted command.
  • latest_latency – Latency (µs) of the most recently completed command.
  • prio – Queue priority level.
  • sqid – Submission Queue ID of this Qpair.
  • depth – Configured queue depth.
def test_qpair_properties(qpair, nvme0n1, buf):
    nvme0n1.write(qpair, buf, 0).waitdone()
    logging.info(f"CID: {qpair.latest_cid}, Latency: {qpair.latest_latency} us")
    logging.info(f"Queue ID: {qpair.sqid}, Depth: {qpair.depth}, Priority: {qpair.prio}")

5.3 Command Completion and Logging

Like the admin queue, I/O Qpairs are asynchronous: commands are submitted to the SQ and completed later in the CQ.

  • waitdone(n=1) – Waits for n command completions. This is the primary way to synchronize I/O.
  • cmdlog(count=1000, offset=0) – Retrieve recent SQEs and CQEs from the queue for debugging.
def test_qpair_cmdlog(nvme0n1, qpair, buf):
    nvme0n1.write(qpair, buf, 0).waitdone()
    for entry in qpair.cmdlog(5):
        logging.info(entry)

You can also retrieve the original SQE from a CQE using:

  • get_cmd(cpl) – Fetches the SQE that produced a given CQE. Typically used in callbacks.
def test_qpair_get_cmd(nvme0n1, qpair, buf):
    def cb(cqe):
        sqe = qpair.get_cmd(cqe)
        logging.info(f"Completion: {cqe}, Original Command: {sqe}")
    nvme0n1.write(qpair, buf, 0, cb=cb).waitdone()

5.4 Interrupt Handling

PyNVMe3 supports both generic MSI/MSI-X interrupt handling and MSI-X–specific APIs. Although interrupts can be tested, PyNVMe3 itself does not rely on interrupts—it polls completions in waitdone().

  • msix_clear() – Clear MSI-X interrupts.
  • msix_isset() – Check MSI-X interrupt state.
  • wait_msix() – Wait for MSI-X interrupt.
  • msix_mask() / msix_unmask() – Mask/unmask MSI-X interrupts.

Example: Testing MSI-X Masking

def test_interrupt_qpair_msix(nvme0n1, qpair, buf):
    qpair.msix_clear()
    assert not qpair.msix_isset()

    # Submit read command
    nvme0n1.read(qpair, buf, 0, 8)
    time.sleep(0.1)
    assert qpair.msix_isset()

    # Mask interrupts
    qpair.msix_clear()
    qpair.msix_mask()
    nvme0n1.read(qpair, buf, 16, 8)
    time.sleep(0.2)
    assert not qpair.msix_isset()

    # Unmask interrupts
    qpair.msix_unmask()
    qpair.waitdone()
    assert qpair.msix_isset()

This enables precise validation of interrupt behavior without kernel dependencies.

5.5 Resource Management

Always delete Qpairs after use:

qpair.delete()

If a test ends prematurely (e.g., due to an assert failure), cleanup code may not run. Therefore, it is recommended to use pytest fixtures for Qpair lifecycle management. Fixtures ensure delete() is called reliably.

5.6 Summary

The Qpair class is the fundamental bridge between Namespaces and actual I/O execution in PyNVMe3. By combining submission and completion queues into a single object, it provides fine-grained control over queue depth, priority, doorbell updates, and interrupt behavior. Qpairs not only support high-performance polling mode but also allow detailed interrupt validation via MSI and MSI-X APIs, making them indispensable for both compliance and stress testing. With additional features such as command logging, latency tracking, and SQE recovery, they give testers deep visibility into I/O flows. Proper resource management—ideally through pytest fixtures—ensures clean teardown even under test failures. In practice, mastering Qpair usage is essential for building robust, high-fidelity NVMe test scripts that can scale from functional verification to advanced performance analysis.

Here’s an improved and more complete version of your Subsystem section, with clearer explanations, polished grammar, and added details on how users can integrate their own power-control devices:

6. Subsystem

The Subsystem class models an NVMe subsystem and provides APIs for reset and power control. It acts as the bridge between PyNVMe3 and the physical power state of the device under test (DUT). Typical use cases include verifying device behavior across power cycles, testing reset recovery, and validating data persistence.

6.1 Core Methods

The following methods are provided:

  • Subsystem.reset() – Performs a subsystem reset, if supported by the DUT.
  • Subsystem.poweroff() – Powers off the DUT.
  • Subsystem.poweron() – Powers the DUT back on.

By default, these functions are mapped to system-level mechanisms:

  • poweroff() uses the host’s S3 sleep state.
  • poweron() uses the system’s RTC wake-up feature.

However, not all motherboards provide complete S3/RTC support. For robust and repeatable testing, PyNVMe3 strongly recommends using an external power control device, such as the Quarch PAM (Power Analysis Module). Support for Quarch PAM is included in scripts/pam.py.

6.2 Using Fixtures

PyNVMe3 provides the subsystem fixture, which transparently uses a Quarch PAM if available. Otherwise, it falls back to the default S3/RTC mechanism:

@pytest.fixture(scope="function")
def subsystem(nvme0, pam):
    if pam.exists():
        # Use PAM to control power
        ret = Subsystem(nvme0, pam.on, pam.off)
    else:
        # Fall back to S3/RTC
        ret = Subsystem(nvme0)
    return ret

With this fixture, test scripts can simply call:

def test_power_cycle(subsystem, nvme0):
    subsystem.poweroff()
    subsystem.poweron()
    nvme0.reset()  # Reinitialize after power-up

The PyNVMe3 driver automatically handles device rescanning and driver rebinding during power transitions. After a poweron(), users only need to reinitialize the controller using Controller.reset().

6.3 Customized Power Controller

Developers may integrate their own power-control hardware (e.g., a custom relay board, USB-controlled power switch, or other programmable power module) by supplying callback functions to the Subsystem constructor.

Example:

def my_poweron():
    # Custom logic to turn DUT power ON
    relay.send_command("on")

def my_poweroff():
    # Custom logic to turn DUT power OFF
    relay.send_command("off")

subsystem = Subsystem(nvme0, my_poweron, my_poweroff)

Once provided, Subsystem.poweron() and Subsystem.poweroff() will automatically call your custom functions. This design makes it easy to adapt PyNVMe3 to any power controller, without modifying the core driver or test scripts.

💡 In practice, this means you can reuse all existing test cases unchanged. Only the Subsystem fixture needs to be updated to use your custom power functions.

6.4 PERST Support

In addition to full power control, PyNVMe3 supports testing via the PCIe PERST# signal. PERST (PCIe Reset) allows the host to reset the DUT without removing power, providing a faster and more precise way to evaluate reset recovery behavior.

The Subsystem class can be constructed with custom PERST control functions, similar to power-on/off integration. For example, with a Quarch PAM or other hardware capable of toggling PERST, you can map Subsystem.poweron() to assert PERST high and Subsystem.poweroff() to deassert PERST low.

def test_perst_basic(pcie, nvme0, nvme0n1, pam):
    # Define helper functions for PERST control
    def perst_low():
        pam.perst = 0  # Drive PERST low (assert reset)
        time.sleep(0.1)

    def perst_high():
        pam.perst = 1  # Drive PERST high (release reset)
        time.sleep(0.1)

    # Create subsystem object using PERST callbacks
    subsystem = Subsystem(nvme0, perst_high, perst_low)

    # Utility: read "Power Cycles" counter from SMART log page
    def get_power_cycles(nvme0):
        buf = Buffer(512)
        nvme0.getlogpage(2, buf, 512).waitdone()
        cycles = buf.data(115, 112)
        logging.info(f"Power Cycles: {cycles}")
        return cycles

    # Record current power cycle count
    powercycle = get_power_cycles(nvme0)

    # Toggle PERST
    subsystem.poweroff()   # Assert PERST low
    time.sleep(1)
    subsystem.poweron()    # Release PERST high
    time.sleep(1)

    # Reinitialize controller
    nvme0.reset()

    # Verify that power cycle counter did not increment
    assert powercycle == get_power_cycles(nvme0)

Explanation:

  • perst_low() and perst_high() are custom functions that toggle the PERST line.
  • These functions are passed to Subsystem so that poweroff() = PERST low and poweron() = PERST high.
  • Unlike a full power cycle, PERST does not increment the “Power Cycles” counter in the SMART log. The test validates this by comparing the counter before and after the reset.
  • After releasing PERST, nvme0.reset() is required to reinitialize the NVMe controller.

This approach allows developers to:

  • Validate controller compliance with PERST reset requirements.
  • Confirm that internal state is reset correctly without affecting the power-cycle log.
  • Run repeatable reset-recovery tests with minimal downtime compared to full power cycles.

6.5 Summary

The Subsystem class gives PyNVMe3 scripts a simple way to control power and reset at the hardware level. It supports standard methods like S3/RTC, external devices such as Quarch PAM, and even custom user-defined callbacks. This flexibility lets developers adapt the same test scripts to different lab setups without changing the driver.

With Subsystem, tests can cover both full power cycles and PERST resets, while PyNVMe3 automatically handles PCIe rescans and driver rebinding. Using fixtures makes resource management safe and consistent, even across failures.

In short, the Subsystem class connects software tests with hardware control, making it easy to validate device behavior during resets, power loss, and recovery.

7. Checklist (for AI)

7.1 General

  • ✅ Use pytest fixtures (pcie, nvme0, nvme0n1, qpair, buf, verify) instead of manual construction whenever possible.
  • ✅ Always call .waitdone() after submitting async commands (both admin and I/O).
  • ✅ Never call .waitdone() inside callbacks to avoid deadlocks.
  • ✅ For debugging, use .cmdlog() or .cmdlog_merged() to inspect SQE/CQE history.
  • ✅ Avoid hardcoding addresses/offsets unless intentionally testing register-level behavior.
  • ✅ Prefer explicit teardown (.close(), .delete()) if not using fixtures.

7.2 Buffer

  • ✅ Allocate buffers from huge pages; run make setup before tests.
  • ✅ For >1 MB buffer, ensure 1 GB huge pages are configured.
  • ✅ Use offset and size to create unaligned PRP/SGL mappings when testing.
  • ✅ Prefer ptype and pvalue for reproducible patterns; avoid raw random fills.
  • ✅ Use buf.dump() and buf.data() for structured inspection.
  • ✅ Keep references to buffers registered as HMB—do not let Python GC reclaim them.

7.3 Pcie

  • ✅ Use cfg_*_read/write() for configuration space, mem_*_read/write() for BAR.
  • ✅ After reset (.reset() or .flr()), always call Controller.reset() to reinit.
  • ✅ Use cap_offset() to locate PCIe capabilities instead of hardcoding offsets.
  • ✅ For workload + reset tests, combine pcie.flr() with ioworker.
  • ✅ Release PCIe resources with .close() (use fixtures to guarantee cleanup).

7.4 Controller

  • ✅ Stick to the default init flow unless you need custom behavior (e.g., WRR arbitration).
  • ✅ When overriding init, do not modify timeouts.
  • ✅ Use supports(opcode) before sending optional admin commands.
  • ✅ For AERs, remember: do not call .waitdone() directly—reap via helper command.
  • ✅ For long operations (e.g., format), increase timeout using .set_timeout_ms().
  • ✅ Enable lazy_doorbell if reducing MMIO traffic is part of the test target.
  • ✅ Use id_ctrl and id_ns fixtures instead of parsing raw identify data manually.

7.5 Namespace

  • ✅ Always issue I/O commands with an explicit Qpair.
  • ✅ Use Namespace.format() instead of Controller.format() unless testing the command itself.
  • ✅ Enable the verify fixture to activate CRC + injected metadata checks.
  • ✅ If huge-page memory is insufficient, use nlba_verify to limit CRC scope.
  • ✅ Use inject_write_buffer_disable() only when you explicitly want raw data patterns.
  • ✅ Export/import CRC state with save_crc() / load_crc() for multi-run consistency.
  • ✅ When running multiple workers, ensure LBA spaces don’t overlap unless testing conflicts.

7.6 Qpair

  • ✅ Always call .delete() after use (fixtures handle this automatically).
  • ✅ Use lazy_doorbell=True to reduce doorbell writes in batch submissions.
  • ✅ Inspect latest_cid and latest_latency for command profiling.
  • ✅ Use MSI-X APIs (msix_clear, msix_mask, msix_unmask) when testing interrupt behavior.
  • ✅ Remember: PyNVMe3’s normal mode is polling, interrupts are for testing only.

7.7 Subsystem

  • ✅ Wrap power control in a Subsystem object instead of direct platform calls.
  • ✅ For custom power cards, pass your own poweron / poweroff functions to Subsystem().
  • ✅ Always call Controller.reset() after Subsystem.poweron() to reinitialize DUT.
  • ✅ Use Quarch PAM integration (pam.py) if available; otherwise, adapt your own control logic.
  • ✅ Example: Implement #PERST control by providing perst_low / perst_high callbacks.