Important Design Considerations from PG302¶

When tackling design issues, navigating a lengthy product guide can be overwhelming. To streamline your troubleshooting process, this article lists essential design considerations directly from the product guide, highlighting key points to keep in mind. This article condenses over 200 pages into a focused, 10-12 page summary, making it easier to access the most relevant information quickly. For full details, be sure to click the provided link to explore each consideration further in the original product guide.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Descriptor-Engine?tocId=IapAFAz_pMIftrJvFO5TaQ
- If a queue is associated with interrupt aggregation, AMD recommends that the status descriptor be turned off, and instead the DMA status be received from the interrupt aggregation ring.

Note

https://docs.amd.com/r/en-US/pg302-qdma/H2C-Stream-Engine?tocId=5bP1owSpu0KFRSL~uEv2lQ
- The total length of all descriptors put together must be less than 64 KB.
- For internal mode queues, each descriptor defines a single AXI4-Stream packet to be transferred to the H2C AXI-ST interface. A packet with multiple descriptors straddling is not allowed due to the lack of per queue storage. However, packets with multiple descriptors straddling can be implemented using the descriptor bypass mode.

Note

https://docs.amd.com/r/en-US/pg302-qdma/C2H-Stream-Engine?tocId=w7xytGq781SYeea3XB7G3A
- In Simple Bypass Mode, the engine does not track anything for the queue, and the user logic can define its own method to receive descriptors. The user logic is then responsible for delivering the packet and associated descriptor through the simple bypass interface. The ordering of the descriptors fetched by a queue in the bypass interface and the C2H stream interface must be maintained across all queues in bypass mode.

Note

https://docs.amd.com/r/en-US/pg302-qdma/AXI-Memory-Mapped-Bridge-Master-Interface
- One or more PCIe BAR of any physical function (PF) or virtual function (VF) can be mapped to the AXI-MM bridge master interface. This selection must be made prior to design compilation.
- Virtual function group (VFG) refers to the VF group number. It is equivalent to the PF number associated with the corresponding VF. VFG_OFFSET refers to the VF number with respect to a particular PF. Note that this is not the FIRST_VF_OFFSET of each PF.
- Note that all VFs belonging to the same PF share the same PCIe to AXI translation vector. Therefore, the AXI address space of each VF is concatenated together. Use VFG_OFFSET to calculate the actual starting address of AXI for a particular VF.

Note

https://docs.amd.com/r/en-US/pg302-qdma/PCIe-RQ/RC
- With a 512-bit interface, straddling is enabled. While straddling is supported, all combinations of RQ straddled transactions might not be implemented.

Note

https://docs.amd.com/r/en-US/pg302-qdma/General-Design-of-Queues
- If queue size is 8, which contains the entry index 0 to 7, the last entry (index 7) is reserved for status. This index should never be used for PIDX update, and PIDX update should never be equal to CIDX. For this case, if CIDX is 0, the maximum PIDX update would be 6.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Limitations
- Use AXI SmartConnect to support Narrow Burst.
- ECC and Slave Narrow Burst support is mutually exclusive.
- If you want an ECC feature, the recommendation is to up-size your AXI Master externally.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Performance-and-Resource-Utilization
- Global register for timer should have a value of 30 for 3 μs.
- The driver should update TX/RX PIDX in batches of 64.
- The driver should update the H2C PIDX in batches of 64, and also update for the last descriptor of the scatter gather list.
- For optimal QDMA streaming performance, packet buffers of the descriptor ring should be aligned to at least 256 bytes.
- AMD recommends that you limit the total outstanding descriptor fetch to be less than 8 KB on the PCIe. For example, limit the outstanding credits across all queues to 512 for a 16B descriptor.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Performance-and-Resource-Utilization
- AMD recommends that this port be asserted once in 32 packets or 64 packets. And if there are no more descriptors left then assert h2c_byp_in_st_sdi at the last descriptor. This requirement is per queue basis, and applies to AXI4 (H2C and C2H) bypass transfers and AXI4-Stream H2C transfers.
- For AXI4-Stream C2H Simple bypass mode, the dsc_crdt_in_fence port should be set to 1 for performance reasons. This recommendation assumes the user design already coalesced credits for each queue and sent them to the IP. In internal mode, set the fence bit in the QDMA_C2H_PFCH_CFG_2 (0xA84) register.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Descriptor-Context
- Prior to enabling the queue, the hardware and credit context must first be cleared. After this is done, the software context can be programmed and the qen bit can be set to enable the queue. After the queue is enabled, the software context should only be updated through the direct mapped address space to update the Producer Index and Interrupt Arm® bit, unless the queue is being disabled.
- Reading the context when the queue is enabled is not recommended as it can result in reduced performance.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Software-Descriptor-Context-Structure-0x0-C2H-and-0x1-H2C
- irq_req: Interrupt due to error waiting to be sent (waiting for irq_arm). This bit should be cleared when the queue context is initialized.
- err_wb_sent: A writeback/interrupt was sent for an error. Once this bit is set no more writebacks or interrupts will be sent for the queue. This bit should be cleared when the queue context is initialized.
- irq_no_last: This bit should be initialized to 0 when the queue context is initialized.
- dsc_sz: If bypass mode is not enabled, 32B is required for Memory Mapped DMA, 16B is required for H2C Stream DMA, and 8B is required for C2H Stream DMA.
- fetch_max: The max outstanding is fetch_max + 1. Higher value can increase the single queue performance.
- fcrd_en: Set to 1 for C2H ST.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Descriptor-Engine?tocId=IapAFAz_pMIftrJvFO5TaQ

If a queue is associated with interrupt aggregation, AMD recommends that the status descriptor be turned off, and instead the DMA status be received from the interrupt aggregation ring.

Note

https://docs.amd.com/r/en-US/pg302-qdma/H2C-Stream-Engine?tocId=5bP1owSpu0KFRSL~uEv2lQ

The total length of all descriptors put together must be less than 64 KB.
For internal mode queues, each descriptor defines a single AXI4-Stream packet to be transferred to the H2C AXI-ST interface. A packet with multiple descriptors straddling is not allowed due to the lack of per queue storage. However, packets with multiple descriptors straddling can be implemented using the descriptor bypass mode.

Note

https://docs.amd.com/r/en-US/pg302-qdma/C2H-Stream-Engine?tocId=w7xytGq781SYeea3XB7G3A

In Simple Bypass Mode, the engine does not track anything for the queue, and the user logic can define its own method to receive descriptors. The user logic is then responsible for delivering the packet and associated descriptor through the simple bypass interface. The ordering of the descriptors fetched by a queue in the bypass interface and the C2H stream interface must be maintained across all queues in bypass mode.

Note

https://docs.amd.com/r/en-US/pg302-qdma/AXI-Memory-Mapped-Bridge-Master-Interface

One or more PCIe BAR of any physical function (PF) or virtual function (VF) can be mapped to the AXI-MM bridge master interface. This selection must be made prior to design compilation.
Virtual function group (VFG) refers to the VF group number. It is equivalent to the PF number associated with the corresponding VF. VFG_OFFSET refers to the VF number with respect to a particular PF. Note that this is not the FIRST_VF_OFFSET of each PF.
Note that all VFs belonging to the same PF share the same PCIe to AXI translation vector. Therefore, the AXI address space of each VF is concatenated together. Use VFG_OFFSET to calculate the actual starting address of AXI for a particular VF.

Note

https://docs.amd.com/r/en-US/pg302-qdma/PCIe-RQ/RC

With a 512-bit interface, straddling is enabled. While straddling is supported, all combinations of RQ straddled transactions might not be implemented.

Note

https://docs.amd.com/r/en-US/pg302-qdma/General-Design-of-Queues

If queue size is 8, which contains the entry index 0 to 7, the last entry (index 7) is reserved for status. This index should never be used for PIDX update, and PIDX update should never be equal to CIDX. For this case, if CIDX is 0, the maximum PIDX update would be 6.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Limitations

Use AXI SmartConnect to support Narrow Burst.
ECC and Slave Narrow Burst support is mutually exclusive.
If you want an ECC feature, the recommendation is to up-size your AXI Master externally.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Performance-and-Resource-Utilization

Global register for timer should have a value of 30 for 3 μs.
The driver should update TX/RX PIDX in batches of 64.
The driver should update the H2C PIDX in batches of 64, and also update for the last descriptor of the scatter-gather list.
For optimal QDMA streaming performance, packet buffers of the descriptor ring should be aligned to at least 256 bytes.
AMD recommends that you limit the total outstanding descriptor fetch to be less than 8 KB on the PCIe. For example, limit the outstanding credits across all queues to 512 for a 16B descriptor.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Performance-and-Resource-Utilization

AMD recommends that this port be asserted once in 32 packets or 64 packets. And if there are no more descriptors left then assert h2c_byp_in_st_sdi at the last descriptor. This requirement is per queue basis, and applies to AXI4 (H2C and C2H) bypass transfers and AXI4-Stream H2C transfers.
For AXI4-Stream C2H Simple bypass mode, the dsc_crdt_in_fence port should be set to 1 for performance reasons. This recommendation assumes the user design already coalesced credits for each queue and sent them to the IP. In internal mode, set the fence bit in the QDMA_C2H_PFCH_CFG_2 (0xA84) register.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Descriptor-Context

Prior to enabling the queue, the hardware and credit context must first be cleared. After this is done, the software context can be programmed and the qen bit can be set to enable the queue. After the queue is enabled, the software context should only be updated through the direct mapped address space to update the Producer Index and Interrupt Arm® bit, unless the queue is being disabled.
Reading the context when the queue is enabled is not recommended as it can result in reduced performance.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Software-Descriptor-Context-Structure-0x0-C2H-and-0x1-H2C

irq_req: Interrupt due to error waiting to be sent (waiting for irq_arm). This bit should be cleared when the queue context is initialized.
err_wb_sent: A writeback/interrupt was sent for an error. Once this bit is set no more writebacks or interrupts will be sent for the queue. This bit should be cleared when the queue context is initialized.
irq_no_last: This bit should be initialized to 0 when the queue context is initialized.
dsc_sz: If bypass mode is not enabled, 32B is required for Memory Mapped DMA, 16B is required for H2C Stream DMA, and 8B is required for C2H Stream DMA.
fetch_max: The max outstanding is fetch_max + 1. Higher value can increase the single queue performance.
fcrd_en: Set to 1 for C2H ST.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Descriptor-Fetch

If fetch crediting is enabled, the user logic is required to provide a credit for each descriptor that should be fetched.
In each direction, C2H and H2C are allocated 256 entries for descriptor fetch completions. Each entry is the width of the datapath. If sufficient space is available, the fetch is allowed to proceed. A given queue can only have one descriptor fetch pending on PCIe at any time.
Available descriptors are always - 2. At any time, the software should not update the PIDX to more than - 2.
If queue size is 8, which contains the entry index 0 to 7, the last entry (index 7) is reserved for status. This index should never be used for the PIDX update, and the PIDX update should never be equal to CIDX. For this case, if CIDX is 0, the maximum PIDX update would be 6.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Internal-Mode-Writeback-and-Interrupts-AXI-MM-and-H2C-ST

It is recommended the wbi_chk bit be set for all internal mode operation, including when interval mode is enabled.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Descriptor-Bypass-Mode-Writeback/Interrupts

If interrupts are enabled, the user logic must monitor the traffic manager output for the irq_arm. After the irq_arm bit is observed for the queue, a descriptor with the sdi bit is sent to the DMA. Once a descriptor with the sdi bit is sent, another irq_arm assertion must be observed before another descriptor with the sdi bit can be sent.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Traffic-Manager-Output-Interface

While the tm_dsc_sts interface is a valid/ready interface, it should not be back-pressured for optimal performance.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Errors?tocId=RVdoy7Fzh1DBbxgMq3ytwg

After the queue is invalidated, if there is an error you can determine the cause by reading the error registers and context for that queue. You must clear and remove that queue, and then add the queue back later when needed.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Operation

Any descriptors that have already started the source buffer fetch will continue to be processed. Reassertion of the run bit will result in resetting internal engine state and should only be done when the engine is quiesced.
Descriptors are received from either the descriptor engine directly or the Descriptor Bypass Input interface. Any queue that is in internal mode should not be given descriptors through the Descriptor Bypass Input interface.

Note

https://docs.amd.com/r/en-US/pg302-qdma/AXI-Memory-Mapped-Descriptor-for-H2C-and-C2H-32B

Internal mode memory mapped DMA must configure the descriptor queue to be 32B and follow the above descriptor format.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Internal-and-Bypass-Modes

If the packet is present in host memory in non-contiguous space, then it has to be defined by more than one descriptor, and this requires that the queue be programmed in bypass mode.
When fcrd_en is enabled in the software context, DMA will wait for the user application to provide credits. When fcrd_en is not set, the DMA uses a pointer update, fetches descriptors and sends the descriptor out. The user application should not send in credits.
There are some requirements imposed on the user logic when using the bypass mode. Because the bypass mode allows a packet to span multiple descriptors, the user logic needs to indicate to QDMA which descriptor marks the Start-Of-Packet (SOP) and which marks the End-Of-Packet (EOP).
At the QDMA H2C Stream bypass-in interface, among other pieces of information, the user logic needs to provide: Address, Length, SOP, and EOP. It is required that once the user logic feeds SOP descriptor information into QDMA, it must eventually feed EOP descriptor information also. Descriptors for these multi-descriptor packets must be fed in sequentially.
Other descriptors not belonging to the packet must not be interleaved within the multi-descriptor packet. The user logic must accumulate the descriptors up to the EOP descriptor, before feeding them back to QDMA. Not doing so can result in a hang.
The QDMA will generate a TLAST at the QDMA H2C AXI4-Stream data output once it issues the last beat for the EOP descriptor. This is guaranteed because the user is required to submit the descriptors for a given packet sequentially.
Quality of service can be severely affected if the packet sizes are large. The Stream engine is designed to saturate PCIe for packet sizes as low as 128B, so AMD recommends that you restrict the packet size to be host page size or maximum transfer unit as required by the user application.
A performance control provided in the H2C Stream Engine is the ability to stall requests from being issued to the PCIe RQ/RC if a certain amount of data is outstanding on the PCIe side as seen by the H2C Stream Engine. To use this feature, the SW must program a threshold value in the H2C_REQ_THROT (0xE24) register.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Handling-Descriptors-With-Errors?tocId=C1GC1G3Yi2rqbFYc6COtCg

For a queue in bypass mode, it is the responsibility of the user logic to not issue a batch of descriptors with an error descriptor. Instead, it must send just one descriptor with error input asserted on the H2C Stream bypass-in interface and set the SOP, EOP, no_dma signal, and sdi or mrkr-req signal to make the H2C Stream Engine send a writeback to Host.

Note

https://docs.amd.com/r/en-US/pg302-qdma/C2H-Stream-Engine?tocId=iIzB4_5EQe28ijZNG1QubA

The QDMA requires software to post full ring size so the C2H stream engine can fetch the needed number of descriptors for all received packets. If there are not enough descriptors in the descriptor ring, the QDMA will stall the packet transfer. For performance reasons, the software is required to post the PIDX as soon as possible to ensure there are always enough descriptors in the ring.

Note

https://docs.amd.com/r/en-US/pg302-qdma/C2H-Stream-Modes

If you already have the descriptor cached on the device, there is no need to fetch one from the host and you should follow the simple bypass mode for the C2H Stream application. In simple bypass mode, do not provide credits to fetch the descriptor, and instead, you need to send in the descriptor on the descriptor bypass interface.
For simple bypass transfer to work, a prefetch tag is needed and it can be fetched from the QDMA IP.
The user application must request a prefetch tag before sending any traffic for a simple bypass queue through the C2H ST engine. Invalid queues or non-bypass queues should not request any tags using this method, as it might reduce performance by freezing tags that never get used.
For the queues that share the same prefetch tag, the data and descriptors need to come in the same order. For Simple Bypass, the data and descriptors are both controlled by the user, so they need to guarantee the order is maintained.
If a current qid is invalidated, a new prefetch tag must be requested with a valid qid.
Prefetched tag must be assigned to input port c2h_byp_in_st_csh_pfch_tag[6:0] for all transfers.

Note

https://docs.amd.com/r/en-US/pg302-qdma/C2H-Stream-Packet-Type

dma<n>_s_axis_c2h_mty = empty byte should be set in last beat.
dma<n>_s_axis_c2h_cmpt_ctrl_wait_pld_pkt_id = This completion packet has to wait for the data packet with this ID to be sent before the CMPT packet can be sent.
When the user application sends the data packet, it must count the packet ID for each packet. The first data packet has a packet ID of 1, and it increments for each data packet.
For the regular C2H packet, the data packet and the completion packet is a one-to-one match. Therefore, the number of data packets with dma<n>_s_axis_c2h_ctrl_has_cmpt as 1’b1 should be equal to the number of CMPT packets with dma<n>_s_axis_c2h_cmpt_ctrl_cmpt_type as HAS_PLD.
Depth and width of the FIFO depends on the use case. Width is dependent on the largest CMPT size for the application, and depth is dependent on performance needs. For best performance for 64 Byte CMPT, a depth of 512 is recommended.
The immediate data packet and the marker packet do not consume the descriptor; instead, they write to the C2H Completion Ring. The software needs to size the C2H Completion Ring large enough to accommodate the outstanding immediate packets and the marker packets.
Zero Byte packets are not supported in Internal mode and Cache bypass mode. The QDMA might hang if zero byte packets are dropped due to not available descriptors. Zero Byte Packets are supported in Simple bypass mode.

Note

https://docs.amd.com/r/en-US/pg302-qdma/C2H-Stream-Modes

When prefetch mode is enabled, the user application cannot send credits as input in QDMA Descriptor Credit input ports.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Completion-Engine?tocId=N~lHogTrZWEFBwBMSKiHgw

The user-defined portion of the CMPT packet typically needs to specify the length of the data packet transferred and whether or not descriptors were consumed as a result of the data packet transfer. Immediate and marker type packets do not consume any descriptors. The exact contents of the user-defined data are up to the user to determine.
Maximum buffer size register 0xB50 bits[31:26] is programmed to 0 (default value). This value might result in an overflow depending on the simulator or the synthesis tool used. To avoid overflow, set 0xB50 bits[31:26] to maximum value of 63.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Completion-Status-Structure

In order to make the QDMA Subsystem for PCIe write Completion Status to the Completion ring, Completion Status must be enabled in the Completion context.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Completion-Context-Structure

baddr4_low: Since the minimum alignment supported is 64B in this case, this field must be 0.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Slave-Bridge

If slave reads and writes are valid, IP prioritizes reads over writes. You are recommended to have proper arbitration (leave some gaps between reads so writes can pass through).

Note

https://docs.amd.com/r/en-US/pg302-qdma/Slave-Address-Translation-Examples

The slave bridge does not support narrow burst AXI transfers. To avoid narrow burst transfers, connect the AXI smart-connect module which will convert narrow burst to full burst AXI transfers.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Function-Map-Table

Along with FMAP table programming in the IP, you must program the FMAP table in the Mailbox IP. This is needed for function level reset (FLR) procedure.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Queue-Setup

Set-up Completion Context. If interrupts/status writes are desired (enabled in the Completion Context), an initial Completion CIDX update is required to send the hardware into a state where it is sensitive to trigger conditions. This initial CIDX update is required, because when out of reset, the hardware initializes into an unarmed state.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Function-Level-Reset

When a VF is reset, only the resources associated with this VF are reset. When a PF is reset, all resources of the PF, including that of its associated VFs, are reset. Because FLR is a privileged operation, it must be performed by the PF driver running in the management system.
Quiesce: The software must ensure all pending transaction is completed. This can be done by polling the Transaction Pending bit in the Device Status register (in PCIe Configuration Space), until it is cleared or times out after a certain period of time.
Initiate Function Level Reset bit (bit 15 of PCIe Device Control Register) of the target function should be set to 1 to trigger FLR process in PCIe.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Host-Profile

Host profile must be programmed to represent root port host. Host profile can be programmed through context programming.
H2C AXI4-MM steering bit and C2H AXI4-MM steering bit should set to 0s. If not, DMA AXI4-MM transfers do not work. For most cases, host profile context structure is all 0s, and host profile must still be programmed to represent a host.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Resets

After soft_reset, you must reinitialize the queues and program all queue context.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Expansion-ROM

The maximum size for the Expansion ROM BAR should be no larger than 16 MB. Selecting an address space larger than 16 MB can result in a non-compliant core.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Data-Path-Errors

Parity errors are not recoverable and can result in unexpected behavior. Any DMA during and after the parity error should be considered invalid.
If there is a parity error and transfer hangs or stops, the DMA will log the error. You must investigate and fix the parity issues. Once the issues are fixed, clear that queue and reopen the queue to start a new transfer.

Note

https://docs.amd.com/r/en-US/pg302-qdma/QDMA-Global-Ports

sys_clk should be driven by the ODIV2 port of reference clock IBUFDS_GTE4.
PCIe reference clock should be driven from the port of reference clock IBUFDS_GTE4.

Note

https://docs.amd.com/r/en-US/pg302-qdma/AXI-Bridge-Slave-Ports

Only the INCR burst type is supported.
s_axib_wstrb can be equal to 0 in the beginning of a valid data cycle and will appropriately calculate an offset to the given address. However, the valid data identified by s_axib_wstrb must be continuous from the first byte enable to the last byte enable.

Note

https://docs.amd.com/r/en-US/pg302-qdma/QDMA-Descriptor-Bypass-Input-Ports

QDMA hangs if the last descriptor without h2c_byp_in_st_sdi has an error. This results in a missing writeback and hw_ctxt.dsc_pend bit that are asserted indefinitely.
For performance reasons, AMD recommends that this port be asserted once in 32 or 64 descriptors and assert at the last descriptor if there are no more descriptors left.
In Cache Bypass mode, you must loop back c2h_byp_out_pfch_tag[6:0] to c2h_byp_in_st_csh_pfch_tag[6:0]. In Simple Bypass mode, you need to pass in the Prefetch tag value from MDMA_C2H_PFCH_BYP_TAG (0x140C) register.
AXI4-Stream C2H Simple Bypass mode and Cache Bypass mode both use the same bypass ports, c2h_byp_in_st_csh_*.

Note

https://docs.amd.com/r/en-US/pg302-qdma/QDMA-Descriptor-Bypass-Output-Ports

h2c_byp_out_rdy: When this interface is not used, Ready must be tied-off to 1.
h2c_byp_out_cidx [15:0]: The ring index of the descriptor fetched. The User must echo this field back to QDMA when submitting the descriptor on the bypass-in interface.
c2h_byp_out_cidx [15:0]: The ring index of the descriptor fetched. The User must echo this field back to QDMA when submitting the descriptor on the bypass-in interface.
c2h_byp_out_rdy: When this interface is not used, Ready must be tied-off to 1.
When Descriptor bypass option is selected in the AMD Vivado™ IDE but the descriptor bypass bit is not set in context programming, you will see valid signals getting asserted with CIDX updates.

Note

https://docs.amd.com/r/en-US/pg302-qdma/QDMA-Descriptor-Credit-Input-Ports

dsc_crdt_in_fence: The fence bit should only be set for a queue that is enabled and has both descriptors and credits available; otherwise, a hang condition might occur.

Note

https://docs.amd.com/r/en-US/pg302-qdma/QDMA-Traffic-Manager-Credit-Output-Ports

tm_dsc_sts_rdy: When this interface is not used, Ready must be tied-off to 1.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Queue-Status-Ports

qsts_out_rdy: Ready must be tied to 1 so status output will not be blocked. Even if this interface is not used, the ready port must be tied to 1.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Registering-Signals

To simplify timing and increase system performance in a programmable device design, keep all inputs and outputs registered between the user application and the subsystem. This means that all inputs and outputs from the user application should come from, or connect to, a flip-flop. While registering signals might not be possible for all paths, it simplifies timing analysis and makes it easier for the AMD tools to place and route the design.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Recognize-Timing-Critical-Signals

The constraints provided with the example design identify the critical signals and timing constraints that should be applied.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Make-Only-Allowed-Modifications

You should not modify the subsystem. Any modifications can have adverse effects on system timing and protocol compliance. Supported user configurations of the subsystem can only be made by selecting the options in the customization IP dialog box when the subsystem is generated.

Note

https://docs.amd.com/r/en-US/pg302-qdma/AXI-BARs-Tab

No Address Translation: When this option is selected, the DMA will not do any address translation. One full 64-bit BAR space is provided, and you are responsible for any address translation if required. When address translation is required by DMA, do not select this option.

Note

https://docs.amd.com/r/en-US/pg302-qdma/PCIe-DMA-Tab

CMPT Coalesce Max buffer:
- Completion (CMPT) Coalesce Max buffer supports up to 64 buffers. Select one of 16 or 32 (default 16). Each entry of the CMPT Coalesce Buffer coalesces multiple Completions (up to 64B) to form a single queue before writing to the host to improve bandwidth utilization. A deeper CMPT Coalesce Buffer allows coalescing within more queues but will increase the area as a downside.
Data Protection:
- When Data Protection is not enabled:
  
  You must always give the parity on CMPT.
- When Data Protection is enabled:
  
  You must send CRC/ECC values on C2H data and the control interface.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Example-Design-with-Descriptor-Bypass-In/Out-Loopback

After the setup initial C2H stream data transfer, the prefetch tag is valid until the qid is valid. When the current qid becomes invalid, you must generate a new tag.

Note

https://docs.amd.com/r/en-US/pg302-qdma/Using-the-Drivers

Note: Starting from the 2022.1 release of the Linux driver for QDMA, if a design is using streaming queues, they must be explicitly enabled through API as they are not configured at module load.