5.2. Hardware Architecture of the Platform

This chapter describes the targeted reference design (TRD) hardware architecture. The following figure shows a block diagram of the design components inside the Versal ACAP on the VCK190 board. See VCK190 Evaluation Board User Guide (UG1366) for more information.

Hardware Block Diagram

At a high level, the design comprises three pipelines:

Capture/input pipeline:

  • USB capture pipeline (PS)

  • MIPI CSI-2 Rx capture pipeline (FMC + PL)

  • Video Input (file read) from x86 host via PCIe

Processing Pipeline:

  • Video processing accelerator funtions

Display/Output Pipeline:

  • HDMI TX display pipeline

  • Video Output via PCIe to host (Host)

The block diagram comprises of two parts: platform and accelerators.

Platform:

The platform consists of I/O interfaces and their data motion network. This is the fixed part of the design. The platform consists of Single sensor MIPI CSI-2 Rx (capture), USB-UVC (capture), and HDMI Tx (display).

Accelerators:

This is a block which can perform different video processing functions from Computer Vision or Machine learning. This is the variable part of the design. The accelerator and corresponding data/control interfaces (AXI-MM, AXI-Lite, interrupts) are generated by the Vitis tool and is integrated into the platform.

5.2.1. Capture Pipeline

5.2.1.1. Single Sensor MIPI Capture

A capture pipeline receives frames from an external source and writes it into memory. The single sensor MIPI CSI-2 receiver capture pipeline is shown in the following figure.

Single Sensor MIPI capture

This pipeline consists of five components, of which four are controlled by the APU via an AXI-Lite based register interface; one is controlled by the APU via an I2C register interface.

  • The Sony IMX274 is a 1/2.5 inch CMOS digital image sensor with an active imaging pixel array of 3864H x2196V. The image sensor is controlled via an I2C interface using an AXI I2C controller in the PL. It is mounted on a FMC daughter card and has a MIPI output interface that is connected to the MIPI CSI-2 RX subsystem inside the PL. For more information refer to the LI-IMX274MIPI-FMC_datasheet.

  • The MIPI CSI-2 receiver subsystem (CSI Rx) includes a MIPI D-PHY core that connects four data lanes and one clock lane to the sensor on the FMC card. It implements a CSI-2 receive interface according to the MIPI CSI-2 standard v2.0 with underlying MIPI D-PHY standard v1.2. The subsystem captures images from the IMX274 sensor in RAW10 format and outputs AXI4-Stream video data. For more information see the MIPI CSI-2 Receiver Subsystem Product Guide (PG232).

  • The HDR extract block takes single digital overlapped frame from the sensor as input and returns two output exposure frames - Short Exposure Frame and a Long Exposure Frame. For more information on this function refer to Vitis Vision Libraries HDR Extract exposure frames.

  • Creating High Dynamic Range images requires at least two frames that are captured with different exposure times. HDR Merge module will generate the HDR frame with these varied exposure frames. HDR Merge in RGB domain is complex and expensive in terms of latency, because of camera response function. Hence the current module works in Bayer domain. For information on this function refer to Vitis Vision Libraries HDR Merge.

  • The Image Single Processing IP is available in the Vitis Vision librarires (https://github.com/Xilinx/Vitis_Libraries/tree/master/vision/L1). The IP receives the RAW10 AXI4-Stream input data and interpolates the missing color components for every pixel to generate a 24-bit, 8 bits per pixel (8 bpc) RGB output image transported via AXI4-Stream. At 4 ppc, the AXIS width is 96-bit. A GPIO from the PS is used to reset the IP between resolution changes. For information on the functions it implments refer to Vitis Vision Libraries Image Sensor Processing pipeline. The ISP IP consists of following fucntions:

    • The Badpixelcorrection module removes the defective pixels in the image as an image sensor may have a certain number of defective/bad pixels that may be the result of manufacturing faults or variations in pixel voltage levels based on temperature or exposure.

    • The Gain control module improves the overall brightness of the input image by applying a multiplicative gain (weight) for red and blue channel to the input bayerized image.

    • The Demosaicing module converts a single plane Bayer pattern output, from the digital camera sensors to a color image.

    • The histogram module computes the histogram of given input image. The normalization module changes the range of pixel intensity values. Both modules are used to improve the contrast in the image.

  • The video processing subsystem (VPSS), see Video Processing Subsystem Product Guide (PG231), is a collection of video processing IP subcores. This instance uses the scaler only configuration which provides scaling, color space conversion, and chroma resampling functionality. The VPSS takes AXI4-Stream input data in 24-bit RGB format and converts it to a 16-bit, 8bpc YUV 4:2:2 output format. The following figure shows AXIS data interface at 4ppc. A GPIO pin from the PS is used to reset the subsystem between resolution changes.

AXI-Stream Data Bus Encoding
  • The video frame buffer, see Video Frame Buffer Read and Video Frame Buffer Write LogiCORE IP Product Guide (PG278) takes YUV 4:2:2 sub-sampled AXI4-Stream input data and converts it to AXI4-MM format which is written to memory as 16-bit packed YUYV. The AXI-MM interface is connected to the system DDR via NOC. For each video frame transfer, an interrupt is generated. A GPIO is used to reset the IP between resolution changes.

All the IPs in this pipeline are configured to transport 4ppc @ 150 MHz, enabling up to 3840x2160 resolution at 30 HDR frames per second (fps). Each HDR frame is a composite of a long exposure frame and a short exposure frame, so effective frame rate is 60fps.

  • Time to transfer one frame: (3840 + 560) x (2160 + 90) / (150 MHz * 4ppc) = 0.0165 ms

  • Number of frames transferred per second = 1/0.0165 = 60 frames

Note: In this calculation the vertical blanking accounts for 90 pixels per line and the horizontal blanking for 560 lines per video frame.

The video resolution, frame format and frame rate are set via register writes through the AXI-Lite interface of the IPs at run-time. The drivers for the above blocks provide APIs to set these values in a user application.

  • For the pass-through design (no accelerator) user can choose between 720p60, 1080p60, and 2160p30.

5.2.2. Display Pipeline

An output pipeline reads video frames from memory and sends the frames to a sink. In this case the sink is a display and therefore this pipeline is also referred to as a display pipeline. The HDMI display pipeline is shown in the following figure.

HDMI transamit

This pipeline consists of three main components, all of them controlled by the APU via an AXI- Lite base register interface:

  • The video mixer IP core is configured to support blending of up to eight overlay AXI4 interfaces connected to the NOC via two interconnects. Two interconnects are required to reduce arbitration across ports. The main AXI-MM layer has the resolution set to match the display. The other layers, whatever their resolution, is blended with this layer. Four videolayers are configured for YUYV and the other four are configured for RGB. The AXI4-Stream output interface is a 96-bit bus that transports 4ppc for up to 2160p60 performance. It is connected to the HDMI Tx subsystem input interface. A GPIO is used to reset the subsystem between resolution changes. For more information refer to the input interface Video Mixer LogiCORE IP Product Guide (PG243).

Note: The mixer configuration remains the same for different capture sources. To enable/ disable various layers, software programs the layer enable register in the IP

  • The HDMI transmitter subsystem (HDMI Tx) interfaces with PHY layers and provides HDMI encoding functionality. The subsystem is a hierarchical IP that bundles a collection of HDMI TX-related IP sub-cores and outputs them as a single IP. The subsystem generates an HDMI stream from the incoming AXI4-Stream video data and sends the generated link data to the video PHY layer. For more information refer to the HDMI 1.4/ 2.0 Transmitter Subsystem Product Guide (PG235).

  • The HDMI GT controller and PHY (GT) enables plug-and-play connectivity with the video transmit or receive subsystems. The interface between the media access control (MAC) and physical (PHY) layers are standardized to enable ease of use in accessing shared gigabit- transceiver (GT) resources. The data recovery unit (DRU) is used to support lower line rates for the HDMI protocol. An AXI4-Lite register interface is provided to enable dynamic accesses of transceiver controls/status. For more information refer to the HDMI GT Controller LogiCORE IP Product Guide (PG334).

  • The HDMI re-timer converts serial HDMI output signals to transition minimized differential signals (TMDS) compliant with HDMI signaling. For more information refer to SNx5DP159 datasheet.

5.2.3. CPM-PCIe Capture & Display

The integrated block for PCIe Rev. 4.0 with DMA and CCIX Rev. 1.0 (CPM) including DMA (QDMA) and two PCIe Controllers 0 & 1, is hardened in Versal ACAP devices. PCIE Controller 0 configured in Gen4 x8 mode transfers data from both host (X86) to end-point (VCK190) and vice-versa. On host-to-endpoint channel, Video frames recieved from host are written to DDR by QDMA via Network-on-Chip (NOC). On endpoint-to-host channel, processed Video frames from DDR are transfered to host and displayed on a monitor.

The QDMA Buffer Descriptors to move data in both the Host to end-point (H2C) direction, or the end-point to Host (C2H) direction are configured by Host via PCIe Interface. The driver for the block provide APIs to set these values in a user application.

For more information on CPM-PCIe & QDMA please refer to Versal ACAP CPM DMA and Bridge Mode for PCI Express v2.1 Product Guide (PG347).

5.2.4. PCIe User Space Register

For hand shaking between host and endpoint applications, the user space register IP provides a set of registers.

There are total 32 4-byte wide registers starting from offset 0x00 that have read/write access from the PS. Each register is byte addressable, which means the address for the second register can be calculated by adding four to the address of the first one.

Similarly, there are 32 4-byte wide registers having read/write access from the host. Detailed mapping of these registers is documented in PCIe EP driver section in software architecture.

5.2.5. Clocks, Resets and Interrupts

The following table lists the clock frequencies of key ACAP components and memory. For more information refer to the Versal ACAP Technical Reference Manual (AM011).

Table 1: Key Component Clock Frequencies

Component

Clock Frequency

ACPU

1,000 MHz

NOC

950 MHz

NPI

300 MHz

LPDDR

1,600

AIE

1,000

The following table identifies the main clocks of the PL design, their source, their clock frequency, and their function.

Table 2: System Clocks

Clock

Clock Source

Clock Frequency

Function

pl0_ref_clk

CIPS

100 MHz

Clock source for clocking wizard.

clk_out1

Clocking wizard

150 MHz

AXI MM clock and AXI Stream clock used in the capture of platform2, display pipeline, and processing pipeline.

clk_out2

Clocking wizard

105 MHz

AXI-Lite clock to configure the different IPs in the design.

clk_out3

Clocking wizard

200 MHz

MIPI D-PHY core clock. Also the AXI MM clock and AXI Stream clock used in the capture pipeline of plaform2.

sys_clk0

SI570 (External)

200 MHz

Differential clock source used internally by the memory controller to generate various clocks to access DDR memory.

HDMI DRU clock

SI570 (External)

200 MHz

Clock for data recovery unit for low line rates.

HDMI GT TX reference clock

IDT 8T49N241(External)

Variable

GT Transmit clock source to support various HDMI resolutions.

HDMI GT RX reference clock

Si570 (External);

Variable

GT receive clock to support various HDMI resolutions.

Audio clock

Si570 (External)

Variable

Master reference clock to generate audio stream at the required sampling rate.

The PL0 clock is provided by the PLL inside the PMC domain and is used as the reference input clock for the clocking wizard instance. This clock does not drive any loads directly. A clocking wizard instance is used to de-skew the clock and to provide three phase-aligned output clocks, clk_out1, clk_out2 and clk_out3.

The clk_out2 is used to drive most of the AXI-Lite control interfaces of the IPs in the PL. AXI-Lite interfaces are typically used to configure registers and therefore can operate at a lower frequency than data path interfaces. Exception is the AXI-Lite interfaces of HLS based IP cores where the control and data plane use either clk_out1 or clk_out3.

The clk_out1 clock drives the AXI MM interfaces and AXI Stream interfaces of the display pipeline and processing pipeline. It also drives AXI MM interfaces and AXI Stream interfaces of the capture pipeline of platform2. The clk_out3 clock drives the AXI MM interfaces and AXI Stream interfaces of the capture pipeline in platform1.

For details on HDMI Tx and HDMI GT clocking structure and requirements refer to HDMI 1.4/2.0 Transmitter Subsystem Product Guide (PG235) and HDMI GT Controller LogiCORE IP Product Guide (PG334). For HDMI Tx, an external clock chip is used to generate the GT reference clock depending on the display resolution. Various other HDMI related clocks are derived from the GT reference clock and generated internally by the HDMI GT controller; only for the DRU a fixed reference clock is provided externally by a Si570 clock chip.

For details on the various clock chips used refer to the VCK190 Evaluation Board User Guide (UG1366).

The master reset (pl_resetn0) is generated by the PS during boot and is used as input to the four processing system (PS) reset modules in the PL. Each module generates synchronous, active-Low and active-High interconnect and peripheral resets that drive all IP cores synchronous to the respective, clk_out0, clk_out1, and clk_out2 clock domains.

Apart from these system resets, there are asynchronous resets driven by PS GPIO pins. The respective device drivers control these resets which can be toggled at run-time to reset HLS- based cores. The following table summarizes the PL resets used in this design.

Table 3: PL Resets

Reset Source

Purpose

pl0_resetn

PL reset for proc_sys_reset modules

rst_processor_150MHz

Synchronous resets for clk_out0 clock domain

rst_processor_105MHz

Synchronous resets for clk_out1 clock domain

rst_processor_200MHz

Synchronous resets for clk_out3 clock domain

lpd_gpio_o 0

Asynchronous reset for the video mixer IP

lpd_gpio_o 1

Asynchronous reset for the HDR extract IP

lpd_gpio_o 2

Asynchronous reset for the HDR merge IP

lpd_gpio_o 3

Asynchronous reset for the demosaic IP

lpd_gpio_o 4

Asynchronous reset for the VPSS CSC IP

lpd_gpio_o 5

Asynchronous reset for the frame buffer write IP

lpd_gpio_o 6

Asynchronous reset for the sensor GPIO

The following table lists the PL-to-PS interrupts used in this design.

Table 4: Interrupts

Interrupt ID

Instance

pl_ps_irq0

HDMI GT Controller

pl_ps_irq1

HDMI Tx subsystem

pl_ps_irq2

Video Mixer

pl_ps_irq3

HDMI I2C

pl_ps_irq4

AXI Performance Monitor

pl_ps_irq5

Audio formatter memory-mapped to stream

pl_ps_irq6

MIPI RX subsytem

pl_ps_irq7

MIPI I2C

pl_ps_irq8

Frame buffer write interrupt


Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License.

You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.