Power test case description

The goal of this test case is to allow the control of the card power consumption. This is achieved by adjusting the toggle_rate of the clock, which drives all flip-flops, DSPs, block RAMs, UltraRAMs and AIEs present in the power xbtest hardware IP (xbtIP) of Alveo Versal Example Design (AVED). Sensor values (power, temperature, power rails current/voltage and fan speed) are read every second using AVED Management Interface (AMI) APIs.

Test parameters

The mandatory test configuration parameters are listed below. For more information, see Power test JSON members.

Main test steps

The measurement of the card rest power, which lasts a few seconds, is always performed when the Power test case starts. For each test configuration, the following steps are repeated:

  1. The toggle_rate is set to the power xbtIP.

  2. For the defined duration, sensor values are read and reported every second.

  3. When the test completes, it always passes because no checks are made on the consumed power. The user is responsible for monitoring the consumed power, which is displayed by xbtest application software (xbtSW).

Warning

The Power xbtIP has been intentionally designed to exceed the power capacity of the card. You might damage your card and cause your server/workstation to reboot if you try to for example (but not limited to):

  • Use a high toggle_rate.

  • Use a particularly demanding test sequence (for example, alternating between a low and a high toggle_rate for short periods of time).

Important

xbtest reports the entire power consumed by the card (for active card, fan speed is also reported).

Power and temperature limits

To limit potential damage to the Alveo™ card in cases of accidental misuse or demanding test environmental conditions, the following basic safety mechanisms are in place.

  • Temperature Limit: A critical warning is generated when the temperature limit is reached.

  • Power Limit: A critical warning is generated when the power limit is reached.

Temperature and power limits are defined in Card definition JSON file.

The list of sensors monitored is available in the Device management task description.

Power budget and calibration

To establish the relationship between toggle_rate and power, a simple calibration method can be used. An example of calibration is starting from 0%, increasing the toggle_rate by 5 %, and for each toggle_rate step, letting the power and temperature stabilize for two minutes. There are numerous considerations to consider when creating this relationship.

Important

  • xbtest always reports the total power of the card obtained via the AVED Management Interface (AMI) APIs.

  • Ensure that the environmental conditions (for example, temperature) used during calibration are similar to the conditions used in testing.

Toggle rate step requirement

By default, xbtest limits the toggle rate steps to 10 % per second as most of the power regulators (& the FPGA) have a step load requirement.

For example, if the toggle_rate is initially set to 25 %, it will take 4 seconds to set a new target toggle_rate of 65 % as the actual toggle rate will gradually increase every second: 35 %, 45 %, 55 % and finally 65 %.

This ramp can be disabled using the parameter disable_toggle_ramp. When disabled, ensure that toggle_rate never steps (down or up) by more than around 20 % per second.

Actual power available

The distribution of the power across the various regulators also limits which power is available for xbtest to control. For example, on an Alveo U50 card, although the power budget of the card is 75W, up to 10W are reserved for the HBM. This means that Power xbtest hardware IP (xbtIP) can only control up to 65W. It also means that the Memory xbtIP must be in use to have a card power consumption higher than 65W.

Moreover, the total power budget of the card is not entirely available. The actual power available would be impacted by the efficiency and current limitation of the various regulators. For information about various sensors and power rails limits, refer to the specific documentation for your card. With the same example (U50 card), the actual power thresholds will be lower than 65W and 10W.

Components present on the card

The card might be fitted with other ICs (such as co-processor, memories, and so forth) on which xbtest has no control. During calibration, ensure these components behave like similarly to normal test operations.

Note

xbtest can only control power of memory directly connected to the FPGA.

Tests running

Other test cases, like memory (DDR or HBM) and GT MAC, have a significant impact on the power consumed. Make sure that the calibration is done while using these other feature as per nominal load.

  • DDR: When running four DDRs simultaneously, the memory test consumes approximately 20 W (write mode) or 15 W (read mode).

  • HBM: For example, when eight HBM ports are used, the memory test consumes between 7 W and 8 W.

  • GT: When two GT MAC xbtest hardware IP (xbtIP) are present, the 25 GbE mode uses ±6 W more than the 10 GbE mode.

  • Logic: Memory and GT xbtIP, and the memory subsystem also consume several watts.

Note

These values are indicative and might vary from card to card. They also depend on test environmental conditions, such as cooling.

Care must be taken when mixing test case types or when changing the mode of other tests while the power test is running. For example, power varies when the memory test changes from only_rd to only_wr mode. Power will decrease when a Memory test case ends. simultaneous_wr_rd mode is usually the memory test mode consuming the most power.

Power test JSON members

Example

The following is an example of a Power test case running for 60 seconds at a toggle_rate of 15 %. If needed change the values accordingly to run at higher/lower toggle rates. This JSON files will be available at /opt/amd/aved/amd_v80_gen5x8_24.1_xbtest_stress/xbtest/test/.

"power": {
  "global_config": {
    "test_sequence": [ { "duration": 60, "toggle_rate": 15 } ]
  }
}

Definition

The following table shows all members available for this test case. More details are provided for each member in the subsequent sections.

Power test case members

Member

Mandatory / Optional

Description

test_sequence

Mandatory

Describes the sequence of tests to perform. A test is defined by the following values:

disable_reg

Optional

Disable usage of all flip-flops present in the Power xbtIP of Alveo Versal Example Design (AVED).

disable_dsp

Optional

Disable usage of all DSPs present in the Power xbtIP of Alveo Versal Example Design (AVED).

disable_bram

Optional

Disable usage of all block RAMs present in the Power xbtIP of Alveo Versal Example Design (AVED).

disable_uram

Optional

Disable usage of all UltraRAMs present in the Power xbtIP of Alveo Versal Example Design (AVED).

disable_aie

Optional

Disable usage of all AIEs present in the Power xbtIP of Alveo Versal Example Design (AVED).

disable_toggle_ramp

Optional

Disable ramp to reach target toggle rate (see Toggle rate step requirement).

test_sequence

Mandatory. Describes the sequence of tests to perform. Tests are performed serially, and a failure in one test does not stop the sequence (the next test will be launched). There is no limitation to the length of the test sequence.

This field contains a list of tests, each test being defined by an object of key–value parameters pairs: [ {}, {}, {} ].

The following table defines the parameters supported in the Power test sequence:

Power test sequence parameters

Member

Mandatory / optional

Description

duration

Mandatory

The duration of the test in seconds; Range [1, 232-1].

toggle_rate

Mandatory

Toggle rate (in %) driving the sites present in the Power xbtIP of Alveo Versal Example Design (AVED); Range [0, 100].

For example:

  • Single test:

    • "test_sequence": [ { "duration": 40, "toggle_rate": 75 } ]
      
    • "test_sequence": [ { "duration": 40, "toggle_rate": 85 } ]
      
  • Multiple tests:

    • "test_sequence": [
        { "duration":  40, "toggle_rate": 15 },
        { "duration": 240, "toggle_rate": 30 },
        { "duration": 120, "toggle_rate": 40 },
        { "duration":  20, "toggle_rate": 50 }
      ]
      

disable_reg, disable_dsp, disable_bram, disable_uram, disable_aie

Optional; Type : boolean; Possible values: true or false; Default : false

By default, all flip-flops, DSPs, block RAMs, and UltraRAMs present in the power xbtest hardware IP (xbtIP) of Alveo Versal Example Design (AVED) are enabled.

  • When disable_reg is set to true, all flip-flops present are disabled.

  • When disable_dsp is set to true, all DSPs present are disabled.

  • When disable_bram is set to true, all block RAMs present are disabled.

  • When disable_uram is set to true, all UltraRAMs present are disabled.

  • When disable_aie is set to true, all AIEs present are disabled.

disable_toggle_ramp

Optional; Type : boolean; Possible values: true or false; Default : false

By default, the target toggle_rate will be set gradually to the power xbtest hardware IP (xbtIP) load using steps (see Toggle rate step requirement).

  • When set to true, the target toggle_rate will be set directly to the Power xbtIP.

Output files

All power measurements are stored in an output CSV file named power.csv which is generated in xbtest logging directory. The values are stored in CSV type format with one column for each information type.

Important

If the command line option -L is used while calling xbtest application software (xbtSW), no output file is generated.

All measurements from all test_sequence are combined into a single file.

A new line is written in this file every time power measurements are available. At a minimum, the following values are recorded:

  • Global time (s): Global elapsed time since xbtest application software (xbtSW) execution started.

  • Test: Index of current test within the test_sequence. Index of first test is 1. The first rows of the file with test and toggle_rate set to 0 corresponds to the measurement of the card rest power.

  • Test time (s): Timestamp of the measurement. Timestamp of first measurement is 0 for a given test within the test_sequence.

  • Toggle rate (%): toggle_rate in % currently set to the power xbtIP.

  • measurement ID: Measurement identifier. ID of first measurement is 1.

  • Measurement valid: Set to OK if xbtSW was able to successfully gets power and temperature measurements via the AVED Management Interface (AMI) APIs, otherwise set to KO.

  • Thermal measurements: Group of one or more columns recording measurements and status for each thermal sensor source monitored by xbtest.

    • FPGA temperature.

  • Electrical measurements: Group of one or more columns recording detailed measurements and status for each electrical sensor source monitored by xbtest.

    • Card power.

    • Current, voltage and power of 3v3_pex, 12v_pex, vccint and 12v_aux (and an auxiliary cable is used).

See Device management task description for more information on the sensor sources monitored by xbtest.