Power test case description¶
The goal of this test case is to allow the control of the card power consumption. This is achieved by adjusting the toggle_rate of the clock, which drives all flip-flops, DSPs, block RAMs, UltraRAMs and AIEs present in the power xbtest hardware IP (xbtIP) of Alveo Versal Example Design (AVED). Sensor values (power, temperature, power rails current/voltage and fan speed) are read every second using AVED Management Interface (AMI) APIs.
Test parameters¶
The mandatory test configuration parameters are listed below. For more information, see Power test JSON members.
duration: The duration of the test, measured in seconds.
toggle_rate: The toggle rate, specified in %, driving the sites present in the Power xbtIP of Alveo Versal Example Design (AVED).
Main test steps¶
The measurement of the card rest power, which lasts a few seconds, is always performed when the Power test case starts. For each test configuration, the following steps are repeated:
The toggle_rate is set to the power xbtIP.
For the defined duration, sensor values are read and reported every second.
When the test completes, it always passes because no checks are made on the consumed power. The user is responsible for monitoring the consumed power, which is displayed by xbtest application software (xbtSW).
Warning
The Power xbtIP has been intentionally designed to exceed the power capacity of the card. You might damage your card and cause your server/workstation to reboot if you try to for example (but not limited to):
Use a high toggle_rate.
Use a particularly demanding test sequence (for example, alternating between a low and a high toggle_rate for short periods of time).
Important
xbtest reports the entire power consumed by the card (for active card, fan speed is also reported).
Power and temperature limits¶
To limit potential damage to the Alveo™ card in cases of accidental misuse or demanding test environmental conditions, the following basic safety mechanisms are in place.
Temperature Limit: A critical warning is generated when the temperature limit is reached.
Power Limit: A critical warning is generated when the power limit is reached.
Temperature and power limits are defined in Card definition JSON file.
The list of sensors monitored is available in the Device management task description.
Power budget and calibration¶
To establish the relationship between toggle_rate and power, a simple calibration method can be used. An example of calibration is starting from 0%, increasing the toggle_rate by 5 %, and for each toggle_rate step, letting the power and temperature stabilize for two minutes. There are numerous considerations to consider when creating this relationship.
Important
xbtest always reports the total power of the card obtained via the AVED Management Interface (AMI) APIs.
Ensure that the environmental conditions (for example, temperature) used during calibration are similar to the conditions used in testing.
Toggle rate step requirement¶
By default, xbtest limits the toggle rate steps to 10 % per second as most of the power regulators (& the FPGA) have a step load requirement.
For example, if the toggle_rate is initially set to 25 %, it will take 4 seconds to set a new target toggle_rate of 65 % as the actual toggle rate will gradually increase every second: 35 %, 45 %, 55 % and finally 65 %.
This ramp can be disabled using the parameter disable_toggle_ramp. When disabled, ensure that toggle_rate never steps (down or up) by more than around 20 % per second.
Actual power available¶
The distribution of the power across the various regulators also limits which power is available for xbtest to control. For example, on an Alveo U50 card, although the power budget of the card is 75W, up to 10W are reserved for the HBM. This means that Power xbtest hardware IP (xbtIP) can only control up to 65W. It also means that the Memory xbtIP must be in use to have a card power consumption higher than 65W.
Moreover, the total power budget of the card is not entirely available. The actual power available would be impacted by the efficiency and current limitation of the various regulators. For information about various sensors and power rails limits, refer to the specific documentation for your card. With the same example (U50 card), the actual power thresholds will be lower than 65W and 10W.
Components present on the card¶
The card might be fitted with other ICs (such as co-processor, memories, and so forth) on which xbtest has no control. During calibration, ensure these components behave like similarly to normal test operations.
Note
xbtest can only control power of memory directly connected to the FPGA.
Tests running¶
Other test cases, like memory (DDR or HBM) and GT MAC, have a significant impact on the power consumed. Make sure that the calibration is done while using these other feature as per nominal load.
DDR: When running four DDRs simultaneously, the memory test consumes approximately 20 W (write mode) or 15 W (read mode).
HBM: For example, when eight HBM ports are used, the memory test consumes between 7 W and 8 W.
GT: When two GT MAC xbtest hardware IP (xbtIP) are present, the 25 GbE mode uses ±6 W more than the 10 GbE mode.
Logic: Memory and GT xbtIP, and the memory subsystem also consume several watts.
Note
These values are indicative and might vary from card to card. They also depend on test environmental conditions, such as cooling.
Care must be taken when mixing test case types or when changing the mode of other tests while the power test is running.
For example, power varies when the memory test changes from only_rd
to only_wr
mode.
Power will decrease when a Memory test case ends.
simultaneous_wr_rd
mode is usually the memory test mode consuming the most power.
Power test JSON members¶
Example¶
The following is an example of a Power test case running for 60 seconds at a toggle_rate of 15 %. If needed change the values accordingly to run at higher/lower toggle rates. This JSON files will be available at /opt/amd/aved/amd_v80_gen5x8_24.1_xbtest_stress/xbtest/test/.
"power": {
"global_config": {
"test_sequence": [ { "duration": 60, "toggle_rate": 15 } ]
}
}
Definition¶
The following table shows all members available for this test case. More details are provided for each member in the subsequent sections.
Member |
Mandatory / Optional |
Description |
---|---|---|
Mandatory |
Describes the sequence of tests to perform. A test is defined by the following values:
|
|
Optional |
Disable usage of all flip-flops present in the Power xbtIP of Alveo Versal Example Design (AVED). |
|
Optional |
Disable usage of all DSPs present in the Power xbtIP of Alveo Versal Example Design (AVED). |
|
Optional |
Disable usage of all block RAMs present in the Power xbtIP of Alveo Versal Example Design (AVED). |
|
Optional |
Disable usage of all UltraRAMs present in the Power xbtIP of Alveo Versal Example Design (AVED). |
|
Optional |
Disable usage of all AIEs present in the Power xbtIP of Alveo Versal Example Design (AVED). |
|
Optional |
Disable ramp to reach target toggle rate (see Toggle rate step requirement). |
test_sequence
¶
Mandatory. Describes the sequence of tests to perform. Tests are performed serially, and a failure in one test does not stop the sequence (the next test will be launched). There is no limitation to the length of the test sequence.
This field contains a list of tests, each test being defined by an object of key–value parameters pairs: [ {}, {}, {} ]
.
The following table defines the parameters supported in the Power test sequence:
Member |
Mandatory / optional |
Description |
---|---|---|
|
Mandatory |
The duration of the test in seconds; Range [1, 232-1]. |
|
Mandatory |
Toggle rate (in %) driving the sites present in the Power xbtIP of Alveo Versal Example Design (AVED); Range [0, 100]. |
For example:
Single test:
"test_sequence": [ { "duration": 40, "toggle_rate": 75 } ] "test_sequence": [ { "duration": 40, "toggle_rate": 85 } ]Multiple tests:
"test_sequence": [ { "duration": 40, "toggle_rate": 15 }, { "duration": 240, "toggle_rate": 30 }, { "duration": 120, "toggle_rate": 40 }, { "duration": 20, "toggle_rate": 50 } ]
disable_reg
, disable_dsp
, disable_bram
, disable_uram
, disable_aie
¶
Optional;
Type : boolean;
Possible values: true
or false
;
Default : false
By default, all flip-flops, DSPs, block RAMs, and UltraRAMs present in the power xbtest hardware IP (xbtIP) of Alveo Versal Example Design (AVED) are enabled.
When
disable_reg
is set totrue
, all flip-flops present are disabled.When
disable_dsp
is set totrue
, all DSPs present are disabled.When
disable_bram
is set totrue
, all block RAMs present are disabled.When
disable_uram
is set totrue
, all UltraRAMs present are disabled.When
disable_aie
is set totrue
, all AIEs present are disabled.
disable_toggle_ramp
¶
Optional;
Type : boolean;
Possible values: true
or false
;
Default : false
By default, the target toggle_rate will be set gradually to the power xbtest hardware IP (xbtIP) load using steps (see Toggle rate step requirement).
When set to
true
, the target toggle_rate will be set directly to the Power xbtIP.
Output files¶
All power measurements are stored in an output CSV file named power.csv
which is generated in xbtest logging directory.
The values are stored in CSV type format with one column for each information type.
Important
If the command line option -L
is used while calling xbtest application software (xbtSW), no output file is generated.
All measurements from all test_sequence are combined into a single file.
A new line is written in this file every time power measurements are available. At a minimum, the following values are recorded:
Global time (s): Global elapsed time since xbtest application software (xbtSW) execution started.
Test: Index of current test within the test_sequence. Index of first test is 1. The first rows of the file with test and toggle_rate set to 0 corresponds to the measurement of the card rest power.
Test time (s): Timestamp of the measurement. Timestamp of first measurement is 0 for a given test within the test_sequence.
Toggle rate (%): toggle_rate in % currently set to the power xbtIP.
measurement ID: Measurement identifier. ID of first measurement is 1.
Measurement valid: Set to
OK
if xbtSW was able to successfully gets power and temperature measurements via the AVED Management Interface (AMI) APIs, otherwise set toKO
.Thermal measurements: Group of one or more columns recording measurements and status for each thermal sensor source monitored by xbtest.
FPGA temperature.
Electrical measurements: Group of one or more columns recording detailed measurements and status for each electrical sensor source monitored by xbtest.
Card power.
Current, voltage and power of 3v3_pex, 12v_pex, vccint and 12v_aux (and an auxiliary cable is used).
See Device management task description for more information on the sensor sources monitored by xbtest.