Performance¶
MLCommons¶
MLPerf is a standard benchmark suite for machine learning inference from MLCommons.
The MLCommons application uses the loadgen
component of MLPerf to send requests to the inference server for a variety of scenarios and models.
SingleStream¶
This scenario sends one inference request after the previous one completes and so measures end-to-end latency. Note that these tests are not run with same duration/query count as specified in the official MLPerf test rules.
Server¶
This scenario sends inference requests in a Poisson distribution with a configurable number of outstanding requests. Note that these tests are not run with same duration/query count as specified in the official MLPerf test rules.