Performance

MLCommons

MLPerf is a standard benchmark suite for machine learning inference from MLCommons. The MLCommons application uses the loadgen component of MLPerf to send requests to the inference server for a variety of scenarios and models.

SingleStream

This scenario sends one inference request after the previous one completes and so measures end-to-end latency. Note that these tests are not run with same duration/query count as specified in the official MLPerf test rules.

Listing 2 Loading…

Listing 1 The Fake model does no work and it shows the latency of the system excluding serialization and inference delays

Listing 4 Loading…

Listing 3 The Fake model does no work and it shows the latency of the system excluding serialization and inference delays

Server

This scenario sends inference requests in a Poisson distribution with a configurable number of outstanding requests. Note that these tests are not run with same duration/query count as specified in the official MLPerf test rules.

Listing 10 Loading…

Listing 9 The Fake model does no work and it shows the latency of the system excluding serialization and inference delays

Listing 12 Loading…

Listing 11 The Fake model does no work and it shows the latency of the system excluding serialization and inference delays