AMD Inference Server

The AMD Inference Server is an easy-to-use inferencing solution designed for AMD CPUs, GPUs, FPGAs. It can be deployed as a server or through custom applications using its C++ API. The AMD Inference Server can also be extended to support other hardware accelerators and machine learning frameworks.

Features

  • Inference Server: The AMD Inference Server supports client requests using HTTP REST / websockets protocols using an API based on KServe’s v2 specification

  • C++ API: custom applications to directly call the backend to bypass the REST interface

  • Python REST library: clients can submit requests to the inference server from Python through this simplified Python API

  • Efficient hardware usage: The AMD Inference Server will automatically make use of all available FPGAs on a machine as needed with XRM

  • User-defined model parallelism: users can define how many models, and how many instances of each, to run simultaneously

  • Batching: incoming requests are batched based on the model’s specifications transparently

  • Integrated with Vitis AI: The AMD/Xilinx Inference Server can serve most xmodels generated from Vitis AI

  • End-to-end inference: A graph of computation such as pre- and post-processing can be written and deployed with the AMD/Xilinx Inference Server using AKS

  • Integrated with ZenDNN: The AMD Inference Server can serve ZenDNN optimized TensorFlow/PyTorch models on AMD EPYC CPUs.

Learn more

The documentation for the AMD Inference Server is available online.

Check out the Quickstart on how to get started.

Support

Raise issues if you find a bug or need help. Refer to Contributing for more information.

License

The AMD Inference Server is licensed under the terms of Apache 2.0 (see LICENSE). The LICENSE file contains additional license information for third-party files distributed with this work. More license information can be seen in the dependencies.