AMD Inference Server¶
The AMD Inference Server is an easy-to-use inferencing solution designed for AMD CPUs, GPUs, FPGAs. It can be deployed as a server or through custom applications using its C++ API. The AMD Inference Server can also be extended to support other hardware accelerators and machine learning frameworks.
Features¶
Inference Server: The AMD Inference Server supports client requests using HTTP REST / websockets protocols using an API based on KServe’s v2 specification
C++ API: custom applications to directly call the backend to bypass the REST interface
Python REST library: clients can submit requests to the inference server from Python through this simplified Python API
Efficient hardware usage: The AMD Inference Server will automatically make use of all available FPGAs on a machine as needed with XRM
User-defined model parallelism: users can define how many models, and how many instances of each, to run simultaneously
Batching: incoming requests are batched based on the model’s specifications transparently
Integrated with Vitis AI: The AMD/Xilinx Inference Server can serve most xmodels generated from Vitis AI
End-to-end inference: A graph of computation such as pre- and post-processing can be written and deployed with the AMD/Xilinx Inference Server using AKS
Integrated with ZenDNN: The AMD Inference Server can serve ZenDNN optimized TensorFlow/PyTorch models on AMD EPYC CPUs.
Learn more¶
The documentation for the AMD Inference Server is available online.
Check out the Quickstart on how to get started.
Support¶
Raise issues if you find a bug or need help. Refer to Contributing for more information.
License¶
The AMD Inference Server is licensed under the terms of Apache 2.0 (see LICENSE). The LICENSE file contains additional license information for third-party files distributed with this work. More license information can be seen in the dependencies.