Xilinx Inference Server¶

Xilinx Inference Server is an easy-to-use inferencing solution designed for Xilinx FPGAs and Vitis AI. It can be deployed as a server or through custom applications using its C++ API. Xilinx Inference Server can also be extended to support other hardware accelerators and machine learning frameworks.

Features¶

Inference Server: Xilinx Inference Server supports client requests using HTTP REST / websockets protocols using an API based on KServe’s v2 specification
C++ API: custom applications to directly call the backend to bypass the REST interface
Python REST library: clients can submit requests to the inference server from Python through this simplified Python API
Efficient hardware usage: Xilinx Inference Server will automatically make use of all available FPGAs on a machine as needed with XRM
User-defined model parallelism: users can define how many models, and how many instances of each, to run simultaneously
Batching: incoming requests are batched based on the model’s specifications transparently
Integrated with Vitis AI: Xilinx Inference Server can serve most xmodels generated from Vitis AI
End-to-end inference: A graph of computation such as pre- and post-processing can be written and deployed with Xilinx Inference Server using AKS

Learn more¶

The documentation for Xilinx Inference Server is available online.

Check out the Quickstart on how to get started.

Support¶

Raise issues if you find a bug or need help. Refer to Contributing for more information.

License¶

Xilinx Inference Server is licensed under the terms of Apache 2.0 (see LICENSE). The LICENSE file contains additional license information for third-party files distributed with this work. More license information can be seen in the dependencies.