Xilinx Inference Server

Xilinx Inference Server is an easy-to-use inferencing solution designed for Xilinx FPGAs and Vitis AI. It can be deployed as a server or through custom applications using its C++ API. Xilinx Inference Server can also be extended to support other hardware accelerators and machine learning frameworks.

Features

  • Inference Server: Xilinx Inference Server supports client requests using HTTP REST / websockets protocols using an API based on KServe’s v2 specification

  • C++ API: custom applications to directly call the backend to bypass the REST interface

  • Python REST library: clients can submit requests to the inference server from Python through this simplified Python API

  • Efficient hardware usage: Xilinx Inference Server will automatically make use of all available FPGAs on a machine as needed with XRM

  • User-defined model parallelism: users can define how many models, and how many instances of each, to run simultaneously

  • Batching: incoming requests are batched based on the model’s specifications transparently

  • Integrated with Vitis AI: Xilinx Inference Server can serve most xmodels generated from Vitis AI

  • End-to-end inference: A graph of computation such as pre- and post-processing can be written and deployed with Xilinx Inference Server using AKS

Learn more

The documentation for Xilinx Inference Server is available online.

Check out the Quickstart on how to get started.

Support

Raise issues if you find a bug or need help. Refer to Contributing for more information.

License

Xilinx Inference Server is licensed under the terms of Apache 2.0 (see LICENSE). The LICENSE file contains additional license information for third-party files distributed with this work. More license information can be seen in the dependencies.