Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

Added

  • An example MLPerf app using the inference server API (#129)

  • Google Benchmark for writing performance-tracking tests (#147)

  • Custom memory storage classes in the memory pool (#166)

  • C++ worker for executing C++ “models” (#172)

  • Model chaining (#176) and loading chains from modelLoad (#178)

Changed

  • Refactor how global state is managed (#125)

  • Require the server as an argument for creating the NativeClient (#125)

  • Use a global memory pool to allocate memory for incoming requests (#149)

  • Resolve the request at the incoming server rather than the batcher (#164)

  • Add flags to run container-based tests in parallel (#168)

  • Bump up to Vitis AI 3.0 (#169)

  • Refactor inference request objects and tensors (#172)

  • Use const references throughout for ParameterMap (#172)

  • Update workers’ doRun method signature to produce and return a batch (#176)

  • Use TOML-based configuration files in the repository by default (#178)

  • Location of test model lists moved to tests directory (#180)

Deprecated

  • N/A

Removed

  • N/A

Fixed

  • Use the right unit for batcher timeout (#129)

  • Don’t call next and prev on end iterators (#166)

  • Use the right package name for g++ in CentOS (#168)

  • Fix building with different CMake options (#170)

Security

  • N/A

0.3.0 - 2023-02-01

Added

  • Allow building Debian package (@930fab2)

  • Add modelInferAsync to the API (@2f4a6c2)

  • Add inferAsyncOrdered as a client operator for making inferences in parallel (#66)

  • Support building Python wheels with cibuildwheel (#71)

  • Support XModels with multiple output tensors (#74)

  • Add FP16 support (#76)

  • Add more documentation (#85, #90)

  • Add Python bindings for gRPC and Native clients (#88)

  • Add tests with KServe (#90)

  • Add batch size flag to examples (#94)

  • Add Kubernetes test for KServe (#95)

  • Use exhale to generate Python API documentation (#95)

  • OpenAPI spec for REST protocol (#100)

  • Use a timer for simpler time measurement (#104)

  • Allow building containers with custom backend versions (#107)

Changed

  • Refactor pre- and post-processing functions in C++ (@42cf748)

  • Templatize Dockerfile for different base images (#71)

  • Use multiple HTTP clients internally for parallel HTTP requests (#66)

  • Update test asset downloading (#81)

  • Reimplement and align examples across platforms (#85)

  • Reorganize Python library (#88)

  • Rename ‘proteus’ to ‘amdinfer’ (#91)

  • Use Ubuntu 20.04 by default for Docker (#97)

  • Bump up to ROCm 5.4.1 (#99)

  • Some function names changed for style (#102)

  • Bump up to ZenDNN 4.0 (#113)

Deprecated

  • ALL_CAPS style enums for the DataType (#102)

Removed

  • Mappings between XIR data types <-> inference server data types from public API (#102)

  • Web GUI (#110)

Fixed

  • Use input tensors in requests correctly (#61)

  • Fix bug with multiple input tensors (#74)

  • Align gRPC responses using non-gRPC-native data types with other input protocols (#81)

  • Fix the Manager’s destructor (#88)

  • Fix using --no-user-config with proteus run (#89)

  • Handle assigning user permissions if the host UID is same as UID in container (#101)

  • Fix test discovery if some test assets are missing (#105)

  • Fix gRPC queue shutdown race condition (#111)

0.2.0 - 2022-08-05

Added

  • HTTP/REST C++ client (@cbf33b8)

  • gRPC API based on KServe v2 API (@37a6aad and others)

  • TensorFlow/Pytorch + ZenDNN backend (#17 and #21)

  • ‘ServerMetadata’ endpoint to the API (@7747911)

  • ‘modelList’ endpoint to the API (@7477b7d)

  • Parse JSON data as string in HTTP body (@694800e)

  • Directory monitoring for model loading (@6459797)

  • ‘ModelMetadata’ endpoint to the API (@22b9d1a)

  • MIGraphX backend (#34)

  • Pre-commit for style verification(@048bdd7)

Changed

  • Use Pybind11 to create Python API (#20)

  • Two logs are created now: server and client

  • Logging macro is now PROTEUS_LOG_*

  • Loading workers is now case-insensitive (@14ed4ef and @90a51ae)

  • Build AKS from source (@e04890f)

  • Use consistent custom exceptions (#30)

  • Update Docker build commands to opt-in to all backends (#43)

  • Renamed ‘modelLoad’ to ‘workerLoad’ and changed the behavior for ‘modelLoad’ (#27)

Fixed

  • Get the right request size in the batcher when enqueuing with the C++ API (@d1ad81d)

  • Construct responses correctly in the XModel worker if there are multiple input buffers (@d1ad81d)

  • Populate the right number of offsets in the hard batcher (@6666142)

  • Calculate offset values correctly during batching (@8c7534b)

  • Get correct library dependencies for production container (@14ed4ef)

  • Correctly throw an exception if a worker gets an error during initialization (#29)

  • Detect errors in HTTP client during loading (@99ffc33)

  • Construct batches with the right sizes (#57)

0.1.0 - 2022-02-08

Added

  • Core inference server functionality

  • Batching support

  • Support for running multiple workers simultaneously

  • Support for different batcher and buffer implementations

  • XModel support

  • Logging, metrics and tracing support

  • REST API based on KServe v2 API

  • C++ API

  • Python library for REST

  • Documentation, examples, and some tests

  • Experimental GUI