Changelog¶

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased¶

Added¶

N/A

Changed¶

N/A

Deprecated¶

N/A

Removed¶

N/A

Fixed¶

N/A

Security¶

N/A

0.4.0 - 2023-09-07¶

Added¶

An example MLPerf app using the inference server API (#129)
Google Benchmark for writing performance-tracking tests (#147)
Custom memory storage classes in the memory pool (#166)
C++ worker for executing C++ “models” (#172)
Model chaining (#176) and loading chains from modelLoad (#178)
vcpkg to resolve C++ dependencies (#188)
Tests with FP16 (#189 and #203)
Versioned models (#190)
Expand benchmarking with MLPerf app (#197) and add to data to docs (#198)
Custom environment configuration per test (#214)
VCK5000 test (#214)

Changed¶

Refactor how global state is managed (#125)
Require the server as an argument for creating the NativeClient (#125)
Rename “RequestParameters” to “ParameterMap” (#125)
Use a global memory pool to allocate memory for incoming requests (#149)
Resolve the request at the incoming server rather than the batcher (#164)
Add flags to run container-based tests in parallel (#168)
Bump up to Vitis AI 3.0 (#169)
Refactor inference request objects and tensors (#172)
Use const references throughout for ParameterMap (#172)
Update workers’ doRun method signature to produce and return a batch (#176)
Use TOML-based configuration files in the repository by default (#178)
Location of test model lists moved to tests directory (#180)
Close dynamically opened libraries (#186)
Replace Jaeger exporter with OTLP (#187)
Change STRING type to BYTES and shape type from uint64 to int64 (#190)
Include the correct tensor name in ModelMetadata in the XModel backend (#207)

Deprecated¶

N/A

Removed¶

Python 3.6 support (#215)

Fixed¶

Use the right unit for batcher timeout (#129)
Don’t call next and prev on end iterators (#166)
Use the right package name for g++ in CentOS (#168)
Fix building with different CMake options (#170)
Fix wheel generation with vcpkg (#191)
Load models at startup correctly (#195)
Fix handling MIGraphX models with dots in the names (#202)

Security¶

N/A

0.3.0 - 2023-02-01¶

Added¶

Allow building Debian package (@930fab2)
Add modelInferAsync to the API (@2f4a6c2)
Add inferAsyncOrdered as a client operator for making inferences in parallel (#66)
Support building Python wheels with cibuildwheel (#71)
Support XModels with multiple output tensors (#74)
Add FP16 support (#76)
Add more documentation (#85, #90)
Add Python bindings for gRPC and Native clients (#88)
Add tests with KServe (#90)
Add batch size flag to examples (#94)
Add Kubernetes test for KServe (#95)
Use exhale to generate Python API documentation (#95)
OpenAPI spec for REST protocol (#100)
Use a timer for simpler time measurement (#104)
Allow building containers with custom backend versions (#107)

Changed¶

Refactor pre- and post-processing functions in C++ (@42cf748)
Templatize Dockerfile for different base images (#71)
Use multiple HTTP clients internally for parallel HTTP requests (#66)
Update test asset downloading (#81)
Reimplement and align examples across platforms (#85)
Reorganize Python library (#88)
Rename ‘proteus’ to ‘amdinfer’ (#91)
Use Ubuntu 20.04 by default for Docker (#97)
Bump up to ROCm 5.4.1 (#99)
Some function names changed for style (#102)
Bump up to ZenDNN 4.0 (#113)

Deprecated¶

ALL_CAPS style enums for the DataType (#102)

Removed¶

Mappings between XIR data types <-> inference server data types from public API (#102)
Web GUI (#110)

Fixed¶

Use input tensors in requests correctly (#61)
Fix bug with multiple input tensors (#74)
Align gRPC responses using non-gRPC-native data types with other input protocols (#81)
Fix the Manager’s destructor (#88)
Fix using --no-user-config with proteus run (#89)
Handle assigning user permissions if the host UID is same as UID in container (#101)
Fix test discovery if some test assets are missing (#105)
Fix gRPC queue shutdown race condition (#111)

0.2.0 - 2022-08-05¶

Added¶

HTTP/REST C++ client (@cbf33b8)
gRPC API based on KServe v2 API (@37a6aad and others)
TensorFlow/Pytorch + ZenDNN backend (#17 and #21)
‘ServerMetadata’ endpoint to the API (@7747911)
‘modelList’ endpoint to the API (@7477b7d)
Parse JSON data as string in HTTP body (@694800e)
Directory monitoring for model loading (@6459797)
‘ModelMetadata’ endpoint to the API (@22b9d1a)
MIGraphX backend (#34)
Pre-commit for style verification(@048bdd7)

Changed¶

Use Pybind11 to create Python API (#20)
Two logs are created now: server and client
Logging macro is now PROTEUS_LOG_*
Loading workers is now case-insensitive (@14ed4ef and @90a51ae)
Build AKS from source (@e04890f)
Use consistent custom exceptions (#30)
Update Docker build commands to opt-in to all backends (#43)
Renamed ‘modelLoad’ to ‘workerLoad’ and changed the behavior for ‘modelLoad’ (#27)

Fixed¶

Get the right request size in the batcher when enqueuing with the C++ API (@d1ad81d)
Construct responses correctly in the XModel worker if there are multiple input buffers (@d1ad81d)
Populate the right number of offsets in the hard batcher (@6666142)
Calculate offset values correctly during batching (@8c7534b)
Get correct library dependencies for production container (@14ed4ef)
Correctly throw an exception if a worker gets an error during initialization (#29)
Detect errors in HTTP client during loading (@99ffc33)
Construct batches with the right sizes (#57)

0.1.0 - 2022-02-08¶

Added¶

Core inference server functionality
Batching support
Support for running multiple workers simultaneously
Support for different batcher and buffer implementations
XModel support
Logging, metrics and tracing support
REST API based on KServe v2 API
C++ API
Python library for REST
Documentation, examples, and some tests
Experimental GUI