C++

Clients

bool amdinfer::serverHasExtension(const Client *client, const std::string &extension)

Checks if the server has a certain extension.

Parameters:
  • client – a pointer to a client object

  • extension – name of the extension to check on the server

Returns:

bool - true if the server has the requested extension

void amdinfer::waitUntilServerReady(const Client *client)

Blocks until the server is ready.

Parameters:

client – a pointer to a client object

void amdinfer::waitUntilModelReady(const Client *client, const std::string &model)

Blocks until the named model/worker is ready.

Parameters:
  • client – a pointer to a client object

  • model – the model/worker to wait for

void amdinfer::waitUntilModelNotReady(const Client *client, const std::string &model)

Blocks until the named model/worker is not ready.

Parameters:
  • client – a pointer to a client object

  • model – the model/worker to wait for

std::vector<std::string> amdinfer::loadEnsemble(const Client *client, std::vector<std::string> workers, std::vector<ParameterMap> parameters)

Load an ensemble - a chain of connected workers. This implementation uses the simplest case where the ensemble is a single linear graph.

Parameters:
  • client – a pointer to a client object

  • workers – the list of workers to connect

  • parameters – the list of parameters corresponding to each worker

Returns:

std::vector<std::string> the endpoints for each loaded worker

void amdinfer::unloadModels(const Client *client, const std::vector<std::string> &models)

Unload a list of models. This list may be from an ensemble or individually loaded workers or models.

Parameters:
  • client – a pointer to a client object

  • models – a list of models to unload

std::vector<InferenceResponse> amdinfer::inferAsyncOrdered(Client *client, const std::string &model, const std::vector<InferenceRequest> &requests)

Makes inference requests in parallel to the specified model. All requests are sent in parallel and the responses are gathered and returned in the same order.

Parameters:
  • client – a pointer to a client object

  • model – the model/worker to make inference requests to

  • requests – a vector of requests

Returns:

std::vector<InferenceResponse>

std::vector<InferenceResponse> amdinfer::inferAsyncOrderedBatched(Client *client, const std::string &model, const std::vector<InferenceRequest> &requests, size_t batch_size)

Makes inference requests in parallel to the specified model in batches. Each batch of requests are gathered and the responses are added to a vector. Once all the responses are received, the response vector is returned.

Parameters:
  • client – a pointer to a client object

  • model – the model/worker to make inference requests to

  • requests – a vector of requests

  • batch_size – the number of requests that should be sent in parallel at once

Returns:

std::vector<InferenceResponse>

gRPC

class GrpcClient : public amdinfer::Client

The GrpcClient class implements the Client using gRPC.

Usage:

GrpcClient client{“127:0.0.1:50051”}; if (client.serverLive()){ … }

Public Functions

explicit GrpcClient(const std::string &address)

Constructs a new GrpcClient object.

Parameters:

address – Address of the server to connect to

explicit GrpcClient(const std::shared_ptr<::grpc::Channel> &channel)

Constructs a new GrpcClient object.

Parameters:

channel – an existing gRPC channel to reuse to connect to the server

GrpcClient(GrpcClient const&) = delete

Copy constructor.

GrpcClient &operator=(const GrpcClient&) = delete

Copy assignment constructor.

GrpcClient(GrpcClient &&other) = default

Move constructor.

GrpcClient &operator=(GrpcClient &&other) = default

Move assignment constructor.

~GrpcClient() override

Destructor. This is needed because GrpcClientImpl is an incomplete type. The destructor is defaulted in the implementation. But having a non- default destructor here forces the need to explicitly specify the other special member functions by the Rule of 5.

virtual ServerMetadata serverMetadata() const override

Returns the server metadata as a ServerMetadata object.

Returns:

ServerMetadata

virtual bool serverLive() const override

Checks if the server is live.

Returns:

bool - true if server is live, false otherwise

virtual bool serverReady() const override

Checks if the server is ready.

Returns:

bool - true if server is ready, false otherwise

virtual bool modelReady(const std::string &model) const override

Checks if a model/worker is ready.

Parameters:

model – name of the model to check

Returns:

bool - true if model is ready, false otherwise

virtual ModelMetadata modelMetadata(const std::string &model) const override

Returns the metadata associated with a ready model/worker.

Parameters:

model – name of the model/worker to get metadata

Returns:

ModelMetadata

virtual void modelLoad(const std::string &model, const ParameterMap &parameters) const override

Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.

Parameters:
  • model – name of the model to load from the model repository directory

  • parameters – load-time parameters for the worker supporting the model

virtual void modelUnload(const std::string &model) const override

Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.

Parameters:

model – name of the model to unload

virtual InferenceResponse modelInfer(const std::string &model, const InferenceRequest &request) const override

Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.

Parameters:
  • model – name of the model/worker to request inference to

  • request – the request

Returns:

InferenceResponse

virtual InferenceResponseFuture modelInferAsync(const std::string &model, const InferenceRequest &request) const override

Makes an asynchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for. The user must save the Future object and use it to get the results of the inference later.

Parameters:
  • model – name of the model/worker to request inference to

  • request – the request

Returns:

InferenceResponseFuture

virtual std::vector<std::string> modelList() const override

Gets a list of active models on the server, returning their names.

Returns:

std::vector<std::string>

virtual std::string workerLoad(const std::string &worker, const ParameterMap &parameters) const override

Loads a worker with the given name and load-time parameters.

Parameters:
  • worker – name of the worker to load

  • parameters – load-time parameters for the worker

Returns:

std::string

virtual void workerUnload(const std::string &worker) const override

Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.

Parameters:

worker – name of the worker to unload

virtual bool hasHardware(const std::string &name, int num) const override

Checks if the server has the requested number of a specific hardware device.

Parameters:
  • name – name of the hardware device to check

  • num – number of the device that should exist at minimum

Returns:

bool - true if server has at least the requested number of the hardware device, false otherwise

class GrpcClientImpl

HTTP

class HttpClient : public amdinfer::Client

The HttpClient class implements the Client using HTTP REST.

Usage:

HttpClient client{“http://127:0.0.1:8998”}; if (client.serverLive()){ … }

Public Functions

explicit HttpClient(const std::string &address)

Construct a new HttpClient object.

Parameters:

address – Address of the server to connect to

HttpClient(const std::string &address, const StringMap &headers, int parallelism)

Construct a new HttpClient object.

Parameters:
  • address – Address of the server to connect to

  • headers – Key-value pairs that should be added to the HTTP headers for all requests

  • parallelism – Max number of requests that can be sent in parallel

HttpClient(HttpClient const&) = delete

Copy constructor.

HttpClient &operator=(const HttpClient&) = delete

Copy assignment constructor.

HttpClient(HttpClient &&other) = default

Move constructor.

HttpClient &operator=(HttpClient &&other) = default

Move assignment constructor.

~HttpClient() override

Destructor. This is needed because HttpClientImpl is an incomplete type. The destructor is defaulted in the implementation. But having a non- default destructor here forces the need to explicitly specify the other special member functions by the Rule of 5.

virtual ServerMetadata serverMetadata() const override

Returns the server metadata as a ServerMetadata object.

Returns:

ServerMetadata

virtual bool serverLive() const override

Checks if the server is live.

Returns:

bool - true if server is live, false otherwise

virtual bool serverReady() const override

Checks if the server is ready.

Returns:

bool - true if server is ready, false otherwise

virtual bool modelReady(const std::string &model) const override

Checks if a model/worker is ready.

Parameters:

model – name of the model to check

Returns:

bool - true if model is ready, false otherwise

virtual ModelMetadata modelMetadata(const std::string &model) const override

Returns the metadata associated with a ready model/worker.

Parameters:

model – name of the model/worker to get metadata

Returns:

ModelMetadata

virtual void modelLoad(const std::string &model, const ParameterMap &parameters) const override

Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.

Parameters:
  • model – name of the model to load from the model repository directory

  • parameters – load-time parameters for the worker supporting the model

virtual void modelUnload(const std::string &model) const override

Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.

Parameters:

model – name of the model to unload

virtual InferenceResponse modelInfer(const std::string &model, const InferenceRequest &request) const override

Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.

Parameters:
  • model – name of the model/worker to request inference to

  • request – the request

Returns:

InferenceResponse

virtual InferenceResponseFuture modelInferAsync(const std::string &model, const InferenceRequest &request) const override

Makes an asynchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for. The user must save the Future object and use it to get the results of the inference later.

Parameters:
  • model – name of the model/worker to request inference to

  • request – the request

Returns:

InferenceResponseFuture

virtual std::vector<std::string> modelList() const override

Gets a list of active models on the server, returning their names.

Returns:

std::vector<std::string>

virtual std::string workerLoad(const std::string &worker, const ParameterMap &parameters) const override

Loads a worker with the given name and load-time parameters.

Parameters:
  • worker – name of the worker to load

  • parameters – load-time parameters for the worker

Returns:

std::string

virtual void workerUnload(const std::string &worker) const override

Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.

Parameters:

worker – name of the worker to unload

virtual bool hasHardware(const std::string &name, int num) const override

Checks if the server has the requested number of a specific hardware device.

Parameters:
  • name – name of the hardware device to check

  • num – number of the device that should exist at minimum

Returns:

bool - true if server has at least the requested number of the hardware device, false otherwise

class HttpClientImpl

Native

class NativeClient : public amdinfer::Client

The NativeClient class implements the Client using the native C++ API. This client can be used if the client and backend are in the same C++ executable.

Usage:

NativeClient client; if (client.serverLive()){ … }

Public Functions

explicit NativeClient(Server *server)

Construct a new NativeClient object.

Parameters:

server – server to connect to

NativeClient(NativeClient const&) = delete

Copy constructor.

NativeClient &operator=(const NativeClient&) = delete

Copy assignment constructor.

NativeClient(NativeClient &&other) = default

Move constructor.

NativeClient &operator=(NativeClient &&other) = default

Move assignment constructor.

~NativeClient() override

Destructor. This is needed because NativeClientImpl is an incomplete type. The destructor is defaulted in the implementation. But having a non- default destructor here forces the need to explicitly specify the other special member functions by the Rule of 5.

virtual ServerMetadata serverMetadata() const override

Returns the server metadata as a ServerMetadata object.

Returns:

ServerMetadata

virtual bool serverLive() const override

Checks if the server is live.

Returns:

bool - true if server is live, false otherwise

virtual bool serverReady() const override

Checks if the server is ready.

Returns:

bool - true if server is ready, false otherwise

virtual bool modelReady(const std::string &model) const override

Checks if a model/worker is ready.

Parameters:

model – name of the model to check

Returns:

bool - true if model is ready, false otherwise

virtual ModelMetadata modelMetadata(const std::string &model) const override

Returns the metadata associated with a ready model/worker.

Parameters:

model – name of the model/worker to get metadata

Returns:

ModelMetadata

virtual void modelLoad(const std::string &model, const ParameterMap &parameters) const override

Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.

Parameters:
  • model – name of the model to load from the model repository directory

  • parameters – load-time parameters for the worker supporting the model

virtual void modelUnload(const std::string &model) const override

Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.

Parameters:

model – name of the model to unload

virtual InferenceResponse modelInfer(const std::string &model, const InferenceRequest &request) const override

Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.

Parameters:
  • model – name of the model/worker to request inference to

  • request – the request

Returns:

InferenceResponse

virtual InferenceResponseFuture modelInferAsync(const std::string &model, const InferenceRequest &request) const override

Makes an asynchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for. The user must save the Future object and use it to get the results of the inference later.

Parameters:
  • model – name of the model/worker to request inference to

  • request – the request

Returns:

InferenceResponseFuture

virtual std::vector<std::string> modelList() const override

Gets a list of active models on the server, returning their names.

Returns:

std::vector<std::string>

virtual std::string workerLoad(const std::string &worker, const ParameterMap &parameters) const override

Loads a worker with the given name and load-time parameters.

Parameters:
  • worker – name of the worker to load

  • parameters – load-time parameters for the worker

Returns:

std::string

virtual void workerUnload(const std::string &worker) const override

Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.

Parameters:

worker – name of the worker to unload

virtual bool hasHardware(const std::string &name, int num) const override

Checks if the server has the requested number of a specific hardware device.

Parameters:
  • name – name of the hardware device to check

  • num – number of the device that should exist at minimum

Returns:

bool - true if server has at least the requested number of the hardware device, false otherwise

struct NativeClientImpl

WebSocket

class WebSocketClient : public amdinfer::Client

The WebSocketClient class implements the Client using websocket. It reuses the HttpClient for most transactions with the exception of some operations that actually use websocket.

Usage:

WebSocketClient client{“ws://127.0.0.1:8998”, “http://127.0.0.1:8998”}; if (client.serverLive()){ … }

Public Functions

WebSocketClient(const std::string &ws_address, const std::string &http_address)

Constructs a new WebSocketClient object.

Parameters:
  • ws_address – address of the websocket server to connect to

  • http_address – address of the HTTP server to connect to

WebSocketClient(WebSocketClient const&) = delete

Copy constructor.

WebSocketClient &operator=(const WebSocketClient&) = delete

Copy assignment constructor.

WebSocketClient(WebSocketClient &&other) = default

Move constructor.

WebSocketClient &operator=(WebSocketClient &&other) = default

Move assignment constructor.

~WebSocketClient() override

Destructor. This is needed because WebSocketClientImpl is an incomplete type. The destructor is defaulted in the implementation. But having a non- default destructor here forces the need to explicitly specify the other special member functions by the Rule of 5.

virtual ServerMetadata serverMetadata() const override

Returns the server metadata as a ServerMetadata object.

Returns:

ServerMetadata

virtual bool serverLive() const override

Checks if the server is live.

Returns:

bool - true if server is live, false otherwise

virtual bool serverReady() const override

Checks if the server is ready.

Returns:

bool - true if server is ready, false otherwise

virtual bool modelReady(const std::string &model) const override

Checks if a model/worker is ready.

Parameters:

model – name of the model to check

Returns:

bool - true if model is ready, false otherwise

virtual ModelMetadata modelMetadata(const std::string &model) const override

Returns the metadata associated with a ready model/worker.

Parameters:

model – name of the model/worker to get metadata

Returns:

ModelMetadata

virtual void modelLoad(const std::string &model, const ParameterMap &parameters) const override

Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.

Parameters:
  • model – name of the model to load from the model repository directory

  • parameters – load-time parameters for the worker supporting the model

virtual void modelUnload(const std::string &model) const override

Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.

Parameters:

model – name of the model to unload

virtual InferenceResponse modelInfer(const std::string &model, const InferenceRequest &request) const override

Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.

Parameters:
  • model – name of the model/worker to request inference to

  • request – the request

Returns:

InferenceResponse

virtual InferenceResponseFuture modelInferAsync(const std::string &model, const InferenceRequest &request) const override

Makes an asynchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for. The user must save the Future object and use it to get the results of the inference later.

Parameters:
  • model – name of the model/worker to request inference to

  • request – the request

Returns:

InferenceResponseFuture

virtual std::vector<std::string> modelList() const override

Gets a list of active models on the server, returning their names.

Returns:

std::vector<std::string>

virtual std::string workerLoad(const std::string &worker, const ParameterMap &parameters) const override

Loads a worker with the given name and load-time parameters.

Parameters:
  • worker – name of the worker to load

  • parameters – load-time parameters for the worker

Returns:

std::string

virtual void workerUnload(const std::string &worker) const override

Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.

Parameters:

worker – name of the worker to unload

virtual bool hasHardware(const std::string &name, int num) const override

Checks if the server has the requested number of a specific hardware device.

Parameters:
  • name – name of the hardware device to check

  • num – number of the device that should exist at minimum

Returns:

bool - true if server has at least the requested number of the hardware device, false otherwise

void modelInferWs(const std::string &model, const InferenceRequest &request) const

Makes a websocket inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for. This method differs from the standard inference in that it submits an actual Websocket message. The user should use modelRecv to get results and must disambiguate different responses on the client-side using the IDs of the responses.

Parameters:
  • model

  • request

std::string modelRecv() const

Gets one message from the websocket server sent in response to a modelInferWs request. The user should know beforehand how many messages are expected and should call this method the same number of times.

Returns:

std::string a JSON object encoded as a string

void close() const

Closes the websocket connection.

class WebSocketClientImpl

Public Functions

WebSocketClientImpl(WebSocketClientImpl const&) = delete

Copy constructor.

WebSocketClientImpl &operator=(const WebSocketClientImpl&) = delete

Copy assignment constructor.

WebSocketClientImpl(WebSocketClientImpl &&other) = delete

Move constructor.

WebSocketClientImpl &operator=(WebSocketClientImpl &&other) = delete

Move assignment constructor.

Core

DataType

class DataType

Supported data types. The ALL_CAPS aliases are deprecated and will be removed.

Public Functions

constexpr DataType() = default

Constructs a new DataType object.

inline explicit constexpr DataType(const char *value)

Constructs a new DataType object.

Parameters:

value – string to identify the initial value of the new datatype

inline constexpr DataType(DataType::Value value)

Constructs a new DataType object.

Parameters:

value – datatype to identify the initial value of the new datatype

inline constexpr operator Value() const

Implicit conversion between the Datatype class and its internal value.

inline constexpr size_t size() const

Get the size in bytes associated with a data type.

Returns:

constexpr size_t

inline constexpr const char *str() const

Given a type, return a string corresponding to the type. KServe requires the string form to be specific values and in all caps. This adheres to that. If these string values are changed, then each server will need to map the values to the ones KServe expects.

Returns:

const char*

Friends

friend std::ostream &operator<<(std::ostream &os, const DataType &value)

Print support for the DataType class.

Parameters:
  • os – stream to print to

  • value – Datatype instance to print

Returns:

std::ostream&

Exceptions

Defines the exception classes. Exception classes follow lower-case snake case name syntax of the standard exceptions in std.

namespace amdinfer
class bad_status : public amdinfer::runtime_error
#include <exceptions.hpp>

This exception gets thrown by the clients if a method fails or the server raises an error.

Subclassed by amdinfer::connection_error

class connection_error : public amdinfer::bad_status
#include <exceptions.hpp>

This exception gets thrown by the clients if the connection to the server fails.

class environment_not_set_error : public amdinfer::runtime_error
#include <exceptions.hpp>

This exception gets thrown if an expected environment variable is not set.

class external_error : public amdinfer::runtime_error
#include <exceptions.hpp>

This exception gets thrown if a third-party library raises an exception.

class file_not_found_error : public amdinfer::runtime_error
#include <exceptions.hpp>

This exception gets thrown if a requested file cannot be found.

class file_read_error : public amdinfer::runtime_error
#include <exceptions.hpp>

This exception gets thrown if a requested file cannot be read.

class invalid_argument : public amdinfer::runtime_error
#include <exceptions.hpp>

This exception gets thrown if an invalid argument is passed to a function.

class runtime_error : public runtime_error
#include <exceptions.hpp>

The base class for all exceptions thrown by the inference server.

Subclassed by amdinfer::bad_status, amdinfer::environment_not_set_error, amdinfer::external_error, amdinfer::file_not_found_error, amdinfer::file_read_error, amdinfer::invalid_argument

Prediction

class ParameterMap : public amdinfer::Serializable

Holds any parameters from JSON (defined by KServe spec as one of bool, number or string). We further restrict numbers to be doubles or int32.

Public Functions

ParameterMap(const std::vector<std::string> &keys, const std::vector<Parameter> &values)

Construct a new ParameterMap object with initial values. The sizes of the keys and values vectors must match.

Until C++20, passing const char* to this constructor will convert it to a bool instead of a string. Explicitly convert any string literals to a string before passing them to this constructor.

Parameters:
  • keys

  • values

void put(const std::string &key, Parameter value)

Put in a key-value pair.

Parameters:
  • key – key used to store and retrieve the value

  • value – value to store

void put(const std::string &key, const char *value)

Put in a key-value pair.

This overload is needed because C++ converts const char* to bool instead of string when both types are present in the variant. This behavior has been fixed in C++20.

Parameters:
  • key

  • value

template<typename T>
inline T get(const std::string &key) const

Get the named parameter.

Template Parameters:

T – type of parameter. Must be (bool|double|int32_t|std::string)

Parameters:

key – parameter to get

Returns:

T

bool has(const std::string &key) const

Checks if a particular parameter exists.

Parameters:

key – name of the parameter to check

Returns:

bool

void rename(const std::string &key, const std::string &new_key)

Rename the key associated with a parameter. If the new key already exists, its value is not overwritten and the old key is just erased.

Parameters:
  • key

  • new_key

void erase(const std::string &key)

Removes a parameter, if it exists. No error is raised if it doesn’t exist.

Parameters:

key – name of the parameter to remove

size_t size() const

Gets the number of parameters.

bool empty() const

Checks if the parameters are empty.

std::map<std::string, Parameter, std::less<>> data() const

Gets the underlying data structure holding the parameters.

Iterator begin()

Returns a read/write iterator to the first parameter in the object.

ConstIterator begin() const

Returns a read iterator to the first parameter in the object.

ConstIterator cbegin() const

Returns a read iterator to the first parameter in the object.

Iterator end()

Returns a read/write iterator to one past the last parameter in the object.

ConstIterator end() const

Returns a read iterator to one past the last parameter in the object.

ConstIterator cend() const

Returns a read iterator to one past the last parameter in the object.

virtual size_t serializeSize() const override

Returns the size of the serialized data.

Returns:

size_t

virtual std::byte *serialize(std::byte *data_out) const override

Serializes the object to the provided memory address. There should be sufficient space to store the serialized object.

Parameters:

data_out

virtual const std::byte *deserialize(const std::byte *data_in) override

Deserializes the data at the provided memory address to initialize this object. If the memory cannot be deserialized, an exception is thrown.

Parameters:

data_in – a pointer to the serialized data for this object type

Friends

inline friend std::ostream &operator<<(std::ostream &os, const ParameterMap &self)

Provides an implementation to print the class with std::cout to an ostream.

struct ServerMetadata

Public Members

std::string name

Name of the server.

std::string version

Version of the server.

std::unordered_set<std::string> extensions

The extensions supported by the server. The KServe specification allows servers to support custom extensions and return them with a metadata request.

class InferenceRequestInput : public amdinfer::InferenceTensor

Holds an inference request’s input data.

Public Functions

InferenceRequestInput()

Constructs a new InferenceRequestInput object.

explicit InferenceRequestInput(const Tensor &tensor)

Constructs a new InferenceRequestInput object.

InferenceRequestInput(void *data, std::vector<uint64_t> shape, DataType data_type, std::string name = "")

Construct a new InferenceRequestInput object.

Parameters:
  • data – pointer to data

  • shape – shape of the data

  • data_type – type of the data

  • name – name to assign

void setData(void *buffer)

Set the request’s data.

void *getData() const

Get a pointer to the request’s data.

virtual size_t serializeSize() const override

Returns the size of the serialized data.

Returns:

size_t

virtual std::byte *serialize(std::byte *data_out) const override

Serializes the object to the provided memory address. There should be sufficient space to store the serialized object.

Parameters:

data_out

Returns:

std::byte* updated address

virtual const std::byte *deserialize(const std::byte *data_in) override

Deserializes the data at the provided memory address to initialize this object. If the memory cannot be deserialized, an exception is thrown.

Parameters:

data_in – a pointer to the serialized data for this object type

Returns:

std::byte* updated address

Friends

friend std::ostream &operator<<(std::ostream &os, InferenceRequestInput const &my_class)

Provides an implementation to print the class with std::cout to an ostream.

class InferenceRequestOutput

Holds an inference request’s output data.

Public Functions

InferenceRequestOutput()

Constructs a new Request Output object.

inline void setData(void *buffer)

Sets the request’s data.

inline void *getData()

Takes the request’s data.

inline std::string getName() const

Gets the output tensor’s name.

void setName(const std::string &name)

Set the output tensor’s name.

inline void setParameters(ParameterMap parameters)

Sets the output tensor’s parameters.

Parameters:

parameters – pointer to parameters to assign

inline const ParameterMap &getParameters() const &

Gets the output tensor’s parameters.

inline ParameterMap getParameters() &&

Gets the output tensor’s parameters.

class InferenceResponse

Creates an inference response object based on KServe’s V2 spec that is used to respond back to clients.

Public Functions

InferenceResponse()

Constructs a new InferenceResponse object.

explicit InferenceResponse(const std::string &error)

Constructs a new InferenceResponse error object.

std::vector<InferenceResponseOutput> getOutputs() const

Gets a vector of the requested output information.

void addOutput(const InferenceResponseOutput &output)

Adds an output tensor to the response.

Parameters:

output – an output tensor

inline std::string getID() const

Gets the ID of the response.

void setID(const std::string &id)

Sets the ID of the response.

void setModel(const std::string &model)

sets the model name of the response

std::string getModel()

gets the model name of the response

bool isError() const

Checks if this is an error response.

std::string getError() const

Gets the error message if it exists. Defaults to an empty string.

inline ParameterMap *getParameters()

Gets a pointer to the parameters associated with this response.

Friends

friend std::ostream &operator<<(std::ostream &os, InferenceResponse const &my_class)

Provides an implementation to print the class with std::cout to an ostream.

class InferenceRequest

Creates an inference request object based on KServe’s V2 spec that is used to communicate between workers.

Public Functions

void setCallback(Callback &&callback)

Sets the request’s callback function used by the last worker to respond back to the client.

Parameters:

callback – a function pointer that accepts a InferenceResponse object

Callback getCallback()

Get the request’s callback function used by the last worker to respond back to the client.

void runCallback(const InferenceResponse &response)

Runs the request’s callback function.

Parameters:

response – the response data

void runCallbackOnce(const InferenceResponse &response)

Runs the request’s callback function and clear it after. This prevents calling the callback multiple times. If this function is called again, it’s a no-op.

Parameters:

response – the response data

void runCallbackError(std::string_view error_msg)

Runs the request’s callback function with an error response. The callback function is not cleared.

Parameters:

error_msg – error message to send back to the client

void addInputTensor(void *data, const std::vector<uint64_t> &shape, DataType data_type, const std::string &name = "")

Constructs and adds a new input tensor to this request.

Parameters:
  • data – pointer to data to add

  • shape – shape of the data

  • data_type – the datatype of the data

  • name – the name of the input tensor

void addInputTensor(InferenceRequestInput input)

Adds a new input tensor to this request.

Parameters:

input – an existing InferenceRequestInput object

void setInputTensorData(size_t index, void *data)

Set the data pointer for an input tensor, if it exists.

Parameters:
  • index – index for the input tensor

  • data – pointer to assign to its data member

void addOutputTensor(const InferenceRequestOutput &output)

Adds a new output tensor to this request.

Parameters:

output – an existing InferenceRequestOutput object

const std::vector<InferenceRequestInput> &getInputs() const

Gets a vector of all the input request objects.

size_t getInputSize() const

Get the number of input request objects.

const std::vector<InferenceRequestOutput> &getOutputs() const

Gets a vector of the requested output information.

inline const std::string &getID() const

Gets the ID associated with this request.

Returns:

std::string

inline void setID(std::string_view id)

Sets the ID associated with this request.

Parameters:

id – ID to set

inline const ParameterMap &getParameters() const &

Get the request’s parameters.

inline ParameterMap getParameters() &&

Get the request’s parameters.

inline void setParameters(ParameterMap parameters)

Sets the parameters for the request.

Parameters:

parameters – pointer to the parameters

Warning

doxygenclass: Cannot find class “amdinfer::ModelMetadataTensor” in doxygen xml output for project “amdinfer” from directory: ../build/docs/doxygen/xml

class ModelMetadata

This class holds the metadata associated with a model (per the KServe spec). This allows clients to query this information from the server.

Public Functions

ModelMetadata(const std::string &name, const std::string &platform)

Constructs a new Model Metadata object.

Parameters:
  • name – Name of the model

  • platform – the platform this model runs on

void addInputTensor(const std::string &name, std::initializer_list<uint64_t> shape, DataType datatype)

Adds an input tensor to this model.

Parameters:
  • name – name of the tensor

  • shape – shape of the tensor

  • datatype – datatype of the tensor

void addInputTensor(const std::string &name, std::vector<int> shape, DataType datatype)

Adds an input tensor to this model.

Parameters:
  • name – name of the tensor

  • shape – shape of the tensor

  • datatype – datatype of the tensor

void addInputTensor(const Tensor &tensor)

Adds an input tensor to this model.

Parameters:

tensor

const std::vector<ModelMetadataTensor> &getInputs() const

Gets the input tensor’ metadata for this model.

Returns:

const std::vector<ModelMetadataTensor>&

void addOutputTensor(const std::string &name, std::initializer_list<uint64_t> shape, DataType datatype)

Adds an output tensor to this model.

Parameters:
  • name – name of the tensor

  • shape – shape of the tensor

  • datatype – datatype of the tensor

void addOutputTensor(const std::string &name, std::vector<int> shape, DataType datatype)

Adds an output tensor to this model.

Parameters:
  • name – name of the tensor

  • shape – shape of the tensor

  • datatype – datatype of the tensor

void addOutputTensor(const Tensor &tensor)

Adds an output tensor to this model.

Parameters:

tensor

const std::vector<ModelMetadataTensor> &getOutputs() const

Gets the output tensors’ metadata for this model.

void setName(const std::string &name)

Sets the model’s name.

const std::string &getName() const

Gets the model’s name.

void setReady(bool ready)

Marks this model as ready/not ready.

bool isReady() const

Checks if this model is ready.

Servers

class Server

Public Functions

Server()

Constructs a new Server object.

Server(Server const&) = delete

Copy constructor.

Server &operator=(const Server&) = delete

Copy assignment constructor.

Server(Server &&other) = default

Move constructor.

Server &operator=(Server &&other) = default

Move assignment constructor.

~Server()

Destructor.

void startHttp(uint16_t port) const

Start the HTTP server.

Parameters:

port – port to use for the HTTP server

void stopHttp() const

Stop the HTTP server.

void startGrpc(uint16_t port) const

Start the gRPC server.

Parameters:

port – port to use for the gRPC server

void stopGrpc() const

Stop the gRPC server.

void setModelRepository(const std::filesystem::path &repository_path, bool load_existing)

Set the path to the model repository associated with this server.

Parameters:
  • path – path to the model repository

  • load_existing – load all existing models found at the path

void enableRepositoryMonitoring(bool use_polling)

Turn on active monitoring of the model repository path for new files. A model repository must be set with setModelRepository() before calling this method.

Parameters:

use_polling – set to true to use polling to check the directory for new files, false to use events. Note that events may not work well on all platforms.

struct ServerImpl