C++¶
Clients¶
-
bool amdinfer::serverHasExtension(const Client *client, const std::string &extension)¶
Checks if the server has a certain extension.
- Parameters
client – a pointer to a client object
extension – name of the extension to check on the server
- Returns
bool - true if the server has the requested extension
-
void amdinfer::waitUntilServerReady(const Client *client)¶
Blocks until the server is ready.
- Parameters
client – a pointer to a client object
-
void amdinfer::waitUntilModelReady(const Client *client, const std::string &model, const std::string &version = "")¶
Blocks until the named model/worker is ready.
- Parameters
client – a pointer to a client object
model – the model/worker to wait for
-
void amdinfer::waitUntilModelNotReady(const Client *client, const std::string &model, const std::string &version = "")¶
Blocks until the named model/worker is not ready.
- Parameters
client – a pointer to a client object
model – the model/worker to wait for
-
std::vector<std::string> amdinfer::loadEnsemble(const Client *client, std::vector<std::string> workers, std::vector<ParameterMap> parameters)¶
Load an ensemble - a chain of connected workers. This implementation uses the simplest case where the ensemble is a single linear graph.
- Parameters
client – a pointer to a client object
workers – the list of workers to connect
parameters – the list of parameters corresponding to each worker
- Returns
std::vector<std::string> the endpoints for each loaded worker
-
void amdinfer::unloadModels(const Client *client, const std::vector<std::string> &models, const std::string &version = "")¶
Unload a list of models. This list may be from an ensemble or individually loaded workers or models.
- Parameters
client – a pointer to a client object
models – a list of models to unload
-
std::vector<InferenceResponse> amdinfer::inferAsyncOrdered(const Client *client, const std::string &model, const std::vector<InferenceRequest> &requests, const std::string &version = "")¶
Makes inference requests in parallel to the specified model. All requests are sent in parallel and the responses are gathered and returned in the same order.
- Parameters
client – a pointer to a client object
model – the model/worker to make inference requests to
requests – a vector of requests
- Returns
std::vector<InferenceResponse>
-
std::vector<InferenceResponse> amdinfer::inferAsyncOrderedBatched(const Client *client, const std::string &model, const std::vector<InferenceRequest> &requests, size_t batch_size, const std::string &version = "")¶
Makes inference requests in parallel to the specified model in batches. Each batch of requests are gathered and the responses are added to a vector. Once all the responses are received, the response vector is returned.
- Parameters
client – a pointer to a client object
model – the model/worker to make inference requests to
requests – a vector of requests
batch_size – the number of requests that should be sent in parallel at once
- Returns
std::vector<InferenceResponse>
gRPC¶
-
class GrpcClient : public amdinfer::Client¶
The GrpcClient class implements the Client using gRPC.
Usage:
GrpcClient client{“127:0.0.1:50051”}; if (client.serverLive()){ … }
Public Functions
-
explicit GrpcClient(const std::string &address)¶
Constructs a new GrpcClient object.
- Parameters
address – Address of the server to connect to
Constructs a new GrpcClient object.
- Parameters
channel – an existing gRPC channel to reuse to connect to the server
-
GrpcClient(GrpcClient const&) = delete¶
Copy constructor.
-
GrpcClient &operator=(const GrpcClient&) = delete¶
Copy assignment constructor.
-
GrpcClient(GrpcClient &&other) = default¶
Move constructor.
-
GrpcClient &operator=(GrpcClient &&other) = default¶
Move assignment constructor.
-
~GrpcClient() override¶
Destructor. This is needed because GrpcClientImpl is an incomplete type. The destructor is defaulted in the implementation. But having a non- default destructor here forces the need to explicitly specify the other special member functions by the Rule of 5.
-
virtual ServerMetadata serverMetadata() const override¶
Returns the server metadata as a ServerMetadata object.
- Returns
-
virtual bool serverLive() const override¶
Checks if the server is live.
- Returns
bool - true if server is live, false otherwise
-
virtual bool serverReady() const override¶
Checks if the server is ready.
- Returns
bool - true if server is ready, false otherwise
-
virtual std::vector<std::string> modelList() const override¶
Gets a list of active models on the server, returning their names.
- Returns
std::vector<std::string>
-
virtual std::string workerLoad(const std::string &worker, const ParameterMap ¶meters) const override¶
Loads a worker with the given name and load-time parameters.
- Parameters
worker – name of the worker to load
parameters – load-time parameters for the worker
- Returns
std::string
-
virtual void workerUnload(const std::string &worker) const override¶
Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.
- Parameters
worker – name of the worker to unload
-
virtual bool hasHardware(const std::string &name, int num) const override¶
Checks if the server has the requested number of a specific hardware device.
- Parameters
name – name of the hardware device to check
num – number of the device that should exist at minimum
- Returns
bool - true if server has at least the requested number of the hardware device, false otherwise
-
class GrpcClientImpl¶
-
explicit GrpcClient(const std::string &address)¶
HTTP¶
-
class HttpClient : public amdinfer::Client¶
The HttpClient class implements the Client using HTTP REST.
Usage:
HttpClient client{“http://127:0.0.1:8998”}; if (client.serverLive()){ … }
Public Functions
-
explicit HttpClient(const std::string &address)¶
Construct a new HttpClient object.
- Parameters
address – Address of the server to connect to
-
HttpClient(const std::string &address, const StringMap &headers, int parallelism)¶
Construct a new HttpClient object.
- Parameters
address – Address of the server to connect to
headers – Key-value pairs that should be added to the HTTP headers for all requests
parallelism – Max number of requests that can be sent in parallel
-
HttpClient(HttpClient const&) = delete¶
Copy constructor.
-
HttpClient &operator=(const HttpClient&) = delete¶
Copy assignment constructor.
-
HttpClient(HttpClient &&other) = default¶
Move constructor.
-
HttpClient &operator=(HttpClient &&other) = default¶
Move assignment constructor.
-
~HttpClient() override¶
Destructor. This is needed because HttpClientImpl is an incomplete type. The destructor is defaulted in the implementation. But having a non- default destructor here forces the need to explicitly specify the other special member functions by the Rule of 5.
-
virtual ServerMetadata serverMetadata() const override¶
Returns the server metadata as a ServerMetadata object.
- Returns
-
virtual bool serverLive() const override¶
Checks if the server is live.
- Returns
bool - true if server is live, false otherwise
-
virtual bool serverReady() const override¶
Checks if the server is ready.
- Returns
bool - true if server is ready, false otherwise
-
virtual std::vector<std::string> modelList() const override¶
Gets a list of active models on the server, returning their names.
- Returns
std::vector<std::string>
-
virtual std::string workerLoad(const std::string &worker, const ParameterMap ¶meters) const override¶
Loads a worker with the given name and load-time parameters.
- Parameters
worker – name of the worker to load
parameters – load-time parameters for the worker
- Returns
std::string
-
virtual void workerUnload(const std::string &worker) const override¶
Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.
- Parameters
worker – name of the worker to unload
-
virtual bool hasHardware(const std::string &name, int num) const override¶
Checks if the server has the requested number of a specific hardware device.
- Parameters
name – name of the hardware device to check
num – number of the device that should exist at minimum
- Returns
bool - true if server has at least the requested number of the hardware device, false otherwise
-
class HttpClientImpl¶
-
explicit HttpClient(const std::string &address)¶
Native¶
-
class NativeClient : public amdinfer::Client¶
The NativeClient class implements the Client using the native C++ API. This client can be used if the client and backend are in the same C++ executable.
Usage:
NativeClient client; if (client.serverLive()){ … }
Public Functions
-
explicit NativeClient(const Server *server)¶
Construct a new NativeClient object.
- Parameters
server – server to connect to
-
NativeClient(NativeClient const&) = delete¶
Copy constructor.
-
NativeClient &operator=(const NativeClient&) = delete¶
Copy assignment constructor.
-
NativeClient(NativeClient &&other) = default¶
Move constructor.
-
NativeClient &operator=(NativeClient &&other) = default¶
Move assignment constructor.
-
~NativeClient() override¶
Destructor. This is needed because NativeClientImpl is an incomplete type. The destructor is defaulted in the implementation. But having a non- default destructor here forces the need to explicitly specify the other special member functions by the Rule of 5.
-
virtual ServerMetadata serverMetadata() const override¶
Returns the server metadata as a ServerMetadata object.
- Returns
-
virtual bool serverLive() const override¶
Checks if the server is live.
- Returns
bool - true if server is live, false otherwise
-
virtual bool serverReady() const override¶
Checks if the server is ready.
- Returns
bool - true if server is ready, false otherwise
-
virtual std::vector<std::string> modelList() const override¶
Gets a list of active models on the server, returning their names.
- Returns
std::vector<std::string>
-
virtual std::string workerLoad(const std::string &worker, const ParameterMap ¶meters) const override¶
Loads a worker with the given name and load-time parameters.
- Parameters
worker – name of the worker to load
parameters – load-time parameters for the worker
- Returns
std::string
-
virtual void workerUnload(const std::string &worker) const override¶
Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.
- Parameters
worker – name of the worker to unload
-
virtual bool hasHardware(const std::string &name, int num) const override¶
Checks if the server has the requested number of a specific hardware device.
- Parameters
name – name of the hardware device to check
num – number of the device that should exist at minimum
- Returns
bool - true if server has at least the requested number of the hardware device, false otherwise
-
struct NativeClientImpl¶
-
explicit NativeClient(const Server *server)¶
WebSocket¶
-
class WebSocketClient : public amdinfer::Client¶
The WebSocketClient class implements the Client using websocket. It reuses the HttpClient for most transactions with the exception of some operations that actually use websocket.
Usage:
WebSocketClient client{“ws://127.0.0.1:8998”, “http://127.0.0.1:8998”}; if (client.serverLive()){ … }
Public Functions
-
WebSocketClient(const std::string &ws_address, const std::string &http_address)¶
Constructs a new WebSocketClient object.
- Parameters
ws_address – address of the websocket server to connect to
http_address – address of the HTTP server to connect to
-
WebSocketClient(WebSocketClient const&) = delete¶
Copy constructor.
-
WebSocketClient &operator=(const WebSocketClient&) = delete¶
Copy assignment constructor.
-
WebSocketClient(WebSocketClient &&other) = default¶
Move constructor.
-
WebSocketClient &operator=(WebSocketClient &&other) = default¶
Move assignment constructor.
-
~WebSocketClient() override¶
Destructor. This is needed because WebSocketClientImpl is an incomplete type. The destructor is defaulted in the implementation. But having a non- default destructor here forces the need to explicitly specify the other special member functions by the Rule of 5.
-
virtual ServerMetadata serverMetadata() const override¶
Returns the server metadata as a ServerMetadata object.
- Returns
-
virtual bool serverLive() const override¶
Checks if the server is live.
- Returns
bool - true if server is live, false otherwise
-
virtual bool serverReady() const override¶
Checks if the server is ready.
- Returns
bool - true if server is ready, false otherwise
-
virtual std::vector<std::string> modelList() const override¶
Gets a list of active models on the server, returning their names.
- Returns
std::vector<std::string>
-
virtual std::string workerLoad(const std::string &worker, const ParameterMap ¶meters) const override¶
Loads a worker with the given name and load-time parameters.
- Parameters
worker – name of the worker to load
parameters – load-time parameters for the worker
- Returns
std::string
-
virtual void workerUnload(const std::string &worker) const override¶
Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.
- Parameters
worker – name of the worker to unload
-
virtual bool hasHardware(const std::string &name, int num) const override¶
Checks if the server has the requested number of a specific hardware device.
- Parameters
name – name of the hardware device to check
num – number of the device that should exist at minimum
- Returns
bool - true if server has at least the requested number of the hardware device, false otherwise
-
void modelInferWs(const std::string &model, const InferenceRequest &request) const¶
Makes a websocket inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for. This method differs from the standard inference in that it submits an actual Websocket message. The user should use modelRecv to get results and must disambiguate different responses on the client-side using the IDs of the responses.
- Parameters
model –
request –
-
std::string modelRecv() const¶
Gets one message from the websocket server sent in response to a modelInferWs request. The user should know beforehand how many messages are expected and should call this method the same number of times.
- Returns
std::string a JSON object encoded as a string
-
void close() const¶
Closes the websocket connection.
-
class WebSocketClientImpl¶
Public Functions
-
WebSocketClientImpl(WebSocketClientImpl const&) = delete¶
Copy constructor.
-
WebSocketClientImpl &operator=(const WebSocketClientImpl&) = delete¶
Copy assignment constructor.
-
WebSocketClientImpl(WebSocketClientImpl &&other) = delete¶
Move constructor.
-
WebSocketClientImpl &operator=(WebSocketClientImpl &&other) = delete¶
Move assignment constructor.
-
WebSocketClientImpl(WebSocketClientImpl const&) = delete¶
-
WebSocketClient(const std::string &ws_address, const std::string &http_address)¶
Core¶
DataType¶
-
class DataType¶
Supported data types. The ALL_CAPS aliases are deprecated and will be removed.
Public Functions
-
inline explicit constexpr DataType(const char *value)¶
Constructs a new DataType object.
- Parameters
value – string to identify the initial value of the new datatype
-
inline constexpr DataType(DataType::Value value)¶
Constructs a new DataType object.
- Parameters
value – datatype to identify the initial value of the new datatype
-
inline constexpr operator Value() const¶
Implicit conversion between the Datatype class and its internal value.
-
inline constexpr size_t size() const¶
Get the size in bytes associated with a data type.
- Returns
constexpr size_t
-
inline constexpr const char *str() const¶
Given a type, return a string corresponding to the type. KServe requires the string form to be specific values and in all caps. This adheres to that. If these string values are changed, then each server will need to map the values to the ones KServe expects.
- Returns
const char*
-
inline explicit constexpr DataType(const char *value)¶
Exceptions¶
Defines the exception classes. Exception classes follow lower-case snake case name syntax of the standard exceptions in std.
-
namespace amdinfer¶
-
class bad_status : public amdinfer::runtime_error¶
- #include <exceptions.hpp>
This exception gets thrown by the clients if a method fails or the server raises an error.
Subclassed by amdinfer::connection_error
-
class connection_error : public amdinfer::bad_status¶
- #include <exceptions.hpp>
This exception gets thrown by the clients if the connection to the server fails.
-
class environment_not_set_error : public amdinfer::runtime_error¶
- #include <exceptions.hpp>
This exception gets thrown if an expected environment variable is not set.
-
class external_error : public amdinfer::runtime_error¶
- #include <exceptions.hpp>
This exception gets thrown if a third-party library raises an exception.
-
class file_not_found_error : public amdinfer::runtime_error¶
- #include <exceptions.hpp>
This exception gets thrown if a requested file cannot be found.
-
class file_read_error : public amdinfer::runtime_error¶
- #include <exceptions.hpp>
This exception gets thrown if a requested file cannot be read.
-
class invalid_argument : public amdinfer::runtime_error¶
- #include <exceptions.hpp>
This exception gets thrown if an invalid argument is passed to a function.
-
class runtime_error : public runtime_error¶
- #include <exceptions.hpp>
The base class for all exceptions thrown by the inference server.
Subclassed by amdinfer::bad_status, amdinfer::environment_not_set_error, amdinfer::external_error, amdinfer::file_not_found_error, amdinfer::file_read_error, amdinfer::invalid_argument
-
class bad_status : public amdinfer::runtime_error¶
Prediction¶
-
class ParameterMap : public amdinfer::Serializable¶
Holds any parameters from JSON (defined by KServe spec as one of bool, number or string). We further restrict numbers to be doubles or int32.
Public Functions
-
ParameterMap(const std::vector<std::string> &keys, const std::vector<Parameter> &values)¶
Construct a new ParameterMap object with initial values. The sizes of the keys and values vectors must match.
Until C++20, passing const char* to this constructor will convert it to a bool instead of a string. Explicitly convert any string literals to a string before passing them to this constructor.
- Parameters
keys –
values –
-
void put(const std::string &key, const Parameter &value)¶
Put in a key-value pair.
- Parameters
key – key used to store and retrieve the value
value – value to store
-
void put(const std::string &key, const char *value)¶
Put in a key-value pair.
This overload is needed because C++ converts const char* to bool instead of string when both types are present in the variant. This behavior has been fixed in C++20.
- Parameters
key –
value –
-
template<typename T>
inline T get(const std::string &key) const¶ Get the named parameter.
- Template Parameters
T – type of parameter. Must be (bool|double|int32_t|std::string)
- Parameters
key – parameter to get
- Returns
T
-
bool has(const std::string &key) const¶
Checks if a particular parameter exists.
- Parameters
key – name of the parameter to check
- Returns
bool
-
void rename(const std::string &key, const std::string &new_key)¶
Rename the key associated with a parameter. If the new key already exists, its value is not overwritten and the old key is just erased.
- Parameters
key –
new_key –
-
void erase(const std::string &key)¶
Removes a parameter, if it exists. No error is raised if it doesn’t exist.
- Parameters
key – name of the parameter to remove
-
size_t size() const¶
Gets the number of parameters.
-
bool empty() const¶
Checks if the parameters are empty.
-
std::map<std::string, Parameter, std::less<>> data() const¶
Gets the underlying data structure holding the parameters.
-
Iterator begin()¶
Returns a read/write iterator to the first parameter in the object.
-
ConstIterator begin() const¶
Returns a read iterator to the first parameter in the object.
-
ConstIterator cbegin() const¶
Returns a read iterator to the first parameter in the object.
-
Iterator end()¶
Returns a read/write iterator to one past the last parameter in the object.
-
ConstIterator end() const¶
Returns a read iterator to one past the last parameter in the object.
-
ConstIterator cend() const¶
Returns a read iterator to one past the last parameter in the object.
-
virtual size_t serializeSize() const override¶
Returns the size of the serialized data.
- Returns
size_t
-
virtual std::byte *serialize(std::byte *data_out) const override¶
Serializes the object to the provided memory address. There should be sufficient space to store the serialized object.
- Parameters
data_out –
-
virtual const std::byte *deserialize(const std::byte *data_in) override¶
Deserializes the data at the provided memory address to initialize this object. If the memory cannot be deserialized, an exception is thrown.
- Parameters
data_in – a pointer to the serialized data for this object type
Friends
-
inline friend std::ostream &operator<<(std::ostream &os, const ParameterMap &self)¶
Provides an implementation to print the class with std::cout to an ostream.
-
ParameterMap(const std::vector<std::string> &keys, const std::vector<Parameter> &values)¶
-
struct ServerMetadata¶
-
class InferenceRequestInput : public amdinfer::InferenceTensor¶
Holds an inference request’s input data.
Public Functions
-
InferenceRequestInput()¶
Constructs a new InferenceRequestInput object.
-
explicit InferenceRequestInput(const Tensor &tensor)¶
Constructs a new InferenceRequestInput object.
-
InferenceRequestInput(void *data, std::vector<int64_t> shape, DataType data_type, std::string name = "")¶
Construct a new InferenceRequestInput object.
- Parameters
data – pointer to data
shape – shape of the data
data_type – type of the data
name – name to assign
-
void setData(void *buffer)¶
Set the request’s data.
-
void *getData() const¶
Get a pointer to the request’s data.
-
virtual size_t serializeSize() const override¶
Returns the size of the serialized data.
- Returns
size_t
-
virtual std::byte *serialize(std::byte *data_out) const override¶
Serializes the object to the provided memory address. There should be sufficient space to store the serialized object.
- Parameters
data_out –
- Returns
std::byte* updated address
-
virtual const std::byte *deserialize(const std::byte *data_in) override¶
Deserializes the data at the provided memory address to initialize this object. If the memory cannot be deserialized, an exception is thrown.
- Parameters
data_in – a pointer to the serialized data for this object type
- Returns
std::byte* updated address
Friends
-
friend std::ostream &operator<<(std::ostream &os, InferenceRequestInput const &self)¶
Provides an implementation to print the class with std::cout to an ostream.
-
InferenceRequestInput()¶
-
class InferenceRequestOutput¶
Holds an inference request’s output data.
Public Functions
-
InferenceRequestOutput()¶
Constructs a new Request Output object.
-
inline void setData(void *buffer)¶
Sets the request’s data.
-
inline void *getData()¶
Takes the request’s data.
-
inline std::string getName() const¶
Gets the output tensor’s name.
-
void setName(const std::string &name)¶
Set the output tensor’s name.
-
inline void setParameters(ParameterMap parameters)¶
Sets the output tensor’s parameters.
- Parameters
parameters – pointer to parameters to assign
-
inline const ParameterMap &getParameters() const &¶
Gets the output tensor’s parameters.
-
inline ParameterMap getParameters() &&¶
Gets the output tensor’s parameters.
-
InferenceRequestOutput()¶
-
class InferenceResponse¶
Creates an inference response object based on KServe’s V2 spec that is used to respond back to clients.
Public Functions
-
InferenceResponse()¶
Constructs a new InferenceResponse object.
-
explicit InferenceResponse(const std::string &error)¶
Constructs a new InferenceResponse error object.
-
std::vector<InferenceResponseOutput> getOutputs() const¶
Gets a vector of the requested output information.
-
void addOutput(const InferenceResponseOutput &output)¶
Adds an output tensor to the response.
- Parameters
output – an output tensor
-
inline std::string getID() const¶
Gets the ID of the response.
-
void setID(const std::string &id)¶
Sets the ID of the response.
-
void setModel(const std::string &model)¶
sets the model name of the response
-
std::string getModel()¶
gets the model name of the response
-
bool isError() const¶
Checks if this is an error response.
-
std::string getError() const¶
Gets the error message if it exists. Defaults to an empty string.
-
inline ParameterMap *getParameters()¶
Gets a pointer to the parameters associated with this response.
Friends
-
friend std::ostream &operator<<(std::ostream &os, InferenceResponse const &self)¶
Provides an implementation to print the class with std::cout to an ostream.
-
InferenceResponse()¶
-
class InferenceRequest¶
Creates an inference request object based on KServe’s V2 spec that is used to communicate between workers.
Public Functions
-
void setCallback(Callback &&callback)¶
Sets the request’s callback function used by the last worker to respond back to the client.
- Parameters
callback – a function pointer that accepts a InferenceResponse object
-
Callback getCallback()¶
Get the request’s callback function used by the last worker to respond back to the client.
-
void runCallback(const InferenceResponse &response)¶
Runs the request’s callback function.
- Parameters
response – the response data
-
void runCallbackOnce(const InferenceResponse &response)¶
Runs the request’s callback function and clear it after. This prevents calling the callback multiple times. If this function is called again, it’s a no-op.
- Parameters
response – the response data
-
void runCallbackError(std::string_view error_msg)¶
Runs the request’s callback function with an error response. The callback function is not cleared.
- Parameters
error_msg – error message to send back to the client
-
void addInputTensor(void *data, const std::vector<int64_t> &shape, DataType data_type, const std::string &name = "")¶
Constructs and adds a new input tensor to this request.
- Parameters
data – pointer to data to add
shape – shape of the data
data_type – the datatype of the data
name – the name of the input tensor
-
void addInputTensor(InferenceRequestInput input)¶
Adds a new input tensor to this request.
- Parameters
input – an existing InferenceRequestInput object
-
void setInputTensorData(size_t index, void *data)¶
Set the data pointer for an input tensor, if it exists.
- Parameters
index – index for the input tensor
data – pointer to assign to its data member
-
void addOutputTensor(const InferenceRequestOutput &output)¶
Adds a new output tensor to this request.
- Parameters
output – an existing InferenceRequestOutput object
-
const std::vector<InferenceRequestInput> &getInputs() const¶
Gets a vector of all the input request objects.
-
size_t getInputSize() const¶
Get the number of input request objects.
-
const std::vector<InferenceRequestOutput> &getOutputs() const¶
Gets a vector of the requested output information.
-
inline const std::string &getID() const¶
Gets the ID associated with this request.
- Returns
std::string
-
inline void setID(std::string_view id)¶
Sets the ID associated with this request.
- Parameters
id – ID to set
-
inline const ParameterMap &getParameters() const &¶
Get the request’s parameters.
-
inline ParameterMap getParameters() &&¶
Get the request’s parameters.
-
inline void setParameters(ParameterMap parameters)¶
Sets the parameters for the request.
- Parameters
parameters – pointer to the parameters
-
void setCallback(Callback &&callback)¶
Warning
doxygenclass: Cannot find class “amdinfer::ModelMetadataTensor” in doxygen xml output for project “amdinfer” from directory: ../build/docs/doxygen/xml
-
class ModelMetadata¶
This class holds the metadata associated with a model (per the KServe spec). This allows clients to query this information from the server.
Public Functions
-
ModelMetadata(const std::string &name, const std::string &platform)¶
Constructs a new Model Metadata object.
- Parameters
name – Name of the model
platform – the platform this model runs on
-
void addInputTensor(const std::string &name, std::initializer_list<int64_t> shape, DataType datatype)¶
Adds an input tensor to this model.
- Parameters
name – name of the tensor
shape – shape of the tensor
datatype – datatype of the tensor
-
void addInputTensor(const std::string &name, std::vector<int> shape, DataType datatype)¶
Adds an input tensor to this model.
- Parameters
name – name of the tensor
shape – shape of the tensor
datatype – datatype of the tensor
-
void addInputTensor(const Tensor &tensor)¶
Adds an input tensor to this model.
- Parameters
tensor –
-
const std::vector<ModelMetadataTensor> &getInputs() const¶
Gets the input tensor’ metadata for this model.
- Returns
const std::vector<ModelMetadataTensor>&
-
void addOutputTensor(const std::string &name, std::initializer_list<int64_t> shape, DataType datatype)¶
Adds an output tensor to this model.
- Parameters
name – name of the tensor
shape – shape of the tensor
datatype – datatype of the tensor
-
void addOutputTensor(const std::string &name, std::vector<int> shape, DataType datatype)¶
Adds an output tensor to this model.
- Parameters
name – name of the tensor
shape – shape of the tensor
datatype – datatype of the tensor
-
void addOutputTensor(const Tensor &tensor)¶
Adds an output tensor to this model.
- Parameters
tensor –
-
const std::vector<ModelMetadataTensor> &getOutputs() const¶
Gets the output tensors’ metadata for this model.
-
void setName(const std::string &name)¶
Sets the model’s name.
-
const std::string &getName() const¶
Gets the model’s name.
-
void setReady(bool ready)¶
Marks this model as ready/not ready.
-
bool isReady() const¶
Checks if this model is ready.
-
ModelMetadata(const std::string &name, const std::string &platform)¶
Servers¶
-
class Server¶
Public Functions
-
~Server()¶
Destructor.
-
void startHttp(uint16_t port) const¶
Start the HTTP server.
- Parameters
port – port to use for the HTTP server
-
void stopHttp() const¶
Stop the HTTP server.
-
void startGrpc(uint16_t port) const¶
Start the gRPC server.
- Parameters
port – port to use for the gRPC server
-
void stopGrpc() const¶
Stop the gRPC server.
-
void setModelRepository(const std::filesystem::path &repository_path, bool load_existing)¶
Set the path to the model repository associated with this server.
- Parameters
path – path to the model repository
load_existing – load all existing models found at the path
-
void enableRepositoryMonitoring(bool use_polling)¶
Turn on active monitoring of the model repository path for new files. A model repository must be set with setModelRepository() before calling this method.
- Parameters
use_polling – set to true to use polling to check the directory for new files, false to use events. Note that events may not work well on all platforms.
-
struct ServerImpl¶
-
~Server()¶