Python¶

The Python library for the AMD Inference Server allows you to communicate with the server using Python.

Install the Python library¶

The Python library is built and installed in the development container as part of the regular CMake build. To install it outside Docker or in different containers, you can use pip:

$ pip install amdinfer

Tip

Make sure the client library version is compatible with the server by using matching versions. If you are using the latest server from main, you may need to install the Python library with pip install --pre amdinfer to install a pre-release package, if it exists.

Build wheels¶

You can build wheels for the Python library to make a precompiled package that can be installed in any Linux host, container or environment. It is recommended to perform the following steps on a fresh clone of the inference server repository. These instructions assume you’re only building wheels for x86_64 Linux with CPython.

# generate a Dockerfile that defines an image for building wheels
./docker/generate.py --cibuildwheel --base-image=quay.io/pypa/manylinux2014_x86_64 --base-image-type yum

# build the image. You should add some suffix to differentiate this image
# from the regular image
./amdinfer dockerize --suffix="-ci"

# this will build an image with the name $(whoami)/amdinfer-dev-ci:latest
# if you're not building wheels on the same host, you will need to upload
# this image to a Docker registry

# on a host where your image exists or can be pulled
export CIBW_MANYLINUX_X86_64_IMAGE=$(whoami)/amdinfer-dev-ci:latest

pip install cibuildwheel

# you can edit pyproject.toml to control which wheels to build or just use the defaults

cibuildwheel --platform linux

# your built wheels will be in ./wheelhouse

After following these instructions, your built wheels will be in ./wheelhouse/. The names on the wheels indicate the Python version they are compatible with. For example, cp37 in the name indicates that it’s compatible with CPython 3.7. You can install these wheels in a virtual environment, Conda environment, a container or on a bare host.

pip install <path/to/wheel>

API¶

exception amdinfer.BadStatus¶

class amdinfer.Client¶

__init__(*args, **kwargs)¶

modelInfer(self: amdinfer.Client, model: str, request: amdinfer.InferenceRequest, version: str = '') → amdinfer.InferenceResponse¶

Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.

Parameter model:: name of the model/worker to request inference to
Parameter request:: the request
Parameter version:: version of the model to request inference to

Returns: InferenceResponse

modelLoad(self: amdinfer.Client, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)¶

, version: str = ‘’) -> None

Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.

Parameter model:: name of the model to load from the model repository directory
Parameter parameters:: load-time parameters for the worker supporting the model
Parameter version:: model version to load from the model repository directory

modelMetadata(self: amdinfer.Client, model: str, version: str = '') → amdinfer.ModelMetadata¶

Returns the metadata associated with a ready model/worker

Parameter model:: name of the model/worker to get metadata
Parameter version:: version of the model to get metadata

Returns: ModelMetadata

modelReady(self: amdinfer.Client, model: str, version: str = '') → bool¶

Checks if a model/worker is ready

Parameter model:: name of the model to check
Parameter version:: version of the model to check

Returns: bool - true if model is ready, false otherwise

modelUnload(self: amdinfer.Client, model: str, version: str = '') → None¶

Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.

Parameter model:: name of the model to unload
Parameter version:: version of the model to unload

exception amdinfer.ConnectionError¶

class amdinfer.DataType¶

BOOL = DataType(BOOL) ¶

BYTES = DataType(BYTES) ¶

FLOAT16 = DataType(FP16) ¶

FLOAT32 = DataType(FP32) ¶

FLOAT64 = DataType(FP64) ¶

FP16 = DataType(FP16) ¶

FP32 = DataType(FP32) ¶

FP64 = DataType(FP64) ¶

INT16 = DataType(INT16) ¶

INT32 = DataType(INT32) ¶

INT64 = DataType(INT64) ¶

INT8 = DataType(INT8) ¶

UINT16 = DataType(UINT16) ¶

UINT32 = DataType(UINT32) ¶

UINT64 = DataType(UINT64) ¶

UINT8 = DataType(UINT8) ¶

class Value¶

Members:

BOOL

UINT8

UINT16

UINT32

UINT64

INT8

INT16

INT32

INT64

FP16

FP32

FLOAT32

FP64

FLOAT64

BYTES

BOOL = <Value.BOOL: 0>¶

BYTES = <Value.BYTES: 12>¶

FLOAT32 = <Value.FP32: 10>¶

FLOAT64 = <Value.FP64: 11>¶

FP16 = <Value.FP16: 9>¶

FP32 = <Value.FP32: 10>¶

FP64 = <Value.FP64: 11>¶

INT16 = <Value.INT16: 6>¶

INT32 = <Value.INT32: 7>¶

INT64 = <Value.INT64: 8>¶

INT8 = <Value.INT8: 5>¶

UINT16 = <Value.UINT16: 2>¶

UINT32 = <Value.UINT32: 3>¶

UINT64 = <Value.UINT64: 4>¶

UINT8 = <Value.UINT8: 1>¶

__init__(self: amdinfer.DataType.Value, value: int) → None¶

property name¶

property value¶

__init__(*args, **kwargs)¶

Overloaded function.

__init__(self: amdinfer._amdinfer.DataType) -> None
__init__(self: amdinfer._amdinfer.DataType) -> None
__init__(self: amdinfer._amdinfer.DataType, arg0: str) -> None
__init__(self: amdinfer._amdinfer.DataType, arg0: amdinfer._amdinfer.DataType.Value) -> None

size(self: amdinfer.DataType) → int¶

str(self: amdinfer.DataType) → str¶

exception amdinfer.EnvironmentNotSetError¶

exception amdinfer.ExternalError¶

exception amdinfer.FileNotFoundError¶

exception amdinfer.FileReadError¶

class amdinfer.GrpcClient¶

__init__(self: amdinfer.GrpcClient, address: str) → None¶

Constructs a new GrpcClient object

Parameter address:: Address of the server to connect to

hasHardware(self: amdinfer.GrpcClient, name: str, num: int) → bool¶

Checks if the server has the requested number of a specific hardware device

Parameter name:: name of the hardware device to check
Parameter num:: number of the device that should exist at minimum

Returns: bool - true if server has at least the requested number of the hardware device, false otherwise

modelList(self: amdinfer.GrpcClient) → List[str]¶

Gets a list of active models on the server, returning their names

Returns: std::vector<std::string>

serverLive(self: amdinfer.GrpcClient) → bool¶

Checks if the server is live

Returns: bool - true if server is live, false otherwise

serverMetadata(self: amdinfer.GrpcClient) → amdinfer.ServerMetadata¶

Returns the server metadata as a ServerMetadata object

Returns: ServerMetadata

serverReady(self: amdinfer.GrpcClient) → bool¶

Checks if the server is ready

Returns: bool - true if server is ready, false otherwise

workerLoad(self: amdinfer.GrpcClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)¶

) -> str

Loads a worker with the given name and load-time parameters.

Parameter worker:: name of the worker to load
Parameter parameters:: load-time parameters for the worker

Returns: std::string

workerUnload(self: amdinfer.GrpcClient, model: str) → None¶

Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.

Parameter worker:: name of the worker to unload

class amdinfer.HttpClient¶

__init__(self: amdinfer.HttpClient, address: str, headers: Dict[str, str] = {}, parallelism: int = 32) → None¶

Construct a new HttpClient object

Parameter address:: Address of the server to connect to

hasHardware(self: amdinfer.HttpClient, name: str, num: int) → bool¶

Checks if the server has the requested number of a specific hardware device

Parameter name:: name of the hardware device to check
Parameter num:: number of the device that should exist at minimum

Returns: bool - true if server has at least the requested number of the hardware device, false otherwise

modelList(self: amdinfer.HttpClient) → List[str]¶

Gets a list of active models on the server, returning their names

Returns: std::vector<std::string>

serverLive(self: amdinfer.HttpClient) → bool¶

Checks if the server is live

Returns: bool - true if server is live, false otherwise

serverMetadata(self: amdinfer.HttpClient) → amdinfer.ServerMetadata¶

Returns the server metadata as a ServerMetadata object

Returns: ServerMetadata

serverReady(self: amdinfer.HttpClient) → bool¶

Checks if the server is ready

Returns: bool - true if server is ready, false otherwise

workerLoad(self: amdinfer.HttpClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)¶

) -> str

Loads a worker with the given name and load-time parameters.

Parameter worker:: name of the worker to load
Parameter parameters:: load-time parameters for the worker

Returns: std::string

workerUnload(self: amdinfer.HttpClient, model: str) → None¶

Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.

Parameter worker:: name of the worker to unload

amdinfer.ImageInferenceRequest(images, modelMetaData=None, asTensor=True, shape=None)¶

Construct a request from an image or list of images

Parameters

images (image) – Images may be numpy arrays or filepaths or a list of these
modelMetaData (ModelMetadata) – Pass the model metadata. Defaults to None.
asTensor (bool, optional) – Send data as a tensor or as base64-encoded string. Defaults to True.
shape (list, optional) – Specify the shape explicitly if needed. Defaults to None.

Raises

TypeError – Raised if an unknown image format is passed

Returns

Request object

Return type

InferenceRequest

class amdinfer.InferenceRequest¶

__init__(self: amdinfer.InferenceRequest) → None¶

addInputTensor(self: amdinfer.InferenceRequest, input: amdinfer.InferenceRequestInput) → None¶

Constructs and adds a new input tensor to this request

Parameter data:: pointer to data to add
Parameter shape:: shape of the data
Parameter data_type:: the datatype of the data
Parameter name:: the name of the input tensor

addOutputTensor(self: amdinfer.InferenceRequest, output: amdinfer.InferenceRequestOutput) → None¶

Adds a new output tensor to this request

Parameter output:: an existing InferenceRequestOutput object

getInputSize(self: amdinfer.InferenceRequest) → int¶: Get the number of input request objects

getInputs(self: amdinfer.InferenceRequest) → List[amdinfer.InferenceRequestInput]¶: Gets a vector of all the input request objects

getOutputs(self: amdinfer.InferenceRequest) → List[amdinfer.InferenceRequestOutput]¶: Gets a vector of the requested output information

property id¶

property parameters¶

propagate(self: amdinfer.InferenceRequest) → amdinfer.InferenceRequest¶

class amdinfer.InferenceRequestInput¶

__init__(*args, **kwargs)¶

Overloaded function.

__init__(self: amdinfer._amdinfer.InferenceRequestInput) -> None

Constructs a new InferenceRequestInput object

__init__(self: amdinfer._amdinfer.InferenceRequestInput, tensor: amdinfer._amdinfer.Tensor) -> None

Constructs a new InferenceRequestInput object

__init__(self: amdinfer._amdinfer.InferenceRequestInput, data: capsule, shape: List[int], data_type: amdinfer._amdinfer.DataType, data: str = ‘’) -> None

Construct a new InferenceRequestInput object

Parameter data:: pointer to data
Parameter shape:: shape of the data
Parameter data_type:: type of the data
Parameter name:: name to assign

getFp16Data(self: amdinfer.InferenceRequestInput) → numpy.ndarray[float16]¶

getFp32Data(self: amdinfer.InferenceRequestInput) → numpy.ndarray[numpy.float32]¶

getFp64Data(self: amdinfer.InferenceRequestInput) → numpy.ndarray[numpy.float64]¶

getInt16Data(self: amdinfer.InferenceRequestInput) → numpy.ndarray[numpy.int16]¶

getInt32Data(self: amdinfer.InferenceRequestInput) → numpy.ndarray[numpy.int32]¶

getInt64Data(self: amdinfer.InferenceRequestInput) → numpy.ndarray[numpy.int64]¶

getInt8Data(self: amdinfer.InferenceRequestInput) → numpy.ndarray[numpy.int8]¶

getStringData(self: amdinfer.InferenceRequestInput) → numpy.ndarray[numpy.int8]¶

getUint16Data(self: amdinfer.InferenceRequestInput) → numpy.ndarray[numpy.uint16]¶

getUint32Data(self: amdinfer.InferenceRequestInput) → numpy.ndarray[numpy.uint32]¶

getUint64Data(self: amdinfer.InferenceRequestInput) → numpy.ndarray[numpy.uint64]¶

getUint8Data(self: amdinfer.InferenceRequestInput) → numpy.ndarray[numpy.uint8]¶

setFp16Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[float16]) → None¶

setFp32Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.float32]) → None¶

setFp64Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.float64]) → None¶

setInt16Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.int16]) → None¶

setInt32Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.int32]) → None¶

setInt64Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.int64]) → None¶

setInt8Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.int8]) → None¶

setStringData(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint8]) → None¶

setUint16Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint16]) → None¶

setUint32Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint32]) → None¶

setUint64Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint64]) → None¶

setUint8Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint8]) → None¶

class amdinfer.InferenceRequestOutput¶

__init__(self: amdinfer.InferenceRequestOutput) → None¶: Holds an inference request’s output data

property data¶

property name¶

property parameters¶

class amdinfer.InferenceResponse¶

__init__(*args, **kwargs)¶

Overloaded function.

__init__(self: amdinfer._amdinfer.InferenceResponse) -> None

Constructs a new InferenceResponse object

__init__(self: amdinfer._amdinfer.InferenceResponse, arg0: str) -> None

Constructs a new InferenceResponse error object

addOutput(self: amdinfer.InferenceResponse, output: amdinfer.InferenceResponseOutput) → None¶

Adds an output tensor to the response

Parameter output:: an output tensor

getError(self: amdinfer.InferenceResponse) → str¶: Gets the error message if it exists. Defaults to an empty string

getOutputs(self: amdinfer.InferenceResponse) → List[amdinfer.InferenceResponseOutput]¶: Gets a vector of the requested output information

getParameters(self: amdinfer.InferenceResponse) → amdinfer.ParameterMap¶: Gets a pointer to the parameters associated with this response

property id¶

isError(self: amdinfer.InferenceResponse) → bool¶: Checks if this is an error response

property model¶

class amdinfer.InferenceResponseOutput¶

__init__(self: amdinfer.InferenceResponseOutput) → None¶: Holds an inference request’s input data

property datatype¶

getFp16Data(self: amdinfer.InferenceResponseOutput) → numpy.ndarray[float16]¶

getFp32Data(self: amdinfer.InferenceResponseOutput) → numpy.ndarray[numpy.float32]¶

getFp64Data(self: amdinfer.InferenceResponseOutput) → numpy.ndarray[numpy.float64]¶

getInt16Data(self: amdinfer.InferenceResponseOutput) → numpy.ndarray[numpy.int16]¶

getInt32Data(self: amdinfer.InferenceResponseOutput) → numpy.ndarray[numpy.int32]¶

getInt64Data(self: amdinfer.InferenceResponseOutput) → numpy.ndarray[numpy.int64]¶

getInt8Data(self: amdinfer.InferenceResponseOutput) → numpy.ndarray[numpy.int8]¶

getSize(self: amdinfer.InferenceResponseOutput) → int¶: Get the tensor’s size (number of elements)

getStringData(self: amdinfer.InferenceResponseOutput) → numpy.ndarray[numpy.int8]¶

getUint16Data(self: amdinfer.InferenceResponseOutput) → numpy.ndarray[numpy.uint16]¶

getUint32Data(self: amdinfer.InferenceResponseOutput) → numpy.ndarray[numpy.uint32]¶

getUint64Data(self: amdinfer.InferenceResponseOutput) → numpy.ndarray[numpy.uint64]¶

getUint8Data(self: amdinfer.InferenceResponseOutput) → numpy.ndarray[numpy.uint8]¶

property name¶

property parameters¶

setFp16Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[float16]) → None¶

setFp32Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.float32]) → None¶

setFp64Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.float64]) → None¶

setInt16Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.int16]) → None¶

setInt32Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.int32]) → None¶

setInt64Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.int64]) → None¶

setInt8Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.int8]) → None¶

setStringData(self: amdinfer.InferenceResponseOutput, arg0: str) → None¶

setUint16Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.uint16]) → None¶

setUint32Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.uint32]) → None¶

setUint64Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.uint64]) → None¶

setUint8Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.uint8]) → None¶

property shape¶

class amdinfer.InferenceTensor¶

__init__(*args, **kwargs)¶

Overloaded function.

__init__(self: amdinfer._amdinfer.InferenceTensor, name: str, shape: List[int], dataType: amdinfer._amdinfer.DataType) -> None

Construct a new InferenceTensor object

__init__(self: amdinfer._amdinfer.InferenceTensor, tensor: amdinfer._amdinfer.Tensor) -> None

Construct a new InferenceTensor object

property parameters¶

exception amdinfer.InvalidArgumentError¶

class amdinfer.ModelMetadata¶

__init__(self: amdinfer.ModelMetadata, arg0: str, arg1: str) → None¶

Constructs a new Model Metadata object

Parameter name:: Name of the model
Parameter platform:: the platform this model runs on

addInputTensor(self: amdinfer.ModelMetadata, arg0: str, arg1: List[int], arg2: amdinfer.DataType) → None¶

Adds an input tensor to this model

Parameter name:: name of the tensor
Parameter shape:: shape of the tensor
Parameter datatype:: datatype of the tensor

addOutputTensor(self: amdinfer.ModelMetadata, name: str, datatype: List[int], shape: amdinfer.DataType) → None¶

Adds an output tensor to this model

Parameter name:: name of the tensor
Parameter shape:: shape of the tensor
Parameter datatype:: datatype of the tensor

getInputs(self: amdinfer.ModelMetadata) → List[amdinfer.Tensor]¶

Gets the input tensor’ metadata for this model

Returns: const std::vector<ModelMetadataTensor>&

getPlatform(self: amdinfer.ModelMetadata) → str¶

isReady(self: amdinfer.ModelMetadata) → bool¶: Checks if this model is ready

property name¶

setReady(self: amdinfer.ModelMetadata, arg0: bool) → None¶: Marks this model as ready/not ready

class amdinfer.NativeClient¶

__init__(self: amdinfer.NativeClient, server: amdinfer.Server) → None¶

hasHardware(self: amdinfer.NativeClient, name: str, num: int) → bool¶

Checks if the server has the requested number of a specific hardware device

Parameter name:: name of the hardware device to check
Parameter num:: number of the device that should exist at minimum

Returns: bool - true if server has at least the requested number of the hardware device, false otherwise

modelList(self: amdinfer.NativeClient) → List[str]¶

Gets a list of active models on the server, returning their names

Returns: std::vector<std::string>

serverLive(self: amdinfer.NativeClient) → bool¶

Checks if the server is live

Returns: bool - true if server is live, false otherwise

serverMetadata(self: amdinfer.NativeClient) → amdinfer.ServerMetadata¶

Returns the server metadata as a ServerMetadata object

Returns: ServerMetadata

serverReady(self: amdinfer.NativeClient) → bool¶

Checks if the server is ready

Returns: bool - true if server is ready, false otherwise

workerLoad(self: amdinfer.NativeClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)¶

) -> str

Loads a worker with the given name and load-time parameters.

Parameter worker:: name of the worker to load
Parameter parameters:: load-time parameters for the worker

Returns: std::string

workerUnload(self: amdinfer.NativeClient, model: str) → None¶

Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.

Parameter worker:: name of the worker to unload

class amdinfer.ParameterMap¶

__init__(*args, **kwargs)¶

Overloaded function.

__init__(self: amdinfer._amdinfer.ParameterMap) -> None
__init__(self: amdinfer._amdinfer.ParameterMap, keys: List[str], values: List[Union[bool, int, float, str]]) -> None

Construct a new ParameterMap object with initial values. The sizes of the keys and values vectors must match.

Until C++20, passing const char* to this constructor will convert it to a bool instead of a string. Explicitly convert any string literals to a string before passing them to this constructor.

Parameter keys:: $Parameter values:

empty(self: amdinfer.ParameterMap) → bool¶: Checks if the parameters are empty

erase(self: amdinfer.ParameterMap, arg0: str) → None¶

Removes a parameter, if it exists. No error is raised if it doesn’t exist

Parameter key:: name of the parameter to remove

getBool(self: amdinfer.ParameterMap, arg0: str) → bool¶

Get the named parameter

Template parameter T:: type of parameter. Must be (bool|double|int32_t|std::string)
Parameter key:: parameter to get

Returns: T

getFloat(self: amdinfer.ParameterMap, arg0: str) → float¶

Get the named parameter

Template parameter T:: type of parameter. Must be (bool|double|int32_t|std::string)
Parameter key:: parameter to get

Returns: T

getInt(self: amdinfer.ParameterMap, arg0: str) → int¶

Get the named parameter

Template parameter T:: type of parameter. Must be (bool|double|int32_t|std::string)
Parameter key:: parameter to get

Returns: T

getString(self: amdinfer.ParameterMap, arg0: str) → str¶

Get the named parameter

Template parameter T:: type of parameter. Must be (bool|double|int32_t|std::string)
Parameter key:: parameter to get

Returns: T

has(self: amdinfer.ParameterMap, key: str) → bool¶

Checks if a particular parameter exists

Parameter key:: name of the parameter to check

Returns: bool

put(*args, **kwargs)¶

Overloaded function.

put(self: amdinfer._amdinfer.ParameterMap, arg0: str, arg1: str) -> None

Put in a key-value pair

Parameter key:: key used to store and retrieve the value
Parameter value:: value to store

put(self: amdinfer._amdinfer.ParameterMap, arg0: str, arg1: Union[bool, int, float, str]) -> None

Put in a key-value pair

Parameter key:: key used to store and retrieve the value
Parameter value:: value to store

size(self: amdinfer.ParameterMap) → int¶: Gets the number of parameters

exception amdinfer.RuntimeError¶

class amdinfer.Server¶

__init__(self: amdinfer.Server) → None¶: Constructs a new Server object

enableRepositoryMonitoring(self: amdinfer.Server, use_polling: bool) → None¶

Turn on active monitoring of the model repository path for new files. A model repository must be set with setModelRepository() before calling this method.

Parameter use_polling:: set to true to use polling to check the directory for new files, false to use events. Note that events may not work well on all platforms.

setModelRepository(self: amdinfer.Server, repository_path: os.PathLike, load_existing: bool) → None¶

Set the path to the model repository associated with this server

Parameter path:: path to the model repository
Parameter load_existing:: load all existing models found at the path

startGrpc(self: amdinfer.Server, port: int) → None¶

Start the gRPC server

Parameter port:: port to use for the gRPC server

startHttp(self: amdinfer.Server, port: int) → None¶

Start the HTTP server

Parameter port:: port to use for the HTTP server

stopGrpc(self: amdinfer.Server) → None¶: Stop the gRPC server

stopHttp(self: amdinfer.Server) → None¶: Stop the HTTP server

class amdinfer.ServerMetadata¶

__init__(self: amdinfer.ServerMetadata) → None¶

property extensions¶: The extensions supported by the server. The KServe specification allows servers to support custom extensions and return them with a metadata request.

property name¶: Name of the server

property version¶: Version of the server

class amdinfer.Tensor¶

__init__(self: amdinfer.Tensor, name: str, shape: List[int], dataType: amdinfer.DataType) → None¶: Describe a tensor with a name, shape and datatype

property datatype¶

getSize(self: amdinfer.Tensor) → int¶: Get the tensor’s size (number of elements)

property name¶

property shape¶

class amdinfer.WebSocketClient¶

__init__(self: amdinfer.WebSocketClient, ws_address: str, http_address: str) → None¶

Constructs a new WebSocketClient object

Parameter ws_address:: address of the websocket server to connect to
Parameter http_address:: address of the HTTP server to connect to

close(self: amdinfer.WebSocketClient) → None¶: Closes the websocket connection

hasHardware(self: amdinfer.WebSocketClient, name: str, num: int) → bool¶

Checks if the server has the requested number of a specific hardware device

Parameter name:: name of the hardware device to check
Parameter num:: number of the device that should exist at minimum

Returns: bool - true if server has at least the requested number of the hardware device, false otherwise

modelInferWs(self: amdinfer.WebSocketClient, model: str, request: amdinfer.InferenceRequest) → None¶

Makes a websocket inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for. This method differs from the standard inference in that it submits an actual Websocket message. The user should use modelRecv to get results and must disambiguate different responses on the client- side using the IDs of the responses.

Parameter model:: $Parameter request:

modelList(self: amdinfer.WebSocketClient) → List[str]¶

Gets a list of active models on the server, returning their names

Returns: std::vector<std::string>

modelRecv(self: amdinfer.WebSocketClient) → str¶

Gets one message from the websocket server sent in response to a modelInferWs request. The user should know beforehand how many messages are expected and should call this method the same number of times.

Returns: std::string a JSON object encoded as a string

serverLive(self: amdinfer.WebSocketClient) → bool¶

Checks if the server is live

Returns: bool - true if server is live, false otherwise

serverMetadata(self: amdinfer.WebSocketClient) → amdinfer.ServerMetadata¶

Returns the server metadata as a ServerMetadata object

Returns: ServerMetadata

serverReady(self: amdinfer.WebSocketClient) → bool¶

Checks if the server is ready

Returns: bool - true if server is ready, false otherwise

workerLoad(self: amdinfer.WebSocketClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)¶

) -> str

Loads a worker with the given name and load-time parameters.

Parameter worker:: name of the worker to load
Parameter parameters:: load-time parameters for the worker

Returns: std::string

workerUnload(self: amdinfer.WebSocketClient, model: str) → None¶

Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.

Parameter worker:: name of the worker to unload

amdinfer.inferAsyncOrdered(client: amdinfer.Client, model: str, requests: List[amdinfer.InferenceRequest], version: str = '') → List[amdinfer.InferenceResponse]¶

amdinfer.inferAsyncOrderedBatched(client: amdinfer.Client, model: str, requests: List[amdinfer.InferenceRequest], batch_sizes: int, version: str = '') → List[amdinfer.InferenceResponse]¶

amdinfer.inference_request_to_dict(request: amdinfer.InferenceRequest)¶

amdinfer.loadEnsemble(client: amdinfer.Client, models: List[str], parameters: List[amdinfer.ParameterMap]) → List[str]¶

amdinfer.parallel_infer(client, model, data, processes)¶

Make an inference to the server in parallel with n processes

Parameters

client (amdinfer.client) – Client to make the inference with
model (str) – Name of the model/worker to make the inference
data (list[np.ndarray]) – List of data to send
processes (int) – number of processes to use

Returns

Responses for each request

Return type

list[amdinfer.InferenceResponse]

amdinfer.serverHasExtension(client: amdinfer.Client, extension: str) → bool¶

amdinfer.start_http_client_server(address: str, extension=None)¶

amdinfer.stringToArray(data)¶

amdinfer.unloadModels(client: amdinfer.Client, models: List[str], version: str = '') → None¶

amdinfer.waitUntilModelNotReady(client: amdinfer.Client, model: str, version: str = '') → None¶

amdinfer.waitUntilModelReady(client: amdinfer.Client, model: str, version: str = '') → None¶

amdinfer.waitUntilServerReady(client: amdinfer.Client) → None¶