Python

The Python library for the AMD Inference Server allows you to communicate with the server using Python.

Install the Python library

The Python library is built and installed in the development container as part of the regular CMake build. To install it outside Docker or in different containers, you can use pip:

$ pip install amdinfer

Tip

Make sure the client library version is compatible with the server by using matching versions. If you are using the latest server from main, you may need to install the Python library with pip install --pre amdinfer to install a pre-release package, if it exists.

Build wheels

You can build wheels for the Python library to make a precompiled package that can be installed in any Linux host, container or environment. It is recommended to perform the following steps on a fresh clone of the inference server repository. These instructions assume you’re only building wheels for x86_64 Linux with CPython.

# generate a Dockerfile that defines an image for building wheels
./docker/generate.py --cibuildwheel --base-image=quay.io/pypa/manylinux2014_x86_64 --base-image-type yum

# build the image. You should add some suffix to differentiate this image
# from the regular image
./amdinfer dockerize --suffix="-ci"

# this will build an image with the name $(whoami)/amdinfer-dev-ci:latest
# if you're not building wheels on the same host, you will need to upload
# this image to a Docker registry

# on a host where your image exists or can be pulled
export CIBW_MANYLINUX_X86_64_IMAGE=$(whoami)/amdinfer-dev-ci:latest

pip install cibuildwheel

# you can edit pyproject.toml to control which wheels to build or just use the defaults

cibuildwheel --platform linux

# your built wheels will be in ./wheelhouse

After following these instructions, your built wheels will be in ./wheelhouse/. The names on the wheels indicate the Python version they are compatible with. For example, cp37 in the name indicates that it’s compatible with CPython 3.7. You can install these wheels in a virtual environment, Conda environment, a container or on a bare host.

pip install <path/to/wheel>

API

exception amdinfer.BadStatus
class amdinfer.Client
__init__(*args, **kwargs)
exception amdinfer.ConnectionError
class amdinfer.DataType
BOOL = DataType(BOOL)
FLOAT32 = DataType(FP32)
FLOAT64 = DataType(FP64)
FP16 = DataType(FP16)
FP32 = DataType(FP32)
FP64 = DataType(FP64)
INT16 = DataType(INT16)
INT32 = DataType(INT32)
INT64 = DataType(INT64)
INT8 = DataType(INT8)
STRING = DataType(STRING)
UINT16 = DataType(UINT16)
UINT32 = DataType(UINT32)
UINT64 = DataType(UINT64)
UINT8 = DataType(UINT8)
class Value

Members:

BOOL

UINT8

UINT16

UINT32

UINT64

INT8

INT16

INT32

INT64

FP16

FP32

FLOAT32

FP64

FLOAT64

STRING

BOOL = <Value.BOOL: 0>
FLOAT32 = <Value.FP32: 10>
FLOAT64 = <Value.FP64: 11>
FP16 = <Value.FP16: 9>
FP32 = <Value.FP32: 10>
FP64 = <Value.FP64: 11>
INT16 = <Value.INT16: 6>
INT32 = <Value.INT32: 7>
INT64 = <Value.INT64: 8>
INT8 = <Value.INT8: 5>
STRING = <Value.STRING: 12>
UINT16 = <Value.UINT16: 2>
UINT32 = <Value.UINT32: 3>
UINT64 = <Value.UINT64: 4>
UINT8 = <Value.UINT8: 1>
__init__(self: amdinfer.DataType.Value, value: int) None
property name
property value
__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: amdinfer._amdinfer.DataType) -> None

  2. __init__(self: amdinfer._amdinfer.DataType) -> None

  3. __init__(self: amdinfer._amdinfer.DataType, arg0: str) -> None

  4. __init__(self: amdinfer._amdinfer.DataType, arg0: amdinfer._amdinfer.DataType.Value) -> None

size(self: amdinfer.DataType) int
str(self: amdinfer.DataType) str
exception amdinfer.EnvironmentNotSetError
exception amdinfer.ExternalError
exception amdinfer.FileNotFoundError
exception amdinfer.FileReadError
class amdinfer.GrpcClient
__init__(self: amdinfer.GrpcClient, address: str) None

Constructs a new GrpcClient object

Parameter address:

Address of the server to connect to

hasHardware(self: amdinfer.GrpcClient, name: str, num: int) bool

Checks if the server has the requested number of a specific hardware device

Parameter name:

name of the hardware device to check

Parameter num:

number of the device that should exist at minimum

Returns:

bool - true if server has at least the requested number of the hardware device, false otherwise

modelInfer(self: amdinfer.GrpcClient, model: str, request: amdinfer.InferenceRequest) amdinfer.InferenceResponse

Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.

Parameter model:

name of the model/worker to request inference to

Parameter request:

the request

Returns:

InferenceResponse

modelList(self: amdinfer.GrpcClient) List[str]

Gets a list of active models on the server, returning their names

Returns:

std::vector<std::string>

modelLoad(self: amdinfer.GrpcClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)

) -> None

Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.

Parameter model:

name of the model to load from the model repository directory

Parameter parameters:

load-time parameters for the worker supporting the model

modelMetadata(self: amdinfer.GrpcClient, model: str) amdinfer.ModelMetadata

Returns the metadata associated with a ready model/worker

Parameter model:

name of the model/worker to get metadata

Returns:

ModelMetadata

modelReady(self: amdinfer.GrpcClient, model: str) bool

Checks if a model/worker is ready

Parameter model:

name of the model to check

Returns:

bool - true if model is ready, false otherwise

modelUnload(self: amdinfer.GrpcClient, model: str) None

Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.

Parameter model:

name of the model to unload

serverLive(self: amdinfer.GrpcClient) bool

Checks if the server is live

Returns:

bool - true if server is live, false otherwise

serverMetadata(self: amdinfer.GrpcClient) amdinfer.ServerMetadata

Returns the server metadata as a ServerMetadata object

Returns:

ServerMetadata

serverReady(self: amdinfer.GrpcClient) bool

Checks if the server is ready

Returns:

bool - true if server is ready, false otherwise

workerLoad(self: amdinfer.GrpcClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)

) -> str

Loads a worker with the given name and load-time parameters.

Parameter worker:

name of the worker to load

Parameter parameters:

load-time parameters for the worker

Returns:

std::string

workerUnload(self: amdinfer.GrpcClient, model: str) None

Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.

Parameter worker:

name of the worker to unload

class amdinfer.HttpClient
__init__(self: amdinfer.HttpClient, address: str, headers: Dict[str, str] = {}, parallelism: int = 32) None

Construct a new HttpClient object

Parameter address:

Address of the server to connect to

hasHardware(self: amdinfer.HttpClient, name: str, num: int) bool

Checks if the server has the requested number of a specific hardware device

Parameter name:

name of the hardware device to check

Parameter num:

number of the device that should exist at minimum

Returns:

bool - true if server has at least the requested number of the hardware device, false otherwise

modelInfer(self: amdinfer.HttpClient, model: str, request: amdinfer.InferenceRequest) amdinfer.InferenceResponse

Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.

Parameter model:

name of the model/worker to request inference to

Parameter request:

the request

Returns:

InferenceResponse

modelList(self: amdinfer.HttpClient) List[str]

Gets a list of active models on the server, returning their names

Returns:

std::vector<std::string>

modelLoad(self: amdinfer.HttpClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)

) -> None

Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.

Parameter model:

name of the model to load from the model repository directory

Parameter parameters:

load-time parameters for the worker supporting the model

modelMetadata(self: amdinfer.HttpClient, model: str) amdinfer.ModelMetadata

Returns the metadata associated with a ready model/worker

Parameter model:

name of the model/worker to get metadata

Returns:

ModelMetadata

modelReady(self: amdinfer.HttpClient, model: str) bool

Checks if a model/worker is ready

Parameter model:

name of the model to check

Returns:

bool - true if model is ready, false otherwise

modelUnload(self: amdinfer.HttpClient, model: str) None

Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.

Parameter model:

name of the model to unload

serverLive(self: amdinfer.HttpClient) bool

Checks if the server is live

Returns:

bool - true if server is live, false otherwise

serverMetadata(self: amdinfer.HttpClient) amdinfer.ServerMetadata

Returns the server metadata as a ServerMetadata object

Returns:

ServerMetadata

serverReady(self: amdinfer.HttpClient) bool

Checks if the server is ready

Returns:

bool - true if server is ready, false otherwise

workerLoad(self: amdinfer.HttpClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)

) -> str

Loads a worker with the given name and load-time parameters.

Parameter worker:

name of the worker to load

Parameter parameters:

load-time parameters for the worker

Returns:

std::string

workerUnload(self: amdinfer.HttpClient, model: str) None

Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.

Parameter worker:

name of the worker to unload

amdinfer.ImageInferenceRequest(images, asTensor=True)

Construct a request from an image or list of images

Parameters:
  • images (image) – Images may be numpy arrays or filepaths or a list of these

  • asTensor (bool, optional) – Send data as a tensor or as base64-encoded string. Defaults to True.

Raises:

TypeError – Raised if an unknown image format is passed

Returns:

Request object

Return type:

InferenceRequest

class amdinfer.InferenceRequest
__init__(self: amdinfer.InferenceRequest) None
addInputTensor(self: amdinfer.InferenceRequest, input: amdinfer.InferenceRequestInput) None

Constructs and adds a new input tensor to this request

Parameter data:

pointer to data to add

Parameter shape:

shape of the data

Parameter data_type:

the datatype of the data

Parameter name:

the name of the input tensor

addOutputTensor(self: amdinfer.InferenceRequest, output: amdinfer.InferenceRequestOutput) None

Adds a new output tensor to this request

Parameter output:

an existing InferenceRequestOutput object

getInputSize(self: amdinfer.InferenceRequest) int

Get the number of input request objects

getInputs(self: amdinfer.InferenceRequest) List[amdinfer.InferenceRequestInput]

Gets a vector of all the input request objects

getOutputs(self: amdinfer.InferenceRequest) List[amdinfer.InferenceRequestOutput]

Gets a vector of the requested output information

property id
property parameters
propagate(self: amdinfer.InferenceRequest) amdinfer.InferenceRequest
class amdinfer.InferenceRequestInput
__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: amdinfer._amdinfer.InferenceRequestInput) -> None

Constructs a new InferenceRequestInput object

  1. __init__(self: amdinfer._amdinfer.InferenceRequestInput, tensor: amdinfer._amdinfer.Tensor) -> None

Constructs a new InferenceRequestInput object

  1. __init__(self: amdinfer._amdinfer.InferenceRequestInput, data: capsule, shape: List[int], data_type: amdinfer._amdinfer.DataType, data: str = ‘’) -> None

Construct a new InferenceRequestInput object

Parameter data:

pointer to data

Parameter shape:

shape of the data

Parameter data_type:

type of the data

Parameter name:

name to assign

getFp16Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[half_float::half]
getFp32Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.float32]
getFp64Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.float64]
getInt16Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.int16]
getInt32Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.int32]
getInt64Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.int64]
getInt8Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.int8]
getStringData(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.int8]
getUint16Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.uint16]
getUint32Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.uint32]
getUint64Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.uint64]
getUint8Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.uint8]
setFp16Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[half_float::half]) None
setFp32Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.float32]) None
setFp64Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.float64]) None
setInt16Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.int16]) None
setInt32Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.int32]) None
setInt64Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.int64]) None
setInt8Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.int8]) None
setStringData(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint8]) None
setUint16Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint16]) None
setUint32Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint32]) None
setUint64Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint64]) None
setUint8Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint8]) None
class amdinfer.InferenceRequestOutput
__init__(self: amdinfer.InferenceRequestOutput) None

Holds an inference request’s output data

property data
property name
property parameters
class amdinfer.InferenceResponse
__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: amdinfer._amdinfer.InferenceResponse) -> None

Constructs a new InferenceResponse object

  1. __init__(self: amdinfer._amdinfer.InferenceResponse, arg0: str) -> None

Constructs a new InferenceResponse error object

addOutput(self: amdinfer.InferenceResponse, output: amdinfer.InferenceResponseOutput) None

Adds an output tensor to the response

Parameter output:

an output tensor

getContext(self: amdinfer.InferenceResponse) Dict[str, str]
getError(self: amdinfer.InferenceResponse) str

Gets the error message if it exists. Defaults to an empty string

getOutputs(self: amdinfer.InferenceResponse) List[amdinfer.InferenceResponseOutput]

Gets a vector of the requested output information

getParameters(self: amdinfer.InferenceResponse) amdinfer.ParameterMap

Gets a pointer to the parameters associated with this response

property id
isError(self: amdinfer.InferenceResponse) bool

Checks if this is an error response

property model
setContext(self: amdinfer.InferenceResponse, arg0: Dict[str, str]) None
class amdinfer.InferenceResponseOutput
__init__(self: amdinfer.InferenceResponseOutput) None

Holds an inference request’s input data

property datatype
getFp16Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[half_float::half]
getFp32Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.float32]
getFp64Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.float64]
getInt16Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.int16]
getInt32Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.int32]
getInt64Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.int64]
getInt8Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.int8]
getSize(self: amdinfer.InferenceResponseOutput) int

Get the tensor’s size (number of elements)

getStringData(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.int8]
getUint16Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.uint16]
getUint32Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.uint32]
getUint64Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.uint64]
getUint8Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.uint8]
property name
property parameters
setFp16Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[half_float::half]) None
setFp32Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.float32]) None
setFp64Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.float64]) None
setInt16Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.int16]) None
setInt32Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.int32]) None
setInt64Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.int64]) None
setInt8Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.int8]) None
setStringData(self: amdinfer.InferenceResponseOutput, arg0: str) None
setUint16Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.uint16]) None
setUint32Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.uint32]) None
setUint64Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.uint64]) None
setUint8Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.uint8]) None
property shape
class amdinfer.InferenceTensor
__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: amdinfer._amdinfer.InferenceTensor, name: str, shape: List[int], dataType: amdinfer._amdinfer.DataType) -> None

Construct a new InferenceTensor object

  1. __init__(self: amdinfer._amdinfer.InferenceTensor, tensor: amdinfer._amdinfer.Tensor) -> None

Construct a new InferenceTensor object

property parameters
exception amdinfer.InvalidArgumentError
class amdinfer.ModelMetadata
__init__(self: amdinfer.ModelMetadata, arg0: str, arg1: str) None

Constructs a new Model Metadata object

Parameter name:

Name of the model

Parameter platform:

the platform this model runs on

addInputTensor(self: amdinfer.ModelMetadata, arg0: str, arg1: List[int], arg2: amdinfer.DataType) None

Adds an input tensor to this model

Parameter name:

name of the tensor

Parameter shape:

shape of the tensor

Parameter datatype:

datatype of the tensor

addOutputTensor(self: amdinfer.ModelMetadata, name: str, datatype: List[int], shape: amdinfer.DataType) None

Adds an output tensor to this model

Parameter name:

name of the tensor

Parameter shape:

shape of the tensor

Parameter datatype:

datatype of the tensor

getPlatform(self: amdinfer.ModelMetadata) str
isReady(self: amdinfer.ModelMetadata) bool

Checks if this model is ready

property name
setReady(self: amdinfer.ModelMetadata, arg0: bool) None

Marks this model as ready/not ready

class amdinfer.NativeClient
__init__(self: amdinfer.NativeClient, server: amdinfer.Server) None
hasHardware(self: amdinfer.NativeClient, name: str, num: int) bool

Checks if the server has the requested number of a specific hardware device

Parameter name:

name of the hardware device to check

Parameter num:

number of the device that should exist at minimum

Returns:

bool - true if server has at least the requested number of the hardware device, false otherwise

modelInfer(self: amdinfer.NativeClient, model: str, request: amdinfer.InferenceRequest) amdinfer.InferenceResponse

Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.

Parameter model:

name of the model/worker to request inference to

Parameter request:

the request

Returns:

InferenceResponse

modelList(self: amdinfer.NativeClient) List[str]

Gets a list of active models on the server, returning their names

Returns:

std::vector<std::string>

modelLoad(self: amdinfer.NativeClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)

) -> None

Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.

Parameter model:

name of the model to load from the model repository directory

Parameter parameters:

load-time parameters for the worker supporting the model

modelMetadata(self: amdinfer.NativeClient, model: str) amdinfer.ModelMetadata

Returns the metadata associated with a ready model/worker

Parameter model:

name of the model/worker to get metadata

Returns:

ModelMetadata

modelReady(self: amdinfer.NativeClient, model: str) bool

Checks if a model/worker is ready

Parameter model:

name of the model to check

Returns:

bool - true if model is ready, false otherwise

modelUnload(self: amdinfer.NativeClient, model: str) None

Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.

Parameter model:

name of the model to unload

serverLive(self: amdinfer.NativeClient) bool

Checks if the server is live

Returns:

bool - true if server is live, false otherwise

serverMetadata(self: amdinfer.NativeClient) amdinfer.ServerMetadata

Returns the server metadata as a ServerMetadata object

Returns:

ServerMetadata

serverReady(self: amdinfer.NativeClient) bool

Checks if the server is ready

Returns:

bool - true if server is ready, false otherwise

workerLoad(self: amdinfer.NativeClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)

) -> str

Loads a worker with the given name and load-time parameters.

Parameter worker:

name of the worker to load

Parameter parameters:

load-time parameters for the worker

Returns:

std::string

workerUnload(self: amdinfer.NativeClient, model: str) None

Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.

Parameter worker:

name of the worker to unload

class amdinfer.ParameterMap
__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: amdinfer._amdinfer.ParameterMap) -> None

  2. __init__(self: amdinfer._amdinfer.ParameterMap, keys: List[str], values: List[Union[bool, int, float, str]]) -> None

Construct a new ParameterMap object with initial values. The sizes of the keys and values vectors must match.

Until C++20, passing const char* to this constructor will convert it to a bool instead of a string. Explicitly convert any string literals to a string before passing them to this constructor.

Parameter keys:

$Parameter values:

empty(self: amdinfer.ParameterMap) bool

Checks if the parameters are empty

erase(self: amdinfer.ParameterMap, arg0: str) None

Removes a parameter, if it exists. No error is raised if it doesn’t exist

Parameter key:

name of the parameter to remove

getBool(self: amdinfer.ParameterMap, arg0: str) bool

Get the named parameter

Template parameter T:

type of parameter. Must be (bool|double|int32_t|std::string)

Parameter key:

parameter to get

Returns:

T

getFloat(self: amdinfer.ParameterMap, arg0: str) float

Get the named parameter

Template parameter T:

type of parameter. Must be (bool|double|int32_t|std::string)

Parameter key:

parameter to get

Returns:

T

getInt(self: amdinfer.ParameterMap, arg0: str) int

Get the named parameter

Template parameter T:

type of parameter. Must be (bool|double|int32_t|std::string)

Parameter key:

parameter to get

Returns:

T

getString(self: amdinfer.ParameterMap, arg0: str) str

Get the named parameter

Template parameter T:

type of parameter. Must be (bool|double|int32_t|std::string)

Parameter key:

parameter to get

Returns:

T

has(self: amdinfer.ParameterMap, key: str) bool

Checks if a particular parameter exists

Parameter key:

name of the parameter to check

Returns:

bool

put(*args, **kwargs)

Overloaded function.

  1. put(self: amdinfer._amdinfer.ParameterMap, arg0: str, arg1: str) -> None

Put in a key-value pair

Parameter key:

key used to store and retrieve the value

Parameter value:

value to store

  1. put(self: amdinfer._amdinfer.ParameterMap, arg0: str, arg1: Union[bool, int, float, str]) -> None

Put in a key-value pair

Parameter key:

key used to store and retrieve the value

Parameter value:

value to store

size(self: amdinfer.ParameterMap) int

Gets the number of parameters

exception amdinfer.RuntimeError
class amdinfer.Server
__init__(self: amdinfer.Server) None

Constructs a new Server object

enableRepositoryMonitoring(self: amdinfer.Server, use_polling: bool) None

Turn on active monitoring of the model repository path for new files. A model repository must be set with setModelRepository() before calling this method.

Parameter use_polling:

set to true to use polling to check the directory for new files, false to use events. Note that events may not work well on all platforms.

setModelRepository(self: amdinfer.Server, repository_path: os.PathLike, load_existing: bool) None

Set the path to the model repository associated with this server

Parameter path:

path to the model repository

Parameter load_existing:

load all existing models found at the path

startGrpc(self: amdinfer.Server, port: int) None

Start the gRPC server

Parameter port:

port to use for the gRPC server

startHttp(self: amdinfer.Server, port: int) None

Start the HTTP server

Parameter port:

port to use for the HTTP server

stopGrpc(self: amdinfer.Server) None

Stop the gRPC server

stopHttp(self: amdinfer.Server) None

Stop the HTTP server

class amdinfer.ServerMetadata
__init__(self: amdinfer.ServerMetadata) None
property extensions

The extensions supported by the server. The KServe specification allows servers to support custom extensions and return them with a metadata request.

property name

Name of the server

property version

Version of the server

class amdinfer.Tensor
__init__(self: amdinfer.Tensor, name: str, shape: List[int], dataType: amdinfer.DataType) None

Describe a tensor with a name, shape and datatype

property datatype
getSize(self: amdinfer.Tensor) int

Get the tensor’s size (number of elements)

property name
property shape
class amdinfer.WebSocketClient
__init__(self: amdinfer.WebSocketClient, ws_address: str, http_address: str) None

Constructs a new WebSocketClient object

Parameter ws_address:

address of the websocket server to connect to

Parameter http_address:

address of the HTTP server to connect to

close(self: amdinfer.WebSocketClient) None

Closes the websocket connection

hasHardware(self: amdinfer.WebSocketClient, name: str, num: int) bool

Checks if the server has the requested number of a specific hardware device

Parameter name:

name of the hardware device to check

Parameter num:

number of the device that should exist at minimum

Returns:

bool - true if server has at least the requested number of the hardware device, false otherwise

modelInfer(self: amdinfer.WebSocketClient, model: str, request: amdinfer.InferenceRequest) amdinfer.InferenceResponse

Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.

Parameter model:

name of the model/worker to request inference to

Parameter request:

the request

Returns:

InferenceResponse

modelInferWs(self: amdinfer.WebSocketClient, model: str, request: amdinfer.InferenceRequest) None

Makes a websocket inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for. This method differs from the standard inference in that it submits an actual Websocket message. The user should use modelRecv to get results and must disambiguate different responses on the client- side using the IDs of the responses.

Parameter model:

$Parameter request:

modelList(self: amdinfer.WebSocketClient) List[str]

Gets a list of active models on the server, returning their names

Returns:

std::vector<std::string>

modelLoad(self: amdinfer.WebSocketClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)

) -> None

Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.

Parameter model:

name of the model to load from the model repository directory

Parameter parameters:

load-time parameters for the worker supporting the model

modelMetadata(self: amdinfer.WebSocketClient, model: str) amdinfer.ModelMetadata

Returns the metadata associated with a ready model/worker

Parameter model:

name of the model/worker to get metadata

Returns:

ModelMetadata

modelReady(self: amdinfer.WebSocketClient, model: str) bool

Checks if a model/worker is ready

Parameter model:

name of the model to check

Returns:

bool - true if model is ready, false otherwise

modelRecv(self: amdinfer.WebSocketClient) str

Gets one message from the websocket server sent in response to a modelInferWs request. The user should know beforehand how many messages are expected and should call this method the same number of times.

Returns:

std::string a JSON object encoded as a string

modelUnload(self: amdinfer.WebSocketClient, model: str) None

Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.

Parameter model:

name of the model to unload

serverLive(self: amdinfer.WebSocketClient) bool

Checks if the server is live

Returns:

bool - true if server is live, false otherwise

serverMetadata(self: amdinfer.WebSocketClient) amdinfer.ServerMetadata

Returns the server metadata as a ServerMetadata object

Returns:

ServerMetadata

serverReady(self: amdinfer.WebSocketClient) bool

Checks if the server is ready

Returns:

bool - true if server is ready, false otherwise

workerLoad(self: amdinfer.WebSocketClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)

) -> str

Loads a worker with the given name and load-time parameters.

Parameter worker:

name of the worker to load

Parameter parameters:

load-time parameters for the worker

Returns:

std::string

workerUnload(self: amdinfer.WebSocketClient, model: str) None

Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.

Parameter worker:

name of the worker to unload

amdinfer.inferAsyncOrdered(client: amdinfer.Client, model: str, requests: List[amdinfer.InferenceRequest]) List[amdinfer.InferenceResponse]
amdinfer.inferAsyncOrderedBatched(client: amdinfer.Client, model: str, requests: List[amdinfer.InferenceRequest], batch_sizes: int) List[amdinfer.InferenceResponse]
amdinfer.inference_request_to_dict(request: InferenceRequest)
amdinfer.loadEnsemble(client: amdinfer.Client, models: List[str], parameters: List[amdinfer.ParameterMap]) List[str]
amdinfer.parallel_infer(client, model, data, processes)

Make an inference to the server in parallel with n processes

Parameters:
  • client (amdinfer.client) – Client to make the inference with

  • model (str) – Name of the model/worker to make the inference

  • data (list[np.ndarray]) – List of data to send

  • processes (int) – number of processes to use

Returns:

Responses for each request

Return type:

list[amdinfer.InferenceResponse]

amdinfer.serverHasExtension(client: amdinfer.Client, extension: str) bool
amdinfer.start_http_client_server(address: str, extension=None)
amdinfer.stringToArray(data)
amdinfer.unloadModels(client: amdinfer.Client, models: List[str]) None
amdinfer.waitUntilModelNotReady(client: amdinfer.Client, model: str) None
amdinfer.waitUntilModelReady(client: amdinfer.Client, model: str) None
amdinfer.waitUntilServerReady(client: amdinfer.Client) None