Python¶
The Python library for the AMD Inference Server allows you to communicate with the server using Python.
Install the Python library¶
The Python library is built and installed in the development container as part of the regular CMake build.
To install it outside Docker or in different containers, you can use pip
:
$ pip install amdinfer
Tip
Make sure the client library version is compatible with the server by using matching versions.
If you are using the latest server from main
, you may need to install the Python library with pip install --pre amdinfer
to install a pre-release package, if it exists.
Build wheels¶
You can build wheels for the Python library to make a precompiled package that can be installed in any Linux host, container or environment. It is recommended to perform the following steps on a fresh clone of the inference server repository. These instructions assume you’re only building wheels for x86_64 Linux with CPython.
# generate a Dockerfile that defines an image for building wheels
./docker/generate.py --cibuildwheel --base-image=quay.io/pypa/manylinux2014_x86_64 --base-image-type yum
# build the image. You should add some suffix to differentiate this image
# from the regular image
./amdinfer dockerize --suffix="-ci"
# this will build an image with the name $(whoami)/amdinfer-dev-ci:latest
# if you're not building wheels on the same host, you will need to upload
# this image to a Docker registry
# on a host where your image exists or can be pulled
export CIBW_MANYLINUX_X86_64_IMAGE=$(whoami)/amdinfer-dev-ci:latest
pip install cibuildwheel
# you can edit pyproject.toml to control which wheels to build or just use the defaults
cibuildwheel --platform linux
# your built wheels will be in ./wheelhouse
After following these instructions, your built wheels will be in ./wheelhouse/
.
The names on the wheels indicate the Python version they are compatible with.
For example, cp37
in the name indicates that it’s compatible with CPython 3.7.
You can install these wheels in a virtual environment, Conda environment, a container or on a bare host.
pip install <path/to/wheel>
API¶
- exception amdinfer.BadStatus¶
- exception amdinfer.ConnectionError¶
- class amdinfer.DataType¶
- BOOL = DataType(BOOL) ¶
- FLOAT32 = DataType(FP32) ¶
- FLOAT64 = DataType(FP64) ¶
- FP16 = DataType(FP16) ¶
- FP32 = DataType(FP32) ¶
- FP64 = DataType(FP64) ¶
- INT16 = DataType(INT16) ¶
- INT32 = DataType(INT32) ¶
- INT64 = DataType(INT64) ¶
- INT8 = DataType(INT8) ¶
- STRING = DataType(STRING) ¶
- UINT16 = DataType(UINT16) ¶
- UINT32 = DataType(UINT32) ¶
- UINT64 = DataType(UINT64) ¶
- UINT8 = DataType(UINT8) ¶
- class Value¶
Members:
BOOL
UINT8
UINT16
UINT32
UINT64
INT8
INT16
INT32
INT64
FP16
FP32
FLOAT32
FP64
FLOAT64
STRING
- BOOL = <Value.BOOL: 0>¶
- FLOAT32 = <Value.FP32: 10>¶
- FLOAT64 = <Value.FP64: 11>¶
- FP16 = <Value.FP16: 9>¶
- FP32 = <Value.FP32: 10>¶
- FP64 = <Value.FP64: 11>¶
- INT16 = <Value.INT16: 6>¶
- INT32 = <Value.INT32: 7>¶
- INT64 = <Value.INT64: 8>¶
- INT8 = <Value.INT8: 5>¶
- STRING = <Value.STRING: 12>¶
- UINT16 = <Value.UINT16: 2>¶
- UINT32 = <Value.UINT32: 3>¶
- UINT64 = <Value.UINT64: 4>¶
- UINT8 = <Value.UINT8: 1>¶
- __init__(self: amdinfer.DataType.Value, value: int) None ¶
- property name¶
- property value¶
- __init__(*args, **kwargs)¶
Overloaded function.
__init__(self: amdinfer._amdinfer.DataType) -> None
__init__(self: amdinfer._amdinfer.DataType) -> None
__init__(self: amdinfer._amdinfer.DataType, arg0: str) -> None
__init__(self: amdinfer._amdinfer.DataType, arg0: amdinfer._amdinfer.DataType.Value) -> None
- size(self: amdinfer.DataType) int ¶
- str(self: amdinfer.DataType) str ¶
- exception amdinfer.EnvironmentNotSetError¶
- exception amdinfer.ExternalError¶
- exception amdinfer.FileNotFoundError¶
- exception amdinfer.FileReadError¶
- class amdinfer.GrpcClient¶
- __init__(self: amdinfer.GrpcClient, address: str) None ¶
Constructs a new GrpcClient object
- Parameter
address
: Address of the server to connect to
- Parameter
- hasHardware(self: amdinfer.GrpcClient, name: str, num: int) bool ¶
Checks if the server has the requested number of a specific hardware device
- Parameter
name
: name of the hardware device to check
- Parameter
num
: number of the device that should exist at minimum
- Returns:
bool - true if server has at least the requested number of the hardware device, false otherwise
- Parameter
- modelInfer(self: amdinfer.GrpcClient, model: str, request: amdinfer.InferenceRequest) amdinfer.InferenceResponse ¶
Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.
- Parameter
model
: name of the model/worker to request inference to
- Parameter
request
: the request
- Returns:
InferenceResponse
- Parameter
- modelList(self: amdinfer.GrpcClient) List[str] ¶
Gets a list of active models on the server, returning their names
- Returns:
std::vector<std::string>
- modelLoad(self: amdinfer.GrpcClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)¶
) -> None
Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.
- Parameter
model
: name of the model to load from the model repository directory
- Parameter
parameters
: load-time parameters for the worker supporting the model
- Parameter
- modelMetadata(self: amdinfer.GrpcClient, model: str) amdinfer.ModelMetadata ¶
Returns the metadata associated with a ready model/worker
- Parameter
model
: name of the model/worker to get metadata
- Returns:
ModelMetadata
- Parameter
- modelReady(self: amdinfer.GrpcClient, model: str) bool ¶
Checks if a model/worker is ready
- Parameter
model
: name of the model to check
- Returns:
bool - true if model is ready, false otherwise
- Parameter
- modelUnload(self: amdinfer.GrpcClient, model: str) None ¶
Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.
- Parameter
model
: name of the model to unload
- Parameter
- serverLive(self: amdinfer.GrpcClient) bool ¶
Checks if the server is live
- Returns:
bool - true if server is live, false otherwise
- serverMetadata(self: amdinfer.GrpcClient) amdinfer.ServerMetadata ¶
Returns the server metadata as a ServerMetadata object
- Returns:
ServerMetadata
- serverReady(self: amdinfer.GrpcClient) bool ¶
Checks if the server is ready
- Returns:
bool - true if server is ready, false otherwise
- workerLoad(self: amdinfer.GrpcClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)¶
) -> str
Loads a worker with the given name and load-time parameters.
- Parameter
worker
: name of the worker to load
- Parameter
parameters
: load-time parameters for the worker
- Returns:
std::string
- Parameter
- workerUnload(self: amdinfer.GrpcClient, model: str) None ¶
Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.
- Parameter
worker
: name of the worker to unload
- Parameter
- class amdinfer.HttpClient¶
- __init__(self: amdinfer.HttpClient, address: str, headers: Dict[str, str] = {}, parallelism: int = 32) None ¶
Construct a new HttpClient object
- Parameter
address
: Address of the server to connect to
- Parameter
- hasHardware(self: amdinfer.HttpClient, name: str, num: int) bool ¶
Checks if the server has the requested number of a specific hardware device
- Parameter
name
: name of the hardware device to check
- Parameter
num
: number of the device that should exist at minimum
- Returns:
bool - true if server has at least the requested number of the hardware device, false otherwise
- Parameter
- modelInfer(self: amdinfer.HttpClient, model: str, request: amdinfer.InferenceRequest) amdinfer.InferenceResponse ¶
Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.
- Parameter
model
: name of the model/worker to request inference to
- Parameter
request
: the request
- Returns:
InferenceResponse
- Parameter
- modelList(self: amdinfer.HttpClient) List[str] ¶
Gets a list of active models on the server, returning their names
- Returns:
std::vector<std::string>
- modelLoad(self: amdinfer.HttpClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)¶
) -> None
Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.
- Parameter
model
: name of the model to load from the model repository directory
- Parameter
parameters
: load-time parameters for the worker supporting the model
- Parameter
- modelMetadata(self: amdinfer.HttpClient, model: str) amdinfer.ModelMetadata ¶
Returns the metadata associated with a ready model/worker
- Parameter
model
: name of the model/worker to get metadata
- Returns:
ModelMetadata
- Parameter
- modelReady(self: amdinfer.HttpClient, model: str) bool ¶
Checks if a model/worker is ready
- Parameter
model
: name of the model to check
- Returns:
bool - true if model is ready, false otherwise
- Parameter
- modelUnload(self: amdinfer.HttpClient, model: str) None ¶
Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.
- Parameter
model
: name of the model to unload
- Parameter
- serverLive(self: amdinfer.HttpClient) bool ¶
Checks if the server is live
- Returns:
bool - true if server is live, false otherwise
- serverMetadata(self: amdinfer.HttpClient) amdinfer.ServerMetadata ¶
Returns the server metadata as a ServerMetadata object
- Returns:
ServerMetadata
- serverReady(self: amdinfer.HttpClient) bool ¶
Checks if the server is ready
- Returns:
bool - true if server is ready, false otherwise
- workerLoad(self: amdinfer.HttpClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)¶
) -> str
Loads a worker with the given name and load-time parameters.
- Parameter
worker
: name of the worker to load
- Parameter
parameters
: load-time parameters for the worker
- Returns:
std::string
- Parameter
- workerUnload(self: amdinfer.HttpClient, model: str) None ¶
Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.
- Parameter
worker
: name of the worker to unload
- Parameter
- amdinfer.ImageInferenceRequest(images, asTensor=True)¶
Construct a request from an image or list of images
- Parameters:
images (image) – Images may be numpy arrays or filepaths or a list of these
asTensor (bool, optional) – Send data as a tensor or as base64-encoded string. Defaults to True.
- Raises:
TypeError – Raised if an unknown image format is passed
- Returns:
Request object
- Return type:
- class amdinfer.InferenceRequest¶
- __init__(self: amdinfer.InferenceRequest) None ¶
- addInputTensor(self: amdinfer.InferenceRequest, input: amdinfer.InferenceRequestInput) None ¶
Constructs and adds a new input tensor to this request
- Parameter
data
: pointer to data to add
- Parameter
shape
: shape of the data
- Parameter
data_type
: the datatype of the data
- Parameter
name
: the name of the input tensor
- Parameter
- addOutputTensor(self: amdinfer.InferenceRequest, output: amdinfer.InferenceRequestOutput) None ¶
Adds a new output tensor to this request
- Parameter
output
: an existing InferenceRequestOutput object
- Parameter
- getInputSize(self: amdinfer.InferenceRequest) int ¶
Get the number of input request objects
- getInputs(self: amdinfer.InferenceRequest) List[amdinfer.InferenceRequestInput] ¶
Gets a vector of all the input request objects
- getOutputs(self: amdinfer.InferenceRequest) List[amdinfer.InferenceRequestOutput] ¶
Gets a vector of the requested output information
- property id¶
- property parameters¶
- propagate(self: amdinfer.InferenceRequest) amdinfer.InferenceRequest ¶
- class amdinfer.InferenceRequestInput¶
- __init__(*args, **kwargs)¶
Overloaded function.
__init__(self: amdinfer._amdinfer.InferenceRequestInput) -> None
Constructs a new InferenceRequestInput object
__init__(self: amdinfer._amdinfer.InferenceRequestInput, tensor: amdinfer._amdinfer.Tensor) -> None
Constructs a new InferenceRequestInput object
__init__(self: amdinfer._amdinfer.InferenceRequestInput, data: capsule, shape: List[int], data_type: amdinfer._amdinfer.DataType, data: str = ‘’) -> None
Construct a new InferenceRequestInput object
- Parameter
data
: pointer to data
- Parameter
shape
: shape of the data
- Parameter
data_type
: type of the data
- Parameter
name
: name to assign
- getFp16Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[half_float::half] ¶
- getFp32Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.float32] ¶
- getFp64Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.float64] ¶
- getInt16Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.int16] ¶
- getInt32Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.int32] ¶
- getInt64Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.int64] ¶
- getInt8Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.int8] ¶
- getStringData(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.int8] ¶
- getUint16Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.uint16] ¶
- getUint32Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.uint32] ¶
- getUint64Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.uint64] ¶
- getUint8Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.uint8] ¶
- setFp16Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[half_float::half]) None ¶
- setFp32Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.float32]) None ¶
- setFp64Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.float64]) None ¶
- setInt16Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.int16]) None ¶
- setInt32Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.int32]) None ¶
- setInt64Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.int64]) None ¶
- setInt8Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.int8]) None ¶
- setStringData(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint8]) None ¶
- setUint16Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint16]) None ¶
- setUint32Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint32]) None ¶
- setUint64Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint64]) None ¶
- setUint8Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint8]) None ¶
- class amdinfer.InferenceRequestOutput¶
- __init__(self: amdinfer.InferenceRequestOutput) None ¶
Holds an inference request’s output data
- property data¶
- property name¶
- property parameters¶
- class amdinfer.InferenceResponse¶
- __init__(*args, **kwargs)¶
Overloaded function.
__init__(self: amdinfer._amdinfer.InferenceResponse) -> None
Constructs a new InferenceResponse object
__init__(self: amdinfer._amdinfer.InferenceResponse, arg0: str) -> None
Constructs a new InferenceResponse error object
- addOutput(self: amdinfer.InferenceResponse, output: amdinfer.InferenceResponseOutput) None ¶
Adds an output tensor to the response
- Parameter
output
: an output tensor
- Parameter
- getContext(self: amdinfer.InferenceResponse) Dict[str, str] ¶
- getError(self: amdinfer.InferenceResponse) str ¶
Gets the error message if it exists. Defaults to an empty string
- getOutputs(self: amdinfer.InferenceResponse) List[amdinfer.InferenceResponseOutput] ¶
Gets a vector of the requested output information
- getParameters(self: amdinfer.InferenceResponse) amdinfer.ParameterMap ¶
Gets a pointer to the parameters associated with this response
- property id¶
- isError(self: amdinfer.InferenceResponse) bool ¶
Checks if this is an error response
- property model¶
- setContext(self: amdinfer.InferenceResponse, arg0: Dict[str, str]) None ¶
- class amdinfer.InferenceResponseOutput¶
- __init__(self: amdinfer.InferenceResponseOutput) None ¶
Holds an inference request’s input data
- property datatype¶
- getFp16Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[half_float::half] ¶
- getFp32Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.float32] ¶
- getFp64Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.float64] ¶
- getInt16Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.int16] ¶
- getInt32Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.int32] ¶
- getInt64Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.int64] ¶
- getInt8Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.int8] ¶
- getSize(self: amdinfer.InferenceResponseOutput) int ¶
Get the tensor’s size (number of elements)
- getStringData(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.int8] ¶
- getUint16Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.uint16] ¶
- getUint32Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.uint32] ¶
- getUint64Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.uint64] ¶
- getUint8Data(self: amdinfer.InferenceResponseOutput) numpy.ndarray[numpy.uint8] ¶
- property name¶
- property parameters¶
- setFp16Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[half_float::half]) None ¶
- setFp32Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.float32]) None ¶
- setFp64Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.float64]) None ¶
- setInt16Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.int16]) None ¶
- setInt32Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.int32]) None ¶
- setInt64Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.int64]) None ¶
- setInt8Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.int8]) None ¶
- setStringData(self: amdinfer.InferenceResponseOutput, arg0: str) None ¶
- setUint16Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.uint16]) None ¶
- setUint32Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.uint32]) None ¶
- setUint64Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.uint64]) None ¶
- setUint8Data(self: amdinfer.InferenceResponseOutput, arg0: numpy.ndarray[numpy.uint8]) None ¶
- property shape¶
- class amdinfer.InferenceTensor¶
- __init__(*args, **kwargs)¶
Overloaded function.
__init__(self: amdinfer._amdinfer.InferenceTensor, name: str, shape: List[int], dataType: amdinfer._amdinfer.DataType) -> None
Construct a new InferenceTensor object
__init__(self: amdinfer._amdinfer.InferenceTensor, tensor: amdinfer._amdinfer.Tensor) -> None
Construct a new InferenceTensor object
- property parameters¶
- exception amdinfer.InvalidArgumentError¶
- class amdinfer.ModelMetadata¶
- __init__(self: amdinfer.ModelMetadata, arg0: str, arg1: str) None ¶
Constructs a new Model Metadata object
- Parameter
name
: Name of the model
- Parameter
platform
: the platform this model runs on
- Parameter
- addInputTensor(self: amdinfer.ModelMetadata, arg0: str, arg1: List[int], arg2: amdinfer.DataType) None ¶
Adds an input tensor to this model
- Parameter
name
: name of the tensor
- Parameter
shape
: shape of the tensor
- Parameter
datatype
: datatype of the tensor
- Parameter
- addOutputTensor(self: amdinfer.ModelMetadata, name: str, datatype: List[int], shape: amdinfer.DataType) None ¶
Adds an output tensor to this model
- Parameter
name
: name of the tensor
- Parameter
shape
: shape of the tensor
- Parameter
datatype
: datatype of the tensor
- Parameter
- getPlatform(self: amdinfer.ModelMetadata) str ¶
- isReady(self: amdinfer.ModelMetadata) bool ¶
Checks if this model is ready
- property name¶
- setReady(self: amdinfer.ModelMetadata, arg0: bool) None ¶
Marks this model as ready/not ready
- class amdinfer.NativeClient¶
- __init__(self: amdinfer.NativeClient, server: amdinfer.Server) None ¶
- hasHardware(self: amdinfer.NativeClient, name: str, num: int) bool ¶
Checks if the server has the requested number of a specific hardware device
- Parameter
name
: name of the hardware device to check
- Parameter
num
: number of the device that should exist at minimum
- Returns:
bool - true if server has at least the requested number of the hardware device, false otherwise
- Parameter
- modelInfer(self: amdinfer.NativeClient, model: str, request: amdinfer.InferenceRequest) amdinfer.InferenceResponse ¶
Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.
- Parameter
model
: name of the model/worker to request inference to
- Parameter
request
: the request
- Returns:
InferenceResponse
- Parameter
- modelList(self: amdinfer.NativeClient) List[str] ¶
Gets a list of active models on the server, returning their names
- Returns:
std::vector<std::string>
- modelLoad(self: amdinfer.NativeClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)¶
) -> None
Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.
- Parameter
model
: name of the model to load from the model repository directory
- Parameter
parameters
: load-time parameters for the worker supporting the model
- Parameter
- modelMetadata(self: amdinfer.NativeClient, model: str) amdinfer.ModelMetadata ¶
Returns the metadata associated with a ready model/worker
- Parameter
model
: name of the model/worker to get metadata
- Returns:
ModelMetadata
- Parameter
- modelReady(self: amdinfer.NativeClient, model: str) bool ¶
Checks if a model/worker is ready
- Parameter
model
: name of the model to check
- Returns:
bool - true if model is ready, false otherwise
- Parameter
- modelUnload(self: amdinfer.NativeClient, model: str) None ¶
Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.
- Parameter
model
: name of the model to unload
- Parameter
- serverLive(self: amdinfer.NativeClient) bool ¶
Checks if the server is live
- Returns:
bool - true if server is live, false otherwise
- serverMetadata(self: amdinfer.NativeClient) amdinfer.ServerMetadata ¶
Returns the server metadata as a ServerMetadata object
- Returns:
ServerMetadata
- serverReady(self: amdinfer.NativeClient) bool ¶
Checks if the server is ready
- Returns:
bool - true if server is ready, false otherwise
- workerLoad(self: amdinfer.NativeClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)¶
) -> str
Loads a worker with the given name and load-time parameters.
- Parameter
worker
: name of the worker to load
- Parameter
parameters
: load-time parameters for the worker
- Returns:
std::string
- Parameter
- workerUnload(self: amdinfer.NativeClient, model: str) None ¶
Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.
- Parameter
worker
: name of the worker to unload
- Parameter
- class amdinfer.ParameterMap¶
- __init__(*args, **kwargs)¶
Overloaded function.
__init__(self: amdinfer._amdinfer.ParameterMap) -> None
__init__(self: amdinfer._amdinfer.ParameterMap, keys: List[str], values: List[Union[bool, int, float, str]]) -> None
Construct a new ParameterMap object with initial values. The sizes of the keys and values vectors must match.
Until C++20, passing const char* to this constructor will convert it to a bool instead of a string. Explicitly convert any string literals to a string before passing them to this constructor.
- Parameter
keys
: $Parameter
values
:
- empty(self: amdinfer.ParameterMap) bool ¶
Checks if the parameters are empty
- erase(self: amdinfer.ParameterMap, arg0: str) None ¶
Removes a parameter, if it exists. No error is raised if it doesn’t exist
- Parameter
key
: name of the parameter to remove
- Parameter
- getBool(self: amdinfer.ParameterMap, arg0: str) bool ¶
Get the named parameter
- Template parameter
T
: type of parameter. Must be (bool|double|int32_t|std::string)
- Parameter
key
: parameter to get
- Returns:
T
- Template parameter
- getFloat(self: amdinfer.ParameterMap, arg0: str) float ¶
Get the named parameter
- Template parameter
T
: type of parameter. Must be (bool|double|int32_t|std::string)
- Parameter
key
: parameter to get
- Returns:
T
- Template parameter
- getInt(self: amdinfer.ParameterMap, arg0: str) int ¶
Get the named parameter
- Template parameter
T
: type of parameter. Must be (bool|double|int32_t|std::string)
- Parameter
key
: parameter to get
- Returns:
T
- Template parameter
- getString(self: amdinfer.ParameterMap, arg0: str) str ¶
Get the named parameter
- Template parameter
T
: type of parameter. Must be (bool|double|int32_t|std::string)
- Parameter
key
: parameter to get
- Returns:
T
- Template parameter
- has(self: amdinfer.ParameterMap, key: str) bool ¶
Checks if a particular parameter exists
- Parameter
key
: name of the parameter to check
- Returns:
bool
- Parameter
- put(*args, **kwargs)¶
Overloaded function.
put(self: amdinfer._amdinfer.ParameterMap, arg0: str, arg1: str) -> None
Put in a key-value pair
- Parameter
key
: key used to store and retrieve the value
- Parameter
value
: value to store
put(self: amdinfer._amdinfer.ParameterMap, arg0: str, arg1: Union[bool, int, float, str]) -> None
Put in a key-value pair
- Parameter
key
: key used to store and retrieve the value
- Parameter
value
: value to store
- size(self: amdinfer.ParameterMap) int ¶
Gets the number of parameters
- exception amdinfer.RuntimeError¶
- class amdinfer.Server¶
- __init__(self: amdinfer.Server) None ¶
Constructs a new Server object
- enableRepositoryMonitoring(self: amdinfer.Server, use_polling: bool) None ¶
Turn on active monitoring of the model repository path for new files. A model repository must be set with setModelRepository() before calling this method.
- Parameter
use_polling
: set to true to use polling to check the directory for new files, false to use events. Note that events may not work well on all platforms.
- Parameter
- setModelRepository(self: amdinfer.Server, repository_path: os.PathLike, load_existing: bool) None ¶
Set the path to the model repository associated with this server
- Parameter
path
: path to the model repository
- Parameter
load_existing
: load all existing models found at the path
- Parameter
- startGrpc(self: amdinfer.Server, port: int) None ¶
Start the gRPC server
- Parameter
port
: port to use for the gRPC server
- Parameter
- startHttp(self: amdinfer.Server, port: int) None ¶
Start the HTTP server
- Parameter
port
: port to use for the HTTP server
- Parameter
- stopGrpc(self: amdinfer.Server) None ¶
Stop the gRPC server
- stopHttp(self: amdinfer.Server) None ¶
Stop the HTTP server
- class amdinfer.ServerMetadata¶
- __init__(self: amdinfer.ServerMetadata) None ¶
- property extensions¶
The extensions supported by the server. The KServe specification allows servers to support custom extensions and return them with a metadata request.
- property name¶
Name of the server
- property version¶
Version of the server
- class amdinfer.Tensor¶
- __init__(self: amdinfer.Tensor, name: str, shape: List[int], dataType: amdinfer.DataType) None ¶
Describe a tensor with a name, shape and datatype
- property datatype¶
- getSize(self: amdinfer.Tensor) int ¶
Get the tensor’s size (number of elements)
- property name¶
- property shape¶
- class amdinfer.WebSocketClient¶
- __init__(self: amdinfer.WebSocketClient, ws_address: str, http_address: str) None ¶
Constructs a new WebSocketClient object
- Parameter
ws_address
: address of the websocket server to connect to
- Parameter
http_address
: address of the HTTP server to connect to
- Parameter
- close(self: amdinfer.WebSocketClient) None ¶
Closes the websocket connection
- hasHardware(self: amdinfer.WebSocketClient, name: str, num: int) bool ¶
Checks if the server has the requested number of a specific hardware device
- Parameter
name
: name of the hardware device to check
- Parameter
num
: number of the device that should exist at minimum
- Returns:
bool - true if server has at least the requested number of the hardware device, false otherwise
- Parameter
- modelInfer(self: amdinfer.WebSocketClient, model: str, request: amdinfer.InferenceRequest) amdinfer.InferenceResponse ¶
Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.
- Parameter
model
: name of the model/worker to request inference to
- Parameter
request
: the request
- Returns:
InferenceResponse
- Parameter
- modelInferWs(self: amdinfer.WebSocketClient, model: str, request: amdinfer.InferenceRequest) None ¶
Makes a websocket inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for. This method differs from the standard inference in that it submits an actual Websocket message. The user should use modelRecv to get results and must disambiguate different responses on the client- side using the IDs of the responses.
- Parameter
model
: $Parameter
request
:
- Parameter
- modelList(self: amdinfer.WebSocketClient) List[str] ¶
Gets a list of active models on the server, returning their names
- Returns:
std::vector<std::string>
- modelLoad(self: amdinfer.WebSocketClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)¶
) -> None
Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.
- Parameter
model
: name of the model to load from the model repository directory
- Parameter
parameters
: load-time parameters for the worker supporting the model
- Parameter
- modelMetadata(self: amdinfer.WebSocketClient, model: str) amdinfer.ModelMetadata ¶
Returns the metadata associated with a ready model/worker
- Parameter
model
: name of the model/worker to get metadata
- Returns:
ModelMetadata
- Parameter
- modelReady(self: amdinfer.WebSocketClient, model: str) bool ¶
Checks if a model/worker is ready
- Parameter
model
: name of the model to check
- Returns:
bool - true if model is ready, false otherwise
- Parameter
- modelRecv(self: amdinfer.WebSocketClient) str ¶
Gets one message from the websocket server sent in response to a modelInferWs request. The user should know beforehand how many messages are expected and should call this method the same number of times.
- Returns:
std::string a JSON object encoded as a string
- modelUnload(self: amdinfer.WebSocketClient, model: str) None ¶
Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.
- Parameter
model
: name of the model to unload
- Parameter
- serverLive(self: amdinfer.WebSocketClient) bool ¶
Checks if the server is live
- Returns:
bool - true if server is live, false otherwise
- serverMetadata(self: amdinfer.WebSocketClient) amdinfer.ServerMetadata ¶
Returns the server metadata as a ServerMetadata object
- Returns:
ServerMetadata
- serverReady(self: amdinfer.WebSocketClient) bool ¶
Checks if the server is ready
- Returns:
bool - true if server is ready, false otherwise
- workerLoad(self: amdinfer.WebSocketClient, model: str, parameters: amdinfer.ParameterMap = ParameterMap(0)¶
) -> str
Loads a worker with the given name and load-time parameters.
- Parameter
worker
: name of the worker to load
- Parameter
parameters
: load-time parameters for the worker
- Returns:
std::string
- Parameter
- workerUnload(self: amdinfer.WebSocketClient, model: str) None ¶
Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.
- Parameter
worker
: name of the worker to unload
- Parameter
- amdinfer.inferAsyncOrdered(client: amdinfer.Client, model: str, requests: List[amdinfer.InferenceRequest]) List[amdinfer.InferenceResponse] ¶
- amdinfer.inferAsyncOrderedBatched(client: amdinfer.Client, model: str, requests: List[amdinfer.InferenceRequest], batch_sizes: int) List[amdinfer.InferenceResponse] ¶
- amdinfer.inference_request_to_dict(request: InferenceRequest)¶
- amdinfer.loadEnsemble(client: amdinfer.Client, models: List[str], parameters: List[amdinfer.ParameterMap]) List[str] ¶
- amdinfer.parallel_infer(client, model, data, processes)¶
Make an inference to the server in parallel with n processes
- Parameters:
client (amdinfer.client) – Client to make the inference with
model (str) – Name of the model/worker to make the inference
data (list[np.ndarray]) – List of data to send
processes (int) – number of processes to use
- Returns:
Responses for each request
- Return type:
- amdinfer.serverHasExtension(client: amdinfer.Client, extension: str) bool ¶
- amdinfer.start_http_client_server(address: str, extension=None)¶
- amdinfer.stringToArray(data)¶
- amdinfer.unloadModels(client: amdinfer.Client, models: List[str]) None ¶
- amdinfer.waitUntilModelNotReady(client: amdinfer.Client, model: str) None ¶
- amdinfer.waitUntilModelReady(client: amdinfer.Client, model: str) None ¶
- amdinfer.waitUntilServerReady(client: amdinfer.Client) None ¶