Python¶
The Python library for the AMD Inference Server allows you to communicate with the server using Python.
Install the Python library¶
The Python library is built and installed in the development container as part of the regular CMake build.
To install it outside Docker or in different containers, you can use pip
:
$ pip install amdinfer
Tip
Make sure the client library version is compatible with the server by using matching versions.
If you are using the latest server from main
, you may need to install the Python library with pip install --pre amdinfer
to install a pre-release package, if it exists.
Build wheels¶
You can build wheels for the Python library to make a precompiled package that can be installed in any Linux host, container or environment. It is recommended to perform the following steps on a fresh clone of the inference server repository. These instructions assume you’re only building wheels for x86_64 Linux with CPython.
# generate a Dockerfile that defines an image for building wheels
./docker/generate.py --cibuildwheel --base-image=quay.io/pypa/manylinux2014_x86_64 --base-image-type yum
# build the image. You should add some suffix to differentiate this image
# from the regular image
./amdinfer dockerize --suffix="-ci"
# this will build an image with the name $(whoami)/amdinfer-dev-ci:latest
# if you're not building wheels on the same host, you will need to upload
# this image to a Docker registry
# on a host where your image exists or can be pulled
export CIBW_MANYLINUX_X86_64_IMAGE=$(whoami)/amdinfer-dev-ci:latest
pip install cibuildwheel
# you can edit pyproject.toml to control which wheels to build or just use the defaults
cibuildwheel --platform linux
# your built wheels will be in ./wheelhouse
After following these instructions, your built wheels will be in ./wheelhouse/
.
The names on the wheels indicate the Python version they are compatible with.
For example, cp37
in the name indicates that it’s compatible with CPython 3.7.
You can install these wheels in a virtual environment, Conda environment, a container or on a bare host.
pip install <path/to/wheel>
API¶
- exception amdinfer.BadStatus¶
- exception amdinfer.ConnectionError¶
- class amdinfer.DataType¶
- BOOL = DataType(BOOL) ¶
- FLOAT32 = DataType(FP32) ¶
- FLOAT64 = DataType(FP64) ¶
- FP16 = DataType(FP16) ¶
- FP32 = DataType(FP32) ¶
- FP64 = DataType(FP64) ¶
- INT16 = DataType(INT16) ¶
- INT32 = DataType(INT32) ¶
- INT64 = DataType(INT64) ¶
- INT8 = DataType(INT8) ¶
- STRING = DataType(STRING) ¶
- UINT16 = DataType(UINT16) ¶
- UINT32 = DataType(UINT32) ¶
- UINT64 = DataType(UINT64) ¶
- UINT8 = DataType(UINT8) ¶
- class Value¶
Members:
BOOL
UINT8
UINT16
UINT32
UINT64
INT8
INT16
INT32
INT64
FP16
FP32
FLOAT32
FP64
FLOAT64
STRING
- BOOL = <Value.BOOL: 0>¶
- FLOAT32 = <Value.FP32: 10>¶
- FLOAT64 = <Value.FP64: 11>¶
- FP16 = <Value.FP16: 9>¶
- FP32 = <Value.FP32: 10>¶
- FP64 = <Value.FP64: 11>¶
- INT16 = <Value.INT16: 6>¶
- INT32 = <Value.INT32: 7>¶
- INT64 = <Value.INT64: 8>¶
- INT8 = <Value.INT8: 5>¶
- STRING = <Value.STRING: 12>¶
- UINT16 = <Value.UINT16: 2>¶
- UINT32 = <Value.UINT32: 3>¶
- UINT64 = <Value.UINT64: 4>¶
- UINT8 = <Value.UINT8: 1>¶
- __init__(self: amdinfer.DataType.Value, value: int) None ¶
- property name¶
- property value¶
- __init__(*args, **kwargs)¶
Overloaded function.
__init__(self: amdinfer._amdinfer.DataType) -> None
__init__(self: amdinfer._amdinfer.DataType) -> None
__init__(self: amdinfer._amdinfer.DataType, arg0: str) -> None
__init__(self: amdinfer._amdinfer.DataType, arg0: amdinfer._amdinfer.DataType.Value) -> None
- size(self: amdinfer.DataType) int ¶
- str(self: amdinfer.DataType) str ¶
- exception amdinfer.EnvironmentNotSetError¶
- exception amdinfer.ExternalError¶
- exception amdinfer.FileNotFoundError¶
- exception amdinfer.FileReadError¶
- class amdinfer.GrpcClient¶
- __init__(self: amdinfer.GrpcClient, address: str) None ¶
Constructs a new GrpcClient object
- Parameter
address
: Address of the server to connect to
- Parameter
- hasHardware(self: amdinfer.GrpcClient, name: str, num: int) bool ¶
Checks if the server has the requested number of a specific hardware device
- Parameter
name
: name of the hardware device to check
- Parameter
num
: number of the device that should exist at minimum
- Returns
bool - true if server has at least the requested number of the hardware device, false otherwise
- Parameter
- modelInfer(self: amdinfer.GrpcClient, model: str, request: amdinfer.InferenceRequest) amdinfer.InferenceResponse ¶
Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.
- Parameter
model
: name of the model/worker to request inference to
- Parameter
request
: the request
- Returns
InferenceResponse
- Parameter
- modelList(self: amdinfer.GrpcClient) List[str] ¶
Gets a list of active models on the server, returning their names
- Returns
std::vector<std::string>
- modelLoad(self: amdinfer.GrpcClient, model: str, parameters: amdinfer.RequestParameters = None) None ¶
Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.
- Parameter
model
: name of the model to load from the model repository directory
- Parameter
parameters
: load-time parameters for the worker supporting the model
- Parameter
- modelMetadata(self: amdinfer.GrpcClient, model: str) amdinfer.ModelMetadata ¶
Returns the metadata associated with a ready model/worker
- Parameter
model
: name of the model/worker to get metadata
- Returns
ModelMetadata
- Parameter
- modelReady(self: amdinfer.GrpcClient, model: str) bool ¶
Checks if a model/worker is ready
- Parameter
model
: name of the model to check
- Returns
bool - true if model is ready, false otherwise
- Parameter
- modelUnload(self: amdinfer.GrpcClient, model: str) None ¶
Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.
- Parameter
model
: name of the model to unload
- Parameter
- serverLive(self: amdinfer.GrpcClient) bool ¶
Checks if the server is live
- Returns
bool - true if server is live, false otherwise
- serverMetadata(self: amdinfer.GrpcClient) amdinfer.ServerMetadata ¶
Returns the server metadata as a ServerMetadata object
- Returns
ServerMetadata
- serverReady(self: amdinfer.GrpcClient) bool ¶
Checks if the server is ready
- Returns
bool - true if server is ready, false otherwise
- workerLoad(self: amdinfer.GrpcClient, model: str, parameters: amdinfer.RequestParameters = None) str ¶
Loads a worker with the given name and load-time parameters.
- Parameter
worker
: name of the worker to load
- Parameter
parameters
: load-time parameters for the worker
- Returns
std::string
- Parameter
- workerUnload(self: amdinfer.GrpcClient, model: str) None ¶
Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.
- Parameter
worker
: name of the worker to unload
- Parameter
- class amdinfer.HttpClient¶
- __init__(self: amdinfer.HttpClient, address: str, headers: Dict[str, str] = {}, parallelism: int = 32) None ¶
Construct a new HttpClient object
- Parameter
address
: Address of the server to connect to
- Parameter
- hasHardware(self: amdinfer.HttpClient, name: str, num: int) bool ¶
Checks if the server has the requested number of a specific hardware device
- Parameter
name
: name of the hardware device to check
- Parameter
num
: number of the device that should exist at minimum
- Returns
bool - true if server has at least the requested number of the hardware device, false otherwise
- Parameter
- modelInfer(self: amdinfer.HttpClient, model: str, request: amdinfer.InferenceRequest) amdinfer.InferenceResponse ¶
Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.
- Parameter
model
: name of the model/worker to request inference to
- Parameter
request
: the request
- Returns
InferenceResponse
- Parameter
- modelList(self: amdinfer.HttpClient) List[str] ¶
Gets a list of active models on the server, returning their names
- Returns
std::vector<std::string>
- modelLoad(self: amdinfer.HttpClient, model: str, parameters: amdinfer.RequestParameters = None) None ¶
Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.
- Parameter
model
: name of the model to load from the model repository directory
- Parameter
parameters
: load-time parameters for the worker supporting the model
- Parameter
- modelMetadata(self: amdinfer.HttpClient, model: str) amdinfer.ModelMetadata ¶
Returns the metadata associated with a ready model/worker
- Parameter
model
: name of the model/worker to get metadata
- Returns
ModelMetadata
- Parameter
- modelReady(self: amdinfer.HttpClient, model: str) bool ¶
Checks if a model/worker is ready
- Parameter
model
: name of the model to check
- Returns
bool - true if model is ready, false otherwise
- Parameter
- modelUnload(self: amdinfer.HttpClient, model: str) None ¶
Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.
- Parameter
model
: name of the model to unload
- Parameter
- serverLive(self: amdinfer.HttpClient) bool ¶
Checks if the server is live
- Returns
bool - true if server is live, false otherwise
- serverMetadata(self: amdinfer.HttpClient) amdinfer.ServerMetadata ¶
Returns the server metadata as a ServerMetadata object
- Returns
ServerMetadata
- serverReady(self: amdinfer.HttpClient) bool ¶
Checks if the server is ready
- Returns
bool - true if server is ready, false otherwise
- workerLoad(self: amdinfer.HttpClient, model: str, parameters: amdinfer.RequestParameters = None) str ¶
Loads a worker with the given name and load-time parameters.
- Parameter
worker
: name of the worker to load
- Parameter
parameters
: load-time parameters for the worker
- Returns
std::string
- Parameter
- workerUnload(self: amdinfer.HttpClient, model: str) None ¶
Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.
- Parameter
worker
: name of the worker to unload
- Parameter
- amdinfer.ImageInferenceRequest(images, asTensor=True)¶
Construct a request from an image or list of images
- Parameters
images (image) – Images may be numpy arrays or filepaths or a list of these
asTensor (bool, optional) – Send data as a tensor or as base64-encoded string. Defaults to True.
- Raises
TypeError – Raised if an unknown image format is passed
- Returns
Request object
- Return type
- class amdinfer.InferenceRequest¶
- __init__(self: amdinfer.InferenceRequest) None ¶
- addInputTensor(self: amdinfer.InferenceRequest, input: amdinfer.InferenceRequestInput) None ¶
Constructs and adds a new input tensor to this request
- Parameter
data
: pointer to data to add
- Parameter
shape
: shape of the data
- Parameter
data_type
: the datatype of the data
- Parameter
name
: the name of the input tensor
- Parameter
- addOutputTensor(self: amdinfer.InferenceRequest, output: amdinfer.InferenceRequestOutput) None ¶
Adds a new output tensor to this request
- Parameter
output
: an existing InferenceRequestOutput object
- Parameter
- getInputSize(self: amdinfer.InferenceRequest) int ¶
Get the number of input request objects
- getInputs(self: amdinfer.InferenceRequest) List[amdinfer.InferenceRequestInput] ¶
Gets a vector of all the input request objects
- getOutputs(self: amdinfer.InferenceRequest) List[amdinfer.InferenceRequestOutput] ¶
Gets a vector of the requested output information
- property id¶
- property parameters¶
- class amdinfer.InferenceRequestInput¶
- __init__(*args, **kwargs)¶
Overloaded function.
__init__(self: amdinfer._amdinfer.InferenceRequestInput) -> None
Holds an inference request’s input data
__init__(self: amdinfer._amdinfer.InferenceRequestInput, data: capsule, shape: List[int], dataType: amdinfer._amdinfer.DataType, name: str = ‘’) -> None
- property datatype¶
- getFp16Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[half_float::half] ¶
- getFp32Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.float32] ¶
- getFp64Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.float64] ¶
- getInt16Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.int16] ¶
- getInt32Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.int32] ¶
- getInt64Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.int64] ¶
- getInt8Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.int8] ¶
- getSize(self: amdinfer.InferenceRequestInput) int ¶
Get the tensor’s size (number of elements)
- getStringData(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.int8] ¶
- getUint16Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.uint16] ¶
- getUint32Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.uint32] ¶
- getUint64Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.uint64] ¶
- getUint8Data(self: amdinfer.InferenceRequestInput) numpy.ndarray[numpy.uint8] ¶
- property name¶
- property parameters¶
- setFp16Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[half_float::half]) None ¶
- setFp32Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.float32]) None ¶
- setFp64Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.float64]) None ¶
- setInt16Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.int16]) None ¶
- setInt32Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.int32]) None ¶
- setInt64Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.int64]) None ¶
- setInt8Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.int8]) None ¶
- setStringData(self: amdinfer.InferenceRequestInput, arg0: str) None ¶
- setUint16Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint16]) None ¶
- setUint32Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint32]) None ¶
- setUint64Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint64]) None ¶
- setUint8Data(self: amdinfer.InferenceRequestInput, arg0: numpy.ndarray[numpy.uint8]) None ¶
- property shape¶
- class amdinfer.InferenceRequestOutput¶
- __init__(self: amdinfer.InferenceRequestOutput) None ¶
Holds an inference request’s output data
- property data¶
- property name¶
- property parameters¶
- class amdinfer.InferenceResponse¶
- __init__(*args, **kwargs)¶
Overloaded function.
__init__(self: amdinfer._amdinfer.InferenceResponse) -> None
Constructs a new InferenceResponse object
__init__(self: amdinfer._amdinfer.InferenceResponse, arg0: str) -> None
Constructs a new InferenceResponse error object
- addOutput(self: amdinfer.InferenceResponse, output: amdinfer.InferenceRequestInput) None ¶
Adds an output tensor to the response
- Parameter
output
: an output tensor
- Parameter
- getContext(self: amdinfer.InferenceResponse) Dict[str, str] ¶
- getError(self: amdinfer.InferenceResponse) str ¶
Gets the error message if it exists. Defaults to an empty string
- getOutputs(self: amdinfer.InferenceResponse) List[amdinfer.InferenceRequestInput] ¶
Gets a vector of the requested output information
- getParameters(self: amdinfer.InferenceResponse) amdinfer.RequestParameters ¶
Gets a pointer to the parameters associated with this response
- property id¶
- isError(self: amdinfer.InferenceResponse) bool ¶
Checks if this is an error response
- property model¶
- setContext(self: amdinfer.InferenceResponse, arg0: Dict[str, str]) None ¶
- exception amdinfer.InvalidArgumentError¶
- class amdinfer.ModelMetadata¶
- __init__(self: amdinfer.ModelMetadata, arg0: str, arg1: str) None ¶
Constructs a new Model Metadata object
- Parameter
name
: Name of the model
- Parameter
platform
: the platform this model runs on
- Parameter
- addInputTensor(self: amdinfer.ModelMetadata, arg0: str, arg1: amdinfer.DataType, arg2: List[int]) None ¶
Adds an input tensor to this model
- Parameter
name
: name of the tensor
- Parameter
datatype
: datatype of the tensor
- Parameter
shape
: shape of the tensor
- Parameter
- addOutputTensor(self: amdinfer.ModelMetadata, name: str, datatype: amdinfer.DataType, shape: List[int]) None ¶
Adds an output tensor to this model
- Parameter
name
: name of the tensor
- Parameter
datatype
: datatype of the tensor
- Parameter
shape
: shape of the tensor
- Parameter
- getPlatform(self: amdinfer.ModelMetadata) str ¶
- isReady(self: amdinfer.ModelMetadata) bool ¶
Checks if this model is ready
- property name¶
- setReady(self: amdinfer.ModelMetadata, arg0: bool) None ¶
Marks this model as ready/not ready
- class amdinfer.ModelMetadataTensor¶
- __init__(self: amdinfer.ModelMetadataTensor, arg0: str, arg1: amdinfer.DataType, arg2: List[int]) None ¶
Construct a new Model Metadata Tensor object
- Parameter
name
: name of the tensor
- Parameter
datatype
: the datatype this tensor accepts
- Parameter
shape
: the expected shape of the data
- Parameter
- getDataType(self: amdinfer.ModelMetadataTensor) amdinfer.DataType ¶
Gets the datatype that this tensor accepts
- getName(self: amdinfer.ModelMetadataTensor) str ¶
Gets the name of the tensor
- getShape(self: amdinfer.ModelMetadataTensor) List[int] ¶
Gets the expected shape of the data
- class amdinfer.NativeClient¶
- __init__(self: amdinfer.NativeClient) None ¶
- hasHardware(self: amdinfer.NativeClient, name: str, num: int) bool ¶
Checks if the server has the requested number of a specific hardware device
- Parameter
name
: name of the hardware device to check
- Parameter
num
: number of the device that should exist at minimum
- Returns
bool - true if server has at least the requested number of the hardware device, false otherwise
- Parameter
- modelInfer(self: amdinfer.NativeClient, model: str, request: amdinfer.InferenceRequest) amdinfer.InferenceResponse ¶
Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.
- Parameter
model
: name of the model/worker to request inference to
- Parameter
request
: the request
- Returns
InferenceResponse
- Parameter
- modelList(self: amdinfer.NativeClient) List[str] ¶
Gets a list of active models on the server, returning their names
- Returns
std::vector<std::string>
- modelLoad(self: amdinfer.NativeClient, model: str, parameters: amdinfer.RequestParameters = None) None ¶
Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.
- Parameter
model
: name of the model to load from the model repository directory
- Parameter
parameters
: load-time parameters for the worker supporting the model
- Parameter
- modelMetadata(self: amdinfer.NativeClient, model: str) amdinfer.ModelMetadata ¶
Returns the metadata associated with a ready model/worker
- Parameter
model
: name of the model/worker to get metadata
- Returns
ModelMetadata
- Parameter
- modelReady(self: amdinfer.NativeClient, model: str) bool ¶
Checks if a model/worker is ready
- Parameter
model
: name of the model to check
- Returns
bool - true if model is ready, false otherwise
- Parameter
- modelUnload(self: amdinfer.NativeClient, model: str) None ¶
Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.
- Parameter
model
: name of the model to unload
- Parameter
- serverLive(self: amdinfer.NativeClient) bool ¶
Checks if the server is live
- Returns
bool - true if server is live, false otherwise
- serverMetadata(self: amdinfer.NativeClient) amdinfer.ServerMetadata ¶
Returns the server metadata as a ServerMetadata object
- Returns
ServerMetadata
- serverReady(self: amdinfer.NativeClient) bool ¶
Checks if the server is ready
- Returns
bool - true if server is ready, false otherwise
- workerLoad(self: amdinfer.NativeClient, model: str, parameters: amdinfer.RequestParameters = None) str ¶
Loads a worker with the given name and load-time parameters.
- Parameter
worker
: name of the worker to load
- Parameter
parameters
: load-time parameters for the worker
- Returns
std::string
- Parameter
- workerUnload(self: amdinfer.NativeClient, model: str) None ¶
Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.
- Parameter
worker
: name of the worker to unload
- Parameter
- class amdinfer.RequestParameters¶
- __init__(self: amdinfer.RequestParameters) None ¶
Holds any parameters from JSON (defined by KServe spec as one of bool, number or string). We further restrict numbers to be doubles or int32.
- empty(self: amdinfer.RequestParameters) bool ¶
Checks if the parameters are empty
- erase(self: amdinfer.RequestParameters, arg0: str) None ¶
Removes a parameter
- Parameter
key
: name of the parameter to remove
- Parameter
- getBool(self: amdinfer.RequestParameters, arg0: str) bool ¶
Gets a pointer to the named parameter. Returns nullptr if not found or if a bad type is used.
- Template parameter
T
: type of parameter. Must be (bool|double|int32_t|std::string)
- Parameter
key
: parameter to get
- Returns
T*
- Template parameter
- getFloat(self: amdinfer.RequestParameters, arg0: str) float ¶
Gets a pointer to the named parameter. Returns nullptr if not found or if a bad type is used.
- Template parameter
T
: type of parameter. Must be (bool|double|int32_t|std::string)
- Parameter
key
: parameter to get
- Returns
T*
- Template parameter
- getInt(self: amdinfer.RequestParameters, arg0: str) int ¶
Gets a pointer to the named parameter. Returns nullptr if not found or if a bad type is used.
- Template parameter
T
: type of parameter. Must be (bool|double|int32_t|std::string)
- Parameter
key
: parameter to get
- Returns
T*
- Template parameter
- getString(self: amdinfer.RequestParameters, arg0: str) str ¶
Gets a pointer to the named parameter. Returns nullptr if not found or if a bad type is used.
- Template parameter
T
: type of parameter. Must be (bool|double|int32_t|std::string)
- Parameter
key
: parameter to get
- Returns
T*
- Template parameter
- has(self: amdinfer.RequestParameters, key: str) bool ¶
Checks if a particular parameter exists
- Parameter
key
: name of the parameter to check
- Returns
bool
- Parameter
- put(*args, **kwargs)¶
Overloaded function.
put(self: amdinfer._amdinfer.RequestParameters, arg0: str, arg1: bool) -> None
Puts in a key-value pair
- Parameter
key
: key used to store and retrieve the value
- Parameter
value
: value to store
put(self: amdinfer._amdinfer.RequestParameters, arg0: str, arg1: float) -> None
Puts in a key-value pair
- Parameter
key
: key used to store and retrieve the value
- Parameter
value
: value to store
put(self: amdinfer._amdinfer.RequestParameters, arg0: str, arg1: int) -> None
Puts in a key-value pair
- Parameter
key
: key used to store and retrieve the value
- Parameter
value
: value to store
put(self: amdinfer._amdinfer.RequestParameters, arg0: str, arg1: str) -> None
Puts in a key-value pair
- Parameter
key
: key used to store and retrieve the value
- Parameter
value
: value to store
put(self: amdinfer._amdinfer.RequestParameters, arg0: str, arg1: str) -> None
Puts in a key-value pair
- Parameter
key
: key used to store and retrieve the value
- Parameter
value
: value to store
- size(self: amdinfer.RequestParameters) int ¶
Gets the number of parameters
- exception amdinfer.RuntimeError¶
- class amdinfer.Server¶
- __init__(self: amdinfer.Server) None ¶
Constructs a new Server object
- static enableRepositoryMonitoring(use_polling: bool) None ¶
Turn on active monitoring of the model repository path for new files
- Parameter
use_polling
: set to true to use polling to check the directory for new files, false to use events. Note that events may not work well on all platforms.
- Parameter
- static setModelRepository(path: os.PathLike) None ¶
Set the path to the model repository associated with this server
- Parameter
path
: path to the model repository
- Parameter
- startGrpc(self: amdinfer.Server, port: int) None ¶
Start the gRPC server
- Parameter
port
: port to use for the gRPC server
- Parameter
- startHttp(self: amdinfer.Server, port: int) None ¶
Start the HTTP server
- Parameter
port
: port to use for the HTTP server
- Parameter
- stopGrpc(self: amdinfer.Server) None ¶
Stop the gRPC server
- stopHttp(self: amdinfer.Server) None ¶
Stop the HTTP server
- class amdinfer.ServerMetadata¶
- __init__(self: amdinfer.ServerMetadata) None ¶
- property extensions¶
The extensions supported by the server. The KServe specification allows servers to support custom extensions and return them with a metadata request.
- property name¶
Name of the server
- property version¶
Version of the server
- class amdinfer.WebSocketClient¶
- __init__(self: amdinfer.WebSocketClient, ws_address: str, http_address: str) None ¶
Constructs a new WebSocketClient object
- Parameter
ws_address
: address of the websocket server to connect to
- Parameter
http_address
: address of the HTTP server to connect to
- Parameter
- close(self: amdinfer.WebSocketClient) None ¶
Closes the websocket connection
- hasHardware(self: amdinfer.WebSocketClient, name: str, num: int) bool ¶
Checks if the server has the requested number of a specific hardware device
- Parameter
name
: name of the hardware device to check
- Parameter
num
: number of the device that should exist at minimum
- Returns
bool - true if server has at least the requested number of the hardware device, false otherwise
- Parameter
- modelInfer(self: amdinfer.WebSocketClient, model: str, request: amdinfer.InferenceRequest) amdinfer.InferenceResponse ¶
Makes a synchronous inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for.
- Parameter
model
: name of the model/worker to request inference to
- Parameter
request
: the request
- Returns
InferenceResponse
- Parameter
- modelInferWs(self: amdinfer.WebSocketClient, model: str, request: amdinfer.InferenceRequest) None ¶
Makes a websocket inference request to the given model/worker. The contents of the request depends on the model/worker that the request is for. This method differs from the standard inference in that it submits an actual Websocket message. The user should use modelRecv to get results and must disambiguate different responses on the client- side using the IDs of the responses.
- Parameter
model
: $Parameter
request
:
- Parameter
- modelList(self: amdinfer.WebSocketClient) List[str] ¶
Gets a list of active models on the server, returning their names
- Returns
std::vector<std::string>
- modelLoad(self: amdinfer.WebSocketClient, model: str, parameters: amdinfer.RequestParameters = None) None ¶
Loads a model with the given name and load-time parameters. This method assumes that a directory with this model name already exists in the model repository directory for the server containing the model and its metadata in the right format.
- Parameter
model
: name of the model to load from the model repository directory
- Parameter
parameters
: load-time parameters for the worker supporting the model
- Parameter
- modelMetadata(self: amdinfer.WebSocketClient, model: str) amdinfer.ModelMetadata ¶
Returns the metadata associated with a ready model/worker
- Parameter
model
: name of the model/worker to get metadata
- Returns
ModelMetadata
- Parameter
- modelReady(self: amdinfer.WebSocketClient, model: str) bool ¶
Checks if a model/worker is ready
- Parameter
model
: name of the model to check
- Returns
bool - true if model is ready, false otherwise
- Parameter
- modelRecv(self: amdinfer.WebSocketClient) str ¶
Gets one message from the websocket server sent in response to a modelInferWs request. The user should know beforehand how many messages are expected and should call this method the same number of times.
- Returns
std::string a JSON object encoded as a string
- modelUnload(self: amdinfer.WebSocketClient, model: str) None ¶
Unloads a previously loaded model and shut it down. This is identical in functionality to workerUnload and is provided for symmetry.
- Parameter
model
: name of the model to unload
- Parameter
- serverLive(self: amdinfer.WebSocketClient) bool ¶
Checks if the server is live
- Returns
bool - true if server is live, false otherwise
- serverMetadata(self: amdinfer.WebSocketClient) amdinfer.ServerMetadata ¶
Returns the server metadata as a ServerMetadata object
- Returns
ServerMetadata
- serverReady(self: amdinfer.WebSocketClient) bool ¶
Checks if the server is ready
- Returns
bool - true if server is ready, false otherwise
- workerLoad(self: amdinfer.WebSocketClient, model: str, parameters: amdinfer.RequestParameters = None) str ¶
Loads a worker with the given name and load-time parameters.
- Parameter
worker
: name of the worker to load
- Parameter
parameters
: load-time parameters for the worker
- Returns
std::string
- Parameter
- workerUnload(self: amdinfer.WebSocketClient, model: str) None ¶
Unloads a previously loaded worker and shut it down. This is identical in functionality to modelUnload and is provided for symmetry.
- Parameter
worker
: name of the worker to unload
- Parameter
- amdinfer.inferAsyncOrdered(client: amdinfer.Client, model: str, requests: List[amdinfer.InferenceRequest]) List[amdinfer.InferenceResponse] ¶
- amdinfer.inferAsyncOrderedBatched(client: amdinfer.Client, model: str, requests: List[amdinfer.InferenceRequest], batch_sizes: int) List[amdinfer.InferenceResponse] ¶
- amdinfer.inference_request_to_dict(request: amdinfer.InferenceRequest)¶
- amdinfer.parallel_infer(client, model, data, processes)¶
Make an inference to the server in parallel with n processes
- Parameters
client (amdinfer.client) – Client to make the inference with
model (str) – Name of the model/worker to make the inference
data (list[np.ndarray]) – List of data to send
processes (int) – number of processes to use
- Returns
Responses for each request
- Return type
- amdinfer.serverHasExtension(client: amdinfer.Client, extension: str) bool ¶
- amdinfer.start_http_client_server(address: str, extension=None)¶
- amdinfer.waitUntilModelReady(client: amdinfer.Client, model: str) None ¶
- amdinfer.waitUntilServerReady(client: amdinfer.Client) None ¶