Model Repository¶

A model repository is a directory that exists on the host machine where the server container is running and it holds the models you want to serve and their associated metadata in a standard structure.

Single models¶

The directory structure for the model repository for a single model is:

/
├─ model_a/
│  ├─ 1/
│  │  ├─ <model>
│  ├─ <config>
├─ model_b/
|  ...

The model name, model_a in this template, must be unique among the models loaded on a particular server. This name is used to name the endpoint used to make inference requests to. Under this directory, there must be a directory named 1/ containing the model file itself and a TOML file describing the configuration. The model file can have an arbitrary name and the file extension depends on the type of the model. This file, <config> in this template, can have any name though config.toml is suggested and will be used in this documentation. You can also use .pbtxt format files for single models as well. The configuration file contains metadata for the model. Consider this example of an MNIST TensorFlow model:

name = "mnist"
platform = "tensorflow_graphdef"

[[inputs]]
name = "images_in"
datatype = "FP32"
shape = [28, 28, 1]

[[outputs]]
name = "flatten/Reshape"
datatype = "FP32"
shape = [10]

The name must match the name of the model directory, i.e. model_a. The platform identifies the type of the model and determines the file extension of the model file. The supported platforms are:

Platform	Model file extension
`tensorflow_graphdef`	`.pb`
`pytorch_torchscript`	`.pt`
`vitis_xmodel`	`.xmodel`
`onnx_onnxv1`	`.onnx`
`migraphx_mxr`	`.mxr`
`amdinfer_cpp`	`.so`

The inputs and outputs define the list of input and output tensors for the model. The names of the tensors may be significant if the platform needs them to perform inference.

The equivalent configuration file as a .pbtxt file would be:

name: "mnist"
platform: "tensorflow_graphdef"
inputs [
    {
        name: "images_in"
        datatype: "FP32"
        shape: [28, 28, 1]
    }
]
outputs [
    {
        name: "flatten/Reshape"
        datatype: "FP32"
        shape: [10]
    }
]

While the inference server will accept a configuration file in this format, note that TOML files take priority if both are present and this format does not support defining ensembles.

Ensembles¶

With ensembles, the “model” actually consists of a set of models. The directory structure for the model repository for ensembles is:

/
├─ model_a/
│  ├─ 1/
│  │  ├─ <model_0>
│  │  ├─ <model_1>
│  │  ├─ ...
│  ├─ <config>
├─ model_b/
|  ...

As a concrete example, consider a three stage ensemble:

cplusplus backend executing a base64_decode model: receive a base64-encoded JPEG image and decode it to an RGB array
cplusplus backend executing a invert_image model: invert every pixel in the input RGB image
cplusplus backend executing a base64_encode model: convert the input RGB image to JPEG, base64-encode it and send it back to client

This ensemble uses the cplusplus backend for each stage with different models. The configuration file for this ensemble could be:

[[models]]
name = "invert_image"
platform = "amdinfer_cpp"
id = "base64_decode.so"

[[models.inputs]]
name = "image_in"
datatype = "STRING"
shape = [1048576]
id = ""

[[models.outputs]]
name = "image_out"
datatype = "INT8"
shape = [1080, 1920, 3]
id = "preprocessed_image"

[[models]]
name = "execute"
platform = "amdinfer_cpp"
id = "invert_image.so"

[[models.inputs]]
name = "image_in"
datatype = "INT8"
shape = [1080, 1920, 3]
id = "preprocessed_image"

[[models.outputs]]
name = "image_out"
datatype = "INT8"
shape = [1080, 1920, 3]
id = "inverted_image"

[[models]]
name = "invert_image_postprocess"
platform = "amdinfer_cpp"
id = "base64_encode.so"

[[models.inputs]]
name = "image_in"
datatype = "INT8"
shape = [1080, 1920, 3]
id = "inverted_image"

[[models.outputs]]
name = "image_out"
datatype = "STRING"
shape = [1048576]
id = ""

This single configuration file lists multiple models reflecting the multiple model files that are in the repository. Each model in the ensemble is marked with [[models]] and has a name and platform just like single models. As before, the name is used to define the endpoint. For chains, the first model’s name should match the name of the parent directory because this is the endpoint that will be used to send requests to the whole ensemble. Each model also has an ID field that should be the name of the model file corresponding to this stage of the ensemble because the 1/ directory will contain multiple model files.

You can define one or more input or output tensors for each model using multiple [[models.inputs]] or [[models.outputs]] tags, respectively. As in the single model case, each input/output tensor has a name, type and shape with the same meaning. In the ensemble case, they also have an ID. For output tensors, the ID is a unique string labeling this tensor. The ID for input tensors should match the ID of the output tensor that is feeding it. Input tensors with an empty ID indicate that the data comes from the external client. Similarly, output tensors with an empty output ID indicate that the data goes to the external client.

The model repository for this example using the above configuration file would be:

/
├─ invert_image/
│  ├─ 1/
│  │  ├─ base64_decode.so
│  │  ├─ base64_encode.so
│  │  ├─ invert_image.so
│  ├─ config.toml