Model Repository¶
A model repository is a directory that exists on the host machine where the server container is running and it holds the models you want to serve and their associated metadata in a standard structure.
Single models¶
The directory structure for the model repository for a single model is:
/
├─ model_a/
│ ├─ 1/
│ │ ├─ <model>
│ ├─ <config>
├─ model_b/
| ...
The model name, model_a
in this template, must be unique among the models loaded on a particular server.
This name is used to name the endpoint used to make inference requests to.
Under this directory, there must be a directory named 1/
containing the model file itself and a TOML file describing the configuration.
The model file can have an arbitrary name and the file extension depends on the type of the model.
This file, <config>
in this template, can have any name though config.toml
is suggested and will be used in this documentation.
You can also use .pbtxt
format files for single models as well.
The configuration file contains metadata for the model.
Consider this example of an MNIST TensorFlow model:
name = "mnist"
platform = "tensorflow_graphdef"
[[inputs]]
name = "images_in"
datatype = "FP32"
shape = [28, 28, 1]
[[outputs]]
name = "flatten/Reshape"
datatype = "FP32"
shape = [10]
The name must match the name of the model directory, i.e. model_a
.
The platform identifies the type of the model and determines the file extension of the model file.
The supported platforms are:
Platform |
Model file extension |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
The inputs and outputs define the list of input and output tensors for the model. The names of the tensors may be significant if the platform needs them to perform inference.
The equivalent configuration file as a .pbtxt
file would be:
name: "mnist"
platform: "tensorflow_graphdef"
inputs [
{
name: "images_in"
datatype: "FP32"
shape: [28, 28, 1]
}
]
outputs [
{
name: "flatten/Reshape"
datatype: "FP32"
shape: [10]
}
]
While the inference server will accept a configuration file in this format, note that TOML files take priority if both are present and this format does not support defining ensembles.
Ensembles¶
With ensembles, the “model” actually consists of a set of models. The directory structure for the model repository for ensembles is:
/
├─ model_a/
│ ├─ 1/
│ │ ├─ <model_0>
│ │ ├─ <model_1>
│ │ ├─ ...
│ ├─ <config>
├─ model_b/
| ...
As a concrete example, consider a three stage ensemble:
cplusplus
backend executing abase64_decode
model: receive a base64-encoded JPEG image and decode it to an RGB arraycplusplus
backend executing ainvert_image
model: invert every pixel in the input RGB imagecplusplus
backend executing abase64_encode
model: convert the input RGB image to JPEG, base64-encode it and send it back to client
This ensemble uses the cplusplus
backend for each stage with different models.
The configuration file for this ensemble could be:
1[[models]]
2name = "invert_image"
3platform = "amdinfer_cpp"
4id = "base64_decode.so"
5
6[[models.inputs]]
7name = "image_in"
8datatype = "BYTES"
9shape = [1048576]
10id = ""
11
12[[models.outputs]]
13name = "image_out"
14datatype = "INT8"
15shape = [1080, 1920, 3]
16id = "preprocessed_image"
17
18[[models]]
19name = "execute"
20platform = "amdinfer_cpp"
21id = "invert_image.so"
22
23[[models.inputs]]
24name = "image_in"
25datatype = "INT8"
26shape = [1080, 1920, 3]
27id = "preprocessed_image"
28
29[[models.outputs]]
30name = "image_out"
31datatype = "INT8"
32shape = [1080, 1920, 3]
33id = "inverted_image"
34
35[[models]]
36name = "invert_image_postprocess"
37platform = "amdinfer_cpp"
38id = "base64_encode.so"
39
40[[models.inputs]]
41name = "image_in"
42datatype = "INT8"
43shape = [1080, 1920, 3]
44id = "inverted_image"
45
46[[models.outputs]]
47name = "image_out"
48datatype = "BYTES"
49shape = [1048576]
50id = ""
This single configuration file lists multiple models reflecting the multiple model files that are in the repository.
Each model in the ensemble is marked with [[models]]
and has a name and platform just like single models.
As before, the name is used to define the endpoint.
For chains, the first model’s name should match the name of the parent directory because this is the endpoint that will be used to send requests to the whole ensemble.
Each model also has an ID field that should be the name of the model file corresponding to this stage of the ensemble because the 1/
directory will contain multiple model files.
You can define one or more input or output tensors for each model using multiple [[models.inputs]]
or [[models.outputs]]
tags, respectively.
As in the single model case, each input/output tensor has a name, type and shape with the same meaning.
In the ensemble case, they also have an ID.
For output tensors, the ID is a unique string labeling this tensor.
The ID for input tensors should match the ID of the output tensor that is feeding it.
Input tensors with an empty ID indicate that the data comes from the external client.
Similarly, output tensors with an empty output ID indicate that the data goes to the external client.
The model repository for this example using the above configuration file would be:
/
├─ invert_image/
│ ├─ 1/
│ │ ├─ base64_decode.so
│ │ ├─ base64_encode.so
│ │ ├─ invert_image.so
│ ├─ config.toml