Quickstart - Deployment

This quickstart is intended for a user who is deploying the inference server and configuring the available models for inference. There are multiple ways to deploy the server but this quickstart only covers one Docker-based deployment. Once the server is up, you and any clients can make requests to it.

Prerequisites

  • Docker

  • Network access to enable clients to access the server

  • Sufficient disk space to host your models

Prepare the model repository

As the server administrator, you need to identify which models you want to make available for inference and organize them into a model repository. The model repository is a directory on your host machine where the server is running that will hold your models and their associated metadata. The format of the directory is as follows:

/
├─ model_a/
│  ├─ 1/
│  │  ├─ saved_model.x
│  ├─ config.pbtxt
| model_b/
|  ...

The model name, model_a in this template, must be unique among the models loaded on a particular server. This name is used to name the endpoint used to make inference requests to. Under this directory, there must be a directory named 1/ containing the model file itself and a text file named config.pbtxt. The model file must be named saved_model and the file extension depends on the type of the model. The config.pbtxt file contains metadata for the model.

As an example, consider a deployment of a single ResNet50 model.

$ wget -O tensorflow.zip https://www.xilinx.com/bin/public/openDownload?filename=tf_resnetv1_50_imagenet_224_224_6.97G_2.5.zip
$ unzip -j "tensorflow.zip" "tf_resnetv1_50_imagenet_224_224_6.97G_2.5/float/resnet_v1_50_baseline_6.96B_922.pb" -d .
$ mkdir -p /tmp/model_repository/resnet50/1
$ mv ./resnet_v1_50_baseline_6.96B_922.pb /tmp/model_repository/resnet50/1/saved_model.pb

For the models used here, their corresponding config.pbtxt should be placed in the chosen model repository (/tmp/model_repository/resnet50/):

name: "resnet50"
platform: "tensorflow_graphdef"
inputs [
    {
        name: "input"
        datatype: "FP32"
        shape: [224,224,3]
    }
]
outputs [
    {
        name: "resnet_v1_50/predictions/Reshape_1"
        datatype: "FP32"
        shape: [1000]
    }
]

The name must match the name of the model directory. The platform identifies the type of the model and determines the file extension of the model file. The supported platforms are:

Platform

Model file extension

tensorflow_graphdef

.pb

pytorch_torchscript

.pt

vitis_xmodel

.xmodel

onnx_onnxv1

.onnx

The inputs and outputs define the list of input and output tensors for the model. The names of the tensors may be significant if the platform needs them to perform inference.

Get the deployment image

The deployment image is optimized for size and only contains the runtime dependencies of the server to allow for quicker deployments. It has limited debugging capabilities and it contains a precompiled executable for the server that automatically starts when the container starts. You can pull the deployment image with Docker if it exists or build it yourself.

$ docker pull amdih/serve:uif1.1_zendnn_amdinfer_0.3.0

Start the image

You can start a container from the deployment image with docker as any other image:

$ docker run -d --volume /path/to/model/repository:/mnt/models:rw --publish 127.0.0.1::8998 --publish 127.0.0.1::50051 <image> amdinfer-server --enable-repository-watcher

Note

These commands are provided as an example for different hardware devices. Depending on your particular device(s) or desired container configuration, you may need to add or remove flags.

As the container starts, it will start the server and load the models from your model repository in /mnt/models in the container. The explicit command provided to the image, amdinfer-server --enable-repository-watcher, starts the server executable in the container and tells it to use the repository at the default location of /mnt/models. The --publish flags will map ports 8998 and 50051 in the container to arbitrary free ports on the host machine for HTTP and gRPC requests, respectively. You can use docker ps to show the running containers and what ports on the host machine are used by the container. Your clients will need these port numbers to make requests to the server. The endpoints for each model will be the name of the model in the config.pbtxt, which should match the name of the parent directory in the model repository. In this example, it would be “resnet50”. Once the container is started, you and any clients can make requests to it.