Running ResNet50 - Python

This page walks you through the Python versions of the ResNet50 examples. These examples and script are intended to run in the development container. You can see the full source files in the repository for more details.

The inference server binds its C++ API to Python so the Python usage and functions look similar to their C++ counterparts but there are some differences due to the available features in both languages. You can read the C++ version of this example to better compare these two.

Note

These examples are intended to demonstrate the API and how to communicate with the server. They are not intended to show the most optimal performance for each backend.

Include the module

In the development container, the Python library is automatically built and installed as part of the CMake build process. You can include the Python library by importing the amdinfer module. The different submodules are imported by default but you can include them manually as well. These explicit imports can help IDEs resolve Python imports for autocompletion.

import amdinfer
import amdinfer.pre_post as pre_post

Start the server

This example assumes that the server is not running elsewhere and starts it locally from Python instead. Creating the amdinfer.Server object starts the inference server backend and it stays alive as long as the object remains in scope.

server = amdinfer.Server()

Depending on what protocol you want to use to communicate with the server, you may need to start it explicitly using one of the object’s methods.

For example, if you were using HTTP/REST:

server.startHttp(args.http_port)

If the server is already running somewhere, you don’t need to do this.

Create the client object

You can create a client in Python corresponding to the protocol you want to use to talk to the server.

server_addr = f"http://{args.ip}:{args.http_port}"
client = amdinfer.HttpClient(server_addr)

Load a worker

Once the server is ready, you have to load a worker to handle your request. This worker, and any load-time parameters it accepts, are backend-specific. After loading the worker, you get back an endpoint string that you use to make requests to this worker. If a worker is already ready on the server or you already have an endpoint, then you don’t need to do this.

parameters = amdinfer.RequestParameters()
parameters.put("model", args.model)
parameters.put("batch_size", args.batch_size)
endpoint = client.workerLoad("xmodel", parameters)

After loading a worker, make sure it’s ready before attempting to make an inference.

amdinfer.waitUntilModelReady(client, endpoint)

Prepare images

Depending on the model, you may need to perform some preprocessing of the data before making an inference request. For ResNet50, this preprocessing generally consists of resizing the image, normalizing its values, and possibly converting types but its exact implementation depends on the model and on what the worker expects. The implementations of the preprocessing functions can be seen in the examples’ sources and they differ for each backend.

In these examples, you can pass a path to an image or to a directory to the executable. This single path gets converted to a List of paths containing just the path you passed in or paths to all the files in the directory you passed in and its passed to the preprocess function. The file at each path is opened as a numpy array and stored in a List that the preprocess function returns.

paths = resolve_image_paths(pathlib.Path(args.image))
images = preprocess(paths)

Construct requests

Using the images after preprocessing, you can construct requests to the inference server. Each image is constructed into an InferenceRequest using the ImageInferenceRequest helper function. This function accepts a single numpy array or a list of numpy arrays, where each array represents an input tensor. The ResNet50 model only accepts a single input tensor so a single image is enough. This function infers the shape and datatype of the image using the properties stored in the numpy array.

def construct_requests(images):
    """
    Construct requests for the inference server from the input images. For ResNet50,
    a valid request includes a single input tensor containing a square image.

    Args:
        images (list[numpy.ndarray]): the input images

    Returns:
        list[amdinfer.InferenceRequest]: the requests
    """
    requests = []
    for image in images:
        requests.append(amdinfer.ImageInferenceRequest(image))
    return requests


Make an inference

There are multiple ways of making a request to the inference server, some of which are used in the different implementations of these examples. Before processing the response, you should verify it’s not an error.

Then, you can examine the outputs and, depending on the model, postprocess the results. For ResNet50, the raw results from the inference server lists the probabilities for each output class. The postprocessing identifies the highest probability classes and the top few of these are printed using the labels file to map the indices to a human-readable name.

Here are some examples of making an inference used in these examples:

This is the simplest way to make an inference: a blocking single inference where you loop through all the requests and make requests to the server one at a time.

# in vitis.py
for image_path, request in zip(paths, requests):
    response = client.modelInfer(endpoint, request)
    assert not response.isError()

    outputs = response.getOutputs()
    assert len(outputs) == 1
    top_indices = postprocess(outputs[0], args.top)
    print_label(top_indices, args.labels, image_path)

There are also some helper methods that wrap the basic inference APIs provided by the client. The inferAsyncOrdered method accepts a list of requests, makes all the requests asynchronously using the C++ modelInferAsync API, waits until each request completes, and then returns a list of responses. If there are multiple requests sent in this way, they may be batched together by the server.

# in migraphx.py
responses = amdinfer.inferAsyncOrdered(client, endpoint, requests)
print("Making inferences...")
for image_path, response in zip(paths, responses):
    assert not response.isError()

    outputs = response.getOutputs()
    assert len(outputs) == 1
    top_indices = postprocess(outputs[0], args.top)
    print_label(top_indices, args.labels, image_path)