Ensembles

Ensembles are a logical pipeline of workers to execute a graph of computations where the output tensors of one model are passed as input to others. You can use ensembles to define a pipeline like pre-processing, inference, and post-processing on the server.

Note

Ensembles are currently limited to chains. They must also be defined at load-time rather than at run-time.

Defining ensembles

Ensembles can be defined in two ways: in the model repository and with the API.

Model repository

Ensembles can be defined in the model repository to enable their use with Docker and KServe deployments. You can also use this approach if you want to load a model from a local repository using modelLoad. At a high-level, you must define the ensemble in the model’s configuration file, place all the model files in the directory and the ensemble gets loaded as any other single model. For more details, see how to define ensembles in the model repository.

API

You can also load a chain as a set of workers directly using the client API method loadEnsemble() and providing it an array of workers and corresponding parameters. The method returns an array of endpoints, where the first endpoint corresponds to the endpoint used to send requests to the ensemble. To unload an ensemble, you can use the client API method unloadModels() and provide the array of endpoints.