REST Endpoints

The REST endpoints are based on KServe’s v2 specification. Additional endpoints are driven by community adoption. The full OpenAPI 3.0 spec is available in the repository.

GET /v2/

Server Metadata

The server metadata endpoint provides information about the server. A server metadata request is made with an HTTP GET to a server metadata endpoint. In the corresponding response the HTTP body contains the Server Metadata Response JSON Object or the Server Metadata Response JSON Error Object.

Status Codes:
POST /v2/hardware

Has Hardware

The hardware endpoint provides information about what hardware exists and is accessible to the server. A request is made by making an HTTP POST to the hardware endpoint with the Hardware Request JSON Object

POST /v2/hardware HTTP/1.1
Content-Type: application/json

1
Status Codes:
  • 200 OK – The server has the requested resource

  • 404 Not Found – The server does not have the requested resource or not enough of it

GET /v2/health/live

Server Live

The “server live” API indicates if the inference server is able to receive and respond to metadata and inference requests. The “server live” API can be used directly to implement the Kubernetes livenessProbe.

Status Codes:
GET /v2/health/ready

Server Ready

The “server ready” health API indicates if all the models are ready for inferencing. The “server ready” health API can be used directly to implement the Kubernetes readinessProbe.

Status Codes:
GET /v2/models

Models

The “models” endpoint provides you a list of model endpoints that are currently available for metadata and inference

Status Codes:
  • 200 OK

    OK

    HTTP/1.1 200 OK
    Content-Type: application/json
    
    1
    

GET /v2/models/${MODEL_NAME}/ready

Model Ready

The “model ready” health API indicates if a specific model is ready for inferencing. The model name and (optionally) version must be available in the URL. If a version is not provided the server may choose a version based on its own policies. Currently, version is not supported.

Parameters:
  • MODEL_NAME (string, required) –

Status Codes:
GET /v2/models/${MODEL_NAME}

Model Metadata

The per-model metadata endpoint provides information about a model. A model metadata request is made with an HTTP GET to a model metadata endpoint. In the corresponding response the HTTP body contains the Model Metadata Response JSON Object or the Model Metadata Response JSON Error Object. The model name and (optionally) version must be available in the URL. If a version is not provided the server may choose a version based on its own policies or return an error. Version is currently not supported

Parameters:
  • MODEL_NAME (string, required) –

Status Codes:
  • 200 OK

    OK

    HTTP/1.1 200 OK
    Content-Type: application/json
    
    1
    

POST /v2/models/${MODEL_NAME}/load

Model Load

Prior to inference, a model must be loaded to serve it. A model can be loaded with an HTTP POST request to the model load endpoint. The request consists of an optional set of parameters for the model. Model load expects that the model files are already available in the expected format in the “model-repository” directory for the running server.

Parameters:
  • MODEL_NAME (string, required) –

POST /v2/models/${MODEL_NAME}/load HTTP/1.1
Content-Type: application/json

1
Status Codes:
POST /v2/models/${MODEL_NAME}/unload

Model Unload

A model can be unloaded with an HTTP POST request to the model unload endpoint. This is identical to ‘worker unload’

Parameters:
  • MODEL_NAME (string, required) –

Status Codes:
POST /v2/workers/${WORKER_NAME}/load

Worker Load

Prior to inference, a model must be loaded to serve it. A model can be loaded with an HTTP POST request to the worker load endpoint. The request consists of an optional set of parameters for the worker. Depending on the worker, some of these parameters may be required.

Parameters:
  • WORKER_NAME (string, required) –

POST /v2/workers/${WORKER_NAME}/load HTTP/1.1
Content-Type: application/json

1
Status Codes:
  • 200 OK

    OK

    HTTP/1.1 200 OK
    Content-Type: text/html
    
    <html>endpoint</html>
    

  • 400 Bad Request – Bad Request

POST /v2/workers/${WORKER_NAME}/unload

Worker Unload

A worker can be unloaded with an HTTP POST request to the worker unload endpoint. This is identical to ‘model unload’

Parameters:
  • WORKER_NAME (string, required) –

Status Codes:
POST /v2/models/${MODEL_NAME}/infer

Inference

An inference request is made with an HTTP POST to an inference endpoint. In the request the HTTP body contains the Inference Request JSON Object. In the corresponding response the HTTP body contains the Inference Response JSON Object or Inference Response JSON Error Object. See Inference Request Examples for some example HTTP/REST requests and responses.

Parameters:
  • MODEL_NAME (string, required) –

POST /v2/models/${MODEL_NAME}/infer HTTP/1.1
Content-Type: application/json

1
Status Codes:
GET /metrics

Metrics

Get Prometheus-styled metrics from the server

Status Codes:
  • 200 OK

    OK

    HTTP/1.1 200 OK
    Content-Type: text/html
    
    <html>metrics...</html>