REST Endpoints¶

The REST endpoints are based on KServe’s v2 specification. Additional endpoints are driven by community adoption. The full OpenAPI 3.0 spec is available in the repository.

GET /v2/¶

Server Metadata

The server metadata endpoint provides information about the server. A server metadata request is made with an HTTP GET to a server metadata endpoint. In the corresponding response the HTTP body contains the Server Metadata Response JSON Object or the Server Metadata Response JSON Error Object.

Status Codes

200 OK –

OK

HTTP/1.1 200 OK
Content-Type: application/json

1

400 Bad Request – Bad Request

POST /v2/hardware¶

Has Hardware

The hardware endpoint provides information about what hardware exists and is accessible to the server. A request is made by making an HTTP POST to the hardware endpoint with the Hardware Request JSON Object

POST /v2/hardware HTTP/1.1
Content-Type: application/json

1

Status Codes

200 OK – The server has the requested resource
404 Not Found – The server does not have the requested resource or not enough of it

GET /v2/health/live¶

Server Live

The “server live” API indicates if the inference server is able to receive and respond to metadata and inference requests. The “server live” API can be used directly to implement the Kubernetes livenessProbe.

Status Codes

200 OK – OK

GET /v2/health/ready¶

Server Ready

The “server ready” health API indicates if all the models are ready for inferencing. The “server ready” health API can be used directly to implement the Kubernetes readinessProbe.

Status Codes

200 OK – OK

GET /v2/models¶

Models

The “models” endpoint provides you a list of model endpoints that are currently available for metadata and inference

Status Codes

200 OK –

OK

HTTP/1.1 200 OK
Content-Type: application/json

1

GET /v2/models/${MODEL_NAME}/ready¶

Model Ready

The “model ready” health API indicates if a specific model is ready for inferencing. The model name and (optionally) version must be available in the URL. If a version is not provided the server may choose a version based on its own policies. Currently, version is not supported.

Parameters

MODEL_NAME (string, required) –

Status Codes

200 OK – OK

GET /v2/models/${MODEL_NAME}¶

Model Metadata

The per-model metadata endpoint provides information about a model. A model metadata request is made with an HTTP GET to a model metadata endpoint. In the corresponding response the HTTP body contains the Model Metadata Response JSON Object or the Model Metadata Response JSON Error Object. The model name and (optionally) version must be available in the URL. If a version is not provided the server may choose a version based on its own policies or return an error. Version is currently not supported

Parameters

MODEL_NAME (string, required) –

Status Codes

200 OK –

OK

HTTP/1.1 200 OK
Content-Type: application/json

1

POST /v2/models/${MODEL_NAME}/load¶

Model Load

Prior to inference, a model must be loaded to serve it. A model can be loaded with an HTTP POST request to the model load endpoint. The request consists of an optional set of parameters for the model. Model load expects that the model files are already available in the expected format in the “model-repository” directory for the running server.

Parameters

MODEL_NAME (string, required) –

POST /v2/models/${MODEL_NAME}/load HTTP/1.1
Content-Type: application/json

1

Status Codes

200 OK – OK
400 Bad Request – Bad Request

POST /v2/models/${MODEL_NAME}/unload¶

Model Unload

A model can be unloaded with an HTTP POST request to the model unload endpoint. This is identical to ‘worker unload’

Parameters

MODEL_NAME (string, required) –

Status Codes

200 OK – OK
400 Bad Request – Bad Request

POST /v2/workers/${WORKER_NAME}/load¶

Worker Load

Prior to inference, a model must be loaded to serve it. A model can be loaded with an HTTP POST request to the worker load endpoint. The request consists of an optional set of parameters for the worker. Depending on the worker, some of these parameters may be required.

Parameters

WORKER_NAME (string, required) –

POST /v2/workers/${WORKER_NAME}/load HTTP/1.1
Content-Type: application/json

1

Status Codes

200 OK –

OK

HTTP/1.1 200 OK
Content-Type: text/html

<html>endpoint</html>

400 Bad Request – Bad Request

POST /v2/workers/${WORKER_NAME}/unload¶

Worker Unload

A worker can be unloaded with an HTTP POST request to the worker unload endpoint. This is identical to ‘model unload’

Parameters

WORKER_NAME (string, required) –

Status Codes

200 OK – OK
400 Bad Request – Bad Request

POST /v2/models/${MODEL_NAME}/infer¶

Inference

An inference request is made with an HTTP POST to an inference endpoint. In the request the HTTP body contains the Inference Request JSON Object. In the corresponding response the HTTP body contains the Inference Response JSON Object or Inference Response JSON Error Object. See Inference Request Examples for some example HTTP/REST requests and responses.

Parameters

MODEL_NAME (string, required) –

POST /v2/models/${MODEL_NAME}/infer HTTP/1.1
Content-Type: application/json

1

Status Codes

200 OK –

OK

HTTP/1.1 200 OK
Content-Type: application/json

1

400 Bad Request – Bad Request

GET /metrics¶

Metrics

Get Prometheus-styled metrics from the server

Status Codes

200 OK –

OK

HTTP/1.1 200 OK
Content-Type: text/html

<html>metrics...</html>