REST Endpoints¶
The REST endpoints are based on KServe’s v2 specification. Additional endpoints are driven by community adoption. The full OpenAPI 3.0 spec is available in the repository.
- GET /v2/¶
Server Metadata
The server metadata endpoint provides information about the server. A server metadata request is made with an HTTP GET to a server metadata endpoint. In the corresponding response the HTTP body contains the Server Metadata Response JSON Object or the Server Metadata Response JSON Error Object.
- Status Codes:
200 OK –
OK
HTTP/1.1 200 OK Content-Type: application/json 1
400 Bad Request – Bad Request
- POST /v2/hardware¶
Has Hardware
The hardware endpoint provides information about what hardware exists and is accessible to the server. A request is made by making an HTTP POST to the hardware endpoint with the Hardware Request JSON Object
POST /v2/hardware HTTP/1.1 Content-Type: application/json 1
- Status Codes:
200 OK – The server has the requested resource
404 Not Found – The server does not have the requested resource or not enough of it
- GET /v2/health/live¶
Server Live
The “server live” API indicates if the inference server is able to receive and respond to metadata and inference requests. The “server live” API can be used directly to implement the Kubernetes livenessProbe.
- Status Codes:
200 OK – OK
- GET /v2/health/ready¶
Server Ready
The “server ready” health API indicates if all the models are ready for inferencing. The “server ready” health API can be used directly to implement the Kubernetes readinessProbe.
- Status Codes:
200 OK – OK
- GET /v2/models¶
Models
The “models” endpoint provides you a list of model endpoints that are currently available for metadata and inference
- Status Codes:
200 OK –
OK
HTTP/1.1 200 OK Content-Type: application/json 1
- GET /v2/models/${MODEL_NAME}/ready¶
Model Ready
The “model ready” health API indicates if a specific model is ready for inferencing. The model name and (optionally) version must be available in the URL. If a version is not provided the server may choose a version based on its own policies. Currently, version is not supported.
- Parameters:
MODEL_NAME (string, required) –
- Status Codes:
200 OK – OK
- GET /v2/models/${MODEL_NAME}¶
Model Metadata
The per-model metadata endpoint provides information about a model. A model metadata request is made with an HTTP GET to a model metadata endpoint. In the corresponding response the HTTP body contains the Model Metadata Response JSON Object or the Model Metadata Response JSON Error Object. The model name and (optionally) version must be available in the URL. If a version is not provided the server may choose a version based on its own policies or return an error. Version is currently not supported
- Parameters:
MODEL_NAME (string, required) –
- Status Codes:
200 OK –
OK
HTTP/1.1 200 OK Content-Type: application/json 1
- POST /v2/models/${MODEL_NAME}/load¶
Model Load
Prior to inference, a model must be loaded to serve it. A model can be loaded with an HTTP POST request to the model load endpoint. The request consists of an optional set of parameters for the model. Model load expects that the model files are already available in the expected format in the “model-repository” directory for the running server.
- Parameters:
MODEL_NAME (string, required) –
POST /v2/models/${MODEL_NAME}/load HTTP/1.1 Content-Type: application/json 1
- Status Codes:
200 OK – OK
400 Bad Request – Bad Request
- POST /v2/models/${MODEL_NAME}/unload¶
Model Unload
A model can be unloaded with an HTTP POST request to the model unload endpoint. This is identical to ‘worker unload’
- Parameters:
MODEL_NAME (string, required) –
- Status Codes:
200 OK – OK
400 Bad Request – Bad Request
- POST /v2/workers/${WORKER_NAME}/load¶
Worker Load
Prior to inference, a model must be loaded to serve it. A model can be loaded with an HTTP POST request to the worker load endpoint. The request consists of an optional set of parameters for the worker. Depending on the worker, some of these parameters may be required.
- Parameters:
WORKER_NAME (string, required) –
POST /v2/workers/${WORKER_NAME}/load HTTP/1.1 Content-Type: application/json 1
- Status Codes:
200 OK –
OK
HTTP/1.1 200 OK Content-Type: text/html <html>endpoint</html>
400 Bad Request – Bad Request
- POST /v2/workers/${WORKER_NAME}/unload¶
Worker Unload
A worker can be unloaded with an HTTP POST request to the worker unload endpoint. This is identical to ‘model unload’
- Parameters:
WORKER_NAME (string, required) –
- Status Codes:
200 OK – OK
400 Bad Request – Bad Request
- POST /v2/models/${MODEL_NAME}/infer¶
Inference
An inference request is made with an HTTP POST to an inference endpoint. In the request the HTTP body contains the Inference Request JSON Object. In the corresponding response the HTTP body contains the Inference Response JSON Object or Inference Response JSON Error Object. See Inference Request Examples for some example HTTP/REST requests and responses.
- Parameters:
MODEL_NAME (string, required) –
POST /v2/models/${MODEL_NAME}/infer HTTP/1.1 Content-Type: application/json 1
- Status Codes:
200 OK –
OK
HTTP/1.1 200 OK Content-Type: application/json 1
400 Bad Request – Bad Request