Call a model, not a vendor.

OpenAI-compatible Chat & Embeddings. Cohere-style Rerank. Whisper for audio. A generic Infer for vision, OCR, NER, and translation. 23 models across 5 capabilities — and you can register your own.

Get an API key Read docs

api.dodil.io / v1

catalog

# OpenAI-compatible Chat — point at /v1/chat/completions.
$ curl https://api.dodil.io/v1/chat/completions \
    -H "Authorization: Bearer $DODIL_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "kimi-k2.5",
      "messages": [{"role": "user", "content": "Summarise this PDF."}],
      "stream": false
    }'

# Or stream via /v1/chat/completions/stream — SSE chunks
# carry usage stats + finish_reason on the last chunk.

OpenAI-compatible — drop in the SDK you already use, swap the base URL.ModelService@v1

The problem

Don’t marry a model — the best one keeps changing.

No single model wins every task, and inference gets cheaper by the month. Hard-wiring one model is a liability.

Every task — chat, embeddings, rerank, vision, audio — has a different best model, and it changes monthly.

~7 models per enterprise

Frontier APIs are costly when an open or specialized model would do the job just as well.

open / self-hosted −40–60%

Today’s model is overpriced tomorrow — but rewiring code to swap it is painful.

inference −10× / year

One OpenAI-compatible API across every model — swap any of them without touching your code, frontier or self-hosted.

Surface

Six endpoints.
Every model behind them.

Chat, embeddings, rerank, transcribe — drop-in for the SDKs you already know. Anything else lands behind a single generic Infer route with a JSON schema you can fetch.

Chat completionslive

POST/v1/chat/completions47 ms · p50

Embeddingslive

POST/v1/embeddings18 ms · p50

Reranklive

POST/v1/rerank62 ms · p50

Transcribelive

POST/v1/audio/transcriptions3.2 s · avg

Inferlive

POST/v1/infer92 ms · p50

List modelscached

19.8k

cached lookups

hot

GET/v1/models4 ms · cached

Catalog

23 models.
Five capabilities.

Curated mix of API-proxied flagships (Kimi) and open-weight models we host (Whisper, Jina, YOLO, GLiNER, …). Each is tagged with the endpoint it serves and whether it runs on GPU or CPU.

Reasoning, long-context, multimodal chat. Served behind /v1/chat/completions.

Kimi K2.5