Chat, embeddings, rerank, transcribe — drop-in for the SDKs you already know. Anything else lands behind a single generic Infer route with a JSON schema you can fetch.
Curated mix of API-proxied flagships (Kimi) and open-weight models we host (Whisper, Jina, YOLO, GLiNER, …). Each is tagged with the endpoint it serves and whether it runs on GPU or CPU.
Reasoning, long-context, multimodal chat. Served behind /v1/chat/completions.
Flagship reasoning + vision + chat. 1M-token context. Extended thinking.
/v1/chat/completions· 1M tokensStep-by-step reasoning. 128K context. Lighter than K2.5.
/v1/chat/completions· 128K tokensAuto-routes between 8K / 32K / 128K context based on input length.
/v1/chat/completions· Auto