K3 · Knowledge Storage

Augmented S3 that understands your data.

One S3-compatible bucket. Three things you can do with every object that lands in it: search it by meaning (RAG), transform it with an AI pipeline, or analyze it across millions of siblings — all without standing up a single extra service.

Create a knowledge bucket View on GitHub

Store and retrieve objects by key.

Store objects and retrieve them by meaning, transform them with AI pipelines, and analyze them at scale — same auth, same ACLs, same bucket.

Three jobs, one bucket

RAG. Transform. Analyze.
All on the same object.

K3 isn’t a vector database with extras. It’s a bucket where every object is searchable by meaning, processable by any AI pipeline, and analyzable across millions of siblings — using the same rules engine, the same auth, and the same storage you already point your S3 SDK at.

RAG

Find what's relevant, instantly.

Hybrid semantic search across your bucket — dense vectors, BM25, multimodal queries (text, image, audio, video), and reranking. The full RAG stack, just sitting there waiting for a query the moment your first object lands.

Transform

Run an AI pipeline on every upload.

Trigger a Scriptum pipeline on every object that matches your rules — summarize, redact, caption, transcribe, classify, extract structured fields. Outputs land back in the same bucket under a derived prefix like `summaries/`, ready to query, link, or feed back into RAG.

Analyze

Make sense of the whole bucket.

Run Scriptum pipelines across many objects at once — anomaly detection, batch scoring, aggregate extraction, dataset construction. Same rules engine, same change detection. Outputs become first-class objects you can search, transform again, or export.

Outcomes

From days to minutes,
in every industry.

Six industries, one bucket. Every card shows the time saved and the launch-day pipelines that power it — all shipping today, all in the same Augmented S3.

Legal

M&A diligence

6 months

a weekend

Hybrid search catches semantic clause variants keyword review misses entirely.

Contract extractionOCR + LayoutEntities + PIISummarization

Finance

Compliance + AP

Weeks per audit query

seconds

Every answer carries a citation chain: policy → regulation → guidance.

Invoice/Receipt parsingTable extractionTranslationSummarization

Insurance

Claims + fraud

30-day claim cycle

hours

Auto-PII redaction and fraud signal scoring run inline during ingest, not weeks later.

OCR + LayoutEntities + PIIImage understandingSentiment + Intent

Engineering

Defect intelligence

30% of engineer time searching

minutes per query

Defect patterns auto-cluster across years of FMEA reports — root causes that no single engineer would have spotted.

OCR + LayoutCode intelligenceImage understandingClassification

Tech / AI

RAG-as-a-service

6-month RAG build

1 day

Multi-tenant + sovereign by default — your enterprise customers get data residency for free.

Modality-routed embeddingSummarizationResume/CV parsingClassification

Defense

Source material triage

Weeks of manual triage

hours

Foreign-language source material auto-translated and indexed alongside English; multi-classification handling out of one bucket.

ASR + DiarizationTranslationOCR + LayoutImage understandingClassification

What you get
on day one.

The platform underneath the trio. Every capability below is shared across RAG, Transform, and Analyze — wired into the same S3-compatible bucket you’d point your existing tooling at. No staging, no migration, no glue code.

Hybrid search

Vector + BM25, fused with RRF

Dense embeddings and BM25 sparse run in parallel and merge with Reciprocal Rank Fusion. Optional Jina Reranker v2 on top via Ignite. Pick a collection, ship a query.

Multimodal

Text, image, audio, video — one query

Multimodal collections accept any modality the embedding pipeline supports. Search a fleet of PDFs and a folder of screenshots in the same call, with the same scoring.

Ingest rules

One object, many pipelines

Per-bucket rules match on glob, MIME, and size. Each rule targets a Scriptum pipeline — embedding, summarization, classification, extraction, anything you can write — and routes results to a collection or a derived prefix. Same engine for RAG, Transform, and Analyze.

Auth & ACL

S3-style policies, presigned URLs

Private/public/custom bucket ACLs evaluated with S3-style policy semantics. Three flavors of presigned URL: SigV4, SigV2, and K3 token-based. Drop-in for everything your S3 SDK already does.

Multi-tenant

Dodil IAM, service ID + secret

Plugs into Dodil IAM out of the box — configure each service with a service ID and service secret, every request scoped to an org via IAM JWT or SigV4 lookup. Tenant isolation runs end-to-end across storage, search, processing, and credentials — two organisations on the same cluster never see each other's data.

Sync engine

Delta detection, on a schedule

Periodic and on-demand scans with etag-based change detection — only new or changed objects get re-processed, so a bucket of a million files isn't a bucket of a million pipeline runs. Self-healing under load, with full job visibility.

The flow

Upload once. Indexed everywhere.

Step 01

Upload

Drop the file in via the S3 API you already use. K3 re-signs the request with your org credentials, persists the object, and queues a discovery event the moment it lands.

Step 02

Discover

K3 matches the new object against your rules — glob patterns, MIME types, size limits — and queues an ingest job for each match.

Step 03

Ingest

Scriptum chunks, normalizes, and embeds the content. Vectors land in your bucket's vector index; job status is tracked end-to-end so you can see exactly where every object is.

Step 04

Search

Queries embed via Scriptum, fan out across collections, fuse dense and sparse hits with RRF, and optionally rerank for the highest-quality top-K.

Four calls. Whole platform.

Wire-compatible with the S3 API for upload, REST for everything else. Your existing tooling and SDKs stay the same.

# Wire-compatible with the S3 API. Re-point and you're done.
aws s3api put-object \
  --bucket my-knowledge \
  --key papers/transformer.pdf \
  --body transformer.pdf \
  --endpoint-url https://k3.dodil.io

# K3 re-signs the request with your org credentials, persists the object,
# and queues a discovery event the moment it lands.

Try it

Search a live demo bucket.

A small bucket, a tiny ranker, and the same modes the real VectorSearch RPC exposes. Toggle Auto, Vector, or Hybrid to see how K3 routes the same query through dense embeddings, BM25 sparse, or both fused with RRF.

bucket: demo-knowledge

8 indexed objects · 1 collection · jina-v3

rag-eval-bench.csvHYBRIDscore 2.70

Hybrid (vector+BM25) beat pure vector by 11% on factoid Qs and 4% on synthesis Qs across 3k evals…

evaluationraghybridvector

vector-tuning-notes.ipynbHYBRIDscore 1.85

Switching from cosine to dot-product cut tail latency by 22% on the Jina v3 collection. Recall held at 0.94…

vectorembeddingjinalatency

customer-call-acme.txtHYBRIDscore 1.35

ACME wants to ingest 200k contracts/quarter and run clause similarity search across the entire data room…

customercontractsclausesearch

Demo bucket runs entirely in your browser — no requests leave the page.

Built to run, not just to demo

Everything you need to put it in production.

K3 isn’t a research artifact. The API, the console, the deployment story, and the reliable processing under load are already wired up — drop it in your cluster or run it on Dodil Cloud.

Programmable everything

Every operation in K3 — buckets, sources, ingest rules, search, ACL, presigned URLs — has a first-class API. Wire it into your platform with the SDK of your choice; the same surface that powers the console is what you build against.

Hosted console

A working web console for managing buckets, ingest rules, vector collections, search, and access policies. Same auth, same multi-tenancy. No need to build your own UI before you can ship.

Cluster-ready

Deploys cleanly into your Kubernetes cluster — autoscaling, health checks, hardened images, secrets handled. Or skip the setup entirely and run it on Dodil Cloud.

Reliable processing

Async ingestion with durable job tracking and self-healing under load. Every upload is queued, traced, and visible end-to-end — jobs don't get lost when a worker restarts or a downstream service blips.

Questions, answered.

Ready to give your data meaning?

Join Early Access and get a knowledge bucket spun up on the London control plane — with hybrid search, multimodal retrieval, and the full ops stack ready on day one.

Get early access Read the docs

DODILFrom data to intelligence.

Home Pricing Progress