ZeroOne / Custom SLM
Founder tier — 25 seats only · locked-for-life pricing

The AI that runs on your own hardware.

Train a custom small language model on your operations data. Ship it as a desktop app, a CLI, or an edge service. No per-token bills. No data leaving your machine. Built for Indian SMBs and operations-heavy businesses by ZeroOne D.O.T.S AI.

Your data stays on your machineExport to GGUF · llama.cpp · OllamaZero per-inference cost
zeroone-slm · train
$zeroone init my-ops-model
~ created project my-ops-model
~ workspace: ./models/my-ops-model
$zeroone data add ./tickets.csv ./qa-pairs.jsonl
~ ingested 8,412 rows · 2.1 MB · 0 rows left the device
$zeroone train --base qwen2.5-3b --epochs 3 --output gguf
~ fine-tuning on local GPU · loss 2.31 → 0.42
~ exported my-ops-model.gguf (1.8 GB)
$zeroone serve my-ops-model.gguf --port 11434
~ ready · http://localhost:11434/v1/chat/completions
$
The D.O.T.S thesis

Generic AI is a tax on every conversation.

A custom SLM is the only AI strategy that gets cheaper, more accurate, and more defensible as you use it. Four reasons why this matters for operations-heavy businesses.

DData

Your data trains it. Your data stays.

Tickets, call transcripts, PO data, SOPs, vendor emails — whatever you have. Encrypted in transit, processed on your hardware. We never see it.

OOperations

Trained for the job you actually do.

Generic LLMs hallucinate on your part numbers, your supplier names, your warranty SOPs. A fine-tuned SLM doesn't — because it learned from yours.

TTech

Runs where you run.

Export GGUF, ONNX, or a Docker image. Deploy to a laptop, a Jetson, a 4-core VM in your colo. No internet required. No vendor lock-in.

SStrategy

Zero per-token bills. Forever.

Train once, run forever. Compare against ₹40,000–₹2,00,000/month in GPT-4 API costs for a busy ops team. Your CFO will thank you.

How it works

Four steps. No ML PhD required.

Custom SLM is the studio + runtime ZeroOne wished existed when we built our first client model. It's now the same pipeline we use internally.

01

Bring your data

CSV, JSONL, Markdown, PDFs, call recordings, support tickets, SOPs. The studio normalizes it into instruction-pair training format. Bad rows get flagged, not dropped silently.

~ 200–10,000 rows is the sweet spot. We'll tell you if you need more.

02

Pick a base model

Qwen 2.5 (3B / 7B), Llama 3.2 (1B / 3B), Phi 3.5, Gemma 2, or our DOTS-tuned starter. Picked by your hardware target — desktop, edge, server — and your latency budget.

Hindi · Marathi · Gujarati · Tamil support varies by base. We benchmark for you.

03

Fine-tune on your hardware

LoRA or QLoRA fine-tuning on your GPU, your colo, or rented H100 hour. Loss curves, eval set accuracy, hallucination rate against your hold-out — all visible.

Train time: 20 minutes (3B, LoRA, 1k rows) → 6 hours (7B, full FT, 10k rows).

04

Ship anywhere

Export GGUF for llama.cpp / Ollama, ONNX for edge devices, or a Docker image with a built-in OpenAI-compatible server. Update without re-training. Roll back without losing data.

Already wired for: Mac laptops · Jetson Orin · 4-core VMs · Raspberry Pi 5.

Roadmap

Built in public. Ship dates, not promises.

Four phases. Studio is the day-one product. Hub, Cloud, and Vault unlock as the waitlist converts. Founder seats get a vote on prioritization and a forever-locked rate as we add capability.

Live

Studio

Available in v0.1

Web-based fine-tuning studio. Dataset upload, training runs, eval dashboards, GGUF export. The piece you need to ship your first model.

Next

Hub

Q3 2026

Public + private model registry. Pin a base model, share fine-tunes with your team, fork from community models. Built on Hugging Face conventions with a privacy layer.

Later

Cloud

Q4 2026

Optional hosted inference for teams that don't want to manage GPUs. Per-second billing, OpenAI-compatible API, EU + India data residency.

Later

Vault

Q1 2027

Encrypted dataset vault with row-level audit. SOC 2 + ISO 27001 path. The piece your CISO will want signed off before scaling.

Pricing

One tier. Twenty-five seats. Then it closes.

We don't know what the right price is yet. The founders are paying ZeroOne to figure that out together — and locking in their rate forever in exchange. After 25 seats, the next tier is announced and won't be this cheap.

Founder

25 seats only

For founders, CTOs, and ops leads at SMBs that want to own their AI stack instead of renting it.

₹4,999/ month

₹59,988 / year · billed monthly or annually · locked for life

  • Up to 10 fine-tuning runs per month
  • Up to 3 production models exported
  • Train on Qwen, Llama 3.2, Phi 3.5, Gemma 2 — and ZeroOne starter bases
  • GGUF · ONNX · Docker export · all formats included
  • Direct Slack channel with the founding ZeroOne team
  • Vote on Hub, Cloud, Vault roadmap priorities
  • Forever-locked pricing — no renewal hikes, ever
  • Migration support if you're moving off GPT-4 / Claude / Gemini APIs
Why one tier?

Tiered pricing on a pre-launch product is theater. We'd rather tell you what we don't know. We don't know what the right per-seat price is at scale. We don't know if you need 10 runs/month or 50.

What we do know: 25 founders paying ₹4,999/month each gives us ₹15L/year — enough runway to ship Studio and Hub without pretending we're an enterprise platform on day one.

Not a founder? You can still

Join the free waitlist

We'll reach out when general access opens. No card, no commitment.

FAQ

The questions worth asking.

Cost, control, and accuracy on your domain. A 3B fine-tuned model on your operations data routinely outperforms GPT-4 on your specific tasks at 100× lower per-inference cost, with zero data leaving your machine. You give up generality; you gain ownership.

Yes. Small bases (1–3B) train fine on a Mac M3 / M4 in a few hours, or on a rented L4 GPU hour (~₹100). For larger models we can rent H100 time for you transparently. The Studio shows you the cheapest option for each run.

200 high-quality instruction-response pairs is the floor for a useful fine-tune on a domain task. 1,000–5,000 is the sweet spot. We help you bootstrap from raw data (tickets, transcripts, SOPs) into the right format.

Depends on the base model. Llama 3.2 and Qwen 2.5 both have credible Hindi performance. Tamil and Bengali are weaker. We benchmark each base on your eval set before training so you know what you're getting.

Nowhere we don't tell you. Local training stays on your machine, full stop. If you rent GPU through us, data is uploaded encrypted, processed in an ephemeral container, and deleted on run completion. Vault (Q1 2027) adds row-level audit + SOC 2 path.

The Studio shows you eval-set accuracy, hallucination rate against your hold-out, and per-example failures before you export. We help you diagnose: more data, different base, different hyperparameters. Bad runs don't burn your credits.

Adjacent. Those are excellent for indie devs in the US/EU. We're focused on Indian SMBs and operations-heavy businesses — manufacturing, logistics, BPO, fintech ops — where the use cases are different (regional language, on-prem deploy, INR economics). We're betting that market is underserved.

Yes. The consulting layer at zeroonedotsai.consulting offers fine-tuning, eval-set construction, and deployment as a service — for teams that want results, not tools. The platform powers both self-serve and consulting work.

Stop renting AI. Own it.

Twenty-five founder seats. After they're gone, the next tier is announced and won't be this cheap. The waitlist is always free — but founders ship first and get a vote on what we build next.