Research

Notes from the training lab.

Benchmark methodology, training postmortems, and hardware notes. No hype — working notes from the team.

2026-04-21 · 2 min read

Training a 14B LLM on a Tesla V100 in 2026

How we fine-tune Qwen3-14B on a single V100 16GB without renting H100s. Unsloth, LoRA, and the unglamorous bits nobody writes about.

2026-04-19 · 2 min read

The A-rate benchmark: how we grade crypto LLMs

Our 45-question crypto benchmark is graded on an A–F scale by two reviewers plus a rubric matcher. Here's exactly how, and why it's harder to game than multiple-choice.

2026-04-17 · 2 min read

DPO regressed our model. Here's what happened.

We trained DPO on top of a strong 87% A-rate SFT model and the result dropped to 78%. Full postmortem with logs, not marketing.

2026-04-15 · 2 min read

Open vs closed models on crypto Q&A: what the numbers actually say

Sovereign v2 (14B) beats GPT-4o and Claude on our 45-question crypto benchmark — but the interesting stories are in *where* and *why*.