Regtrace

A regression-first evaluation framework for LLM outputs.

About Regtrace

LLM quality degrades silently. A prompt change, a model update, a new feature, any of it can quietly break outputs that used to work.

Regtrace gives you golden sets, multi-dimensional scoring, and baseline comparison so you catch drift before your users do.

It is built on a single premise: the CLI is the product. Evaluation should be a version-controlled, reproducible pipeline step, not a dashboard you log into or a library you import. When a dashboard comes, it will be a viewer for data the CLI already produces, not a dependency you need to ship.

Comments

Marlon Martin Indie Hacker

2h ago

Hey App Builders PH! I built Regtrace because I was tired of discovering LLM quality regressions the hard way users telling me something felt "off." The existing tools fell into two buckets: Python-heavy libraries that don't fit in non-Python stacks (DeepEval, RAGAS), or cloud/SaaS platforms that require sending data to a third-party dashboard (LangSmith, Braintrust). I wanted something that runs in CI as a step in the pipeline (exit codes, JSON output), is a standalone binary (no pip install, no npm install, no Docker), and compares every run against a baseline automatically. So I built Regtrace. It's MIT licensed, has zero telemetry, and your data stays on your machine. Would love your feedback and questions!

Regtrace

About Regtrace

More in Developer Tools

UniSMS

Twift

QRlfy

Comments