UniSMS
The best SMS API, SMS Blast, SMS API in the Philippines.
A regression-first evaluation framework for LLM outputs.
LLM quality degrades silently. A prompt change, a model update, a new feature, any of it can quietly break outputs that used to work.
Regtrace gives you golden sets, multi-dimensional scoring, and baseline comparison so you catch drift before your users do.
It is built on a single premise: the CLI is the product. Evaluation should be a version-controlled, reproducible pipeline step, not a dashboard you log into or a library you import. When a dashboard comes, it will be a viewer for data the CLI already produces, not a dependency you need to ship.
Sign in to join the discussion.
Sign in to comment
Hey App Builders PH! I built Regtrace because I was tired of discovering LLM quality regressions the hard way users telling me something felt "off." The existing tools fell into two buckets: Python-heavy libraries that don't fit in non-Python stacks (DeepEval, RAGAS), or cloud/SaaS platforms that require sending data to a third-party dashboard (LangSmith, Braintrust). I wanted something that runs in CI as a step in the pipeline (exit codes, JSON output), is a standalone binary (no pip install, no npm install, no Docker), and compares every run against a baseline automatically. So I built Regtrace. It's MIT licensed, has zero telemetry, and your data stays on your machine. Would love your feedback and questions!