Build in publicMay 5, 20264 min read

Notes from the Build

What itstru does, what's in the pipeline, and what we're starting to measure.

By René Sauvé · Co-founder, itstru.ai

AI-drafted, human-edited— how this was written

An AI assistant produced the first draft from a structured outline. A human reviewed every claim, replaced or removed anything that couldn't be backed by a published source, and rewrote sections for clarity. Every external claim has a citation in the references list at the end of this post.

What itstru does

itstru takes a business idea and turns it into a validated product — market research with real citations, a business summary that stress-tests viability, a clickable prototype, and eventually a full-fledged product.

The AI drafts. A human reviews before anything ships. No subscription, no retainer.

What's in the pipeline

Most of the development time goes into the agents and the checks that run on what they produce. Here's what exists today.

Source tier classifier

Every URL cited in a research report gets classified A through D based on the publisher. Government data, SEC filings, named analyst firms — Tier A. Trade publications, peer-reviewed journals — Tier B. Company blogs, Q&A sites — Tier C. Forums, content farms — Tier D.

If the report doesn't hit a minimum share of A/B sources, the pipeline regenerates it. Anything from a D-tier source fails it outright.

It's a domain list and some thresholds. But it stops the AI from leaning on Reddit threads as primary evidence for market size — which it absolutely will do if you let it.

Claim verifier

For every numeric claim with a citation marker, the pipeline fetches the cited URL and checks whether the number actually appears on that page (within ±5% to handle formatting differences like "$42 billion" vs "42,300,000,000").

It catches mismatches — the AI says $42B, the source page says $4.2B. It's a sanity check on whether citations point to something real.

Prototype flow validator

The clickable prototype gets checked for three things: every link target exists, every screen is reachable from the entry point, every button is wired to something. If anything fails, the prototype goes through a targeted repair pass before it reaches review.

This is structural. It doesn't say the prototype is good. It says it isn't broken.

Human review

After automated checks pass, the pipeline suspends. Nothing ships without someone reading the research, clicking through the prototype, and approving it.

Starting to benchmark

I've been running projects through the pipeline and I'm starting to track aggregate numbers — things like compute time to human review, citation density, source tier distribution, and deliverable size.

I'm not publishing those yet. The sample size is small and I want to be confident the numbers are representative before putting them on a page. What I can say: the pipeline is fast, the reports are substantial, and the automated checks catch real problems before they reach me.

When I have enough data to feel good about it, I'll publish a benchmarks page with full methodology — how many runs, what counts as valid, what each number actually measures. No cherry-picking.

What I've learned so far

A few things that surprised me building this:

AI drafts are way better with constraints. The source-tier classifier was the single biggest quality jump. Once the AI knows low-quality sources will get rejected, it stops reaching for them in the first place.
Automated checks compound. Each validator I add makes the next pipeline run better without touching any prompts. The quality floor just rises.
The human review step matters more than I expected. Not because the AI gets things wrong often, but because "technically correct" and "useful for this specific founder" aren't the same thing.

Why this is exciting

AI is genuinely good at structured drafting work now. Not "replace your designer" good — more like "produce a first draft that a skilled person can refine in hours instead of days" good.

That changes the economics of validation. If you want to know whether a business idea has legs today, the options are: do it yourself with ChatGPT and hope, or hire an agency for $10K–$15K and wait six weeks. A pipeline with real checks and human review in the middle can deliver something better than the first option and faster than the second, at a fraction of either cost.

Token costs keep dropping. Model capabilities keep improving. The same pipeline produces better output every few months without changing a line of code. And then changing the code makes it better still.

That's the bet. More to come.