Skip to main content

The right technique, identified before you spend on the wrong one.

$5–8k  ·  fixed scope  ·  2 weeks

Most AI adaptation failures aren’t execution failures. They’re selection failures — the wrong technique applied to a problem that needed something different.

In two weeks, we’ll assess your pipeline, data, and failure modes across the full adaptation stack: retrieval architecture, fine-tuning approaches, preference optimization, model compression. The output is a written technique recommendation you own outright — rationale, evaluation plan, and compute cost projections included — whether you build with us, another team, or not at all.

Start with an Intro Call →

Thirty minutes. No cost, no commitment. We work under NDA.

By Philip Stevens — production ML specialist, previously at Agoda and Quantcast.

The diagnostic pays for itself in the first decision it prevents.

A RAG refactor that was actually a fine-tune problem. A LoRA adapter for a task that needed better retrieval. Teams routinely commit $25–60k and two months to the wrong approach — not because they’re careless, but because the right technique isn’t obvious without a systematic diagnosis.

Every AI services company has a specialty. Fine-tuning shops recommend fine-tuning. RAG consultancies recommend RAG. Prompt engineers recommend more prompting. None of them run the diagnostic first. The recommendation follows from their default, not from your data.

At $5–8k, the diagnostic costs less than one engineering month. It defines what to build, what it will cost, and what success looks like — before any build commitment is made. Execution scope and pricing come from the report, so you go into a build decision with full information, not a guess made before anyone’s seen your data.

Every failed attempt makes the next one harder to justify internally. The diagnostic is how you avoid the first failure.

A written report you own outright — regardless of what you do next.

After two weeks, you’ll have everything you need to make a fully informed build decision: what to build, why, what success looks like, and what it will cost.

01

Technique selection with supporting rationale

Not just a recommendation — the analysis that backs it. Why this technique for your specific data characteristics, quality requirements, and latency constraints. Why the alternatives were ruled out. A document your team can read, challenge, and act on without us in the room.

02

Root cause analysis of current failure modes

A systematic breakdown of where and why your current approach is failing — whether the bottleneck is in retrieval, model weights, pipeline design, or data quality. The diagnosis distinguishes between a retrieval problem solvable with chunking changes and a model-layer problem that requires training signal. These are not the same fix.

03

Evaluation plan with acceptance criteria

A concrete definition of “working” for your problem: the eval set structure, benchmark approach, and acceptance thresholds. Built before execution starts, so you know what you’re measuring toward — and can verify the build delivered it. No eval plan means no way to know if you’re done.

04

Compute cost and timeline projections

Specific estimates for your problem type and data volume: training compute, infrastructure requirements, and calendar time for execution. What the build will cost, how long it will take, and what resources it requires — so you can budget and plan before you commit, not after.

Start with an Intro Call →

30 minutes to assess whether the diagnostic is the right fit for your problem.

Two weeks. Three stages. One written deliverable.

01

Data and pipeline review

We review your training data, current pipeline architecture, and production failure patterns. We identify what’s failing, how often, and under what conditions. This establishes the failure profile before any technique is considered.

02

Full-stack diagnosis

We assess the failure pattern against the complete adaptation stack — retrieval hardening, LoRA/QLoRA, full fine-tune, DPO alignment, model compression. We determine which technique class the problem fits, where in the stack the fix lives, and what alternatives were considered and ruled out.

03

Written report

A structured document with: technique recommendation, decision rationale, alternatives considered, root cause analysis, evaluation plan with acceptance criteria, compute cost estimates, and build timeline. You own it outright. Take it to your internal team, another vendor, or use it to scope a Baseweight execution engagement — no obligation either way.

Common questions

The intro call (30 minutes, free) is a scoping conversation — we assess whether the diagnostic is the right next step and what it would cover for your specific situation. The diagnostic is two weeks of technical work: systematic failure analysis, stack-wide assessment, and a written deliverable. The call comes first; the diagnostic follows if it’s the right fit.

We’ll say so. If better prompting or a minor RAG adjustment closes the gap, that’s the recommendation — even if it means no further engagement. We’ve ended engagements at the diagnostic stage; those clients refer others precisely because we didn’t oversell. Recommending unnecessary work would undermine everything the diagnostic is for.

We work under NDA and can operate within your infrastructure. For the diagnostic, we need access to representative failure cases and pipeline architecture — not necessarily your full training set. Data handling requirements are agreed before any access begins.

Yes. But execution without a prior diagnosis is how teams end up retrofitting after a failed attempt. The diagnostic is the faster path to a working build — not a gatekeeping step. If you’ve already run a thorough internal diagnosis and know exactly what you need, we can scope execution directly on the intro call.

The diagnostic report becomes the scope document for execution. Execution pricing is defined in the report, so there are no surprises when you decide to proceed. The diagnostic fee is not credited toward execution — it’s a standalone deliverable with standalone value.

If you’re not sure which technique your domain data needs, that’s what the diagnostic is for.

Book an intro call to discuss your current setup, where performance is falling short, and whether the diagnostic is the right next step. Thirty minutes. No cost, no commitment.

Start with an Intro Call →

After you book, we’ll send a brief pre-call questionnaire so we make the most of 30 minutes.

Prefer email? phil@baseweight.co  ·  We work under NDA.

← Back to home