HumanMachine
What's this?
FAQs9/2/2025· Last updated: 5/23/2026

How to Choose an AI Consultancy: A Buyer's Framework

FM Team
FM Team
FM
Article hero image

Most AI consultancy engagements that disappoint share a single root cause: the buyer didn't ask the right questions before signing. Every consultancy can show a polished deck and name-drop a few clients. The hard part is separating the firms that ship working systems from the firms that ship strategy memos and walk away.

This is a working buyer's framework. A scoring rubric, the red flags to watch for, and a small set of questions that reveal what kind of partner you're really hiring.

A scoring rubric for AI consultancies

Score each candidate from 1 (worst case) to 5 (best case) across nine dimensions. Anything under 30/45 total is a real concern. Anything over 38 is a strong fit.

Criterion1 / 5 looks like5 / 5 looks like
Discovery vs. delivery balance6–8 weeks of discovery before any code1–2 weeks of focused discovery, then a working prototype
What gets deliveredA strategy deck and recommendationsWorking software, deployed in your environment
Team seniorityA PM fronting offshore juniors you never meetSenior engineers doing the work directly, named on the contract
AI evaluation & quality"We'll test it before launch"Custom eval suites and structured logging built in from day one
Model choice & vendor neutralityLocked into one provider regardless of fitClaude, ChatGPT, Gemini evaluated per use case with clear rationale
Integration approachOne-off custom integrations for every toolMCP servers, reusable patterns, agents that reach into your systems cleanly
Code & IP ownershipLicensed platform you must keep paying to accessYou own every line of code, on your accounts, from day one
Ongoing maintenanceHand-off then unavailableOptional retainer OR a clean handoff with real documentation
Risk transparency"Nothing should go wrong if you follow our process"Names specific risks upfront with mitigation plans

Red flags to watch for

Any one of these alone isn't disqualifying. Three or more is.

  • "We're excited about AI." Excitement isn't capability. Ask for specifics.
  • Massive teams with unclear roles. Usually means you're paying for layered management.
  • AI as a buzzword. No specific tools, frameworks, or model names mentioned.
  • No mention of evaluation. If they can't tell you how they know the AI is working in production, they don't know either.
  • Vague code ownership. "We'll work that out in the SOW" is a no.
  • Hourly billing with no upper bound. Outcomes-based pricing aligns incentives. Hourly does the opposite.
  • No honest disqualifiers. A consultancy that says it's right for every problem is right for none.

Six questions that reveal posture

The scoring rubric covers what to look for. These six questions tell you who you're actually dealing with. Ask all of them in a single conversation and pay attention to whether the answers are specific, honest, and grounded in real work.

What did you ship last quarter that's running in production today?

This separates the firms that build from the firms that talk. A good answer names a specific system, what it does, and how the client uses it. A bad answer is generic ("we recently helped a Fortune 500 client streamline their operations") or pivots into a deck. If they can't show you something running with users on it, the rest of the conversation doesn't matter much.

Show me an evaluation suite you built for an AI system.

Production AI is non-deterministic. The difference between a demo and a production system is whether you know when it breaks. A consultancy that can show you actual eval code, test cases, and logging dashboards is doing the work. A consultancy that can't is shipping demos that haven't been pressure-tested in front of real users yet.

Walk me through a project that went sideways. What did you do?

The honesty test. Every consultancy has projects that struggled. The ones that pretend otherwise are the dangerous ones. Listen for specifics, root-cause analysis, and what they changed in their process afterward. A partner that's open about past failures will be open about risks on your project too.

What kind of work do you refuse to take on, and why?

A consultancy with no disqualifiers is a consultancy that says yes to everything for revenue. Listen for actual scope refusals — types of work, types of clients, types of engagements where they know they're not the right fit. The clearer the no, the more credible the yes.

Who specifically will write the code, and can I talk to them today?

This catches the bait-and-switch where senior people pitch and junior people build. The right answer is "yes, here they are, let's set up a call this week." If the answer is "we'll introduce you after you sign the SOW," the people on the call aren't the people on the project.

If we wanted to take this fully in-house after launch, what would that take?

Reveals whether they're building for handoff or for lock-in. A good partner answers concretely — documentation your engineers can actually use, training sessions during the engagement, decision documents explaining why the system was built the way it was, and clean handover of accounts and credentials. A partner that hedges, or who immediately steers the answer toward "most clients keep us on a retainer," may be building something you'll struggle to operate independently.

FAQ

How long should I spend evaluating AI consultancies?

For a 4–16 week engagement, two to four weeks of evaluation is reasonable. Talk to three partners minimum, score them against the rubric above, and ask each one a question you already know the answer to (to check whether they bluff or admit they don't know).

What's a fair price range to expect?

It depends on scope. As a rough frame: a focused 4–6 week AI adoption assessment usually runs in the low five figures. A 4–12 week agents and automations build usually runs in the mid five to low six figures. An 8–16 week custom software replatform usually runs in the mid-to-high six figures. Outcomes-based pricing should be the norm. Firms that quote hourly with no upper bound are a red flag.

Should I run an RFP?

RFPs are useful when you need to compare apples-to-apples on a well-scoped problem. They're counterproductive when you're still figuring out what to build — they reward partners who write good documents, not partners who build good software. For AI work, a paid two-week discovery engagement with one or two finalists usually tells you more than an RFP ever will.

Should I ask for references?

Yes. Ask each reference three specific questions: "What surprised you about working with this firm?", "What would you do differently?", and "Would you hire them again for a different project?" The third question is the most honest signal you'll get.

What if my team isn't technical enough to evaluate AI-specific answers?

Bring in an independent advisor for the evaluation conversations. A one-to-two hour consult with someone senior who has actually shipped AI systems will cost far less than picking the wrong consultancy.


If you're looking for an AI consultancy that delivers working systems, where senior people do the work, and you own everything that gets shipped — FM might be the right fit. Most engagements start with a 30-minute scoping call. No decks, no hard sell.

Share this article

Subscribe to our Newsletter

Get insights delivered to your inbox.

By subscribing you agree to our Privacy Policy.

Continue Reading

How Do Forward-Deployed Engineers Differ from Traditional Consultants?
FAQs

How Do Forward-Deployed Engineers Differ from Traditional Consultants?

Forward-deployed engineers work alongside your team to build solutions in real-time. The deliverable is solutions, not recommendations.

Brian Fletcher

Brian Fletcher

Principal, Co-founder @ FM