---
title: "How to Choose an AI Consultancy: A Buyer's Framework"
publishedAt: "2025-09-02"
dateModified: "2026-05-23"
author: "FM Team"
role: "FM"
headshot: "/images/fm_profile_logo.png"
category: "FAQs"
slug: "how-to-choose-an-ai-consultancy"
excerpt: "A practical framework for evaluating AI consultancies: the dimensions that actually matter, the red flags to watch for, and the questions that reveal what kind of partner you're really hiring."
image: "/images/posts/how-to-choose-an-ai-consultancy.png"
---

Most AI consultancy engagements that disappoint share a single root cause: the buyer didn't ask the right questions before signing. Every consultancy can show a polished deck and name-drop a few clients. The hard part is separating the firms that ship working systems from the firms that ship strategy memos and walk away.

This is a working buyer's framework. A scoring rubric, the red flags to watch for, and a small set of questions that reveal what kind of partner you're really hiring.

## A scoring rubric for AI consultancies

Score each candidate from 1 (worst case) to 5 (best case) across nine dimensions. Anything under 30/45 total is a real concern. Anything over 38 is a strong fit.

| Criterion | 1 / 5 looks like | 5 / 5 looks like |
| --- | --- | --- |
| **Discovery vs. delivery balance** | 6&ndash;8 weeks of discovery before any code | 1&ndash;2 weeks of focused discovery, then a working prototype |
| **What gets delivered** | A strategy deck and recommendations | Working software, deployed in your environment |
| **Team seniority** | A PM fronting offshore juniors you never meet | Senior engineers doing the work directly, named on the contract |
| **AI evaluation & quality** | "We'll test it before launch" | Custom eval suites and structured logging built in from day one |
| **Model choice & vendor neutrality** | Locked into one provider regardless of fit | Claude, ChatGPT, Gemini evaluated per use case with clear rationale |
| **Integration approach** | One-off custom integrations for every tool | MCP servers, reusable patterns, agents that reach into your systems cleanly |
| **Code & IP ownership** | Licensed platform you must keep paying to access | You own every line of code, on your accounts, from day one |
| **Ongoing maintenance** | Hand-off then unavailable | Optional retainer OR a clean handoff with real documentation |
| **Risk transparency** | "Nothing should go wrong if you follow our process" | Names specific risks upfront with mitigation plans |

## Red flags to watch for

Any one of these alone isn't disqualifying. Three or more is.

- **"We're excited about AI."** Excitement isn't capability. Ask for specifics.
- **Massive teams with unclear roles.** Usually means you're paying for layered management.
- **AI as a buzzword.** No specific tools, frameworks, or model names mentioned.
- **No mention of evaluation.** If they can't tell you how they know the AI is working in production, they don't know either.
- **Vague code ownership.** "We'll work that out in the SOW" is a no.
- **Hourly billing with no upper bound.** Outcomes-based pricing aligns incentives. Hourly does the opposite.
- **No honest disqualifiers.** A consultancy that says it's right for every problem is right for none.

## Six questions that reveal posture

The scoring rubric covers what to look for. These six questions tell you who you're actually dealing with. Ask all of them in a single conversation and pay attention to whether the answers are specific, honest, and grounded in real work.

<QuestionGrid>
  <QuestionCard question="What did you ship last quarter that's running in production today?">
    This separates the firms that build from the firms that talk. A good answer names a specific system, what it does, and how the client uses it. A bad answer is generic ("we recently helped a Fortune 500 client streamline their operations") or pivots into a deck. If they can't show you something running with users on it, the rest of the conversation doesn't matter much.
  </QuestionCard>

  <QuestionCard question="Show me an evaluation suite you built for an AI system.">
    Production AI is non-deterministic. The difference between a demo and a production system is whether you know when it breaks. A consultancy that can show you actual eval code, test cases, and logging dashboards is doing the work. A consultancy that can't is shipping demos that haven't been pressure-tested in front of real users yet.
  </QuestionCard>

  <QuestionCard question="Walk me through a project that went sideways. What did you do?">
    The honesty test. Every consultancy has projects that struggled. The ones that pretend otherwise are the dangerous ones. Listen for specifics, root-cause analysis, and what they changed in their process afterward. A partner that's open about past failures will be open about risks on your project too.
  </QuestionCard>

  <QuestionCard question="What kind of work do you refuse to take on, and why?">
    A consultancy with no disqualifiers is a consultancy that says yes to everything for revenue. Listen for actual scope refusals &mdash; types of work, types of clients, types of engagements where they know they're not the right fit. The clearer the no, the more credible the yes.
  </QuestionCard>

  <QuestionCard question="Who specifically will write the code, and can I talk to them today?">
    This catches the bait-and-switch where senior people pitch and junior people build. The right answer is "yes, here they are, let's set up a call this week." If the answer is "we'll introduce you after you sign the SOW," the people on the call aren't the people on the project.
  </QuestionCard>

  <QuestionCard question="If we wanted to take this fully in-house after launch, what would that take?">
    Reveals whether they're building for handoff or for lock-in. A good partner answers concretely &mdash; documentation your engineers can actually use, training sessions during the engagement, decision documents explaining why the system was built the way it was, and clean handover of accounts and credentials. A partner that hedges, or who immediately steers the answer toward "most clients keep us on a retainer," may be building something you'll struggle to operate independently.
  </QuestionCard>
</QuestionGrid>

## FAQ

### How long should I spend evaluating AI consultancies?

For a 4&ndash;16 week engagement, two to four weeks of evaluation is reasonable. Talk to three partners minimum, score them against the rubric above, and ask each one a question you already know the answer to (to check whether they bluff or admit they don't know).

### What's a fair price range to expect?

It depends on scope. As a rough frame: a focused 4&ndash;6 week AI adoption assessment usually runs in the low five figures. A 4&ndash;12 week agents and automations build usually runs in the mid five to low six figures. An 8&ndash;16 week custom software replatform usually runs in the mid-to-high six figures. Outcomes-based pricing should be the norm. Firms that quote hourly with no upper bound are a red flag.

### Should I run an RFP?

RFPs are useful when you need to compare apples-to-apples on a well-scoped problem. They're counterproductive when you're still figuring out what to build &mdash; they reward partners who write good documents, not partners who build good software. For AI work, a paid two-week discovery engagement with one or two finalists usually tells you more than an RFP ever will.

### Should I ask for references?

Yes. Ask each reference three specific questions: "What surprised you about working with this firm?", "What would you do differently?", and "Would you hire them again for a different project?" The third question is the most honest signal you'll get.

### What if my team isn't technical enough to evaluate AI-specific answers?

Bring in an independent advisor for the evaluation conversations. A one-to-two hour consult with someone senior who has actually shipped AI systems will cost far less than picking the wrong consultancy.

---

If you're looking for an AI consultancy that delivers working systems, where senior people do the work, and you own everything that gets shipped &mdash; [FM](/solutions) might be the right fit. Most engagements start with a 30-minute scoping call. No decks, no hard sell.