Why LLMs Cannot Replace the Voice of the Customer

If you work in banking or financial services, you have probably heard some version of this claim already: “AI can tell us what customers want.” It sounds efficient. It sounds modern. It also sounds suspiciously like the kind of sentence people say right before making a very expensive mistake.

Large language models are useful. They can summarize interviews, spot themes in open text, draft questionnaires, and help research teams move faster. Financial firms are adopting AI at pace too. In the Bank of England and FCA’s 2024 survey, 75% of firms said they were already using AI, and the top perceived benefits included data and analytical insight, anti-money laundering, fraud, and cybersecurity. At the same time, firms ranked data privacy and protection, data quality, data security, and data bias among the top current risks, with data protection and privacy also cited as the biggest regulatory constraint.

That is exactly why this matters: in regulated industries, speed is useful, but false confidence is expensive.

LLMs can support customer insight. They cannot replace the voice of the customer. They cannot replace the judgment that comes from speaking to customers, spotting discomfort in a process, or realising that a “small” complaint is actually a sign of a much larger trust issue. When banks and financial institutions rely too heavily on model-generated assumptions instead of real customer conversations, they risk designing for averages, not people.

LLMs are pattern engines, not lived experience

An LLM does not know your customer. It predicts the next likely word based on patterns in data. That can make it seem fluent, informed, even persuasive. But fluency is not understanding. NIST’s Generative AI Profile is blunt about this in the careful, bureaucratic way only a standards body can manage: organisations should document a system’s knowledge limits, define how outputs are to be overseen by humans, compare outputs to ground-truth data, deploy fact-checking, and involve end users, practitioners, and domain experts in prototyping and testing. NIST also explicitly recommends obtaining input from stakeholder communities and using direct feedback from affected communities to monitor and improve outputs.

That is why it matters: customer insight is not just a text problem. It is a context problem.

A bank customer saying, “the app is fine,” may actually mean, “I only use it when I have to.” A distributor saying, “the turnaround time is manageable,” may actually mean, “we have normalised poor service because the alternative is worse.” Those signals emerge in specific questions, follow-up, comparison, and contradiction. They rarely emerge from a model confidently generating a plausible answer from training data.

The blunders are not edge cases. They are warnings.

If anyone still thinks LLMs can safely stand in for real customer understanding, the last year has provided enough cautionary material to retire that fantasy. The problem is not that these systems occasionally make mistakes. The problem is that they make mistakes confidently, in ways that sound polished enough to be trusted by busy teams, customers, and even professionals. That is a bad combination in any industry, and a particularly dangerous one in regulated environments like banking and financial services.

In April 2025, Cursor’s own AI support bot invented a company policy that did not exist. Users who were unexpectedly logged out across devices were told by the bot that this was expected under a new usage rule. It was not. Cursor’s co-founder later apologised publicly and clarified that there was no such policy. That is a useful reminder that an LLM can sound like authoritative customer support while simply making things up. Replace “login policy” with “complaints handling,” “eligibility,” or “distributor onboarding” and the risk becomes very real, very quickly.

The same pattern showed up in consumer finance. In late 2025, Which? tested six major AI tools on common consumer questions spanning finance, legal matters, health, travel, and rights. The results were not exactly reassuring. ChatGPT and Copilot both failed to spot that a user asking about a “£25k annual ISA allowance” was already over the legal limit, because the actual allowance is £20,000. Which? also found poor guidance on tax refunds, flight compensation, travel insurance, and contract rights, and concluded that some of the answers were inaccurate, unclear, or risky if followed. For professionals in banks and financial institutions, this is the key point: even when the model sounds fluent, it may still miss the rule, the exception, or the local context that matters most.

Legal and professional settings have not been spared either. In May 2025, lawyers for Anthropic told a California court that an incorrect footnote in an expert report had been caused by an AI hallucination, after Claude produced a fake article title and wrong authors. A few weeks later, London’s High Court warned that lawyers using AI to cite non-existent cases could face contempt proceedings or even criminal consequences in the most serious situations. These are not toy examples. They show what happens when people confuse a plausible answer with a verified one.

The problem is broader than a few embarrassing anecdotes. In October 2025, Reuters reported on new research from the European Broadcasting Union and the BBC covering 3,000 responses from leading AI assistants across 14 languages. The study found that 45% of responses contained at least one significant issue, a third had serious sourcing problems, and 20% contained accuracy issues such as outdated or wrong information. In other words, even when the answer looks neat and complete, there is still a meaningful chance that the foundations are shaky.

That is why customer insight still begins with people.

Modern AI remains heavily dependent on human oversight, human feedback, and human judgment. Use AI to summarise interviews, cluster open-text responses, or accelerate survey drafting. But do not mistake that support role for genuine customer understanding. In sectors like banking, trust is built by speaking to customers directly, hearing what they actually mean, and then using technology to sharpen the signal, not invent it.

These examples are not evidence that AI is useless. They are evidence that LLMs are not a substitute for customer truth, domain judgment, and accountability. They can be wrong in ways that are polished, immediate, and convincing. Which, professionally speaking, is a terrible combination.

Why direct customer input still outperforms synthetic assumptions

The voice of the customer is not just about collecting responses. It is about understanding meaning. That is why direct research remains one of the highest-return activities for growth-oriented organisations.

When you talk to customers directly, you learn things that no model can safely infer on its own:

You learn what customers are trying to achieve, not just what they clicked.
You learn which words they use to describe trust, delay, friction, fairness, or value.
You learn what they would never bother typing into a generic feedback form.
You learn where internal assumptions are wrong.
You also learn what not to build. That alone can save more money than most software budgets.

For banks and financial institutions, this is especially important. In financial services, a product can be technically sound and commercially attractive and still fail because it creates anxiety, confusion, or perceived unfairness. Direct conversations uncover these emotional and practical barriers earlier. Enterprise surveys help you measure them at scale, but the initial insight often comes from conversation first.

This is why the best customer insight functions do not choose between interviews and surveys. They use both. They talk to customers to understand the problem, then use enterprise surveys to size the issue, quantify trade-offs, and prioritise action. For surveys for banks and financial institutions, that combination is still the gold standard.

Even modern AI depends on humans talking to it

There is another irony here. The modern AI stack itself still depends heavily on human input.

Many risks originate from human behaviour and from human-AI interactions, not just from the model alone. The report also recommends structured public feedback, user surveys, human oversight, stakeholder feedback, and testing in real-world scenarios. In other words, even the systems built to automate language still require people to define acceptable use, identify risk, verify truth, and explain context.

That is not a weakness. It is reality.

AI can help you process the voice of the customer faster. It cannot create a trustworthy voice of the customer out of thin air. If nobody is speaking to customers, the system is left predicting from old language, incomplete records, public web patterns, and whatever bias made it into the data in the first place. That is not customer understanding. That is autocomplete wearing a suit.

Some of the best products came from listening hard enough to change direction

Slack is one of the clearest examples. Stewart Butterfield’s company Tiny Speck started as a gaming company building Glitch. When the game failed, the company pivoted to the internal communication tool it had built along the way, which became Slack. That pivot turned a failed game effort into one of the defining enterprise software products of its era.

Instagram’s founders also began with something else. Burbn was originally a location-based app with multiple features. The team saw that the photo-sharing behaviour was what people really cared about and stripped the product back to that core. Instagram was the result.

Airbnb did not start scaling by sitting in a room inventing synthetic personas. Brian Chesky famously went door-to-door, met hosts in person, photographed their homes, and learned directly what they liked and disliked about the product. That kind of close contact is painstaking, yes. It is also how category-shaping products become useful instead of merely clever.

Intuit has institutionalised this discipline. Its “Design for Delight” approach is explicitly customer-centred, built on understanding customer problems before jumping to solutions, with methods such as deep customer empathy and “follow-me-home” observation.

The common thread is not “AI first.” It is “customer first.”

What this means for you

For teams responsible for customer insight in financial institutions, the implication is clear.

Use AI, but do not outsource customer understanding to it.
Use interviews to uncover motivations, fears, unmet needs, and language.
Use enterprise surveys to validate those insights at scale.
Use open source surveys and enterprise survey platforms that give you control over design, data, deployment, and brand experience, especially if you are operating in a regulated environment.

For surveys for banks and financial institutions, trust matters twice: once in the survey itself, and once in how the data is handled. A generic, poorly branded survey on a third-party-looking link can reduce response quality before the first question is even answered. A weak privacy model can make the research team’s life harder than the fieldwork itself.

There’s always a way

And that is where OpenSurveyCraft fits in.

OpenSurveyCraft is designed to help organisations understand customers without compromising on brand, privacy, or security. It supports a more thoughtful approach to enterprise surveys by making surveys feel like a true extension of the brand rather than a disconnected vendor form. It also supports self-hosted deployment and stronger control over customer data, which is especially relevant for banks, regulated firms, and any organisation that takes customer data privacy seriously.

In other words, it is built for teams that want better customer insight without surrendering control of the customer experience.

LLMs will continue to improve. They will become more useful across analysis, operations, and research workflows. But the organisations that grow most consistently will still be the ones that speak to customers directly, listen carefully, and use technology to sharpen that understanding rather than replace it.

That part is not old-fashioned. It is just good business.

Stay tuned for more from us and if you want to know how we can help you understand your customers better, reach out at contact@opensurveycraft.com.