Lisa Avvocato
By Lisa Avvocato, Sama
Chatbots aren’t new (the first one is celebrating its 58th birthday this year), but they are getting better and better. Uberall has found 80% of customers have had positive interactions with them. The problem is when a chatbot doesn’t help, it really doesn’t help, and the annoyance and frustration caused by an unsuccessful interaction may lead to unsatisfied customers and lost revenue, or, worst case, regulatory fines or legal costs in addition to reputation damage.
It can seem tempting to leverage Generative AI (GenAI) to shore up these issues – the technology is advancing rapidly — but GenAI comes with its own pitfalls. Financial institutions need to be mindful of these problems and the positive potential GenAI can bring to their chatbots to maximize the chances of success. (It may even be a matter of federal law.)
Issue 1: The chatbot doesn’t understand the customer
Ask anyone you know if they’ve ever had an experience with a chatbot that didn’t understand what they were asking for, and they will almost certainly have a story for you. In part, this is driven by older, rule-based chatbots that only “know” to look for a few answers worded in specific ways, like “help with account.” Say a customer asks, “How much money do I have in my checking account right now?” They could be unspecific, or they may use different words, or they may have a unique problem.
The chatbot doesn’t know what rule it needs to follow, so it defaults to running through the options again and frustrating someone so much that they may abandon the exercise entirely. These negative experiences can seriously color customer perceptions: it’s no surprise that 90% of people still prefer getting customer service from a human, and human service agents have an NPS that’s 72 points higher than customer service chatbots’.
The problem is 24/7 digital banking means a human isn’t always available, and customers have been trained to like its convenience. 75% of customers already prefer digital banking over going to a branch; 94% of digital banking users access services at least once a month. And 60% of consumers are less likely to remain customers if there’s no option to transfer to a live agent.
Using GenAI and large language models (LLMs) can help . As a whole, GenAI is capable of ingesting far more information in a single prompt, moving beyond the need to use three-word phrases for a customer to get what they want. LLMs further solve the specificity issue, because they’re designed to handle context and understand the myriad ways humans can express the same thing. (38% of customers find it annoying when a chatbot is unable to understand context.) Meanwhile, retrieval-augmented generation (RAG) can pull information from databases to help a model give more accurate responses — such as pulling a customer’s credit history, then determining what auto loan options may be available.
Read More : How to Embed Identity Verification Solutions Across Sectors
Issue 2: The chatbot makes up a (sometimes wildly) inaccurate response
Let’s take that auto loan example a bit further. Say your chatbot gives the customer information that sounds accurate, but doesn’t exist, like for a loan with too-good-to-be-true terms. This is called hallucination, and it’s still a big problem. Even top models still hallucinate around 2.5% of the time, and It’s such an issue that Anthropic’s major selling point for a recent update was that its models were now twice as likely to answer questions correctly. 2.5% seems like a relatively small risk, but Bank of America’s chatbot Erica has had about 1.8 billion interactions from 2018 to early 2024. If Erica hallucinates at the same rate, that would be 25,000,000 cases of wrong information.
Such a case can even lead to legal trouble. Air Canada’s chatbot gave someone the wrong policy, the customer believed the chatbot (they do sound quite authoritative), and then Air Canada refused to honor it until the courts ruled in the customer’s favor. Not all of these examples even make it to court, and the consequences can be far more expensive if a chatbot pre-approves the wrong person for a loan or gives incorrect terms for a mortgage. Fintech CEOs are already aware of this issue, but hallucination affects all GenAI models.
It is impractical to have a data set that’s so big that it handles every permutation of the English language. But a representative dataset, carefully annotated and labeled, can help mitigate the risk of hallucination, especially when paired with reinforcement, guardrails, and ongoing model validation to identify weak areas and improve training data. The idea is to start from a good foundation (so including information about how a model should avoid phrases to seem biased, for example) and keep the model constantly learning. Companies should be checking in on their chatbot regularly to make sure it’s not learning the wrong things, creating a feedback loop that helps the model be the best it can be.
Issue 3: The chatbot’s safeguards get breached
Those guardrails mentioned above lead to our third problem: chatbots revealing information they have access to but shouldn’t give out, causing a data breach. Malicious users have a number of different techniques to break a model. For example, at a hackathon last year, a participant obtained what looked like real Visa credit card numbers with just a few simple questions.
As of last year, a single data breach costs a bank $5.9 million on average, higher than the average across all surveyed industries. That’s before assessing the damage against consequences outlined in data privacy laws. The EU AI Act is designed to work in conjunction with the GDPR, for just one example.
Your IT team should be practicing or working with partners experienced in what’s called red teaming: intentionally trying to “break” a model using the same methods malicious users do, exposing vulnerabilities that can then be fixed. Performing red teaming may represent additional upfront costs, either to run it with your own R&D or by hiring a partner, but when compared to the cost of a singular data breach both in money and trust, it’s worth it. Aim to have a wide range of specialties on the team undertaking this exercise, especially those who have experience with red teaming. Good testing also means keeping up with the trends – AI keeps evolving, and the methods to break it do too.
Your chatbot may seem like a small piece of a larger whole, but its performance has the opportunity to make or break your relationship with your customer. Old problems like a chatbot not understanding a customer persist, and while GenAI solves that problem, it brings issues of its own like hallucination and exposure of sensitive data.
It’s possible to turn your chatbot from a liability to an asset — provided you have the right team and the right partners on your team as the model is developed. Making the most of GenAI means addressing its issues before they ever have a chance to color your customers’ perception of your brand.
Read More : GlobalFintechSeries Interview with John Sun, CEO at Spring Labs
[To share your insights with us, please write to psen@itechseries.com ]
