I was not in the office in Cambridge the day two deadly explosions ripped through the finish line of the Boston marathon. I was working from home, fielding calls from concerned employees.
For many Boston natives, one of the worst things about the bombing was the uncertainty:
They were watching the news coming in—but it was incomplete and inaccurate. They were not afraid for their lives. They just did not know what was going on. They could not believe that this terrible thing happened, whatever its exact scale and scope. They could not believe that this could have happened in their city.
Uncertainty, as it turns out, was at the very core of the tragedy. Tamerlan Tsernaev—the architect of the attack—should have been apprehended by authorities long before the bombing in April of 2013. In fact, he should have been stopped at the John F. Kennedy airport in July of 2012. Having known connections to violent Islamic extremists, Tsernaev was watchlisted. Thus, airport security had screened him against this list as he entered the country. Unfortunately, a spelling discrepancy on Tsernaev’s travel documentation and the name listed in the database confused the screening application that was in place at the time. Unable to cope with this kind of uncertainty, it incorrectly returned a no match.
Immediately after this was discovered, U.S. Customs and Border Protection (CBP) launched a project to improve their screening technology. The project produced a name screening application that used artificial intelligence (AI) to handle the challenges of uncertainty presented by name matching. While the approach addressed the issues CBP sought to solve, it had relevance beyond national security.
For the anti-money laundering space, the technology is particularly applicable to sanctions screening. In these applications, the capacity to accurately compare short strings of text—like the names of people, organizations and countries—is critical.
This article will provide an in-depth analysis into this innovative approach and a real-life blueprint for using AI to update sanctions screening technology.
What is in a name?
Comparing and matching names seems straightforward but few things are farther from the truth. People often have and can use nicknames, languages are hard, translators make mistakes and name spelling can be ambiguous.
There are over a dozen phenomena that will throw a wrench in any name-matching process. The trouble is that the matching technology under the hood of many sanctions screening applications is not equipped to deal with these unwieldy linguistic structures. Right now, many of these systems use relatively basic, rule-based approaches—like the list method—to perform this critical task.1
Solving the problems name matching poses requires a smart algorithm, one that can “understand” how similar one name is to another—it requires an AI-driven approach.
Solving name problems
Teaching a machine to recognize what makes one name “like” another works much like teaching a person to do the same task—through example.
When people recognize that one thing is similar to another, they do so because they have developed a mental model of how one pattern relates to another. For instance, one can only determine that two dogs are both dogs if they both share dog traits. Over time, people develop a mental model of “dog traits” through exposure to different animals and from those animals being identified by parents, friends and teachers.
Machine learning models are developed in a very similar manner. For a task like name matching, a system is exposed to hundreds or thousands of matching name pairs so it can infer the essential characteristics each pair shares.
Like a human, models trained in this manner can overcome the various challenges that wreak havoc on rule-based approaches. They can deal with typos, names in their original language and script, as well as names they have never seen before.
However, nothing is perfect. While machine learning is the key to coping with a wide range of problems from name comparison, its killer accuracy is not without drawbacks: namely, speed. A system that only uses machine learning can struggle to handle the demands of high transaction environments.
Something old, something new
The ideal solution is not just AI—it is a hybrid approach. This approach uses an old matching method, known as the common key approach, to perform a first pass to cut down the total number of match possibilities. The common key approach is fast. It also has a high recall, meaning that it preserves a high number of likely matches.
Once the candidate pool has been culled, the machine learning model is deployed, scoring name similarity based on its training. It was the discovery of this method, developed for the CBP with funding from In-Q-Tel, that put the finishing touches on the solution that is now used to screen the two million people entering and leaving the country every day.
The fact that this approach is the current best practice highlights the reality of current AI technology.2 The most effective machine learning applications are rarely (maybe never) whole cloth AI. Effective AI is the product of pragmatism—a reality that glossy marketing and elegant user interfaces often obscure.
The best of this generation of AI tools is a blend of classic and emerging technology because, quite often, the best way to fix an issue has already been discovered.
Teaching a machine to recognize what makes one name “like” another works much like teaching a person to do the same task— through example
While no technology can erase tragedy, innovation can help prevent future disaster. For national security agencies, such prevention is a prime directive but they are not alone in the fight.
Financial institutions (FIs) also play a vital role. Just as the CBP must safeguard national borders, FIs serve on the frontlines of financial systems, protecting borders from criminal abuse. To that end, it is incumbent on them to use every effective technology available. AI is one of these tools and government organizations like the CBP have paved the way.
This article outlined one area AI can integrate in the sanctions screening process, but this is not the only place where the host of technologies under the AI umbrella can be deployed. Active learning, semantic similarity and other recent developments have the potential to impact the way FIs manage sanctions risk deeply. The only real question is will FIs act quickly enough?
Steve Cohen, COO, Basis Technology, Cambridge, MA, USA, firstname.lastname@example.org
- This approach works by attempting to compile every version of a given name component and matches names against this list. While maintenance is not difficult, the list method is computationally intense, cannot recognize novel names and struggles with basic typos like missing or added spaces.
- “An Overview of Fuzzy Name Matching Techniques,” Rosette Text Analytics, https:// www.rosette.com/blog/overview-fuzzyname-matching-techniques/