Don’t Be Scared of the Data


Beyond traditional sources of data lie valuable alternative sources. Bringing these various data sources together to develop a hypothesis for an investigation may seem intimidating at first, but in her session at the 30th Annual ACFE Global Fraud Conference titled, “Leveraging Traditional and Alternative Data in Investigations,” Lacey Keller, data science managing director at Gryphon Strategies, demonstrated a common-sense approach to gathering and shaping data. The goal of doing so is to create a narrative that those outside of an investigation can easily understand.

First, Keller assured attendees that it didn’t matter what their skill level was because everyone has to start from somewhere. She shared a story about her beginning. “I wasn’t doing a lot of quantitative analytics how I got my start was on the job.” Her first time creating a bit of code happened when she was a summer intern. She developed a small piece of data scraping code out of “laziness,” she shared with the crowd. “If you find yourself doing the same task over and over again, maybe you should automate it.” To demonstrate this concept, she shared the story of an Airbnb investigation.

When Keller worked for New York’s Office of the Attorney General, after receiving many complaints, they were tasked with investigating claims that many of Airbnb’s listings were illegal. The story ended up in The New York Times, but Keller took attendees behind the scenes, explaining how they gathered the data for the AG’s case against Airbnb.

“We’re getting tons of complaints at the AG’s office about illegal conversions, hotels, full apartment buildings being taken down as hotels … so we set out to analyze that information to see what’s going on under the hood.” After a tussle with Airbnb, the courts ruled that Airbnb had to hand the data over to the AG’s office. The information in the data was hashed — in other words, anonymized — but they got host IDs, listing addresses, reservation data and other pertinent information.

Most of the hosts were exactly what Airbnb had advertised — normal people leasing out their homes for a bit of extra income. But when Keller and her team started digging into the outliers, they could pinpoint a small number of hosts with massive amounts of revenue. “I see this … a few people making hundreds of thousands of dollars, if not millions, in a few short years.”

She then began plugging the addresses of these high-revenue listings into a map of New York. She mashed that up with New York City building information so she could also figure out how many units there were in these listings. This started to raise some concerning red flags. “You see 3,000 reservations going on in a six-unit building? That’s some serious volume.” After they were able to map it out and demonstrate the story of these few hosts who were abusing the system, this led to Airbnb changing their practices.

“But it was fun. We were just combining public data with subpoenaed data … and I did all this in Microsoft Word and Excel.” She told attendees they could do the exact same thing today with the laptops and Excel documents they had with them in the session.

Keller emphasized that the information is out there for everyone to learn how to use data in this way. Fraud examiners will encounter an everchanging data landscape, but it’s exciting that datasets and tools that did not exist even a week ago might exist the next time they pick up an investigation. “All of my training has come from Googling problems,” she said. “The internet always has answers.” Get out there, Keller urged. Ask your questions and don’t be scared of the data.