How to detect both credit card fraud and money laundering using graph and machine learning.
Are you using graph yet? If not, you’ve obviously not read my other blog posts. Go do that, then install yourself some graph goodness, then come back here. See you soon.
Okay, now that you’re a graph expert, we can go on to talk about the title of this article. As Andrew Ng points out in his lecture on applying a triplet loss function, it’s common in the deep learning literature for titles to be <domain of interest> inserted into either of the sequences “______ Net” or “Deep ______”. In that spirit, I was going to name this paper Fraud Net or Deep Fraud, but then I realized publishing a company blog post on Deep Fraud probably isn’t the best PR.
I digress. Let’s talk about fraud detection.
Figure 1: A graph including regular corporate financial transactions and fraudulent financial transactions.
What we really want is fraud prediction (and thence fraud prevention), right? Yes, but that’s in an upcoming blog post. Today we’ll talk about attributing certain a priori behavior to a binary class target, namely, a fraud/notfraud target. We’ll look at two ways of determining whether or not a certain entity has transacted a fraudulent activity or not, first using embeddings of a graph, and second using several topology metrics of a graph.
I know what you’re thinking: if the fraud has already been committed, who cares? According to this article, everyone should care. In 2015, the author claims, the cost of false positive fraud labelling was 118 billion dollars. That’s billion. With a “b.” The cost of the actual fraud cases was only $9 billion. Don’t get me wrong, nine billion smackeroos is quite a few, but it’s only seven percent of the total money lost. Incorrectly labeling transactions as fraudulent is worth as much as building a new international space station. Every year. So without further digression, let me show you how to save $118,000,000,000. (you’re welcome)
Scenario number one: you want to increase the accuracy of your credit card fraud analysis tool. First you’d organize your data into a graph, instantiating nodes as individual customers and merchants with node properties about their financial histories. You’d build edges representing financial transactions between these entities with node properties like timestamp of transaction and amount paid.
Now you have to embed the graph in a lower dimensional space so you can use a simple model to analyze it. Why not directly insert your graph into your model? Because the geometries aren’t compatible. If you’re into graph theory or differential geometry, read this to understand that last sentence about geometries. For those of us who aren’t into the theories of graph thingys or differential whosits, let’s just take it as an axiom that we need to embed our graph.
Here’s a graph in some serious need of embedding. As described above, the nodes represent people with credit cards and the merchants at whom they sling their plastic. Note the complex three dimensional structure and the large number of edges, representing financial transactions.
Figure 2: a graph before embedding. Nodes are credit card holders and merchants. Edges are financial transactions.
Embedding strategies abound; some are more popular than others for reasons outside the scope of this article. I’ll show two common ones in the image below, dimensionality reduction by principal component analysis and spectral embedding by eigenvalue decomposition.
Figure 3: the two dimensional embedding of the graph in Figure 2. The orange algorithm was PCA, the blue was spectral embedding.
Finally, we’re ready to build a model. Encoding the embedded graph for modeling is as simple as building feature vectors from the now flattened nodes. We include the node (entity) and edge (transaction) properties, but we also concatenate the embedded coordinate information from the image above. We then build a target vector (or matrix for a multi-class target regime) of our known labels, fire up our GPU, and watch it eat.
Scenario number two: you want to ferret out the money laundering organizations from your database of transactional records. This problem is an order of magnitude more interesting than analyzing individual transactional records; rather than looking at discrete samples, we’re interested in analyzing rings of financial interaction. This is the paradigm in which graph really shines.
Check out the image below. It’s a set of companies who interact financially. The colors are representative of their “community,” determined by an unsupervised learning algorithm. This discussion is getting dangerously close to secret-sauce territory, so I’ll leave it at that. The question is, are the yellow companies doing biz as usual, or is this yellow community really a money laundering ring?
Figure 4: a graph of companies colored by community. Are the yellow companies on the lower right really fronts for a money laundering ring?
Step one: Munge your data into the same graph structure defined in the section above.
Step two: Build a clever algorithm which extract subgraphs of interest (the colored communities in the image above), and calculates topology metrics for each community. “Topology metric” is a fancy name for descriptions of the geometry of the subgraph in question. For example, one popular topology metric is number-of-edges; in the yellow subgraph we have 23 edges. There are many such topology metrics, and we calculate several dozen of them for each subgraph.
Step three: Build a feature vector of these topology metrics for each subgraph. Concatenate the node properties in another secret-saucy type of way. An example implementation of this would be to calculate the average node properties of all nodes in the subgraph.
Step four: Build a target vector (or matrix for a multi-class target regime) of our known labels, fire up our GPU, and let that cookie bake.
These techniques depend heavily on what types of data are available and the structure of the entities these data describe. Implementation must be custom (or at least fit-to-order) for each bank/agency/investigator interested in performing this work. Using an off-the-shelf solution is likely to make the problem worse, but when correctly designed and deployed these techniques stand to save many billions of dollars per year.
P.S. Tune in next week for a more sophisticated version of this analysis using a graph-kernelized methodology called a graph convolutional network.
P.P.S. Fraud detection is a tractable problem without graph. Want to find fraudsters without re-munging your 100 PB database into yet another format? No problem. We do that, too.
Tell us what you need and one of our experts will get back to you.