+ All Categories
Home > Technology > Fraud Detection Class Slides

Fraud Detection Class Slides

Date post: 11-Apr-2017
Category:
Upload: max-de-marzi
View: 174 times
Download: 1 times
Share this document with a friend
135
Graphs in Fraud Detection Max De Marzi Field Engineer, Neo4j @maxdemarzi
Transcript

GraphsinFraudDetection

MaxDeMarziFieldEngineer,Neo4j@maxdemarzi

AboutMe

• MaxDeMarzi-Neo4jFieldEngineer

• MyBlog:http://maxdemarzi.com• FindmeonTwitter:@maxdemarzi• Emailme:[email protected]• GitHub:http://github.com/maxdemarzi

Overview

TypesofFraud• CreditCardFraud• First-PartyFraud• SyntheticIdentitiesandFraudRings• InsuranceFraud

TypesofAnalysis• TraditionalAnalysis• Graph-BasedAnalysis

FraudDetectionandPreventionCommonQuestions

…butbeforewegetintothat…

• Whatisn’tFraud?

Idon’tknow,butIknowwhodoes

• AlexBeutel,CMU• LemanAkoglu,StonyBrook• ChristosFaloutsos,CMU

• Graph-BasedUserBehaviorModeling:FromPredictiontoFraudDetection

• http://www.cs.cmu.edu/~abeutel/kdd2015_tutorial/

UserBehaviorChallenges

• Howcanweunderstandnormaluserbehavior?

UserBehaviorChallenges

• Howcanweunderstandnormaluserbehavior?

• Howcanwefindsuspiciousbehavior?

UserBehaviorChallenges

• Howcanweunderstandnormaluserbehavior?

• Howcanwefindsuspiciousbehavior?

• Howcanwedistinguishthetwo?

Users

DoesyourlittlegirllikeRambo?

Personalization

UnderstandingourUsers

• Whatdoweknowaboutthem?

Demographics:Age

Demographics:Gender

UnderstandingourUsers

MATCH(u:User)-[r:RATED]->(m:Movie) RETURNu.gender,u.age,COUNT(DISTINCTu)ASuser_cnt,COUNT(DISTINCTm)ASmov_cnt, COUNT(r)ASrtg_cnt

UnderstandingourUsers

UnderstandingourUsers

MATCH(me:User{id:1})-[r1:RATED]->(m:Movie)<-[r2:RATED]-(similar_users:User) WHEREABS(r1.stars-r2.stars)<=1 RETURNsimilar_users.gender, similar_users.age, COUNT(DISTINCTsimilar_users)ASuser_cnt, COUNT(r2)ASrtg_cnt

UnderstandingourUsers

LittleGirlslikeMoviesotherLittleGirlsLike

LittleGirlslikeMoviesotherLittleGirlsLike

WhatdoLittleGirlsLike?

MATCH(u:User)-[r:RATED]->(m:Movie) WHEREu.age=1ANDu.gender="F"ANDr.stars>3RETURNm.title,COUNT(r)AScntORDERBYcntDESC LIMIT10

WhatdoLittleGirlsLike?

WhatdoMen25-34Like?

MATCH(u:User)-[r:RATED]->(m:Movie) WHEREu.age=25ANDu.gender="M"ANDr.stars>3RETURNm.title,COUNT(r)AScntORDERBYcntDESC LIMIT10

WhatdoMen25-34Like?

Modeling“Normal”Behavior

• PredictEdges(SimilarUsers)

Modeling“Normal”Behavior

• PredictEdges(MoviesIshouldWatch)

RecommendationEnginewithNeo4j

Recommendation

ContentBasedRecommendations

• Step1:CollectItemCharacteristics• Step2:FindsimilarItems• Step3:RecommendSimilarItems

• Example:SimilarMovieGenres

ThereismoretolifethanRomanticZombie-coms

CollaborativeFilteringRecommendations

• Step1:CollectUserBehavior• Step2:FindsimilarUsers• Step3:RecommendBehaviortakenbysimilarusers

• Example:Peoplewithsimilarmusicaltastes

Youaresooriginal!

UsingRelationshipsforRecommendations

Content-basedfilteringRecommenditemsbasedonwhatusershavelikedinthepast

Collaborativefiltering Predictwhatuserslikebasedonthesimilarityoftheirbehaviors,activitiesandpreferencestoothers

Movie

Person

Person

RATED

SIMILARITY

rating:7

value:.92

HybridRecommendations

• Combinethetwoforbetterresults

• LikePeanutButterandJelly

HelloWorldRecommendation

HelloWorldRecommendation

X

MovieDataModel

CypherQuery:MovieRecommendation

MATCH(watched:Movie{title:"ToyStory”})<-[r1:RATED]-()-[r2:RATED]->(unseen:Movie)WHEREr1.rating>7ANDr2.rating>7ANDwatched.genres=unseen.genresANDNOT((:Person{username:”maxdemarzi"})-[:RATED]->(unseen))RETURNunseen.title,COUNT(*)ORDERBYCOUNT(*)DESCLIMIT25

WhataretheTop25Movies• thatIhaven'tseen• withthesamegenresasToyStory• givenhighratings• bypeoplewholikedToyStory

Let’stryk-nearestneighbors(k-NN)

CosineSimilarity

CypherQuery:RatingsofTwoUsers

MATCH(p1:Person{name:'MichaelSherman’})-[r1:RATED]->(m:Movie),(p2:Person{name:'MichaelHunger’})-[r2:RATED]->(m:Movie)RETURNm.nameASMovie,r1.ratingAS`M.Sherman'sRating`,r2.ratingAS`M.Hunger'sRating`

WhataretheMoviesthese2usershavebothrated

CypherQuery:RatingsofTwoUsersCalculatingCosineSimilarity

CypherQuery:CosineSimilarity

MATCH(p1:Person)-[x:RATED]->(m:Movie)<-[y:RATED]-(p2:Person)WITHSUM(x.rating*y.rating)ASxyDotProduct,SQRT(REDUCE(xDot=0.0,aINCOLLECT(x.rating)|xDot+a^2))ASxLength,SQRT(REDUCE(yDot=0.0,bINCOLLECT(y.rating)|yDot+b^2))ASyLength,p1,p2MERGE(p1)-[s:SIMILARITY]-(p2)SETs.similarity=xyDotProduct/(xLength*yLength)

CalculateitforallPersonnodeswithatleastoneMoviebetweenthem

MovieDataModel(v2)

CypherQuery:Yournearestneighbors

MATCH(p1:Person{name:'GraceAndrews’})-[s:SIMILARITY]-(p2:Person)WITHp2,s.scoreASsimRETURNp2.nameASNeighbor,simASSimilarityORDERBYsimDESCLIMIT5

Whoarethe• top5Personsandtheirsimilarityscore• orderedbysimilarityindescendingorder• forGraceAndrews

Yournearestneighbors

CypherQuery:k-NNRecommendation

MATCH(m:Movie)<-[r:RATED]-(b:Person)-[s:SIMILARITY]-(p:Person{name:'ZoltanVarju'})WHERENOT((p)-[:RATED]->(m))WITHm,s.similarityASsimilarity,r.ratingASratingORDERBYm.name,similarityDESCWITHm.nameASmovie,COLLECT(rating)[0..3]ASratingsWITHmovie,REDUCE(s=0,iINratings|s+i)*1.0/LENGTH(ratings)ASrecommendationORDERBYrecommendationDESCRETURNmovie,recommendation LIMIT25

WhataretheTop25Movies• thatZoltanVarjuhasnotseen• usingtheaveragerating• bymytop3neighbors

Modeling“Normal”Behavior

• PredictEdges• PredictNodeAttributes(Age,Gender,etc)

Age:35

Age:?

Modeling“Normal”Behavior

• PredictEdges• PredictNodeAttributes• PredictEdgeAttributes(Rating)

WhatRatingshouldIgive101Dalmatians?

MATCH(me:User{id:1})-[r1:RATED]->(m:Movie) <-[r2:RATED]-(:User)-[r3:RATED]-> (m2:Movie{title:”101Dalmatians”})WHEREABS(r1.stars-r2.stars)<=1 RETURNAVG(r3.stars)

Modeling“Normal”Behavior

• PredictEdges• PredictNodeAttributes• PredictEdgeAttributes• ClusteringandCommunityDetection

PredictaStarRatingpurelyonDemographics

MATCH(u:User)-[r:RATED]->(m:Movie{title:”ToyStory”})WHEREu.age=1ANDu.gender="F"RETURNAVG(r.stars)

Modeling“Normal”Behavior

• PredictEdges• PredictNodeAttributes• PredictEdgeAttributes• ClusteringandCommunityDetection

• FraudDetection

TwoSidesoftheSameCoin

Recommendations• Addtherelationshipthatdoesnotexist

FraudDetection• Findtherelationshipsthatshouldnotexist

ModelingUserBehavior

• Modelingnormalusersanddetectinganomaliesaretwosidesofunderstandinguserbehavior

ModelingUserBehavior

• Modelingnormalusersanddetectinganomaliesaretwosidesofunderstandinguserbehavior

• RoughModelofnormalvsoutlier

ModelingUserBehavior

• Modelingnormalusersanddetectinganomaliesaretwosidesofunderstandinguserbehavior.

• Finegrainedmodelscanfindmoresubtleoutliers

ModelingUserBehavior

• Modelingnormalusersanddetectinganomaliesaretwosidesofunderstandinguserbehavior

• Complexmodelscancapturenormalandabnormalpatterns

ModelingUserBehavior

• Modelingnormalusersanddetectinganomaliesaretwosidesofunderstandinguserbehavior

• Knownfraudulentpatternscanbesearchedfordirectly

CrossReference

FindtheNodes

ArrayList<Node>nodes=newArrayList<Node>();nodes.add(db.findNode(Labels.CC,“number”,card));nodes.add(db.findNode(Labels.Phone,“number”,phone));nodes.add(db.findNode(Labels.Email,“address”,address));nodes.add(db.findNode(Labels.IP,“address”,ip));

AddtheCrossesfor(Nodenode:nodes){HashMap<String,AtomicInteger>crosses=newHashMap<String,AtomicInteger>();crosses.put("CCs",newAtomicInteger(0));crosses.put("Phones",newAtomicInteger(0));crosses.put("Emails",newAtomicInteger(0));crosses.put("IPs",newAtomicInteger(0));for(Relationshiprelationship:node.getRelationships(RELATED,Direction.BOTH)){Nodething=relationship.getOtherNode(node);Stringtype=thing.getLabels().iterator().next().name()+"s";crosses.get(type).getAndIncrement();}results.add(crosses);}

ExamineResults

[{"ips":4,"emails":7,"ccs":0,"phones":4},--ccreturned4ips,7emails,and3phones.{"ips":1,"emails":1,"ccs":1,"phones":0},--phonereturnedjust1itemforeachcrossreferencecheck.{"ips":2,"emails":0,"ccs":4,"phones":3},--emailreturned2ips,4creditcardsand3phones.{"ips":0,"emails":1,"ccs":3,"phones":2}]--ipreturned3creditcardsand2phones.

Whatisasubgraph?KDD 2015 2

Subgraphs

Whatisasubgraph?KDD 2015 3

• ASubsetofnodesandtheedgesbetweenthem

Whataresomeusefulsubgraphs?

Largest dense subgraph (Greatest average degree)

Whataresomeusefulsubgraphs?

E

Ego-network: the subgraph

among a node and

its neighbors

Whataresomeusefulsubgraphs?

Graph queries: find subgraphs of particular pattern

Whataresomeusefulsubgraphs?

Graph queries: find subgraphs of particular pattern

MATCH(a)--(b)--(c)--(a) RETURN*

Whataresomeusefulsubgraphs?

Graph queries: find subgraphs of particular pattern

Whataresomeusefulsubgraphs?

Graph queries: find subgraphs of particular pattern

Whataresomeusefulsubgraphs?

Graph queries: find subgraphs of particular pattern

Whataresomeusefulsubgraphs?

Graph queries: find subgraphs of particular pattern

Whataresomeusefulsubgraphs?

Graph queries: find subgraphs of particular pattern

MATCH(a)—(b)—(c)—(d)—(a)—(c),(d)—(b) RETURN*

GraphsasMatrices

ClusteringgivesClarity

Link

Ego-netPatterns

Ego-netPatterns

Ni: number of neighbors of ego i

Ei: number of edges in egonet i

Wi: total weight of egonet i

λw,i: principal eigenvalue of the weighted adjacency matrix of egonet i

PowerLawDensity

slope=2

slope=1

slope=1.35

PowerLawWeight

PowerLawEigenvalue

FindGroupswithinEgo-Nets

First-PartyFraud

First-PartyFraud

• Fraudster’saim:applyforlinesofcredit,actnormally,extendcredit,then…runoffwithit

• FabricateanetworkofsyntheticIDs,aggregatesmallerlinesofcreditintosubstantialvalue

• Oftenahiddenproblemsinceonlybanksarehit• Whereasthird-partyfraudinvolvescustomerswhoseidentitiesarestolen• Moreonthatlater…

Sowhat?

• $10’sbillionslostbybankseveryyear• 25%ofthetotalconsumercreditwrite-offsintheUSA• Around20%ofunsecuredbaddebtinE.U.andN.A.ismisclassified

• Inrealityitisfirst-partyfraud

FraudRing

Thenthefraudhappens…

• Revolvingdoorsstrategy• Moneymovesfromaccounttoaccounttoprovidelegitimatetransactionhistory

• Banksdulyincreasecreditlines• Observedresponsiblecreditbehavior

• Fraudstersmaxoutalllinesofcreditandthenbustout

…andtheBankloses

• Collectionsprocessensues• Realaddressesarevisited• FraudstersdenyallknowledgeofsyntheticIDs• Bankwritesoffdebt

• Twofraudsterscaneasilyrackup$80k• Wellorganizedcrimeringscanrackupmanytimesthat

DiscreteAnalysisFailstopredict…

…andMakesitHardtoReact

• Whenthebustoutstartstohappen,howdoyouknowwhattocancel?• Andhowdoyoudoitfasterthenthefraudstertolimityourlosses?

• Agraph,that’show!

ProbablyNon-FraudulentCohabiters

ProbableCohabitersQuery

MATCH (p1:Person)-[:HOLDS|LIVES_AT*]->() <-[:HOLDS|LIVES_AT*]-(p2:Person) WHERE p1 <> p2 RETURN DISTINCT p1

Dodgy-LookingChain

RiskyPeople

MATCH (p1:Person)-[:HOLDS|LIVES_AT]->() <-[:HOLDS|LIVES_AT]-(p2:Person) -[:HOLDS|LIVES_AT]->() <-[:HOLDS|LIVES_AT]-(p3:Person) WHERE p1 <> p2 AND p2 <> p3 AND p3 <> p1 WITH collect (p1.name) + collect(p2.name) + collect(p3.name) AS names UNWIND names AS fraudster RETURN DISTINCT fraudster

Prettyquick…

Number of people: [5163] Number of fraudsters: [40] Time taken: [100] ms

Localizethefocus

MATCH (p1:Person {name:'Sol'})-[:HOLDS|LIVES_AT]-()…

Number of fraudsters: [5] Time taken: [13] ms

Stop a bust-out in ms.

QuicklyRevokeCardsinBust-Out

MATCH (p1:Person)-[:HOLDS|LIVES_AT]->() <-[:HOLDS|LIVES_AT]-(p2:Person) -[:HOLDS|LIVES_AT]->() <-[:HOLDS|LIVES_AT]-(p3:Person) WHERE p1 <> p2 AND p2 <> p3 AND p3 <> p1 WITH collect (p1) + collect(p2)+ collect(p3) AS names UNWIND names AS fraudster MATCH (fraudster)-[o:OWNS]->(card:CreditCard) DELETE o, card

AutoFraud

Whiplash

http://georgia-clinic.com/blog/wp-content/uploads/2013/10/whiplash.jpg

WhiplashforCash

http://georgia-clinic.com/blog/wp-content/uploads/2013/10/whiplash.jpg http://cdn2.holytaco.com/wp-content/uploads/2012/06/lottery-winner.jpg

Whiplash for Cash Example

Accidents

Cars

Doctor Attorney

People

DrivesIs Passenger

Drivers PassengersWitnesses

Risk

• $80,000,000,000annuallyonautoinsurancefraudandgrowing• Evensmall%reductionsareworthwhile!

• Britishpolicyholderspay~£100peryeartocoverfraud• USdriverspay$200-$300peryearaccordingtoUSNationalInsuranceCrimeBureau

RegularDrivers

RegularDriversQuery

MATCH (p:Person)-[:DRIVES]->(c:Car) WHERE NOT (p)<-[:BRIEFED]-(:Lawyer) AND NOT (p)<-[:EXAMINED]-(:Doctor) AND NOT (p)-[:WITNESSED]->(:Car) AND NOT (p)-[:PASSENGER_IN]->(:Car) RETURN p,c LIMIT 100

GenuineClaimants

GenuineClaimantsQuery

MATCH (p:Person)-[:DRIVES]->(:Car), (p)<-[:BRIEFED]-(:Lawyer), (p)<-[:EXAMINED]-(:Doctor) OPTIONAL MATCH (p)-[w:WITNESSED]->(:Car), (p)-[pi:PASSENGER_IN]->(:Car) RETURN p, count(w) AS noWitnessed, count(pi) as noPassengerIn

Fraudsters

Fraudsters

MATCH (p:Person)-[:DRIVES]->(:Car), (p)<-[:BRIEFED]-(:Lawyer), (p)<-[:EXAMINED]-(:Doctor), (p)-[w:WITNESSED]->(:Car), (p)-[pi:PASSENGER_IN]->(:Car) WITH p, count(w) AS noWitnessed, count(pi) as noPassengerIn WHERE noWitnessed > 1 OR noPassengerIn > 1 RETURN p

Auto-fraudGraph

• Onceyouhavethefraudsters,findingtheirsupportteamiseasy.• (fraudster)<-[:EXAMINED]-(d:Doctor) • (fraudster)<-[:BRIEFED]-(l:Lawyer)

• Andit’salsoeasytofindtheirpassengers• (fraudster)-[:DRIVES]->(:Car)<-[:PASSENGER_IN]-(p)

• Andeasytofindotherplaceswherethey’vemaybecommittedfraud• (fraudster)-[:WITNESSED]->(:Car) • (fraudster)-[:PASSENGER_IN]->(:Car)

• Andyoucanseethisinmilliseconds!

It’s all about the patterns

PhonyPersona

OnlinePaymentsFraud(First-Party)

• Stealingcredentialsiscommonplace• Phishing,malware,simplenaïveusers

• Buyingstolencreditcardnumbersiseasy

• Howshouldoneprotectagainstseeminglyfinecredentials?• Andvalidcreditcardnumbers?

Wearealllittlestars

• Usernameandpasswords• Two-factorauth• IPaddresses,cookies• Creditcard,paypalaccount

• Somegamingsitesalreadydosomeofthis

• ArtsandCraftsplatformEtsyalreadyembracedtheideaofgraphofidentity

AnIndividualIdentitySubgraph

128.240.229.18

[email protected] 1234LOL

Weareallmadeofstars…

OtherSpe

cific

Considera

tions

SpecificWeightedIdentityQuery

MATCH (u:User {username:'Jim', password: 'secret'})

OPTIONAL MATCH (u) -[cookie:PROVIDED]->(:Cookie {id:'1234'}) OPTIONAL MATCH (u)-[address:FROM]->(:IP {network:'128.240.0.0'})

RETURN SUM(cookie.weighting) + SUM(address.weighting) AS score

BareMinimum

OtherSpecificConsiderations

FinalDecision

GeneralWeightedIdentityQuery

MATCH (u:User {username:'Jim', password: 'secret'})

OPTIONAL MATCH (u)-[rel]->() WHERE has(rel.weighting)

RETURN SUM(rel.weighting) AS score

BareMinimum

AllAvailableWeightings

FinalDecision

[email protected] 1234LOL

From1stto3rdParty

• The1stpartyidentitygraphcaneasilybeextendedto3rdpartyfraud• Likeinthebankfraudring,fraudsterscanmix-n-matchclaims• Startwithafewphishedaccountsandexpandfromthere!

SharedConnections

128.240.229.18

[email protected] 1234LOL [email protected]

Ca$hMon£y

GraphingSharedConnections

Hmm….

ScanforPotentialFraudsters

MATCH (u1:User)--(x)--(u2:User) WHERE u1 <> u2 AND NOT (x:IP) RETURN x

NetworkincommonisOK

Stopspecificfraudsternetwork,quickly

MATCH path = (u1:User {username: 'Jim'})-[*]-(x)-[*]-(u2:User) WHERE u1<>u2 AND NOT (x:IP) AND NOT (x:User) RETURN path

Howdothesefitwithtraditionalfraudprevention?

http://www.gartner.com/newsroom/id/1695014

Gartner’sLayeredFraudPreventionApproach

DemoTime

BankFraud

http://gist.neo4j.org/?dfdfbddfdc63f4858f80

CreditCardFraudDetection

http://gist.neo4j.org/?3ad4cb2e3187ab21416b

WhiplashforCash

http://gist.neo4j.org/?6bae1e799484267e3c60

Askforhelpifyougetstuck

• Onlinetraining-http://neo4j.com/graphacademy/

• Videos-http://vimeo.com/neo4j/videos

• Usecases-http://www.neotechnology.com/industries-and-use-cases/

• Meetups

• Bookstogetyourstarted• http://www.graphdatabases.com

• http://neo4j.com/book-learning-neo4j/

DeepNeuralNetworksforBankFraud

https://www.youtube.com/watch?v=TAer-PeIypI

FraudDetectionstartsabouthalf-way(afterintro)

Thanksforlistening

@maxdemarzi


Recommended