Date post: | 01-Jul-2015 |
Category: |
Documents |
Upload: | shane-dempsey |
View: | 338 times |
Download: | 0 times |
Collaborative Processing
System
. . . .
Organization 1
Agreed information warnings
Internet Organization M
contract contract contract
CoMiFin Essential
Comifin Essentials: Business Vision
• CoMiFin platform can be potentially useful for addressing the following business use cases – Monitoring and reaction to cyber threats (Man-in-the-Browser, Man-in-the-
Middle, Botnet detection, stealthy inter-domain port scan) – Location intelligence for fraud correlation – ID-theft – Anti money laundering monitoring – Black/white lists distribution (for credit reputation, trust level,.) – Anti-terrorism lists
• These use cases imply value added services that can be offered by SPs to FIs over CoMiFin
• CoMiFin project had been submitted to four FAB meeting evaluation sessions that have highlighted its possible business value in real financial use cases
CoMiFin Essentials: The notion of semantic room ■ Contract
■ set of processing and data sharing services provided by the SR along with the data protection, privacy, isolation, trust, security, dependability, performance requirements.
■ The contract also contains the hardware and software requirements a member has to provision in order to be admitted into the SR.
■ Objective
■ each SR has one strategic objective to meet (e.g, large-scale stealthy scans detection, detecting Man-In-The-Middle attacks)
■ Deployment
■ highly flexible to accommodate the use of different technologies for the implementation of the processing and sharing within the SR (i.e., the implementation of the SR logic or functionality).
CoMiFin Essentials: The notion of semantic room ■ Contract
■ set of processing and data sharing services provided by the SR along with the data protection, privacy, isolation, trust, security, dependability, performance requirements.
■ The contract also contains the hardware and software requirements a member has to provision in order to be admitted into the SR.
■ Objective
■ each SR has one strategic objective to meet (e.g, large-scale stealthy scans detection, detecting Man-In-The-Middle attacks)
■ Deployment
■ highly flexible to accommodate the use of different technologies for the implementation of the processing and sharing within the SR (i.e., the implementation of the SR logic or functionality).
CoMiFin Essentials: Deploying a Semantic Room
■ Private cloud ■ Deployment of the semantic room
through the federation of computing and storage capabilities at each member
■ Each member brings a private cloud to federate
■ Public Cloud ■ Deployment of the semantic room on
a third party cloud provider ■ The third party owns all computing
and storage capabilities
■ Hybrid approach
Interconnections of SRs • Horizontal/Vertical composition
– Communicating SRs have different goals
– Complex/combined attacks can be detected by information fusion
– E.g. , alerts generated in Portscan SR contributes to the Blacklist managed in MitM SR
Today’s CoMiFin Achievements: user perspectives
• Timely identification of Identity Theft • Community-based advanced
information sharing • Identification of Command and
Control of Trojans for MiTB
Outline • Three Semantic Rooms
– Esper-based and Agilis-based SRs for inter-domain stealthy port scan detection
– Agilis-based botnet driven HTTP session hijacking (not covered in this presentation)
• High level description of the port scanning detection algorithms – R-SYN port scan detection – Line Fitting
• High Level description of botnet driven HTTP session hijacking (not covered in this presentation)
• Performance evaluation – R-SYN vs Line Fitting using the Esper-based SR – Agilis vs Esper-based SR for inter-domain stealthy port scan
detection (R-SYN only) – Agilis for botnet driven HTTP session hijacking (not covered in this
presentation)
Do you remember some important concepts?
• SR creation phase – A so-called SR schema is created where the three characterizing SR
elements are specified • The objective • The contract filled in with general SR contractual clauses • The variety of software deployments that can be used for that SR
• SR instantiation phase – The same SR schema can be instantiated in different ways according
to different aspects, e.g., • geographical position of the members • processing and sharing software (SR IDPS instance uses Agilis for the
processing, SR IDPS instance uses Esper for the processing) • types of SR deployments: third party based, SR owned, hybrid
Inter-domain stealthy port scan
• TCP SYN (half-open) port scan – Each scanner is targeting multiple sites – Hosts at each site receive a series of probes to multiple ports – Each probe is a TCP connection initiation (3-way handshake), which never
completes • A scanner S sends a SYN packet to a target T on a specific port P and waits for a
response – If a SYN-ACK packet is received, S can conclude that P is open and optionally reply with a RST
packet to reset the connection (incomplete connections) – if a RST-ACK packet is received, S can consider P as closed – If no packet is received at all and S has some knowledge that T is reachable, then S can conclude
that P is filtered – If S does not have any clue on the reachability status of T, it cannot assume anything about the
state of P
• We have implemented two algorithms for inter-domain stealthy port scan detection – Rank-based SYN (R-SYN) port scan detection – Line Fitting port scan detection
R-SYN algorithm* • It recognizes half open connections (HOC)
– Sequence of SYN, ACK, RST packets in the 3-way TCP handshake
• Normal: (i) SYN, (ii) SYN-ACK, (iii) ACK • SYN port scan: (i) SYN, (ii) SYN-ACK, (iii) RST (or nothing)
• It recognizes failed connections (FC) – Unreachable hosts and closed ports
• Unreachable hosts: a sender, after a timeout the sending of a SYN packet, does not receive neither SYN-ACK nor RST-ACK packets
• Closed ports: it looks for RST-ACK reply packets
• For each source IP address x, it maintains the pairs (IP address, TCP port) probed by x (V(x))
• Using a proper ranking mechanism, it assigns a mark r to each source IP address x
– r(x) = f (HOC(x), FC(x), V(x)) – If r(x) >= predefined threshold, x is a scanner
* L. Aniello, G. Lodi, R. Baldoni, “Inter-Domain Stealthy Port Scan Detection through Complex Event Processing”, to appear in the Proceedings of 13th European Workshop on Dependable Computing (EWDC 2011), May 11-12, Pisa, Italy
Line Fitting algorithm* • Underlying principle
– a scanner does not repeatedly perform the same operation towards specific hosts or ports
• if the attempt fails on a T:P a scanner likely carries out a malicious port scan towards different targets
• Line Fitting takes into account the set F{h} which is a multiset of failures generated by the source host h
– A normal failure: the set contains few elements with high multiplicity
– A port scan: the set includes many elements with low multiplicity
* L. Aniello, G. A. Di Luna, G. Lodi, R. Baldoni, “A Collaborative Event Processing System for Protection of Critical Infrastructures From Cyber Attacks”, submitted for publication to International Conference, 2011.
Implementation in Esper • Esper uses the so-called Event Processing Language (EPL)
for defining continuous queries – EPL is an SQL extension language
• Example: EPL query for detecting incomplete connections – We exploit the “pattern” construct of EPL
• a is the stream of SYN packets, b is the stream of SYN+ACK packets, <c > is a filter for RST packets and <d > is the filter of ACK packets that would correctly complete 3-way handshaking
• Pattern matches if involved packets are within a time window of 61 sec
Concluding Remarks • Collaboration can be beneficial
– In both algorithms, augmenting the number of SR Members (i.e., augmenting the volume of data to be correlated) leads to an increase of the detection rate
– Line Fitting converges more quickly to the highest detection rate compared to R-SYN
• Esper employed a centralized approach – It can can be useful in case of small SRs with
members which are not geographically dispersed – It can become a bottleneck in case of high number of
distributed SR members • A distributed version can be necessary in order to
improve scalability
Design Goals • Enable processing for detecting cyber attacks • Ensure privacy/confidentiality for locally
generated data – Sensitive information must not be exposed for
global correlation • Support for diverse types of input data
– E.g., real-time and long-lived historical data • Easy-to-use, built using off-the-shelf
components • Scalable performance
Scaling Challenges
• Large numbers participating sites • Possibly wide distribution • Massive volumes of events • High rates of incoming event traffic
Agilis Architecture
Jaql Front-End
Re-define InputFormat, OutputFormat
Distributed In-Memory Store (WXS)
Cat 1
Cat 2 Distributed File System (HDFS)
Map-Reduce (Hadoop)
Job Tracker
Task Tracker
Task Tracker
Task Tracker
HDFS Adapter
WXS Adapter
Jaql Query
Storage container
Aglis Site 1
Gat
eway
Data
Pre-Processing
Anonymizing
Raw Data
Storage container
Agilis Site 2
Gat
eway
Data
Anonymizing
Raw Data
Storage container
Agilis Site N G
atew
ay
Data
Anonymizing
Raw Data
…. Pre-Processing Pre-Processing
Query Scheduler
Locality-Aware Collaboration • Map tasks are mutually independent and can run
in parallel at each site – Can be run in parallel at each site
• Input data is partitioned among the sites – Each partition is mapped into an input split – Map tasks are collocated with their input splits
• Simple queries are delegated to the SQL engine embedded in data containers – Select, project, and aggregate
Improves scalability by reducing the amount of data requiring global correlation
Processing Flow
Preprocessing
Preprocessing
Preprocessing
XS Part
1
XS Part
2
XS Part N
Historical Safety
Ranking
Black List
TCPDump
Normalized Data: [LogEvent]*
TCPDump
TCPDump
Summarized Data: [SourceIP, rNum, pNum]*
Summarizati
on
Blacklisting
Summarized Data
[SourceIP,rank]*
Calibration
Parallelized Map/Reduce Jobs
Ranking
Detection Algorithm (R-SYN) • Recognizes two probe patterns
– Incomplete connection – Failed connection
• Log unique tuples – (sourceIP, destIP, destPort)
• Ranking: – Rank(sourceIP) =
F(#incomplete, #failed)
R-SYN Implementation in Agilis • Gateway identifies incomplete and failed
connections, and maintains tuples (IP, #incomplete, #failed)
• Jaql query for global correlation:
Latency Evaluation* • 287MB intrusion trace from http://www.itoc.usma.edu/research/
dataset/index.html Varied WAN link bandwidth
WANem simulator Comparison against
centralized event processing (Esper)
Partitioned across 6 Agilis sites Each site simulated by a Linux VM
* L. Aniello, R. Baldoni, G. Chockler, G. Laventman, G. Lodi, Y. Vigfusson, “Agilis: An Internet-Scale Distributed Event Processing System for Collaborative Detection of Cyber Attacks”, submitted for publication to International conference, 2011
Conclusions • Agilis: Distributed platform for collaborative
detection of cyber attacks • Performance studies show scalability and
highlight benefits of collaboration – Port scan detection and botnet identification
• Ongoing work: improvements to the infrastructure – Replacing WXS with open source RAM-based
store, persistent & data-driven queries • Evaluation on realistic data
UML modeling • UML
– standardized, general-purpose modeling language • Modeling the overall CoMiFin middleware
– Common understanding – Coherent models
• Create – high level – technology independent models – following MDA approach
• Main packages of the model – Actors – Components – Common Data Types – Diagrams
Diagrams • Use Case diagrams • High-level Component diagram • Detailed component diagrams • Class diagrams • Data models • Sequence diagrams • Deployment diagrams • Covering all parts of CoMiFin prototype
– 123 diagrams, 1518 modeling element, 223 classes, 49 components, 89 sequence diagrams,…
• Example diagrams: SR Gateway component
Nov. 2010 attack – selected actions
Event # Day # Time Event 1 1 14:48 Bank1 is notified about infections 3 1 16:05 Logon attempt from UK IP 4 1 16:35 Bank2 sends Bank1 link to drop site 5 2 09:00 Bank 1 analyzes the information received from Bank 2 6 2 09:10 Bank 1 comes across login information of customers of Bank 3, and duly warns Bank 3.
16 3 13:04 Bank 1 analyzes config file of the infection that Bank 1 has received from Bank 2.
17 3 18:45 Customer records are collected from drop site 20 3 20:56 Analysis of config file reveals how the customer may recognize if the PC is infected. 26 4 09:10 The certificates of compromised customers are revoked. 29 4 09:16 The recent transaction history of compromised customers is analyzed. 37 4 12:38 The Financial Supervisory Authority of Norway is notified of the attack. 45 4 13:04 All certificates of compromised customers are revoked. 47 4 13:10 There is a successful logon from a PC in UK. 48 4 13:43 The infected PCs of compromised customers are collected. 53 4 14:10 There are telephone calls with the cyber police. 78 7 10:55 Bank1 receives samples of the Zeus virus from the cyber police. 81 7 12:02 Discussions with the cyber police about how the Zeus virus works. 104 8 09:21 New “stolen” login credentials are posted to drop site. … … … …
Lessons learnt Banks today exchange information about incidents in an informal
and accidental way
The way banks today exchange information about incidents does not scale
The cooperation between the banks and between the banks and the cyber police seems informal and based more or less on good will. One cannot help thinking that a more formal cooperation and exchange of information made possible by CoMiFin might further benefit the parties involved. Countermeasures might then be made at an earlier stage, reducing detrimental consequences of attacks and protect society as a whole against attacks.
Which are the events to look for? During log-in, the Trojan presents a false login-page. When
this happens, one bank had certain log-entries posted to the
bank’s web-server log. The bank identified the signature of
the log entry and developed an automated process that
recognizes the signature, stops the login and also
automatically and instantly revokes the login-credentials of
the customer. Sharing this information allowed other banks
to implement similar countermeasures.
Which are the events to look for?
Analysis revealed that at the time of writing this, the Trojan
is very context sensitive, i.e. a small change to the web of
the internet bank would fool the Trojan. This suggests that
using random names for JavaScript functions and CSS
class names would make the Trojan not recognize where to
inject the malicious code and probably render the Trojan
useless or less effective.
Which are the events to look for?
One important quality of a Trojan, is its ability to hide itself.
Encryption, hashing and other techniques are being used
for this. If the obfuscation techniques are elaborate it takes
resources and expertise to break them. Together the banks
were able to “undress” the Trojan effectively. Once
“undressed”, the banks were able to device effective
countermeasures.
Which are the events to look for?
The national CERT (NSM) did a comprehensive technical
analysis of SpyEye to be shared by all FIs. During this they
discovered the IP of the Command and Control Center
(CC). By contacting the Norwegian ISPs, they were able to
monitor the amount of traffic going from Norwegian
customers to the CC. This way they got a picture of the
severity of the attack. Also steps could be taken to close
down the attack sites.
Supporting documentation The CIFAS - the UK's Fraud Prevention Service, has released Fraudscape, a report detailing the frauds recorded by the 265 CIFAS Members during 2009. According to CIFAS,
"the findings presented in Fraudscape, however, clearly demonstrates the benefits that mutual collaboration brings. By sharing knowledge and pooling resources, CIFAS Members have prevented millions of pounds of fraud year after year and also increased the knowledge of the methods used to defraud businesses, consumers and society equally. This approach can only bring futher benefits if further cooperation and responsible data-sharing takes place across all sections of society".