benchmarking v2 - University of Texas at Dallas€¦ · 7/23/18 1 Marco Vieira [email protected]...

7/23/18

1

[email protected]

DepartmentofInformaticsEngineeringUniversityofCoimbra- Portugal

BENCHMARKINGTHE SECURITY OF SOFTWARE SYSTEMS OR

TO BENCHMARK OR NOT TO BENCHMARK

QRS 2018Lisbon, PortugalJuly 19th, 2018

MarcoVieira QRS2018,Lisbon,Portugal,July19th,2018 2

BENCHMARKING

Assessingandcomparingcomputer systems and/or componentsaccording to specific qualityattributes

§ Performancebenchmarking– Wellestablishedbothintermsofresearchandapplication– SupportedbyorganizationslikeTPCandSPEC– Mostlyformarketing

§ Dependabilitybenchmarking– Wellestablishedfromaresearchperspective– Noendorsementfromtheindustry


BENCHMARKING

Assessingandcomparingcomputer systems and/or componentsaccording to specific qualityattributes

§ Securitybenchmarking– Severalworkscanbefound– Nocommonapproachavailableyet

2017

Performance benchmarks Dependability benchmarks

Security benchmarks

CIS

2000

Whetstone Wisconsin BenchTP1DebitCreditOrange Book

TPC & SPEC SIGDeBCommon Criteria

1972 1983 1985 1988 1999

EMBC

1987

Release of commercial performance benchmarks… Research projects on dependability & security benchmarks


OUTLINE

§ Thepast:Performance&DependabilityBenchmarking

§ Thepresent:SecurityBenchmarking

§ BenchmarkingtheSecurityofSystems– Approach:Qualification+TrustworthinessAssessment– Example:BenchmarkingWebServiceFrameworks

§ BenchmarkingSecurityTools– Approach:VulnerabilityandAttackInjection– Example:BenchmarkingIntrusionDetectionSystems

§ ChallengesandConclusions


PERFORMANCE BENCHMARKING

Assessingandcomparingcomputer systems and/or components

in terms of performance


PERFORMANCE BENCHMARKING

SUB MetricsWorkload

§ Workload:– Setofrepresentativeoperations

§ Metrics:– Throughput– Responsetime– Latency– …

7/23/18

2


TPC-C(1992)

§ Workload:– Databasetransactions

§ Metrics:– Transactionrate(tpmC)– Pricepertransaction($/tpmC)

Althoughsomeintegritytestsareperformed,it assumes thatnothingfails

DBMS MetricsWorkload


DEPENDABILITY BENCHMARKING

Assessingandcomparingcomputer systems and/or componentsconsidering dependabilityattributes


DEPENDABILITY BENCHMARKING

SUB

ExperimentalmetricsWorkload

Faultload

§ Faultload:– Setofrepresentativefaults,injectedintothesystem

§ Metrics:– Performanceand/ordependability

• Bothbaselineandinthepresenceoffaults

– Unconditionaland/ordirect

Unconditionalmetrics

Models

Parameters(faultrates,MTBF,etc.)


§ Workload:– TPC-Ctransactions

§ Faultload:– Operatorfaults+Softwarefaults+HWcomponentfailures

§ Metrics:– Performance:tpmC,$/tpmC,Tf,$/Tf– Dependability:Ne,AvtS,AvtC

DBENCH-OLTP(2005)

SUB


Faultload


DBENCH-OLTP(2005)

Faultload:Operatorfaults


DBENCH-OLTP(2005)

Baseline Performance

0

1000

2000

3000

4000

A B C D E F G H I J K

tpmC

0

10

20

30

$tpmC$/tpmC

Performance With Faults

0

1000

2000

3000

4000


Tf

0

10

20

30

$Tf$/Tf

Availability

50

60

70

80

90

100


% AvtS (Server)AvtC (Clients)

Doesnottakeintoaccountmaliciousbehaviors(faults=vulnerability+attack)

7/23/18

3


SECURITY BENCHMARKING

Assessingandcomparingcomputer systems and/or components

considering securityaspects

§ BenchmarkingtheSecurityofSystems/Components– Systemsthatshouldimplementsecurityrequirements– OS,middleware,serversoftware,etc.

§ BenchmarkingSecurityTools– Toolsusedtoimprovethesecurityofsystems– Penetrationtesters,staticanalyzers,IDS,etc.


BENCHMARKING SECURITY OF SYSTEMS

§ Attackload:– Representativeattacks

§ Metrics:– Performance+dependability– Security(e.g.,numbervulnerabilities,attackdetection)

SUB


Attackload

Unconditionalmetrics

Models

Parameters(vulnerabilityexposure,meantimebetweenattacks,etc.)

Attackingwhat?Doweknowthevulnerabilities?Whatarerepresentativeattacks?

Doesnotworkifonewantstobenchmarkhowsecuredifferentsystemsare!

e.g.doesthenumberofvulnerabilitiesofasystemrepresent anything?


ADIFFERENT APPROACH…

SUBsSecurity

Qualification

Unacceptable

Security=0

§ SecurityQualification:– Applystate-of-the-arttechniquesandtoolstodetectvulnerabilities

– SUBswithvulnerabilitiesare:• Disqualified!• Orvulnerabilitiesarefixed…



TrustworthinessAssessment

MetricsAcceptable

§ TrustworthinessAssessment:– Gatherevidencesonhowmuchonecantrust– e.g.,bestcodingpractices,developmentprocess,badsmells

SUBsSecurity

Qualification

Unacceptable

Security=0



§ Metrics:– Portraytrustfromauserperspective– Dynamic:maychangeovertime– Dependonthetypeofevidencesgathered– Differentmetricsfordifferentattackvectors

TrustworthinessAssessment

MetricsAcceptableSUBs

SecurityQualification

Unacceptable

Security=0


EXAMPLE:WEB SERVICE FRAMEWORKS

Assessment(CPU+mem.)

Trust.Score

AcceptableWSFs

Qualification(testing)

Unacceptable

Security=0

§ Qualification– DoS Attacks– CoerciveParsing,MalformedXML,MaliciousAttachment,etc.

§ TrustworthinessAssessment:– Qualitymodeltocomputeascore

7/23/18

4


QUALITY MODEL


SYSTEMS UNDER BENCHMARKING


TRUSTWORTHINESS RESULTS


§ Faultload:– Vulnerabilitiesareinjected– Attackstargettheinjectedvulnerabilities

§ Datacanbecollectedforbenchmarkingsecuritytools– Penetrationtesters,staticanalyzers,IDS,etc.

BENCHMARKING SECURITY TOOLS

SUB


Faultload(vulnerabilities+attacks)

Sec.Tool

Data


VULNERABILITY AND ATTACK INJECTION


EXAMPLE:BENCHMARKING IDS§ Securityrequiresadefenseindepthapproach

– Codingbestpractices– Testing– Staticanalysis– …

§ Vulnerability-freecodeishard(orevenimpossible)toachieve...

§ Intrusiondetectiontoolssupportapost-deploymentapproach– Forprotectingagainstknownandunknownattacks

7/23/18

5


EVALUATION APPROACH


EXAMPLES OF VULNERABILITIES INJECTED

Original PHP code Code with injected vulnerability Operation performed

$id=intval($_GET['id']); $id=$_GET['id']; Removed the “intval” function allowing also non numeric values (i.e. SQL commands) in the “$id” variable

$page = urlencode($page); $page = $page; Removed the “urlencode” function allowing also alphanumeric values (i.e. SQL commands) in the “$page” variable

… … …


EXAMPLES OF ATTACKS

Attack payloads Expected result ' Modifies the structure of the query; usually results in an error

or 1=1 Modifies the structure of the query. Overrides the query restrictions by adding a statement that is always true.

' or 'a'='a Modifies the structure of the query. Overrides the query restrictions by adding a statement that is always true.

+connection_id()-connection_id() Modifies the query result to 0

+1-1 Modifies the query result to 0 +67-ASCII('A') Modifies the query result to 0 +51-ASCII(1) Modifies the query result to 0 … …


SYSTEMS UNDER BENCHMARKING

Tool Architectural Level monitored

Detection Approach

Data Source Known

Technology Limitations

ACD Application Anomaly Based Apache Log Only GET method Apache Scalp Application Signature Based Apache Log Only GET method ModSecurity Application Signature Based HTTP traffic - Snort (v2.8 and v2.9)

Network Signature Based Network Trafic -

GreenSQL Database Signature Based SQL Proxy Trafic MySQL data

DB IDS Database Anomaly Based SQL Sniffer Trafic

MySQL and Oracle data


EXPERIMENTAL SETUP


MAIN RESULTS

P N Pop TP TN FN FP

ACD 1275 376 174 675 50 0.883 0.358 0.088 0.135

Scalp 1275 206 224 845 0 1.000 0.196 0.210 0.196

ModSecurity 826

225 1051 236 225 590 0 1.000 0.286 0.276 0.286

Net Snort 2.8 1275 0 817 458 0 - 0.000 - 0.000

GreenSQL 1275 244 813 214 4 0.984 0.533 0.775 0.528

DB IDS 1275 451 384 7 433 0.510 0.985 0.492 0.455

Net Snort 2.9 173

878 1051 0 878 173 0 - 0.000 - 0.000

458

817

DB

App

1051

224

Alllvl Tool

Review ReportedPrec. Infor.Mark.Recall

7/23/18

6


WHAT IS WRONG?§ Establishedbenchmarksaremostlyformarketing!

§ Strictbenchmarkingconditions– Fixedworkload&faultload +Smallsetofmetrics

§ Workload&faultload:– Maynotberepresentativeoftheuserscenario

§ Metrics:– Fixed!Maynotsatisfytheuserneeds– Decisionbasedonseveralmetricsisdifficult!

Nosecuritybenchmarkendorsedbyanyorganization or industry


FIXED!

§ Example:– Benchmarkingvulnerabilitydetectiontools– Typicalmetric:F-Measure– Isthisgoodinallscenarios?

• Businesscritical:recall• Besteffort:F-Measure• Minimumeffort:Markedness

SUB MetricsActivation

Fixed!


APOTENTIAL APPROACH…§ Benchmarkingconditionsadaptabletotheuserneeds

§ Includemultipleusagescenarios:– Metricsdependonthescenario– Adaptableworkloadandfaultload

§ Usequalitymodelsinsteadofindependentmetrics– Qualitymodelsshouldalsoadapttothescenario


SCENARIOS AND QUALITY MODELS

Howtodefinescenarios?Howtodefinequalitymodels?Howtoadaptworkloadsandfaultloads to

thescenarios?


CHALLENGES§ Satisfyindustryrequirements

– Representativeness,portability,scalability,non-intrusiveness,lowcost,…

– Prevent“gaming”

§ Satisfyuserrequirements– Representativeness,usefulness,simplicityofuse…– Adaptable– allow“gaming”

§ EndorsementbyTPC,SPEC,…– Howto?


IS THERE A FUTURE?§ ResilienceBenchmarking

– Assessandcomparethebehaviorofcomponentsandcomputersystemswhensubjectedtochanges

– Whichresiliencemetrics?• Comparable,consistent,understandable,meaningful,…

– Changeloads:• Representative,practical,portable,…

§ TrustworthinessBenchmarking– Whatevidencestocollect?– Whatmetrics?– Dynamicityofperception… socialtrust...

7/23/18

7


CONCLUSIONS§ Thebenchmarkingconceptiswellestablished!

§ Acceptanceby“big”industrydependsonperceivedutilityformarketing

§ Acceptancebyusersrequires“adaptability”

§ Fromaresearchperspective,performanceanddependabilitybenchmarkingarewellknown

§ Securitybenchmarkingapproachesareweak

§ Newtypesofbenchmarkswillbringadditionalchallenges!


QUESTIONS?

Marco VieiraDepartment of Informatics EngineeringUniversity of [email protected]

http://eden.dei.uc.pt/~mvieira

Date post:	23-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

benchmarking v2 - University of Texas at Dallas€¦ · 7/23/18 1 Marco Vieira [email protected]...

Documents