Shouling Ji, Shukun Yang, and Raheem Beyah Georgia Institute of Technology Ting Wang Lehigh...

transcript

Shouling Ji, Shukun Yang, and Raheem BeyahGeorgia Institute of Technology

Ting WangLehigh University

Changchang Liu and Wei-Han LeePrinceton University

PARS: A Uniform and Open-source Password Analysis and Research

System

INTRODUCTION

• People choose simple passwords.• password• 123456• 111111• iloveyou

• People reuse passwords.• On average a user has 6-7 passwords and maintains 25 distinct

online accounts. (Florencio et al. [WWW’07][1])

INTRODUCTION

Over the past decade, hundreds of millions of passwords have been leaked.

INTRODUCTION

Why do we worry about leaked passwords?

They can be used to crack other password datasets!

INTRODUCTION

• Text-based passwords still dominate computer system authentication

• Usability: Personal enough for users to remember• Security: Difficult enough for outsiders to guess/access

• Passwords datasets have been leaked• English: LinkedIn, Rockyou, eHarmony, …• Chinese: Tianya, CSDN, 7k7k, Duduniu, ….• German: Gamigo

• Passwords research has made considerable progress• Passwords Cracking: Markov-based, Structure-based, Dictionary, Rainbow

Table, …• Passwords Strength Meter: NIST, Ideal, Markov-based, Structure-based, …• Passwords Management, Measurement, Alternatives, …

RELATED WORK

• Password Cracking

• Algorithms aim to reduce search size of password space and to enumerate passwords in the decreasing order of likelihood.

• Use expired and reused passwords as training information to create guesses.

• Password Measurement

• Correlations between demographic and behavioral factors have been found. Regional differences result in various password patterns.

• NIST entropy and other traditional password metrics have been found ineffective. Using cracking models to build sophisticated meters has been shown to be more effective.

• Inconsistent feedback of password strength exists among different sorts of strength metrics and meters.

RELATED WORK

Questions:

Many password cracking algorithms have performed reasonably well.

But which one is the most effective?

Many websites have password policies and strength meters.

But are they helpful?

Many passwords datasets have been leaked and published.

But do they affect the security of other datasets?

OUR CONTRIBUTIONS

• A uniform and comprehensive Password Analysis and Research System - PARS (open-source project)

• Large-scale Password Security Measurement and Analysis

• Future Research Insights: Correlation, Hybrid Cracking, Relative Improvement Ratio, Diversity, …

OUTLINE

• PARS Overview• Datasets Analysis• Password Cracking Models• Hybrid Cracking Feasibility Analysis• Password Measurement Models – Academic• Password Measurement Models – Commercial• Future Research Insights

PARS OVERVIEW

Available in PARS:

• Cracking Module - Attack

• Measurement Module - Defend

• Utility Module - Analyze

Dataset Analysis (145M)

Strength Metrics (15)

IR, …

Academic Meters (8)

Commercial Meters (15)

PARS OVERVIEW

Cracking Module:

• 12 state-of-the-art cracking algorithms

Measurement Module:

• 15 intra-site and cross-site password strength metrics

• 8 academic password meters

• 15/24 commercial password meters from top-150 websites ranked by Alexa.com

IR, …

Academic Meters (8)

PARS OVERVIEW

Utility Module:

• Data Analysis• 8 Datasets of Leaked

Passwords• Data Processing Unit

• Tools• Hashing (MD5)• Preprocessing of data• Statistical Analysis

• Insights and Future Research• Hybrid Password Cracking• RIR Metrics

IR, …

Academic Meters (8)

PARS OVERVIEW

• Command Line Mode• Easy and Fast Scripting• No need to learn about using

individual algorithms• Carefully aligned outputs

• GUI Mode• Delicate design that is user-

friendly• Support for Commercial

Meters Evaluation• Visualization of Outputs

IR, …

Academic Meters (8)

DATASETS ANALYSIS

Ethics: All datasets were once publicly available and are used for research purpose only

Contains username, email corresponding to each password

DATASETS ANALYSIS

• Passwords classifications• Lengths: <=6, 7, …, 14, >=15

• Compositions: Univariate, Bivariate, Trivariate, Qualvariate

e.g., password123 -> bivariate; password123!@# -> trivariate

• Structure: LD, L, D, DL, LDL, UD, U, ULD, DLD, LDLD, other

password123 -> LDPAssword123 -> ULD

DATASETS ANALYSIS

• Standard Datasets

• 2 million randomly sampled passwords from each of the 8 datasets to ensure fair evaluation

• 8 standard datasets for all evaluations and tests beyond this point• 7k7k• CSDN• Duduniu• Renren• Tianya• LinkedIn• Rockyou• Gamigo

PASSWORD CRACKING MODELS

• John the Ripper [2]

• A popular community cracking software• Bleeding-Jumbo is a open-sourced community version

• Contains 4 popular modes• Single (social profile information)• Wordlist (input dictionary/wordlist)• Incremental (smart brute-force)• Markov (markov chain/training data)

• HashCat (v0.50)

• A popular community cracking software

• Contains 4 popular modes• Brute-force Mode• Dictionary Attack• Mask Attack• Permutation Attack…etc

• Probabilistic Context Free Grammars

• Pcfg Manager (PCFG)

Weir et el. “Password Cracking Using Probabilistic Context-Fre Grammars”, S&P 2009.

• Semantic Guesser (VCT)

Rafael et el. “On the Semantic Patterns of Passwords and their Security Impact”, NDSS 2014.

• Markov Models

• Fast Dictionary Attack

• Narayana et el. “Fast dictionary attacks on passwords using time-space tradeo ”, CCS 2005ff

• N-gram

• Ur et el. “How does your password measure up? the e ect of strength meters ffon password creation”, USENIX 2012

• OMEN and OMEN+

• Durmuth et el. “Leveraging personal information for password cracking”, CoRR 2013

• Others

• Cross-site guessing (DBCBW)

• Das et el. “The tangled web of password reuse”, NDSS 2014

• Transform-based guessing (ZMR)

• Zhang et el. “The security of modern password expiration: An algorithmic framework and empirical analysis”, CCS 2010

• Summary

Highest percentage of cracked passwords highlighted

Training-free

BETTER CRACKING PERFORMANCE?

Since no single algorithm excels all the time…

Why not devise a hybrid algorithm to combine the cracking performance?

BETTER CRACKING PERFORMANCE?

• Possible?

• Take the advantages of both PCFG-based and Markov-based algorithms and combine into a single hybrid version

Hybrid Password Cracking (HPC)

HYBRID PASSWORD CRACKING (HPC)

• Question 1

Is it reasonable/necessary to design a HPC algorithm?

• Question 2

If reasonable, how much improvement can be achieved?

• Relative Improvement Ratio (RIR)Under same settings:

A1 -> PCFG, A2->OMENX: set of 7k7k password cracked by Renren-trained A1Y: set of 7k7k password cracked by Renren-trained A2

The RIR of A1 given by A2, denoted by is defined as

PCFG(P)VCT(V)3gram(3)OMEN(O)

Set of cracked passwords by A1

Set of cracked passwords by A2

Overlap

• RIR indicates the potential improvement of an algorithm when incorporating the advantages of another algorithm

• Answers to previous 2 questions• Every algorithm has room for improvement given the

advantage of another algorithm• Different choice of algorithms will result in different

improvement spaces

PASSWORD MEASUREMENT - ACADEMIC

• Statistics-based• Ideal [9]: Assigns entropy

score based on probability distribution

• Rule-based• NIST [9]: entropy of 1st char

is 4 bits; next 7 chars are 2 bits/char; bonus with dictionary check

• Intra-site metrics [10]• Min-Entropy• Guesswork G• Beta-success-rate• Alpha-work-factor

• Cross-site metrics [10]

• Attack-based• Estimate difficulty of attack

using a cracking model• PCFG [11]

• Based on the structure-based cracking algorithm

• Adaptive [9]• Based on the Markov-

based cracking algorithm• Brute-force Markov (BFM)

Traditional entropy does not evaluate attacker’s efforts.

Good metrics try to estimate the efforts for attackers to crack a specific password/dataset.

Training -> Testing

Tianya->CSDN (PCFG)

= Using Tianya to train PCFG meter and evaluate CSDN dataset

CDF of Entro

py Dist.

VS. Entro

py Score

PASSWORD MEASUREMENT – COMMERCIAL

• From alexa.com

• Top-150 commercial password strength evaluators in 10 specific categories

LL: with minimum length constraint

UL: with maximum length constraint

C1-C6: # of composition policies

N/A: unclear policies

List of 150 websites

• Commercial Meters:

• Target

• Bloomberg

• Yahoo!

• Google

• Home Depot

• …

• We extracted source code from specific online meters and leveraged them directly or rewrote the meters using the algorithms gleaned from the code

• For those checkers that operate at the server-end without , we can initiate requests to check specific passwords

• In our experiments, we evaluate using Google, Yahoo!, Bloomberg and Target’s meters

PASSWORD MEASURING – COMMERCIAL

Strength Dist. VS. Strength Levels (each meter has different “levels”)

CONCLUSIONS

• We proposed and implemented PARS, the first uniform, open-source and comprehensive password research platform, which can provide great convenience for researchers and a platform of benchmark for new techniques.

• We conducted large-scale and comparable evaluation using numerous algorithms implemented in PARS which helps us better understand current password security research.

• We evaluated future research insights such as the feasibility of hybrid password cracking and provided a measure of the effectiveness of HPC design.

Thank you!

CAP Grouphttp://www.ece.gatech.edu/cap/PARS

Shouling Ji, Shukun Yang, and Raheem Beyah Georgia Institute of Technology Ting Wang Lehigh...

Documents