Generating Password Challenge Questions Chuong Ngo.

Generating Password Challenge Questions

Chuong Ngo

Online Services and the Problem of Account Security

E-commerce, banking, e-mail, etc� Average: 26 diffeerent online accounts

5 unique passwords 25 to 30: 40+ accounts

2012 online fraud cases: 3x 2010 case count 90% of accounts require user id and password

Passwords need to be strong and unique

Passwords: So secure you can't remember it?

Memorability vs security - negative correlation

Password recovery systems a must SMS E-mail Snail mail Challenge questions

Why Challenge Question?

User must answer agreed upon questions to validate identity

Most commonly used systemAdvantages Disadvantages

Resilient to opportunistic attacks Quality question pool

Can be automated/little training required Susceptible to targeted attacks

On-demand/quick turn-around time Invasion of privacy

False rejection

Just How Safe and Secure?

System is weak & exploitable Answers easy to obtain/public domain

Social media 12% answerable with social media info

Applicability & repeatability

Can It be Salvaged?

Treat challenge questions like passwords Must value memorability Avoid too many “easy” answers

Large pool of challenge questions What if the questions were targeted and

personal?

Targeted Challenge Questions

Applicability and repeatability negligible More personal, more secure & memorable Greater answer variety from long-form answers

Make a system the uses or generates challenge questions that target the user's strong, personal memories.

System Concept

Current System (Simplified) Concept System

Prompt user to select questions Prompt user with general question

Capture user responses Capture user responses (long-form)

Store user responses Run responses through NLP engine

Store responses and extracted entities

Current System (Simplified) Concept System

Query for challenge question Query for stored response or web of data

Capture user response Modify stored response/generate challenge question

Compare response against stored responses

Capture user response

Run response through NLP engine

Compare response to stored response

Data Ingest

Data Retrieval

The Natural Language Processing Engineat the Heart of it All

The NLP Engine

Uses Stanford CoreNLP Pipeline includes:

Tokenizer Sentence Splitteer PoS Tagger Morpha Annotator NER Parsing Coreferencer

Notable Pipeline Absences

No sentiment analyzer Requires training for individuals No real advantage

No relationship analyzer Beyond scope

Limited use of the coreferencer and dependency tree. Focused on named noun entities (NN) to simplify

implementation.

Fill-in-the-Blanks (FitB) ApproachA First Step

FitB Approach Overview

Challenge question is open-ended and general. User provides a long-form response.

Presents user with the modifieed answer to the challenge question. User must “fiell-in-the-blanks”/correct the mistakes.

Authentication done by comparing user's responses to the missing entities. Match must meet or exceed a threshold.

An Example

Bob is a great uncle. He loved to fiesh and would do so as ofteen as he could near his home in Minnesota. He taught me to fiesh over the summer that I stayed with him. Everyday, we would go to a nearby stream. Thee stream would later feed into the Mississippi River.

[Blank] is a great uncle. He loved to fiesh and would do so as ofteen as he could near his home in [Blank]. He taught me to fiesh over the summer that I stayed with him. Everyday, we would go to a nearby stream. Thee stream would later feed into the [Blank] River.

Why does it work?

It is a single story. Multiple NNs related to the same idea.

It is memorable. Prompt helps to kick start memory.

Simple and fast Does not overly burden the user. Avoids the problem of question generation. Easily extensible

No web of knowledge – preserves privacy.

Where does it fall short?

Potentially low entropy in question pool. Queestion is not generated.

No web of knowledge – no context. Unable to correlate multiple stored user responses.

Dependent on large number of NNs. Needs clean, non-noisy input. Token matches does not tolerate much deviation. Some private information may be leaked. Unable to be integrated into other sources of information. Signifiecant setup time.

Future Work

Diffeerent user interfaces Example: pictures

Incorporate additional processors Example: relationship analyzer

Increase the number of data points to match.

Document Retrieval ApproachA Slight Twist

Document Retrieval Approach

Similar to FitB approach. User is prompted to answer the same challenge

question they originally wrote an answer for. User's answers run through NLP engine, NNs

extracted. NNs used to search through all registered answer

documents, matching via bag-of-words count. Authenticated if match is above a specifieed

threshold.

Not Quite Right...

Cannot use regular bag-of-words approach. Source document and user-provided

answer document may diffeer too much.

Not backed by web of knowledge. Does not reveal private information.

Future Work

May benefiet from existing search engine technologies (ex. Lucene).

May benefiet from more data points to match.

Generating Questions from a Web of Knowledge (WOK) Approach

Now I Understand Why This is Still Unsolved

WOK Approach Overview

NLP engine extracts the NNs from the user's initial response.

User is prompted to provide more information for the NNs. Information stored in WOK.

Challenge questions generated from WOK. Answers compared to the information in

the WOK.

Making the WOK

Utilized Protege Popular java

library for OWL and RDF.

Information stored as OWL data models.

Generating the Questions

A random class is chosen from the WOK.

Queestion is generated using a property's label id and a template question.

User's response is matched against the property's value.

An Example

Property Value

#type Person

#fName Bob

#lName Ngo

#livesIn Minnesota

#relation Uncle

#name Bob Ngo

What is the [Blank] of your [Blank]?

What is the livesIn of your Bob Ngo?

What is the name of your Uncle?

What is the relation of your Minnesota?

Property Value

#type Location

#name Minnesota

#relation Bob's home

Why doesn't it work?

Queestion generation algorithm needs to be less naive. Generated questions are very impersonal.

Not really an improvement over current method. Creation of WOK is not automatic/semi-automatic. Expected answer must be an exact match. Greater invasion of privacy – has WOK. Signifiecant setup time.

Future Work

Queestion generation algorithm must be improved.

Incorporation of additional NLP technologies for a smarter WOK.

Ontology is the wrong technology?

Conclusion

FitB approach is the most ready for deployment.

Document retrieval approach evaluation incomplete.

WOK approach needs a lot more work.

Questions?

Date post:	31-Dec-2015
Category:	Documents
Upload:	wilfred-bruce
View:	216 times
Download:	2 times

Generating Password Challenge Questions Chuong Ngo.

Documents