Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | wilfred-bruce |
View: | 216 times |
Download: | 2 times |
Generating Password Challenge Questions
Chuong Ngo
Online Services and the Problem of Account Security
E-commerce, banking, e-mail, etc� Average: 26 diffeerent online accounts
5 unique passwords 25 to 30: 40+ accounts
2012 online fraud cases: 3x 2010 case count 90% of accounts require user id and password
Passwords need to be strong and unique
Passwords: So secure you can't remember it?
Memorability vs security - negative correlation
Password recovery systems a must SMS E-mail Snail mail Challenge questions
Why Challenge Question?
User must answer agreed upon questions to validate identity
Most commonly used systemAdvantages Disadvantages
Resilient to opportunistic attacks Quality question pool
Can be automated/little training required Susceptible to targeted attacks
On-demand/quick turn-around time Invasion of privacy
False rejection
Just How Safe and Secure?
System is weak & exploitable Answers easy to obtain/public domain
Social media 12% answerable with social media info
Applicability & repeatability
Can It be Salvaged?
Treat challenge questions like passwords Must value memorability Avoid too many “easy” answers
Large pool of challenge questions What if the questions were targeted and
personal?
Targeted Challenge Questions
Applicability and repeatability negligible More personal, more secure & memorable Greater answer variety from long-form answers
Make a system the uses or generates challenge questions that target the user's strong, personal memories.
System Concept
Current System (Simplified) Concept System
Prompt user to select questions Prompt user with general question
Capture user responses Capture user responses (long-form)
Store user responses Run responses through NLP engine
Store responses and extracted entities
Current System (Simplified) Concept System
Query for challenge question Query for stored response or web of data
Capture user response Modify stored response/generate challenge question
Compare response against stored responses
Capture user response
Run response through NLP engine
Compare response to stored response
Data Ingest
Data Retrieval
The Natural Language Processing Engineat the Heart of it All
The NLP Engine
Uses Stanford CoreNLP Pipeline includes:
Tokenizer Sentence Splitteer PoS Tagger Morpha Annotator NER Parsing Coreferencer
Notable Pipeline Absences
No sentiment analyzer Requires training for individuals No real advantage
No relationship analyzer Beyond scope
Limited use of the coreferencer and dependency tree. Focused on named noun entities (NN) to simplify
implementation.
Fill-in-the-Blanks (FitB) ApproachA First Step
FitB Approach Overview
Challenge question is open-ended and general. User provides a long-form response.
Presents user with the modifieed answer to the challenge question. User must “fiell-in-the-blanks”/correct the mistakes.
Authentication done by comparing user's responses to the missing entities. Match must meet or exceed a threshold.
An Example
Bob is a great uncle. He loved to fiesh and would do so as ofteen as he could near his home in Minnesota. He taught me to fiesh over the summer that I stayed with him. Everyday, we would go to a nearby stream. Thee stream would later feed into the Mississippi River.
[Blank] is a great uncle. He loved to fiesh and would do so as ofteen as he could near his home in [Blank]. He taught me to fiesh over the summer that I stayed with him. Everyday, we would go to a nearby stream. Thee stream would later feed into the [Blank] River.
Why does it work?
It is a single story. Multiple NNs related to the same idea.
It is memorable. Prompt helps to kick start memory.
Simple and fast Does not overly burden the user. Avoids the problem of question generation. Easily extensible
No web of knowledge – preserves privacy.
Where does it fall short?
Potentially low entropy in question pool. Queestion is not generated.
No web of knowledge – no context. Unable to correlate multiple stored user responses.
Dependent on large number of NNs. Needs clean, non-noisy input. Token matches does not tolerate much deviation. Some private information may be leaked. Unable to be integrated into other sources of information. Signifiecant setup time.
Future Work
Diffeerent user interfaces Example: pictures
Incorporate additional processors Example: relationship analyzer
Increase the number of data points to match.
Document Retrieval ApproachA Slight Twist
Document Retrieval Approach
Similar to FitB approach. User is prompted to answer the same challenge
question they originally wrote an answer for. User's answers run through NLP engine, NNs
extracted. NNs used to search through all registered answer
documents, matching via bag-of-words count. Authenticated if match is above a specifieed
threshold.
Not Quite Right...
Cannot use regular bag-of-words approach. Source document and user-provided
answer document may diffeer too much.
Not backed by web of knowledge. Does not reveal private information.
Future Work
May benefiet from existing search engine technologies (ex. Lucene).
May benefiet from more data points to match.
Generating Questions from a Web of Knowledge (WOK) Approach
Now I Understand Why This is Still Unsolved
WOK Approach Overview
NLP engine extracts the NNs from the user's initial response.
User is prompted to provide more information for the NNs. Information stored in WOK.
Challenge questions generated from WOK. Answers compared to the information in
the WOK.
Making the WOK
Utilized Protege Popular java
library for OWL and RDF.
Information stored as OWL data models.
Generating the Questions
A random class is chosen from the WOK.
Queestion is generated using a property's label id and a template question.
User's response is matched against the property's value.
An Example
Property Value
#type Person
#fName Bob
#lName Ngo
#livesIn Minnesota
#relation Uncle
#name Bob Ngo
What is the [Blank] of your [Blank]?
What is the livesIn of your Bob Ngo?
What is the name of your Uncle?
What is the relation of your Minnesota?
Property Value
#type Location
#name Minnesota
#relation Bob's home
Why doesn't it work?
Queestion generation algorithm needs to be less naive. Generated questions are very impersonal.
Not really an improvement over current method. Creation of WOK is not automatic/semi-automatic. Expected answer must be an exact match. Greater invasion of privacy – has WOK. Signifiecant setup time.
Future Work
Queestion generation algorithm must be improved.
Incorporation of additional NLP technologies for a smarter WOK.
Ontology is the wrong technology?
Conclusion
FitB approach is the most ready for deployment.
Document retrieval approach evaluation incomplete.
WOK approach needs a lot more work.
Questions?