Post on 02-Jan-2016
transcript
Moving Ahead: Creative Moving Ahead: Creative Feature Extraction and Feature Extraction and
Error Analysis TechniquesError Analysis Techniques
Carolyn Penstein RosCarolyn Penstein RosééCarnegie Mellon UniversityCarnegie Mellon University
Funded through the Pittsburgh Science of Learning Center and The Office of Naval Research, Cognitive and Neural Sciences Division
OutlineOutline
New Feature CreationNew Feature Creation Error AnalysisError Analysis
New Feature CreationNew Feature Creation
Why create new features?Why create new features?
You may want to generalize across sets of You may want to generalize across sets of related wordsrelated words Color = {red,yellow,orange,green,blue}Color = {red,yellow,orange,green,blue} Food = {cake,pizza,hamburger,steak,bread}Food = {cake,pizza,hamburger,steak,bread}
You may want to detect contingenciesYou may want to detect contingencies The text must mention both cake and The text must mention both cake and
presents in order to count as a birthday partypresents in order to count as a birthday party You may want to combine theseYou may want to combine these
The text must include a color and a foodThe text must include a color and a food
Why create new features by hand?Why create new features by hand?
More likely to capture meaningful More likely to capture meaningful generalizationsgeneralizations
Build in knowledge so you can get by with Build in knowledge so you can get by with less training dataless training data
Rule LanguageRule Language
ANY() is used to create listsANY() is used to create lists COLOR = ANY(red,yellow,green,blue,purple)COLOR = ANY(red,yellow,green,blue,purple) FOOD = ANY(cake,pizza,hamburger,steak,bread)FOOD = ANY(cake,pizza,hamburger,steak,bread)
ALL() is used to capture contingenciesALL() is used to capture contingencies ALL(cake,presents)ALL(cake,presents)
More complex rulesMore complex rules ALL(COLOR,FOOD)ALL(COLOR,FOOD)
Group Project: Group Project: Make a rule that will match against Make a rule that will match against
questions but not statementsquestions but not statements
Question Tell me what your favorite color is.
Statement I tell you my favorite color is blue.
Question Where do you live?
Statement I live where my family lives.
Question Which kinds of baked goods do you prefer
Statement I prefer to eat wheat bread.
Question Which courses should I take?
StatementYou should take my applied machine learning course.
Question Tell me when you get up in the morning.
Statement I get up early.
Possible RulePossible Rule
ANY(ALL(tell,me),BOL_WDT,BOL_WRB)ANY(ALL(tell,me),BOL_WDT,BOL_WRB)
Advanced Feature EditingAdvanced Feature Editing
* Click here
Types of Basic FeaturesTypes of Basic Features Primitive features Primitive features
inclulde unigrams, inclulde unigrams, bigrams, and POS bigrams, and POS bigramsbigrams
Types of Basic FeaturesTypes of Basic Features The Options change The Options change
which primitive features which primitive features show up in the Unigram, show up in the Unigram, Bigram, and POS bigram Bigram, and POS bigram listslists You can choose to remove You can choose to remove
stopwords or notstopwords or not You can choose whether or You can choose whether or
not to strip endings off not to strip endings off words with stemmingwords with stemming
You can choose how You can choose how frequently a feature must frequently a feature must appear in your data in appear in your data in order for it to show up in order for it to show up in your listsyour lists
Types of Basic FeaturesTypes of Basic Features
* Now let’s look at how to createnew features.
Creating New FeaturesCreating New Features
*The feature editor allows you to createnew feature definitions
* Click on + to add your new feature
Examining a New FeatureExamining a New Feature
•Right click on a feature toexamine where it matches inyour data
Examining a New FeatureExamining a New Feature
Error AnalysisError Analysis
Create an Error Analysis FileCreate an Error Analysis File
Use TagHelper to Code Uncoded Use TagHelper to Code Uncoded FileFile
•The output file containsthe codes TagHelperassigned.
•What you want to do now is to remove prediction column and insert the correct answers next tothe TagHelper assignedanswers.
Load Error Analysis FileLoad Error Analysis File
Load Error Analysis FileLoad Error Analysis File
Error Analysis StrategiesError Analysis Strategies
Look for large error cells in the confusion Look for large error cells in the confusion matrixmatrix
Locate the examples that correspond to Locate the examples that correspond to that cellthat cell
What features do those examples share?What features do those examples share? How are they different from the examples How are they different from the examples
that were classified correctly?that were classified correctly?
Group ProjectGroup Project
Load in the NewsGroupTrain.xls data setLoad in the NewsGroupTrain.xls data set What is the best performance you can get by playing What is the best performance you can get by playing
with the standard TagHelper tools feature options?with the standard TagHelper tools feature options? Train a model using the best settings and then Train a model using the best settings and then
use it to assign codes to NewsGroupTest.xlsuse it to assign codes to NewsGroupTest.xls Copy in Answer column from Copy in Answer column from
NewsGroupAnswers.xlsNewsGroupAnswers.xls Now do an error analysis to determine why Now do an error analysis to determine why
frequent mistakes are being madefrequent mistakes are being made How could you do better?How could you do better?