Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé...

Post on 02-Jan-2016

216 views 0 download

Tags:

transcript

Moving Ahead: Creative Moving Ahead: Creative Feature Extraction and Feature Extraction and

Error Analysis TechniquesError Analysis Techniques

Carolyn Penstein RosCarolyn Penstein RosééCarnegie Mellon UniversityCarnegie Mellon University

Funded through the Pittsburgh Science of Learning Center and The Office of Naval Research, Cognitive and Neural Sciences Division

OutlineOutline

New Feature CreationNew Feature Creation Error AnalysisError Analysis

New Feature CreationNew Feature Creation

Why create new features?Why create new features?

You may want to generalize across sets of You may want to generalize across sets of related wordsrelated words Color = {red,yellow,orange,green,blue}Color = {red,yellow,orange,green,blue} Food = {cake,pizza,hamburger,steak,bread}Food = {cake,pizza,hamburger,steak,bread}

You may want to detect contingenciesYou may want to detect contingencies The text must mention both cake and The text must mention both cake and

presents in order to count as a birthday partypresents in order to count as a birthday party You may want to combine theseYou may want to combine these

The text must include a color and a foodThe text must include a color and a food

Why create new features by hand?Why create new features by hand?

More likely to capture meaningful More likely to capture meaningful generalizationsgeneralizations

Build in knowledge so you can get by with Build in knowledge so you can get by with less training dataless training data

Rule LanguageRule Language

ANY() is used to create listsANY() is used to create lists COLOR = ANY(red,yellow,green,blue,purple)COLOR = ANY(red,yellow,green,blue,purple) FOOD = ANY(cake,pizza,hamburger,steak,bread)FOOD = ANY(cake,pizza,hamburger,steak,bread)

ALL() is used to capture contingenciesALL() is used to capture contingencies ALL(cake,presents)ALL(cake,presents)

More complex rulesMore complex rules ALL(COLOR,FOOD)ALL(COLOR,FOOD)

Group Project: Group Project: Make a rule that will match against Make a rule that will match against

questions but not statementsquestions but not statements

Question Tell me what your favorite color is.

Statement I tell you my favorite color is blue.

Question Where do you live?

Statement I live where my family lives.

Question Which kinds of baked goods do you prefer

Statement I prefer to eat wheat bread.

Question Which courses should I take?

StatementYou should take my applied machine learning course.

Question Tell me when you get up in the morning.

Statement I get up early.

Possible RulePossible Rule

ANY(ALL(tell,me),BOL_WDT,BOL_WRB)ANY(ALL(tell,me),BOL_WDT,BOL_WRB)

Advanced Feature EditingAdvanced Feature Editing

* Click here

Types of Basic FeaturesTypes of Basic Features Primitive features Primitive features

inclulde unigrams, inclulde unigrams, bigrams, and POS bigrams, and POS bigramsbigrams

Types of Basic FeaturesTypes of Basic Features The Options change The Options change

which primitive features which primitive features show up in the Unigram, show up in the Unigram, Bigram, and POS bigram Bigram, and POS bigram listslists You can choose to remove You can choose to remove

stopwords or notstopwords or not You can choose whether or You can choose whether or

not to strip endings off not to strip endings off words with stemmingwords with stemming

You can choose how You can choose how frequently a feature must frequently a feature must appear in your data in appear in your data in order for it to show up in order for it to show up in your listsyour lists

Types of Basic FeaturesTypes of Basic Features

* Now let’s look at how to createnew features.

Creating New FeaturesCreating New Features

*The feature editor allows you to createnew feature definitions

* Click on + to add your new feature

Examining a New FeatureExamining a New Feature

•Right click on a feature toexamine where it matches inyour data

Examining a New FeatureExamining a New Feature

Error AnalysisError Analysis

Create an Error Analysis FileCreate an Error Analysis File

Use TagHelper to Code Uncoded Use TagHelper to Code Uncoded FileFile

•The output file containsthe codes TagHelperassigned.

•What you want to do now is to remove prediction column and insert the correct answers next tothe TagHelper assignedanswers.

Load Error Analysis FileLoad Error Analysis File

Load Error Analysis FileLoad Error Analysis File

Error Analysis StrategiesError Analysis Strategies

Look for large error cells in the confusion Look for large error cells in the confusion matrixmatrix

Locate the examples that correspond to Locate the examples that correspond to that cellthat cell

What features do those examples share?What features do those examples share? How are they different from the examples How are they different from the examples

that were classified correctly?that were classified correctly?

Group ProjectGroup Project

Load in the NewsGroupTrain.xls data setLoad in the NewsGroupTrain.xls data set What is the best performance you can get by playing What is the best performance you can get by playing

with the standard TagHelper tools feature options?with the standard TagHelper tools feature options? Train a model using the best settings and then Train a model using the best settings and then

use it to assign codes to NewsGroupTest.xlsuse it to assign codes to NewsGroupTest.xls Copy in Answer column from Copy in Answer column from

NewsGroupAnswers.xlsNewsGroupAnswers.xls Now do an error analysis to determine why Now do an error analysis to determine why

frequent mistakes are being madefrequent mistakes are being made How could you do better?How could you do better?