+ All Categories
Home > Documents > Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news...

Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news...

Date post: 25-May-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
20
DataCamp Natural Language Processing Fundamentals in Python Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan
Transcript
Page 1: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

ClassifyingfakenewsusingsupervisedlearningwithNLP

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

KatharineJarmulFounder,kjamistan

Page 2: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

Whatissupervisedlearning?Formofmachinelearning

ProblemhaspredefinedtrainingdataThisdatahasalabel(oroutcome)youwantthemodeltolearnClassificationproblemGoal:Makegoodhypothesesaboutthespeciesbasedongeometricfeatures

SepalLength SepalWidth PetalLength PetalWidth Species

5.1 3.5 1.4 0.2 I.setosa

7.0 3.2 4.77 1.4 I.versicolor

6.3 3.3 6.0 2.5 I.virginica

Page 3: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

SupervisedlearningwithNLPNeedtouselanguageinsteadofgeometricfeaturesscikit-learn:Powerfulopen-sourcelibrary

Howtocreatesupervisedlearningdatafromtext?Usebag-of-wordsmodelsortf-idfasfeatures

Page 4: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

IMDBMovieDatasetPlot Sci-Fi Action

Inapost-apocalypticworldinhumandecay,a... 1 0

Moheiisawanderingswordsman.Hearrivesin... 0 1

#137isaSCI/FIthrilleraboutagirl,Marla,... 1 0

Goal:PredictmoviegenrebasedonplotsummaryCategoricalfeaturesgeneratedusingpreprocessing

Page 5: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

SupervisedlearningstepsCollectandpreprocessourdataDeterminealabel(Example:Moviegenre)SplitdataintotrainingandtestsetsExtractfeaturesfromthetexttohelppredictthelabel

Bag-of-wordsvectorbuiltintoscikit-learn

Evaluatetrainedmodelusingthetestset

Page 6: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

Let'spractice!

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

Page 7: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

Buildingwordcountvectorswithscikit-

learn

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

KatharineJarmulFounder,kjamistan

Page 8: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

PredictingmoviegenreDatasetconsistingofmovieplotsandcorrespondinggenreGoal:Createbag-of-wordvectorsforthemovieplots

Canwepredictgenrebasedonthewordsusedintheplotsummary?

Page 9: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

CountVectorizerwithPythonIn[1]:importpandasaspd

In[2]:fromsklearn.model_selectionimporttrain_test_split

In[3}:fromsklearn.feature_extraction.textimportCountVectorizer

In[4]:df=...#LoaddataintoDataFrame

In[5]:y=df['Sci-Fi']

In[6]:X_train,X_test,y_train,y_test=train_test_split(df['plot'],y,test_size=0.33,random_state=53)

In[7]:count_vectorizer=CountVectorizer(stop_words='english')

In[8]:count_train=count_vectorizer.fit_transform(X_train.values)

In[9]:count_test=count_vectorizer.transform(X_test.values)

Page 10: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

Let'spractice!

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

Page 11: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

Trainingandtestingaclassificationmodelwithscikit-learn

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

KatharineJarmulFounder,kjamistan

Page 12: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

NaiveBayesclassifierNaiveBayesModel

CommonlyusedfortestingNLPclassificationproblemsBasisinprobability

Givenaparticularpieceofdata,howlikelyisaparticularoutcome?Examples:

Iftheplothasaspaceship,howlikelyisittobesci-fi?Givenaspaceshipandanalien,howlikelynowisitsci-fi?

EachwordfromCountVectorizeractsasafeature

NaiveBayes:Simpleandeffective

Page 13: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

NaiveBayeswithscikit-learnIn[10]:fromsklearn.naive_bayesimportMultinomialNB

In[11]:fromsklearnimportmetrics

In[12]:nb_classifier=MultinomialNB()

In[13]:nb_classifier.fit(count_train,y_train)

In[14]:pred=nb_classifier.predict(count_test)

In[15]:metrics.accuracy_score(y_test,pred)Out[15]:0.85841849389820424

Page 14: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

ConfusionMatrix

Action Sci-Fi

Action 6410 563

Sci-Fi 864 2242

In[16]:metrics.confusion_matrix(y_test,pred,labels=[0,1])Out[16]:array([[6410,563],[864,2242]])

Page 15: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

Let'spractice!

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

Page 16: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

SimpleNLP,ComplexProblems

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

KatharineJarmulFounder,kjamistan

Page 17: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

Translation

(source: )https://twitter.com/Lupintweets/status/865533182455685121

Page 18: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

SentimentAnalysis

(source: )https://nlp.stanford.edu/projects/socialsent/

Page 19: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

LanguageBiases

(relatedtalk: )https://www.youtube.com/watch?v=j7FwpZB1hWc

Page 20: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

Let'spractice!

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON


Recommended