+ All Categories
Home > Documents > Welcome (back) to IST 380 !

Welcome (back) to IST 380 !

Date post: 05-Jan-2016
Category:
Upload: jacqui
View: 22 times
Download: 3 times
Share this document with a friend
Description:
Welcome (back) to IST 380 !. Today: the old and the new. modeling trends from Twitter data. the most traditional approach to modeling data. This picture may soon become part of the OLD, if trends continue…. Assignments…. Homework #1 is complete! (2/5). - PowerPoint PPT Presentation
Popular Tags:
83
Welcome (back) to IST 380 ! Today: the old and the new the most traditional approach to modeling data modeling trends from Twitter data This picture may soon become part of the OLD, if trends continue…
Transcript
Page 1: Welcome (back) to IST 380 !

Welcome (back) to IST 380 !

Today: the old and the new

the most traditional approach to modeling data

modeling trends from Twitter data

This picture may soon become part of

the OLD, if trends continue…

Page 2: Welcome (back) to IST 380 !

Assignments…

Homework #1 is complete! (2/5)

Getting started with R (tutorial + "quiz" + text)

Pr #1: text, Chapters 6-9

Pr #2: Monty Hall challenge

Pr #3: writing a predictive model by hand…

Homework #3 is due next Tuesday (2/20)

Things are heating up here!

Make sure you can submit to our submission site!

Homework #2 is due tomorrow (2/12)

Pr #1: text, Chapter 10

Pr #2: the envelope, please!

Pr #3: linear models for prediction

Zac & Suleng

Page 3: Welcome (back) to IST 380 !

The age of data?

I prefer my data well-aged!

Page 4: Welcome (back) to IST 380 !

R path!

Progra

mm

ing

Skills

Subject Expertise

2

… R's toolset and its capabilities…

data collection

descriptive vs. generative vs. predictive statistics

predictions using linear regression

I predict we'll get here, but not necessarily in a straight line!…

3

1

Page 5: Welcome (back) to IST 380 !

Tweet "diffs" for a certain hashtag…

Chapter 10 introduces access to Twitter data and statistical descriptions using these data

Descriptive statistics: Twitter data

packageslibrarylapplyorderdiff

Page 6: Welcome (back) to IST 380 !

Some R: library

Once you have installed these packages

packages:bitopsRcurl

RJSONIOtwitteR

later:UsingR

You can ensure they're present with

library(bitops)

Chapter 10 will have you write a function to automate this process…

and so on…

Caution! Some of these may have to be installed by hand…

What if I don't have hands?!

Page 7: Welcome (back) to IST 380 !

Some R: style…I have NO COMMENT about this function!

Page 8: Welcome (back) to IST 380 !

Some R: style…

better, but not ideal

Page 9: Welcome (back) to IST 380 !

Some R: style…

use variables to hold intermediate values!

Page 10: Welcome (back) to IST 380 !

Some R: lapply and vapplyClock in Bristol, UK

lapply(X, FUN, ...)

Allow you to apply a function to every element of a list or a vector:

vapply(X, FUN, FUN.VALUE ...)

> L <- list(8,9,10)> lapply( L, add1 )[[1]][1] 9

[[2]][1] 10

[[3]][1] 11

> V <- 8:10> vapply( V, add1, FUN.VALUE=42 )[1] 9 10 11

Page 11: Welcome (back) to IST 380 !

UTC?

since before the railroads…red minute hand: Bristol

black minute hand: London (Greenwich)

Clock in Bristol, UKcoordinated universal time

Page 12: Welcome (back) to IST 380 !

Looking at the data…

Page 13: Welcome (back) to IST 380 !

UTC?

can be plotted as-is

take differences via as.numeric

- so that "2013-02-11 20:55:03 UTC"

becomes 1360616103

Page 14: Welcome (back) to IST 380 !

Some R: order and diff

order returns a permutation of its input…

> V <- c(3,4,2,1)

> V[1] 3 4 2 1

> order(V)[1] 4 3 1 2

>

order(..., na.last = TRUE, decreasing = FALSE)

What do these numbers mean?

Page 15: Welcome (back) to IST 380 !

Some R: order and diff

order returns a permutation of its input…

> V <- c(3,4,2,1)

> V[1] 3 4 2 1

> order(V)[1] 4 3 1 2

> V[order(V)][1] 1 2 3 4

order(..., na.last = TRUE, decreasing = FALSE)

What do these numbers mean?

Why not just use sort?

You can, but this let's you order

anything in the same way!

diff ?

Page 16: Welcome (back) to IST 380 !

Comparing tags?

#losangeles#sanfransisco

Which is which?

Page 17: Welcome (back) to IST 380 !

Comparing tags?

#losangeles#sanfrancisco

Which is which?

Page 18: Welcome (back) to IST 380 !

Comparing tags...

#losangeles#sanfrancisco

Which is which?

Next week: we will

quantify these differences

more carefully…

Page 19: Welcome (back) to IST 380 !

Generative statistics rgeomrunifrnorm … samplereplicate

Chapter 7 reviews repeated sampling and the resulting distribution of means

distribution of samples of state populations

Page 20: Welcome (back) to IST 380 !

Generative statistics rgeomrunifrnorm … samplereplicate

Chapter 7 reviews repeated sampling and the resulting distribution of means

distribution of samples of state populations

Monte Carlo method: run

a process many times to

gain insights into it…

Page 21: Welcome (back) to IST 380 !

Both envelopes hold some positive amount of money (in a check or IOU), but one of these two envelopes holds twice as much money as the other.

Should you switch or stay?

Hw3 pr2: A second Monte Carlo example :

Page 22: Welcome (back) to IST 380 !

Both envelopes hold some positive amount of money (in a check or IOU), but one of these two envelopes holds twice as much money as the other.

Should you switch or stay?

Hw3 pr2: A second Monte Carlo example :

Switch!but, then, should you switch back?

Page 23: Welcome (back) to IST 380 !

Both envelopes hold some positive amount of money (in a check or IOU), but one of these two envelopes holds twice as much money as the other.

Should you switch or stay?

Hw3 pr2: A second Monte Carlo example :

This week ~ write a

function to model this

process…

Page 24: Welcome (back) to IST 380 !

Hw3 pr2

Write a Mystery Envelope function:

… that runs one envelope trial

Another to run it N times:

ME_once <- function( amount_found=1.0, sors="switch", verbose=TRUE)

ME_ntimes <- function( n=100 )

sample_ME <- function( run_me=100 )

… and returns the amount of $ "earned"

And another to run it N times:

Page 25: Welcome (back) to IST 380 !

Assignments…

Homework #1 is complete! (2/5)

Getting started with R (tutorial + "quiz" + text)

Pr #1: text, Chapters 6-9

Pr #2: Monty Hall challenge

Pr #3: writing a predictive model by hand…

Homework #3 is due next Tuesday (2/20)

Things are heating up here!

Make sure you can submit to our submission site!

Homework #2 is due tomorrow (2/12)

Pr #1: text, Chapter 10

Pr #2: the envelope, please!

Pr #3: linear models for prediction

Page 26: Welcome (back) to IST 380 !

Big Ideas:

Predictive modeling

Linear regression

The human role… !

Page 27: Welcome (back) to IST 380 !

So, what is Machine Learning?

The goal of machine learning also known as

predictive statistics/analytics,

is to find a function

that yields outputs for previously-unseen inputs…

function

passenger details

prediction: did the passenger

survive?

Page 28: Welcome (back) to IST 380 !

So, what is Machine Learning?

The goal of machine learning also known as

predictive statistics/analytics,

is to find a function

that yields outputs for previously-unseen inputs…

function

passenger details

prediction: did the passenger

survive?For Hw2, you are building

this function by hand.

Page 29: Welcome (back) to IST 380 !

R is for Regression!

The oldest and (still) most popular technique for

automatically generating a model from data.

problem 3 this week…

Page 30: Welcome (back) to IST 380 !

RegressionWhat is it?

Page 31: Welcome (back) to IST 380 !

Regression ~ predictive modeling

this week: making an assumption of linear dependence on the

inputs

Page 32: Welcome (back) to IST 380 !

But why is it called regression?

1877: "reversion" (peas)

1885: "regression" (people)

Page 33: Welcome (back) to IST 380 !
Page 34: Welcome (back) to IST 380 !
Page 35: Welcome (back) to IST 380 !
Page 36: Welcome (back) to IST 380 !
Page 37: Welcome (back) to IST 380 !
Page 38: Welcome (back) to IST 380 !
Page 39: Welcome (back) to IST 380 !
Page 40: Welcome (back) to IST 380 !
Page 41: Welcome (back) to IST 380 !
Page 42: Welcome (back) to IST 380 !
Page 43: Welcome (back) to IST 380 !
Page 44: Welcome (back) to IST 380 !
Page 45: Welcome (back) to IST 380 !

make this sum of squared errors (residuals) as

small as possible

Page 46: Welcome (back) to IST 380 !
Page 47: Welcome (back) to IST 380 !

Let's look at lm1

Page 48: Welcome (back) to IST 380 !
Page 49: Welcome (back) to IST 380 !
Page 50: Welcome (back) to IST 380 !
Page 51: Welcome (back) to IST 380 !
Page 52: Welcome (back) to IST 380 !
Page 53: Welcome (back) to IST 380 !
Page 54: Welcome (back) to IST 380 !
Page 55: Welcome (back) to IST 380 !
Page 56: Welcome (back) to IST 380 !
Page 57: Welcome (back) to IST 380 !
Page 58: Welcome (back) to IST 380 !

pr3 this week: temperatures…

Page 59: Welcome (back) to IST 380 !

Temperature anomalies

Page 60: Welcome (back) to IST 380 !

The data…

deviations from the 1950-1980 global average of 14°C ~ 57.2°F

averaged (worldwide) and presented in units of 0.01°C

Page 61: Welcome (back) to IST 380 !

Your task…

• follow an analysis plan similar to the Galton data in the previous slides

• fit a linear model to the yearly average data and to each month's average data

• use your model to predict what the average temperature will be for 2012 and 2013

• is the linear model a reasonable one?

• we'll check (or you can…) the prediction for 2012 (but not 2013, yet)

Page 62: Welcome (back) to IST 380 !

Try it!

Help is available either with hw#2 (Monty Hall and Titanic using R's functions)

or hw#3 (Twitter, envelopes, and temperatures)

this evening during lab time…

Good luck with everything this week!

Page 63: Welcome (back) to IST 380 !

Lab !

Page 64: Welcome (back) to IST 380 !

The Titanic

April 15, 1912

1502 out of the 2224 passengers

died in the sinking

What characteristics did the survivors share?

Page 65: Welcome (back) to IST 380 !

The Data

There are 742 rows and 11 columns in the training data.

here are the 11 columns

Page 66: Welcome (back) to IST 380 !

Our goal

… is to write a function that takes in a row of new data and outputs whether that passenger would survive (1) or not (0).

Page 67: Welcome (back) to IST 380 !

A first predictor

Page 68: Welcome (back) to IST 380 !

A second predictor

Does the data match the famous emergency cry?

Page 69: Welcome (back) to IST 380 !

Testing our functions…

Page 70: Welcome (back) to IST 380 !
Page 71: Welcome (back) to IST 380 !
Page 72: Welcome (back) to IST 380 !
Page 73: Welcome (back) to IST 380 !
Page 74: Welcome (back) to IST 380 !

CS vs. IS and IT ?

www.acm.org/education/curric_vols/CC2005_Final_Report2.pdf

greater integration system-wide issues

smaller details machine specifics

Page 75: Welcome (back) to IST 380 !

CS vs. IS and IT ?

Where will IS go?

Page 76: Welcome (back) to IST 380 !

CS vs. IS and IT ?

Page 77: Welcome (back) to IST 380 !

IT ?

Where will IT go?

Page 78: Welcome (back) to IST 380 !

IT ?

Page 79: Welcome (back) to IST 380 !

The bigger picture

Weeks 10-12

Objects

Week 10

Week 11

Week 12

Weeks 13-15

Final Projects

classes vs. objects

methods and data

inheritance

Week 13

Week 14

Week 15

final projects

final projects

final exam

Page 80: Welcome (back) to IST 380 !

Data?!• Neighbor's name

• A place they consider home

• Are they working at a company now?

• How many U.S. states have they visited?

• Their favorite unhealthy food… ?

• Do they have any "Data Science"

(statistics, machine learning, CS)

background?

Where?

Page 81: Welcome (back) to IST 380 !

state reminders…

Page 82: Welcome (back) to IST 380 !

Data! • Neighbor's name

• A place they consider home

• Are they working at a company now?

• How many U.S. states have they visited?

• Their favorite unhealthy food… ?

• Do they have any "Data Science"

(statistics, machine learning, CS)

background?

Zachary Dodds

Pittsburgh, PA

Harvey MuddWhere?

44

mostly CS for me…

M&Ms

Page 83: Welcome (back) to IST 380 !

Data! • Neighbor's name

• A place they consider home

• Are they working at a company now?

• How many U.S. states have they visited?

• Their favorite unhealthy food… ?

• Do they have any "Data Science"

(statistics, machine learning, CS)

background?

Zachary Dodds

Pittsburgh, PA

Harvey MuddWhere?

44

mostly CS for me…

M&Ms

be sure to set up your login + profile for the submission site…

This class is truly seminar-style:

we're devloping expertise in this field together.


Recommended