Date post: | 19-Jan-2015 |
Category: |
Economy & Finance |
Author: | cs-ncstate |
View: | 1,018 times |
Download: | 0 times |
Data Mining, Truth, Justice, the American Way,and the Flying Spaghetti Monster
[email protected] Ph.D.LCSEE, WVU, 20 Sept 2007
2
Expose, and hose
• "Part of education is toexpose people to differentschools of thought.”
- President George Bush,August 1, 2005
• "Part of science is toexpose people to thecritical and continual(re)evaluation of ideas.”- Some guy called Timm,
September 20, 2007
3
"Look up in the sky! It's a bird! It's aplane! It's Superman!"
"Yes, it's Superman, strange visitor fromanother planet who came to Earth withpowers and abilities far beyond those ofmortal men.”
“Superman, who can change the course ofmighty rivers, bend steel in his bare hands;and who, disguised as Clark Kent, mild-mannered reporter for a great metropolitannewspaper, fights a never ending battle fortruth, justice, and the American way."
Why a never-ending battle? How to ensure
justice?How to find truth? How to make lottsa $$ ?
4
So, tonight Notions of certainty
Standards for debate Surprises
Nothing is “truth” but many more things are false
And some things are useful Implications for humility
And for justice
5
God gave me a brain. I take it (s)he wants me to use it.
Mark of the rational while not dead; do
Review and revise assumptions; Done
Entertain a wide range of ideas But don’t necessarily accept them
Demand evidence that lets your repeat/ refute/ improve
prior conclusions
But what of faith? That, is another talk There is room for the
divine in my universe But in my test tubes?
Not too much
6
Data miners: agents that automate thecreation and review of new ideas
@relation [email protected] outlook {sunny, overcast, rainy}@attribute temperature {hot, mild, cool}@attribute humidity {high, normal}@attribute windy {TRUE, FALSE}@attribute play {yes, no}@data
sunny,hot,high,FALSE,nosunny,hot,high,TRUE,noovercast,hot,high,FALSE,yesrainy,mild,high,FALSE,yesrainy,cool,normal,FALSE,yesrainy,cool,normal,TRUE,noovercast,cool,normal,TRUE,yessunny,mild,high,FALSE,nosunny,cool,normal,FALSE,yesrainy,mild,normal,FALSE,yessunny,mild,normal,TRUE,yesovercast,mild,high,TRUE,yesovercast,hot,normal,FALSE,yesrainy,mild,high,TRUE,no
outlook = sunny | humidity = high: no | humidity = normal: yes outlook = overcast: yes outlook = rainy | windy = TRUE: no | windy = FALSE: yes
Mountains of data
Tablespoons ofknowledge
7
Data doubling every 20 months Internet, Radio Frequency Identification (RFID) tracking, on-line
shopping (patterns of sales tracked at Amazon)
So now we can automatically learn answers to many questions; e.g. What eggs to select for IVF? What will software cost to develop? What diseases does a patient have? Which loan applications to fund? What houses will have the best resale value? Which parts of the program need more inspection? What products are best to sell to what markets? What cows to keep and which to send to the abattoir ? How to teach a satellite to distinguish between cloud shadows and oil
spills? How much electricity will be needed in two hours
i.e. what cola-powered generators to fire up?
8
More fundamentally, what can we sayabout the world, with any certainty?
Same data, different data miners different conclusions
Every miner biased by Evaluation bias Language
What is the “shape” of themodels we can learn?
Decision trees, equations, etc Search
Pruning the possible infinitespace of of candidate models
What not to explore Over-fitting avoidance
How to stop the learner fixating on noise E.g. pruning back decision trees
9
• Bias lets us ignore “stuff”.
• Without it, we don’t knowwhat is important or dull, wecan’t summarize, generalize.
• Without bias, we can’tlearn from the past
• Bias blinds us butlets us see the future
• But changing biases changes whatwe best believe
• No wonder truth is anever-ending battle
Any learning schemehas many biases
10
Generalizing fromthe past, works
Sometimes, very clearly Heavy smokers have
2000% to 3000%higher change of lungcancer
Learned theoriesperforms very well onnew data
But ... the “best” learned theory
can be a moveable feast.
11
So, a relativistic soup?
No certainty? No way to plan effective actions? No way to rule out absurd notions?
12
I don’t want to offendany one, but…
… I think that once … there were no cell phones or iPods, or clothes, or
countries, or language, orhuman society, or 4-valvedhearts, or homeostasis, ororgans, or brains, or planets,or stars, or matter
Where the net energyin-flow is positive… the universe selects for self-
perpetuating systems, an exponentially decreasing
number of which are ofexponentially increasingcomplexity
Should I even say this in apublic place? "Part of education is to expose
people to different schools ofthought.”
President George Bush,August 1, 2005
Shouldn’t I be have to givecredence to all theories?
Evolution, Intelligent design Pirates cause global
warming?
13
The Church of the FlyingSpaghetti Monster (FSM)
Founded in 2005 OSU physics graduate Bobby Henderson
A protest against the decision by the Kansas State Board of Education That require the teaching of intelligent design as an alternative to biological evolution.
Henderson wrote to the board professing belief in a supernatural Creator called the Flying Spaghetti Monster Demanded that his "Pastafarian" theory of creation be taught in science classrooms.
14
FSM is not about religion
It is a mistake to view FSM as anti-religion Rather, FSM is anti-anti-scientific rigor
No one in their right mind would everbelieve this nonsense And that’s the point
Truth is a never-ending battle We must have standards to assess scientific
theories, to reject absurdities Or any nonsense can be released on this world
E.g. “Global warming is caused by pirates.”
15
Wikipedia on FSM FSM: an invisible, undetectable
Flying Spaghetti Monster
Evidence for evolution planted byFSM to in to Pastafarians' faith
FSM changes the results ofmeasurements, like radiocarbondating, via His Noodly Appendage.
Heaven contains beer volcanoesand a stripper factory.
Hell is similar, but with stale beerand diseased strippers.
Pirates are "absolute divinebeings" and the originalPastafarians.
Their image as "thieves andoutcasts" is misinformation spreadby Christian theologians in theMiddle Ages and Hare Krishnas.
Pirates are "peace-lovingexplorers and spreaders of goodwill" who distributed candy tosmall children.
Global warming, earthquakes,hurricanes, and other naturaldisasters are a direct effect of theshrinking numbers of pirates sincethe 1800s.
16
FSM “proof” of the divinity of pirates
X-axis deliberatelymisleading.
A case study on hownot to present data
Crazy? Yes! • But would you recognize such craziness if you say it again?
17
What is the “best” weight-loss diet?
How lucky for those in powerthat people don't think.
- Adolph Hitler
i.e. people trying tosell you their diet book
19
What is the “best”programming language?
20
To our peril, we trustold ideas too much
Columbia ice strike: Size: 1200 in3, Speed: 477 mph
(relative to vehicle)
Certified as “safe” by theCRATER micro-meteorite model A typical experiment in
CRATER’s test database Size: 3 in3 piece of debris Speed: under 150 mph.
21
Value of estrogen
(NYT magazine,Sept 16, 2007)
1990s: American Heart Association
recommends hormone replacementtherapy for older women to ward offheart disease and osteoporosis.
2001: 15 million Americans filling H.R.T.
prescriptions annually 2002:
estrogen therapy exposed as a hazard,not a benefit, for health
Failure of scientific method Benefits of estrogen reported from large
observational studies, not randomized trials Repeated epidemiological finding:
randomized trail rarely support conclusionsfrom observational studies.
So forget what you’re read about Anti-oxidants like vitamins E & C &beta
carotene preventing heat disease Fiber prevents colon cancer
22
So, why is FSM silly? And please, rest assured,
it is very very silly stuff indeed.
Theories need an entrance exam
Many possible theories one for each bias
Demand that a theory has past at leastsome operational al test before wecondone it, act on it. If no reason to accept the new, don’t
Trust the most what has beenchallenged the most Karl Popper
23
No things are “right”, but somethings are “useful”
Sure, one data set supports many theories. But there are many many more theories that are
unsupported. No model is right, but some things are useful
(perform well on test data) George Box
And many many many more ideas are useless Can’t make predictions Not defined enough to support (possible) refutation
24
Wolfgang Pauli The "conscience of physics",
the critic to whom his colleagues were accountable. Scathing in his dismissal of poor theories
often labeling it ganz falsch, utterly false. But “ganz falsch” was not his most severe
criticism, He hated theories so unclearly presented as to be
untestable unevaluatable,
Worse than wrong because they could not beproven wrong.
Not properly belonging within the realm of science, even though posing as such.
Famously, he wrote of of such unclear paper: ”This paper is right. It is not even wrong."
Believe those who seek the truth;doubt those who find it
-Andre Gide.
26
Don’t test once on just the training data
Study more than theaverageperformance
Also look at thevariance
E.g. here, nosignificant on newdata after X=8
27
If something works, poke it till it breaksi) Sort attributes on “infogain”ii) Learn using first N attributes
diabetes
labor soybean
anneal
A few variables A few variables are (often) enoughare (often) enough
28
Living with Uncertainty Check how training rate size effects theory
29
Living with Uncertainty Launch learners with anomaly
detection and repair tools
30
Living with uncertainty:count, alert, fix
Count: stuff seen in pastAlert: if new counts differentFix: find delta new to old Very, very fast
An incrementaldiscretizer + a Bayesclassifier where all inputsare all mono-classified
Track average maxlikelihood for dataprocessing in “era”’s of Xinstances
Contrast set learning
Linear time inference,Tiny memory footprint
And, it works [Orrego, 2004] F15 simulator data [courtesy B. Cukic] Five flights: a,b,c,d,e each with different off-nominal condition
imposed at “time” 15 Off-nominal condition not present in prior data In all cases,
massive change detected
31
Living with uncertainty Policy #1: exploration
Tolerate the sub-optimal, a little Doing crazy things to learn new things
Policy #2: exploitation Fix your theories and base your work on those fixed ideas.
Human young:
• Do crazy things (take long trips)
• Less craziness as we grow older
Popper:
• most “science” is puzzle solving…
• … within existing paradigms.
• Sometimes the paradigm breakdowns….
• …prompting revolutionary research
Life is a balancebetween
32
Tolerance of “exploration” Critical to the
American way America: history of
tolerance and acceptance
1945: 400 German rocket
scientists choose tosurrender to the Yankees,not the Russians
The choose their post-warlife based on theirperceptions of Americanideology
Hence,
33
Tolerance = hi-tech = $$$ R. Florida: The Economic
Geography of Talent, 2002 Annals of Association of American
Geographers 92(4), 2002,pp743-655
Best predictor for hi-tech industry R2 0.42 to “coolness” R2 0.49 to cultural amenities R2 0.50 to median house value R2 0.77 to “diversity” index
34
Data Mining, Truth, Justice, theAmerican Way & Flying Spaghetti Monsters
“Superman, fights a never ending battlefor truth, justice, and the American way."
Old conclusions must be constantly re-assessed
No “truth”,all Is biased.
A healthy hi-tech needstolerance to supportexploration
and that the FSM is silly,but would consider revising
that view if new evidenceemerges
To make $$, institutionalize
exploration and tolerance
35
Expose, and hose
• "Part of education is toexpose people to differentschools of thought.”
- President George Bush,August 1, 2005
• "Part of science is toexpose people to thecritical and continual(re)evaluation of ideas.”- Some guy called Timm,
September 20, 2007