Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | ami-stokes |
View: | 230 times |
Download: | 0 times |
What is data mining?
• Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information.
• In my lab, we tend to look at data and problems that no one else looks at.
Data Mining People
• Eamonn Keogh• Vagelis Hristidis• Vassilis Tsotras
• Chinya Ravishankar• Michael Pazzani• Christian Shelton (AI)• Stefano Lonardi (Bioinformatics)
My PhD Students• Jessica Lin (Ph.d 2005: George Mason University)• Chotirat (Ann) Ratanamahatana (Ph.d 2005: Chulalongkorn University)• Li Wei (Ph.d 2006, Google)• Xiaopeng Xi (Ph.d 2007, Yahoo)• Dragomir Yankov. (Ph.d 2008, Yahoo)• Lexiang Ye (Ph.d 2010 Google)• Xiaoyue (Elaine) Wang (Ph.d 2010 Nokia)• Jin-Wien Shieh (Ph.d 2010 Microsoft)• Qiang Zhu (Ph.d 2011 stumbleupon.com)• Abdullah Mueen (Ph.d 2012 Microsoft)
• Bilson Campana (Ph.d going to Google at Xmas)• Thanawin (Art) Rakthanmanon (Ph.d ongoing)• Bing Hu (Ph.d ongoing)• Yuan Hao (Ph.d ongoing)• Jesin Zakaria (Ph.d ongoing)• Yipeng Chen (Ph.d ongoing)
false nettles
stinging nettles
false nettles
false nettles
Shapelet
stinging nettlesfalse nettles stinging nettles
Leaf Decision Tree
Shapelet Dictionary
5.1
yes no
I
I
0 1
Decision Tree for Arrowheads
11.24
85.47
Shapelet Dictionary
(Clovis)
(Avonlea)
I
II
0 100 200 300 400
00.51.01.5
Arrowhead Decision Tree
I
21
II
0
Clovis Avonlea
Avonlea Clovis Mix
Training data (subset)
Of course, this is a decision tree, we want to eventually do clustering. However, in general, features that are good for classification, are good for clustering.
To do: On a small labeled subset of data, learn a dictionary of shaplets. Code the large unlabeled dataset with reference to that dictionary.
The shapelet decision tree classifier achieves an accuracy of 80.0%, the accuracy of rotation invariant one-nearest-neighbor classifier is 68.0%.
There now exists, perhaps tens of million of digitized pages of historical manuscripts dating back to the 12th century, that feature one or more heraldic shields
The images are often stained, faded or torn
Wouldn’t it be great if we could automatically hyperlink all similar shields to each other?
For example, here we could link two occurrence of the Von Sax family shield.
To do this, we need to consider shape, color and texture. Lets just consider shape for now…
Manesse Codexan illuminated manuscript
in codex form, copied and illustrated between 1304 and 1340
in Zurich
Indexing and Mining Rock Art
Rock art is found on every continent except Antarctica.
To date, computer science has had little impact on analysis of rock art.
A decade ago, Walt et al. summed up the state of petroglyph research by noting, “Complete-site and cross-site research thus remains impossible, incomplete, or impressionistic”
Australia may have 100 million examples
Atlatls
Anthropomorphs
Bighorn Sheep
One challenge is designing distance measures.
For example, we would like
to find and similar,
even though one is solid and
one is hollow. *Zhu, Wang, Keogh, Lee (2009). Augmenting the Generalized Hough Transform to Enable the Mining of Petroglyphs. SIGKDD 2009
If we assume that we have high quality binary images of rock art, then we can do clustering, classification, indexing motif discovery.
Apple maggots cause two types of injury: dimpling and tunneling. Dimpling occurs around the site where eggs are
laid, causing the flesh to stop growing, resulting in a sunken, misshapen, dimpled area. Tunneling, done by the
larvae (maggots) eating in the fruit, causes the pulp to break down, discolor, and start to rot. The tunnels are often enlarged by bacterial decay. Damaged fruit eventually
becomes soft and rotten and cannot be used.
Apple Maggot
Rhagoletis pomonella
Carbaryl is an insecticide that is widely used agriculturally. Effective, but likely a human carcinogen, and it kills honey bees and other pollinators [1].
[1] http://npic.orst.edu/factsheets/carbgen.pdf[2] http://www.maine.gov/agriculture/pesticides/gotpests/bugs/factsheets/apple-maggot-cornell.pdf
One Example Crop/Insect
Why Insects Matter IBecause they eat/destroy $40 billion+ worth of food each year
Surround WP Crop Protectant against insects. Derived from Kaolin clay, a
natural mineral it forms a barrier that acts to control insect pests.
Effective & safe, but very expensive
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5x 104-0.2
-0.1
0
0.1
0.2
One second of audio from our sensor. The Common Eastern Bumble Bee
(Bombus impatiens) takes about one tenth of a second to pass the laser.
Background noise Bee begins to cross laser Bee has past though the laser
Our Sensor
100 200 300 400 500 600 700 800
Frequency (Hz)
Bombusimpatiens
Culexquinquefasciatu Aedes aegypti
0 100 200 300 400 500 600 700 800 900 1000Frequency (Hz)
Peak at 705 Hz
Almost certainly a Aedes aegypti
Eamonn Keogh Computer Science &
Engineering Department
University of California – Riverside