Wine Informatics

Post on 11-Jan-2016

35 views 0 download

Tags:

description

Wine Informatics. Dr. Bernard Chen Ph.D. University of Central Arkansas. Data science. Data science is the study that incorporates varying techniques and theories from distinct fields, such as Data Mining, Scientific Methods, Math and Statistics, Visualization , - PowerPoint PPT Presentation

transcript

Wine Informatics

Dr. Bernard Chen Ph.D.University of Central Arkansas

Data science Data science is the study that incorporates varying

techniques and theories from distinct fields, such as Data Mining, Scientific Methods, Math and Statistics, Visualization, natural language processing, and the Domain Knowledge, to discover useful information from domain-related

data.

Domain Knowledge in Wine The quality of the wine is usually

assured by the wine certification, which is generally assessed by Physicochemical, and sensory tests

The existing data mining researches focus on the physicochemical laboratory tests much more than sensory tests.

Domain Knowledge in Wine it is very interesting to mine useful information

from those sensory testing notes for answering the questions such as

“What makes wine become a 90+ one?”, “What is the common characteristics shared by 90+

Napa Cabernet sauvignon?”, “What are the group of the wine share similarities?”, “What are the characteristics differ the wine from

France and Italy?”

Domain Knowledge in Wine The key to the success of the wine sensory

related data science research relays on the consistent reviews from prestigious experts.

Several popular wine magazines provide widely accepted sensory reviews toward wines produced every year, such as Wine Spectator [13], Wine Advocate [14], Decanter [15]

Wine Spectator Review Example Kosta Browne Pinot Noir Sonoma Coast 2009 Ripe and deeply flavored, concentrated and

well-structured, this full-bodied red offers a complex mix of black cherry, wild berry and raspberry fruit that's pure and persistent, ending with a pebbly note and firm tannins. Drink now through 2018. 5,818 cases made.

Wine Spectator

Our first dataset is compiled from the list of “Top 100 Wines of 2011” [16] by Wine Spectator, a lifestyle magazine that focuses on wine and wine culture.

Their reviews are straight and to the point.

Review Example Kosta Browne Pinot Noir Sonoma Coast 2009 Ripe and deeply flavored, concentrated and

well-structured, this full-bodied red offers a complex mix of black cherry, wild berry and raspberry fruit that's pure and persistent, ending with a pebbly note and firm tannins. Drink now through 2018. 5,818 cases made.

Ann C. Noble’s Wine Aroma Wheel

Our own wine wheel Based on “Top 100 wines in 2011”, we

analyzing all one hundred wine reviews and adding all necessary categories and subcategories, we came out with a total of 547 distinct attributes.

When looking at our finished list, we noticed many cases where groups of attributes were really just permeations of the same thing.

An example would be the following three attributes: FRESHLY-CUT APPLE, RIPE APPLE, and APPLE.

Hierarchical Clustering

DendrogramVenn Diagram of Clustered Data

From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Distance Measure

Distance Measure Example

WINE CHERRY CHEWY TANNINS

BEAUTY

WINE1 1 1 1

WINE2 0 0 1

Clustering Results

Clustering Results1 2 3 4 5 6 7 8 9 10 11

Clustering Results

Ref#

Vintage

Type

Varietal

1 2008 RED MERLOT (.53) - CABERNET FRANC (.29) - CABERNET SAUVIGNON (.13) - MALBEC (.04) - PETIT VERDOT (.01)

2 2008 RED CABERNET SAUVIGNON

3 2009 RED PINOT NOIR

4 2007 RED CABERNET SAUVIGNON

5 2007 RED SANGIOVESE (.90) - CANAIOLO/COLORINO (.10)

6 2004 RED TEMPRANILLO

Ref#

World

Country Region Alcohol

Price

Drink Begin

Drink End

1 NEW United States

Washington   $35 NOW 2020

2 NEW United States

Washington 14.5% $37 NOW 2018

3 NEW United States

California 13.9% $45 NOW 2019

4 NEW United States

Washington 14.6% $32 NOW 2019

5 OLD Italy Tuscany 14% $22 NOW 2022

6 OLD Spain Castilla y Leon

  $15 NOW 2015

Clustering Results

CLUSTER #3 – 6 Instances – Attribute InformationAttribute Number of Wines Attribute

WeightBLACKBERRY 6 3LONG FINISH 5 2

SPICE 4 3FRUIT 3 1

BLACK CHERRY 3 3RED 3 2

FOCUSED 3 1EXCELLENT

FINISH3 2

RIPE 3 1TANNINS_LOW 3 2TANNINS_HIGH 3 2Suggestions

This cluster represents the fruity aspect of new-world wines, focusing on powerful notes of blackberry and black cherry, as well as a commanding finish.

Conclusion In this paper, we discuss Wine Reviews

and how their attributes can play an integral role in grouping different wines together.

We show that when using only the attributes of a wine review, we can aggregate wines together that have similar world region, monetary value, vintage, type, and varietal.

Thanks

Questions?