1
A Weapon in Your Competitive Arsenal:The Data Warehouse
John Rome, Arizona State University
2005 Fall Conference
Agenda• Quiz• Background• Define Data Warehousing• Discuss Latest Buzzwords• Demo of Actual Data Warehouse• Lessons Learned and Some Advice• Demo/Questions/Discussion• Later Today…
– Data Mining, Dashboard and Data Quality
2
Quiz--Truth or Urban Legend?1. Pizza Hut knows your favorite toppings,
what you ordered last and whether you like salad with your meat lover's pie?
2. Ekco sells more turkey basters during Christmas than Thanksgiving?
3. 4 wheel drive Green Subarus outsell Blue by a wide margin, except Wisconsin?
4. Walmart increases sales by placing diapers and beer next to each other?
About Arizona State University
• Located in Phoenix Metropolitan• 61,033 Students • 5,393 Full-Time Administrative Staff• 2,165 Full-Time Faculty• Awarded Research I Status in 1994• “New American University”• http://www.asu.edu
3
`
“One University, Many Places”
4
About ASU’s Data Administration • Reports to the President’s Office • 5 Professional Staff, 4 Support Staff• Mission: Data Access, Data Quality, and Data
Education• Supports Centralized/Decentralized Initiatives• Data Warehouse is “full-employment”• In Preliminary ERP discussions• Close ties with IR office• http://www.asu.edu/data_admin
5
Warehousing Was and Still is Hot...• $8B Industry• 90% of CIOs claim to be developing (Meta Group, 1998)
with 99.9% today• Higher Education Institutions are building
them• Keynotes at IR conferences!!• Chapters in college textbooks• Amazon.com barometer
(Over 100 books on warehousing)
So Hot...Even Dilbert is talking about them!!
apologies to Scott Adams!!
WAREHOUSE
6
What is a Data Warehouse?
• SUBJECT-ORIENTED• INTEGRATED• TIME-VARIANT• NON-VOLATILE
collection of data in support of management’s decision making process.
-Bill Inmon
Some More DefinitionsA copy of transaction data specifically structured for query and analysis.
A single, integrated store of corporate data which provides the infrastructuralbasis for informational applications in the enterprise.
-Ralph Kimball
-S.G.Kelly
7
?
My Definition…
Age: 2 8 42 66Weight: 35 85 205 190Net Worth: $0.00 $52.00 $X90,000 30 Million
“A Database with Snapshots of Data Dedicated for Reporting Purposes”
8
Why All the Fuss About Warehousing?• Powerful Data Source for Reporting • Fills in Gaps Left by Operational Systems• Integrates Data from Silo Systems• Both Strategic and Tactical• Keeps Historical Data• Assists Longitudinal Studies• Helps Assessment and Retention• Becoming Mission Critical to Organizations!
How Is a Warehouse Different?
• data is read-only• managed redundancy• serves management• “time fixed” data• “what if” processing• historical trends• response… minutes
• data is updated• minimal redundancy• serves operational users• “current value” data• repetitive processing• limited history• response… seconds
WarehouseOLTP
9
OTHERSOURCES
MAINFRAME
MVS/ESA
LEGACY SYSTEM
(DB2/IDMS)
SQL/ODBC
SQL/ODBCNT
WEB SERVER
ASPCOLD FUSION
UNIXWEB SERVER
JAVA
SQL/JDBC
UNIX
SQL/”Native”
Data Warehouse
Sample Warehouse Architecture
Some “BI” BuzzwordsOLAP
MOLAP
ROLAP
Metadata
ReplicationAggregation
Star Schema Multi-dimensional
Facts/Dimensions
Bit-Mapped IndexingDrill-Down
Transformation Tools (ETL)
De-Normalized
Snowflake Schema
Operation Data Store (ODS)
XML
Data Mining
Data Quality Business Intelligence
DashboardsSQL
Data Mart
10
What is a Data Mart?A data mart is often a very focused slice of a larger data warehouse.
Data Warehouse vs. Data Mart Data Warehouse Data Mart Scope Enterprise
Specific business process
Data Perspective
Historical data Some summary Lightly denormalized
Current (some history)Highly denormalized
Data Subjects 20-30 tables (each subject area) Multiple subjects
5-10 tables Single subject area
Ability to Integrate
Highly integrated Some/little integration
Time to Build
12-18 months 2-8 months
Characteristics Flexible Strategic Durable
Restrictive Tactical Focused
11
What is SQL?
SELECT *FROM CONFERENCE_ATTENDEESWHERE LAST_NAME = ‘HALE’ ANDFIRST_NAME = ‘LESLIE’
SQL. Stuctured Query Language (pronounced sequel). The Lingua Franca of Data Access in Relational Databases. It is used to build queries to be performed against Data Warehouses.
Tools Are Doing the Dirty Work
12
End User Access Tools
-Gartner Group
-Keith Gile, Forrester
What is ETL? • Tool or process used to move data from
one system/DB to another system/DB• Over 100 ETL tools on market, about 10
serious contenders• Range from Free - $750K• Better ones may be cost-prohibitive• Database often has bulk load utilities• Sometimes its E.L.T. (Load data 1st after
extract and then transform with programs or stored procedures after load )
13
ETL Example
What is A Data Model?• graphical representation that identifies
the information needs of the business. A data driven, versus function (or process) based view of an organization.
takes
is offered by
offersis identified by
CLASS MEETING TIME
CLASS
CAMPUS
COURSE COLLEGE
STUDENT
COURSE CATALOG
14
Warehouse Modeling Techniques?#1 Dimensional Modeling
(Star Join Schema)#2 Tabular Modeling
(E/R Denormalized)
takes
is offered by
offersis identified by
CLASS MEETING TIME
CLASS
CAMPUS
COURSE COLLEGE
STUDENT
COURSE CATALOG
15
What Makes A Good Data Model?
• Completeness• Simplicity • No redundancy (OLTP)• Enforcement of Business Rules• Data Reusability• Stability and Flexibility• Communication Effectiveness
Some Design Guidelines• Add element of time to the tables• Appropriately name tables, attributes,
views• Add derived fields when necessary• Make sure data integrates• Consider security and privacy in design• Consider performance (indexes, etc.)• Make sure data model can answer the
critical business questions
16
Display Your Model Proudly...
takes
is offered by
offersis identified by
CLASS MEETING TIME
CLASS
CAMPUS
COURSE COLLEGE
STUDENT
COURSE CATALOG
“Mona Lisa” “Wall Ware” “American Gothic”
Demo Time
• Ad Hoc Quer(ies) using BI Tool• Retention Application using Web
17
About ASU’s Data Warehouse• 10 years in the making• Major subject areas (Student, HR,
Financial)• Supports over 1500 users• “Poor Man’s Repository” for definitions• Source of data – multiple operational
systems• Mission Critical to University
ASU’s Warehouse Vital StatisticsUsers: 1900+ loginsVolume: 50+ gigabytesApproach: EnterpriseDatabase: Sybase Adaptive ServerServer: Sun/UNIXDesktop: Brio (now Hyperion), MS-AccessWeb: ASP, Java, Cold Fusion
18
ASU’s Warehouse Subject AreasPRIMARY SUPPORT SPECIAL
CENSUS
FINANCIAL
STUDENTFEES
HUMAN_RESOURCESTUDENT
RESEARCH
COURSE
FINANCIALAID
March 1, 2002
TRAINING(*)
DICTIONARY
LOOKUP
PERSON DIRECTORY_SERVICES
USER_TABLE
SRCDARS
FACILITY STUDENT_RETENTION
WAREHOUSE_ADMIN PROXY DBs
PROPERTY ETC.
OTHERSOURCES
MAINFRAME
MVS/ESA
LEGACY SYSTEM
(DB2/IDMS)
SQL/ODBC
SQL/ODBCNT
WEB SERVER
ASPCOLD FUSION
UNIXWEB SERVER
JAVA
SQL/JDBC
UNIX
SQL/”Native”
Data Warehouse
Sample Warehouse Architecture
19
Lessons LearnedFrom the Home Office in Tempe, Arizona
Have a Historical Data Plan• Need Ability to Compare Data Over Time• Decide how Far Back or how many Years
of Data to Keep• “Census” Snapshots are a Must• Fiscal Year, Calendar Year, Semester or
Term, Pay-Period Data
#10
20
DQ Isn’t as Good as You Think It Is
#9
It’s good, it’s bad,and it’s ugly!
It’s good, it’s still bad,and it’s still ugly!
Costs Shifting to the Customer• Faster PCs• Printers• Ethernet/Web Connection• Middleware?• Data Access Software
– Client/Server Application or Plug-in
#8
21
User Involvement Critical
Strike 1 Finding users with free timeStrike 2 Different business users may
have conflicting ideas of what they want
Strike 3 Users often don’t know what they really want
Factors Making it Tough
#7
Data Definitions Are Important
#6
22
Security & Privacy Still a Big Issue
• Careful design of the Data Warehouse helps security
• Variety of ways to implement security• All users must take responsibility for
security/privacy (train them!!!)• Security costs money to implement
#5
(Even if the data is read-only)
Web Solution is a Must • Internet has become more reliable• Offers Quick delivery of vital information• Reduction of access and communication
cost (IT overhead)• Ability to reach an expanded audience• No software, just a browser in many cases
“Because it requires minimal training and reduces IT overhead, the Web is becoming the de facto warehouse access platform.”-Wayne Eckerson
#4
23
Web Will Win Out
Training Investment Pays Dividends• Recognized, but Often
not Funded• Rely on “Data
Trustee”/expert for support
• Standardize on One Tool
• Tool Training is easy, Data Training is Tough!
#3
“Be suspicious of your results!
24
Need Support Structure in Place
• Need to move from pilot phase to production
• Treat system as “mission critical”• ASU Solutions
– “ware-q” e-mail– Warehouse User’s Group (WUG)– 1-800-what-now (just kidding!)
#2
Users Do Amazing Things...• Persistence Studies• Faculty Workload• Web-erize Reports• Create Pseudo-operational
systems to fill data gaps• Create Personal Letters
(encouraging Students to register)• Get Data for Legislative support• Find tutors
#1
25
The Data Warehouse...Helps us do our business better
Inconvenience Store Problem Solved
Here’s the Data
26
Another Quiz1. Phoenix isn't a good place for selling golf
clubs, despite # of golf courses.2. Motorcycle owners (picture Hells Angels
riders) usually rank within the highest income bracket.
3. Best Selling Women’s Shoe Size in 1986 and 2003?
4. What are Walmart’s Top 5 “Affinity” Sales with a Miller Lite 6 PK?
Shoes Size
27
• Mar’s M&Ms Peanuts• Mar’s M&Ms Plain• Beefy Cigars 2 PK• Marlboro Regular PK• Cert’s Wintergreen Roll Candy
Affinity Sales
Questions