+ All Categories
Home > Documents > MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data...

MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data...

Date post: 27-Sep-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
15
MIS2502: Data Analytics Extract, Transform, Load JaeHwuen Jung [email protected] http://community.mis.temple.edu/jaejung
Transcript
Page 1: MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data from the transactional database Transform data into an analysis-ready format Load

MIS2502:Data AnalyticsExtract, Transform, Load

JaeHwuen [email protected]

http://community.mis.temple.edu/jaejung

Page 2: MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data from the transactional database Transform data into an analysis-ready format Load

Where we are…

Transactional Database

Analytical Data Store

Stores real-time transactional data

Stores historical transactional and

summary data

Data entry

Data transformation Data

analysis

Now we’re here…

Page 3: MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data from the transactional database Transform data into an analysis-ready format Load

Extract, Transform, Load (ETL)Extract data from the transactional database

Transform data into an analysis-ready format

Load it into the analytical data store

Page 4: MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data from the transactional database Transform data into an analysis-ready format Load

The Actual Process

Transactional Database 2

Transactional Database 1

Analytical Data store

Data conversion

Data conversion

Extract Transform Load

Relational database Dimensional database

Other Sources Data conversion

Page 5: MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data from the transactional database Transform data into an analysis-ready format Load

ETL’s Not That Easy!

• What if the data is in different formats?

Data Consistency

• How do we know it’s correct?

• What if there is missing data?

• What if the data we need isn’t there?

Data Quality

Page 6: MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data from the transactional database Transform data into an analysis-ready format Load

Data Consistency: The Problem with Legacy Systems

• An IT infrastructure evolves over time

• Systems are created and acquired by different people using different specifications

This can happen through:• Changes in management• Mergers & Acquisitions• Externally mandated standards• Generally poor planning

Page 7: MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data from the transactional database Transform data into an analysis-ready format Load

Why Not Replacing Legacy Systems?

Too much riskProhibitive

costUser

reluctance

Limited business

agility

Speed of delivery

https://www.onbase.com/~/media/Files/hyland/whitepaper/wp_trouble-with-legacy-systems.pdf

https://thenextweb.com/finance/2017/04/10/ancient-programming-language-cobol-can-make-you-bank-literally/

Page 8: MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data from the transactional database Transform data into an analysis-ready format Load

Problems with Data Consistency

The same data element stored in different formats

• Social Security number (123-45-6789 versus 123456789)

• Date (10/9/2015 versus 9/10/2015)

Redundant data across the organization

• Customer record maintained by accounts receivable and marketing

Different naming conventions

• “Management Information Systems” versus “MIS” versus “Man. Info. Sys.”

Different unique identifiers used

• AccessNet account versus Temple ID

Page 9: MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data from the transactional database Transform data into an analysis-ready format Load

What’s the big deal?

This is a fundamental problem for creating the analytical data store

We often need to combine information from several transactional databases

How do we know if we’re talking about the same customer or product?

Page 10: MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data from the transactional database Transform data into an analysis-ready format Load

Now think about this scenario

Hotel Reservation Database Café Database

What are the differences between a “guest” and a “customer”?

Is there any way to know if a customer of the café is staying at the hotel?

CustomerCustomer_numberCustomer_nameCustomer_addressCustomer_cityCustomer_zipcode

OrderOrder_numberCustomer_numberHotel_idFood_item_idOrder_dateOrder_timeTable_number

Food itemOrder numberFood_item_idOrder_dateOrder_time

HotelsHotel_idCountry_codeHotel_nameHotel_addressHotel_cityHotel_zipcode

HotelsHotel_idCountry_codeHotel_nameHotel_addressHotel_cityHotel_zipcode

CountriesCountry_codeCountry_currencyCountry_name

Hotel roomsRoom_numberHotel_idRoom_typeRoom_floor

Room typesRoom_type_codeRoom_standard_rateRoom_descriptionSmoking_YN

Room BookingsBooking_idRoom_type_codeHotel_idCheckin_dateNumber_of_daysRoom_count

Guest BookingsBooking_idGuest_number

GuestsGuest_numberGuest_firstnameGuest_lastnameGuest_addressGuest_cityGuest_zipcodeGuest_email

Hotel Amenities LookupCharacteristic_idCharacteristic_description

Hotel AmenitiesCharacteristic_idHotel_id

Page 11: MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data from the transactional database Transform data into an analysis-ready format Load

Solution: “Single view” of data

• The entire organization understands a unit of data in the same way

• It’s both a business goal and a technology goal

but it’s really more this…

...than this

Page 12: MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data from the transactional database Transform data into an analysis-ready format Load

Closer look at the Guest/Customer

GuestsGuest_numberGuest_firstnameGuest_lastnameGuest_addressGuest_cityGuest_zipcodeGuest_email

CustomerCustomer_numberCustomer_nameCustomer_addressCustomer_cityCustomer_zipcode

Getting to a “single view” of data:

How would you represent “name?”

What would you use to uniquely

identify a guest/customer?

Would you include email address?

How do you figure out if

you’re talking about the same

person?

vs.

Page 13: MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data from the transactional database Transform data into an analysis-ready format Load

Data Transformation Steps

• Decomposes data elements

• Example: [name: Joe Cool ]→[FirstName: Joe, LastName: Cool)

Parsing

• Corrects parsed data elements

• Example: street name does not exist and is replaced with the "closest" one

Correcting

• Transforms data into its preferred format

• Example: Broad ST → Broad StreetStandardizing

• Matches records within and across data sourcesMatching

Page 14: MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data from the transactional database Transform data into an analysis-ready format Load

Data Quality

The degree to which the data reflects the actual environment

Do we have the right data?

Is the collection process reliable?

Is the data accurate?

• Choose data consistent with the goals of analysis

• Verify that the data really measures what it claims to measure

• Manual verification through sampling

• Use the knowledgeexpert

• Build fault tolerance into the process

• Periodically run reports, check logs, and verify results

Page 15: MIS2502: Data Analytics - Temple MISMar 06, 2018  · Extract, Transform, Load (ETL) Extract data from the transactional database Transform data into an analysis-ready format Load

Summary

• What is ETL? Why is it important?

– Data consistency

– Data quality

• Explain the purpose of each component (Extract, Transform, Load)


Recommended