of 31
8/6/2019 Teradata Application Dev4
1/31
Chapter 4
Chapter 4 The Extended Logical Data Model
"Too much of a good thing is just right."
Too much of a good thing is just right, but even a little of a bad thing will mess up your data warehouse. We
have arrived at building the Extended Logical Data Model. The Extended Logical Data Model will serve as theinput to the Physical Data Model. We are in a critical stage. We must produce excellence in this section because
we dont want bad input going into our Physical Data Model. This comes from Computer Science 101 -GarbageIn Garbage Out (GIGO). Building a poor Extended Logical Data Model makes about as much sense as Mae
West running east looking for a Sunset! Things will be headed in the wrong direction. If however, you canproduce quality in your ELDM, your warehouse is well on its way to being just right!
This chapter will begin with the Application Development Life Cycle, and then talk about the Logical Modeland Normalization. From there, we get to the meat and discuss the metrics, which is a critical part of the
Extended Logical Data Model. The ELDM will become our final input into the Physical Data Model.
The Application Development Life Cycle
"Failure accepts no alibis.
Success requires no explanation."
The design of the physical database is key in the success of implementing a data warehouse along with theapplications that reside on the system. Whenever something great is built, whether a building, an automobile, or
a business system, there is a process that facilitates its development. When designing the physical database, theprocess to conduct this is known as the Application Development Life Cycle. If you follow the process you
will need no alibis because success will be yours.
The six major phases of this process are as follows:
yDesign - Developing the Logical Data Model (LDM), Extended Logical Data Model (ELDM), and the
Physical Data Model (PDM).
yDevelopment - Generating the queries according to the requirements of the business users.
yTest- Measure the impact that the queries have on system resources.
yProduction - Follow the plan and move the procedures into the production environment.
yDeployment - Training the users on the system application and query tools (i.e., SQL, MicroStrategy,
Business Objects, etc.).
yMaintenance - Re-examining strategies such as loading data and index selection.
It is important that you understand the fundamentals and which order to perform them in the development life
cycle of an application. First there is the Business Discovery, then second is a Logical Data Model, then third
8/6/2019 Teradata Application Dev4
2/31
Chapter 4
you get an outstanding Physical Model, then fourthDesign the Application, and then lastly you perform yourDevelopment and Assurance Testing.
During the testing phase of an application it is important to check that Teradata is using Parallelism, there are
no large spool space peaks, and that AMP utilization is equal. A HOT AMP is a bad sign and so is runningout of Spool.
Asking the Right Questions
"He who asks a question may be a fool for five minutes, but he who never
asks a question remains a fool forever."
The biggest key to this section is knowledge and not being afraid to ask the right questions. Knowledge about
the user environment is vitally important. If you can ask the right questions, you will build a model that willmap to the users needs. In addition, you will be able to deliver a world-class data warehouse that remains cool
forever.
Here is how this works. The logical modelers will create a logical data model. Then it is up to you to ask theright questions and find out about the demographics of the data. Only then you can build the proper Physical
Data Model.
Remember:
The Logical Data Model will be the input to the Extended Logical Data Model.
The Extended Logical Data Model will be input to the Physical Data Model.
The Physical Data Model is where Denormalization and the advantage of parallelism are determined.
Logical Data Model
"When you are courting a nice girl an hour seems like a second. When
you sit on a red-hot cinder a second seems like an hour. Thats relativity."
The first step of the design phase is called the Logical Data Model (LDM). The LDM is a logical representationof tables that reside in the data warehouse database. Tables, rows and columns are the equivalent of files,
records and fields in most programming languages. A properly normalized data model allows users to ask awide variety of questions today, tomorrow, and forever.
The following illustration displays the Employee Table. The columns are emp, dept, lname, fname and sal:
8/6/2019 Teradata Application Dev4
3/31
Chapter 4
Employee Table
EMP DEPT LNAME FNAME SAL
Primary
Key
Foreign
Key
1
2
3
4
40
20
20
10
BROWN
JONES
NGUYEN
JACOBS
CHRIS
JEFF
SHERRY
95000.00
70000.00
55000.00
34000.00
Notice that each of the four rows in the employee table is listed across all of columns and each row has a value for each
column. A row is the smallest unit that can be inserted, or deleted from a table in a data warehouse database.
Primary Keys
"Instead of giving politicians the keys to the city, it might be better to
change the locks."
The Primary Key of a table is the column or group of columns whose values will identify each row of thattable.
Every table has to have a primary key: Tables are very flexible when it comes to defining how a tables datacan be laid out. However, every table must have a primary key. Each row within that table must always be
uniquely identifiable.
Every table can only have one primary key: If the table happens to have several possible combinations thatcould work as a primary key, only one can be chosen. You cannot have more than one primary key on a table.
The smallest group of columns, often just one, is usually the best.
PKmeans primary key: Primary keys will be marked with the letters PK.
8/6/2019 Teradata Application Dev4
4/31
Chapter 4
"Life is a foreign language; all men mispronounce it."
A foreign key is a column or group of columns that happens to be a primary key in another table. Foreign keys
help to relate a group of rows to another group of rows. Both groups of rows are required to have a commoncolumn containing like-data so that they can match up with each other, and these groups of rows can be found
on the same or multiple table(s).
FK means foreign key: When drawing out the design of your tables, FK will stand for your foreign keys. Theycan also be numbered so that you can properly mark multi-column foreign keys (FK1, FK2, etc).
Primary Key Foreign Keys Establish a Relationship"One of the keys to happiness is bad memory."
Primary Key Foreign Key relationships establish a relation therefore the tables can be joined. The picture below shows
the joining of the Employee Table and the Department Table.
8/6/2019 Teradata Application Dev4
5/31
Chapter 4
"The noblest search is the search for excellence."
The term normalizing a database came from IBMs Codd. He was intrigued by Nixons term of trying tonormalize relations with China so he began his search for excellence to normalize relations between tables. He
called the process normalization. Most modelers believe the noblest search is the search for perfection so theyoften can take years to perfect the data model. This is a mistake. When creating a logical data model, dont
strive for perfection; aim for excellence. And do it quickly!
Normalizing a database is a process of placing columns that are not key related (PK/FK) columns into tables in
such a way that flexibility is utilized, redundancy is minimized, and update anomalies are vaporized! Industry
experts consider Third Normal Form (3NF) mandatory, however it is not an absolute necessity when utilizing aTeradata Data Warehouse. Teradata has also been proven to be extremely successful with Second Normal Formimplementations. The first four forms are described as follows:
1. First Normal Form (1NF) eliminates repeating groups, for each Primary Key there is only one occurrence
of the column data.
2. Second Normal Form (2NF) is when the columns/attributes relate to the entire Primary Key, not just a
portion of it. All tables with a single column Primary Key are considered second normal form.
3. Third Normal Form(3NF) states that all columns/attributes relate to the entire primary key and no other
key or column.
The interesting thing about a normalized model is that at each level, consistency is maintained. For example, in
first normal form there are no repeating groups. When a table is normalized from first to second normal form, itwill have no repeating groups and attributes must relate to the entire primary key. In third normal form it will
have no repeating groups (like first normal form), attributes must relate to the entire primary key (likesecond normal form), and attributes must relate to the primary key and not to each other (like third normal
form).
8/6/2019 Teradata Application Dev4
6/31
Chapter 4
Once the tables are defined, related, and normalized, the Logical Data Model is complete. The next phaseinvolves the business users, which establish the foundation for creation of the Extended Logical Data Model.
Read on as we discuss this next step of the design process of Teradata.
"Examine what is said, not him who speaks."
An entity represents a person, place, or anything of that matter that can be uniquely identified. We mentioned
earlier in the book that a noun is the name of a type of a person, place or thing in simple terms. Entities arenouns that we want to track. We will want to track our favorite nouns such as employees, customers, locations,
departments, products and stores.
A great building starts with a great foundation. A great table starts by tracking a great entity. If a business canname a noun that it wants to keep track of, then this is the start of a great table.
8/6/2019 Teradata Application Dev4
7/31
8/6/2019 Teradata Application Dev4
8/31
Chapter 4
Examples of Major and MinorEntities
8/6/2019 Teradata Application Dev4
9/31
Chapter 4
"Fate chooses your relations, you choose your friends."
A relation can be a state of being, an association, an action, or an event that will tie two or more entities
together. Relations come in three different forms; one-to-one, one-to-many and many-to-many. No matter whatthe form of the relation is the tables being related will have a Primary Key Foreign Key relationship.
The term Normalization in a relational database actually comes from Dr. Codd of IBM. He was inspired by
President Richard Nixon who at the time was trying to build a positive relationship with China. President Nixontermed what he was doing as normalizing relations with China. Because there is only one China and only one
United States of America, their normalizing of relations could be considered a one-to-one relationship.
When they discover the center of the universe, a lot of people will be
disappointed to discover they are not in it."
A one-to-one relation is found when each occurrence of one entity (entity A) can be related to only oneoccurrence of another entity (entity B), or none at all. The same applies when you relate entity B to entity A.
This is the rarest form of relations, and may not be found in most data models you create.
Ones dignity may be assaulted, vandalized, and cruelly mocked, but it
cannot be taken away unless it is surrendered."
A one-to-many relation is found when each occurrence of one entity (entity A) can be related to only oneoccurrence of another entity (entity B). But when it comes to each occurrence of entity B, you can find multiple
occurrences relating to Entity A. This is a very common relation found in tables, and will appear in almost allmodels. For example, you can only have 1 department assigned to an employee, but you can have multiple
employees assigned to a department.
8/6/2019 Teradata Application Dev4
10/31
Chapter 4
"Cats are smarter than dogs. You cant get eight cats to pull a sled
through snow."
A many-to-many relation is found when each occurrence of on entity (entity A) can be related to many different
occurrences of another entity (entity B). The same will apply when you relate entity B to entity A. This is also avery common form of a relation.
8/6/2019 Teradata Application Dev4
11/31
Chapter 4
"A lot of people approach risk as if its the enemy when its really
fortunes accomplice."
Because a Many-to-Many relationship does not have a direct Primary Key Foreign Key relationship anassociative table is utilized as the middle man. The associative table has a multi-column Primary Key. One of
the associative tables Primary Key columns is the Primary Key of table A and the other Primary Key column ofthe associative table is the Primary Key of table B. The example below shows the exact syntax for joining the
Student Table to the Course Table via the associative table; Student Course Table.
"The surprising thing about young fools is how many survive to become
old fools."
As you can see from our example below we were able to join our Many-to-Many relationships table via the
associative table.
8/6/2019 Teradata Application Dev4
12/31
Chapter 4
"If computers get too powerful, we can organize them into a committee
that will do them in."
An attribute is a characteristic of an entity or a relation describing its character, amount or extent.
"This is the sixth book Ive written, which isnt bad for a guy whos only
read two."
8/6/2019 Teradata Application Dev4
13/31
8/6/2019 Teradata Application Dev4
14/31
Chapter 4
"To me, old age is always 15 years older than I am."
In this chapter were going to learn about special cases of the relation section, which are called recursiverelations. A recursive relation is a relation between variable occurrences of an entity. This also extends to
relations to subsets of entities and dependents of entities.
"My idea of an agreeable person is a person who agrees with me."
The second type of special cases is called complex relations. A complex relation is a relation thats shared
between more than two entities. This also includes subsets and dependents of entities, which is why it can becomplex.
8/6/2019 Teradata Application Dev4
15/31
Chapter 4
"They always say time changes things, but you actually have to change them
yourself."-
The third scenario of special relations is the time relation. This type of scenario is a relation between an entity, asubset, or a dependent, and a time value.
A Normalized Data Warehouse
"The reputation of a thousand years may be determined by the conduct of one
hour."
A normalized data warehouse will have many different tables that are related to one another. Tables that
have a relation can be joined. Most normalized databases will have many tables with fewer columns. Thisprovides flexibility and is a natural way for business users to view the business. Each table created will have a
Primary Key. All of the columns in the table should relate to the Primary Key. A Foreign Key is anothercolumn in the table that is also a Primary Key in another table. The Primary Key Foreign key relation is
how joins are performed on two tables.
8/6/2019 Teradata Application Dev4
16/31
Chapter 4
Relational Databases use the Primary Key Foreign Key relationships to join tables together.
"Never insult an alligator until after you have crossed the river."Never insult a modeler until after you have the ERWin diagram in hand. Many believe that a normalized model
is best while others argue that dimensional models are better. A combination of both is an excellent strategy.
Dimensional Modeling was originally designed for retail supermarket applications because their systems didnot have the performance to perform joins, full table scans, aggregations, and sorting. Dimensional Modelingoften implements fewer tables and can be adapted to enhance performance. This is because dimensional
modeling was designed around answering specific questions.
8/6/2019 Teradata Application Dev4
17/31
Chapter 4
v
"Facts are stupid things."
The Dimension Table The dimension table helps to describe the measurements on the fact table. Thedimension table can and will contain many columns and/or attributes. Dimension tables tend to be relatively
shallow when it comes to the average number of rows within a dimension table. Each dimension is defined byits primary key, which serves as the basis for referential integrity with any fact table to which it is joined.
Dimension attributes serves as the source of query constraints, groupings, and report labels. When a query orreport is requested, attributes can be identified as the words following by. For example, a user may ask to see
last weeks total sales by Product_Id and by Customer_Number. Product_Id and Customer_Number will haveto be available as dimension attributes. Dimension Table attributes are key to making a data warehouse usable
and understandable. Dimensions also implement the user interface to the data warehouse.
8/6/2019 Teradata Application Dev4
18/31
8/6/2019 Teradata Application Dev4
19/31
Chapter 4
"The weak can never forgive. Forgiveness is the attribute of the strong."
Each dimension on a dimensional model is going to have attributes that make that table unique from the rest.
Each table will vary, depending on what information is on which table. Attributes will look something like thefollowing:
"People can have the Model T in any color so long as its black."
The following two pictures represent an Entity-Relational (ER) Model and a Dimensional Model (DM):
8/6/2019 Teradata Application Dev4
20/31
Chapter 4
"The man who is swimming against the stream knows the strength of it."
Dimensional Modeling contains a number of warehousing advantages that the ER model lacks. First off, thedimensional model is very predictable. Query tools and report writers are able to make strong assumptions
about the dimensional model to make user interfaces more understandable, and to make more efficientprocessing. Metadata is able to use the cardinality of values within a dimension to help guide user-interface
behavior. Because the framework is predictable, it strengthens processing. The database engine is able to makestrong assumptions about constraining dimension tables and then linking to the fact table all at once along with
the Cartesian product of the dimension table keys that satisfy the users constraints. This approach enables theuser to evaluate arbitrary n-way joins to a fact table within a single pass through the fact tables index.
There are several standard approaches for handling certain modeling situations. Each situation has a well-
understood set of alternative decisions that can be programmed in report writers, query tools, and userinterfaces. These situations can include:
- Slow-changing dimensions involving dimension tables that change slowly over time. Dimensionalmodeling helps to provide techniques for handling these slow-changing dimensions, depending on the
company.
- Miscellaneous products where a business needs to track a number of different lines of business within asingle common set of attributes and facts.
- Pay-in-advance databases where transactions of the company are more than small revenue accounts,
however the business may want to look at single transactions as well as a report of revenue regularly.
The last strength of the dimensional model is the growing pool of DBA utilities and software processes that
regulate and use aggregates. Remember, an aggregate is considered summary data that is logically redundantwithin the data warehouse and are used to enhance query performance. A well-formed strategy for
comprehensive aggregates is needed for any medium to large data warehouse implementation. Another way tolook at it is if you dont have any aggregates, lots of money could end up being wasted on minuscule hardware
upgrades.
Aggregate management software packages and navigation utilities depend on a specific single structure of thefact and dimension tables, which in turn is dependant of the dimensional model. If you stick to ER modeling,
you will not benefit from these tools.
"The problem with political jokes is that they get elected."
The dimensional model can fit a data warehouse very nicely. However, this doesnt mean that the dimensionalmodel is perfect. There are several situations where an ER model is better than the dimensional model, and vice
versa.
A star-join schema tends to fit the needs of users who have been interviewed; its possible that it doesnt fit all
users. There will always be users who dont contribute to the dimensional modeling process. The star-joindesign tends to optimize the access of data for solely one group of users at the expense of everyone else. A star-
join schema with just one shape and a set number of constraints can be extremely optimal for one group ofusers, and horrendous for another. Star-join schemas are shaped around user requirements, and because a) not
8/6/2019 Teradata Application Dev4
21/31
Chapter 4
every user can be interviewed during the dimensional model design phase, and b) user requirements vary fromgroup to group, different star-schemas will be optimal for different groups of users.
A single star-join schema will never fit everyones requirements in a data warehouse. The data warehousing
industry has long-since discovered that a single database will not work for all purposes. There are severalreasons why companies need their own star-join schemas and cant share a star-join schema with another
company:
- Sequencing of data. Finance users love to see data sequenced one way while marketing users love to seeit sequenced another way.
- Data definitions. The sales department considers a sale as closed business, while the accountingdepartment sees it as booked business.
- Granularity. Finance users look at things in terms of monthly and yearly revenue. Accounting users
look at things in terms of quarterly revenue.
- Geography.The sales department looks at things in term of ZIP code, while the marketing departmentmight look at things in terms of states.
- Products. Sales tend to look things in terms of future products, while the finance department tends tolook at things in terms of existing products.
- Time. Finance looks at the world through calendar dates, while accounting looks at the world through
closing dates
- Sources of data. A source system will feed one star join while another source system feeds another.
The differences between certain business operations and others are much more vast than the short list above.Because there are many types of users on a database, many situations can arise at any time, and on any subject.
"A problem well stated is a problem half-solved."
Each department of a company tends to conduct their business differently than other departments. Eachdepartment sees an area of the data warehouse differently than the others because of their unique and different
user requirements. Each department tends to care for different aspects of the company, which is why eachdepartment sees things differently from others. Because of the need for each department to view things in their
own unique way, each department requires a different star-join schema for their data warehouse. A star joinschema that is optimal for the finance department is practically useless for marketing. The reality of the
business world is that one star-join schema will not fit all aspects of a companys data warehouse.
It is possible to design a star-join schema specifically for each department, but even then certain problems will
arise. Most of the problems dont even become apparent until multiple star joins have been designed for the datawarehouse. When you have multiple independent star-join schema environments, the same detailed data will
appear on each star-join. Data will no longer be reconcilable, and any new star-join creation will require thesame amount of work as the old star-joins. Because of this:
8/6/2019 Teradata Application Dev4
22/31
8/6/2019 Teradata Application Dev4
23/31
Chapter 4
Follow Socrates advice and assume you know nothing when starting this portion of the design process.However, If you make a column your Primary Index that is never accessed in the WHERE clause or used to
JOIN to other tables, then Socrates had you in mind on the last three words of his quote.
It will be your job to interview users, look at applications, and find out what columns are being used in theWHERE clause for the SELECTs. The key here is to investigate what tables are being joined. It is also
important to know what columns are joining the tables together. You will be able to find some join informationfrom the users, but common sense plays a big part in join decisions. Understand how column(s) are accessed,
and your warehouse is on its way to providing true wisdom!
COLUMN ACCESS in the WHERE CLAUSE:
Value Access Frequency How frequently the table will be accessed via thiscolumn.
Value Access Rows * Frequency The number of rows that will be accessed multiplied
by the frequency at which the column will be used.
JoinA
ccess FrequencyHow frequently the table will be joined to another
table by this column being used in the WHEREclause.
Join Access Rows The number of rows joined.
Quite often, new designers to Teradata believe that selecting the Primary Index will be easy. They just pick thecolumn that will provide the best data distribution. They assume that if they keep the Primary Index the same as
the Primary Key column (which is unique by definition), then the data will distribute evenly (which it true). ThePrimary Index is about distribution, but even more important is Join Access. Remember this golden rule!
The Primary Index is the fastest way to access data and Secondary Indexes are next.
Dont count the days, make the days count."
Mohammed Ali has given you some advice that is reflective of your next objective. Count the DATA so youcan make the DATAcount. So how do you accomplish this task? The following will be your guide:
y Write SQL and utilize a wide variety of tools to get the data demographics.
yCombine these demographics with the column and join access information to complete the Extended LogicalData Model.
yThen use this information to create the Primary Indexes, Secondary Indexes, and other options in the PhysicalDatabase Design.
DATA DEMOGRAPHICS:
8/6/2019 Teradata Application Dev4
24/31
Chapter 4
Distinct Values The total number of unique values that will be storedin this column.
Maximum Rows per Value The number of rows that will have the most popularvalue in this column.
Typical Rows per Value The typical number of rows for each column value.
Maximum RowsNULL
The number of rows with NULL values for thecolumn.
Change Rating A relative rating for how often the column value willchange. The value range is from 0-9, with 0
describing columns that do not change, and 9describing columns that change with every write
operation.
Data Demographics answer these questions:
y How evenly will the data be spread across the AMPs?
yWill my data have spikes causing AMP space problems?
y Will the column change too often to be an index?
Extended Logical Data Model Template
Below is an example of an Extended Logical Data Model. The Value Access and Data Demographics have
been collected. We can now use this to pick our Primary Indexes and Secondary Indexes. During the first passat the table you should pick your potential Primary Indexes. Label them UPI or NUPI based on whether or not
the column is unique. At this point in time, dont look at the change rating.
The reason for the Extended Logical Data Model is to provide input for the Physical Model so that the Parsing
Engine (PE) Optimizer can best choose the least costlyaccess method orjoin path for user queries. Theoptimizer will look at factors such as row selection criteria, index and column demographics, and for the
best index choices.
A great Primary Index will have:
yA Value Access frequency that is high
yA Join Access frequency that is high
yReasonable distribution
yA change rating below 2
The Example Below illustrates an ELDM template of the Employee Table (assuming 20,000 Employees):
8/6/2019 Teradata Application Dev4
25/31
Chapter 4
EMP DEPT LNAME FNAME SAL
PK & FK PK SA FK
ACCESS
Value Acc Freq 6K 5K 100 0 0
Join Acc Freq 7K 6K 0 0 0
Value Acc Rows 70K 50K 0 0 0
DATA
DEMOGRAPHICS
Distinct Rows 20K 5K 12K N/A N/A
Max Rows Per Value 1 50 1K N/A N/A
Max Rows Null 0 12 0 N/A N/A
Typical Rows Per Value 1 15 3 N/A N/A
Change Rating 0 2 1 N/A N/A
Once we have our table templates we are ready for the Physical Database Design. Read on, this is becominginteresting!
The Physical Data Model
"Nothing can stand against persistence;
even a mountain will be worn down over time."
We have arrived at the moment of truth. We are arriving at the top of the mountain. It is now time to create thePhysical Database Design model. The biggest keys to a good physical model are choosing the correct:
yPrimary Indexes
ySecondary Indexes
yDenormalization Tables
yDerivedData Options
The physical model is important because that is the piece that makes Teradata perform at the optimum level ona daily basis. If you have done a great job with the physical model Teradata should perform like lightning. If
you have done the job on the physical model and Teradata is not performing to your anticipated speed then youmight want to get an upgrade. Remember, Teradata needs to be designed to perform best on a daily basis.
You dont justify an UPGRADE because of a slow Year-End orQuarter End Report! You justify an upgradeif you have done due diligence on the physical model and have reached the point in time when your system is
not performing well on a daily basis.
8/6/2019 Teradata Application Dev4
26/31
Chapter 4
No two minds are alike, but two tables are usually joined in exactly the same manner by everybody. This is whythe most important factor when picking a Primary Index is Join Access Frequency.
You still need to look if the index will distribute well. With poor distribution you are at risk of running out of
Perm or Spool.
Then you must look at Value Access Frequency to see if the column is accessed frequently. If it is then it is a
great candidate for a primary or secondary index.
If theDistribution for a Column is BAD, it has already been eliminatedas a Primary Index Candidate.
If you have a column with high Join Access Frequency, then this is yourPrimary Index of choice.
If you have a column that has a high ValueAccess Frequency, you can always create a secondary indexfor it.
If there are no Join Columns that survived theD
istribution Analysis, then pick the column with the best Value Access asyour Primary Index.
If all of the above fail or two columns are equally important, then pick the column with the best distribution as yourPrimary Index.
Denormalization
"Most databases denormalize because they have to, but Teradata
denormalizes because it wants to."
Denormalization is the process of implementing methods to improve SQL query performance. The biggest keys
to consider when deciding to denormalize a table are PERFORMANCE and VOLATILTITY. Willperformance improve significantly and does the volatility factor make denormalization worthwhile? It is in the
physical model that you can also determine the places to denormalize for extra speed.
Before you go crazy denormalizing remember these valid considerations foroptimal system performance firstor you are wasting your time. Make sure statistics will be collected properly. Then make sure you have chosen
yourPrimary Indexes based on userACCESS, UNIQUENESS for distribution purposes, and that yourPrimary Indexes are stable values (Change Rating low). Also, know your environment and your business
priorities. For example, most often the performance benefits of secondary indexes in OLTP environments
outweigh performance costs ofBatch Maintenance.
Improved performance is an admirable goal, however, one must be aware of the hazards of denormalization.
Denormalization will always reduce the amount of flexibility that you have with your data and can alsocomplicate the development effort. In addition, it will also increase the risk for data anomalies. Lastly, it could
also take on extra I/O and space overhead in some cases.
Others believe that denormalization has a positive effect on application coding because some feel it will reduce
the potential for data problems.
8/6/2019 Teradata Application Dev4
27/31
Chapter 4
Either way you should consider denormalization if users run certain queries over and over again and speed is anecessity. The key word here is performance. Performance for known queries is the most complete answer.
It is a great idea whenever you denormalize from your logical model to include the denormalization in "The
Denormalization Exception Report". This report keeps track of all deviations from 3rd
normal form in your datawarehouse.
Derived Data
Derived data is data that is calculated from other data. For instance, taking all of the employees salaries and
averaging them would calculate the Average Employee Salary. It is important to be able to determine whetherit is better to calculate derived data on demand or to place this information into a summary table.
The 4key factors for deciding whether to calculate or store stand alone derived data are:
y Response Time Requirements
y Access Frequency of the request
y Volatility of the column
y Complexity of the calculation
Response Time Requirements Derived data can take a period of time to calculate while a query is running. If
user requirements need speed and their requests are taking too long, then you might consider denormalizing tospeed up the request. If there is no need for speed, then be formal and stay with normal.
Access frequency of the request If one user needs the data occasionally then calculate on demand, but if
there are many users requesting the information daily, then consider denormalizing so many users can besatisfied.
Volatility of the column If the data changes often, then there is no reason to store the data in another table or
temporary table. If the data never changes and you can run the query one time and store the answer for a longperiod of time then you may want to consider denormalizing. If the game stays the same there is no need to be
formal make it denormal.
Complexity of the calculation The more complex the calculation the longer the request may take to process.
If the calculation takes a long time to calculate and you dont have the time to wait then you might considerplacing it in a table.
When you look at the above considerations you begin to see a clear picture. If there are several requests for
derived data, and the data is relatively stable, then denormalize and make sure that when any additional requestsare made the answer is ready to go.
8/6/2019 Teradata Application Dev4
28/31
Chapter 4
Temporary Tables
Setting up Derived, Volatile or Global Temporary tables allow users to use a temporary table during their entire
session. This is a technique where everyone wins. A great example might be this: Lets say you have a table thathas 120,000,000 rows. Yes, the number is 120 million rows. It is a table that tracks detail data for an entire year.
You have been asked to run calculations on a per month basis. You can create a temporary table, insert only the
month you need to calculate, and run queries until you logoff the session. Your queries in theory will run twelvetimes faster. After you logoff, the data in the temporary table goes away.
TABLE 1- Employee Table
EMP DEPT LNAME FNAME SALPK FK
1
2
3
4
5
6
40
20
20
40
10
30
BROWN
JONES
NGUYEN
JACOBS
SIMPSON
HAMILTON
CHRIS
JEFF
SHERRY
MORGAN
LUCY
65000.00
70000.00
55000.00
30000.00
40000.00
20000.00
TABLE 2 - Department Table
DEPT DEPT_NAME
PK
10
20
30
40
Human Resources
Sales
Finance
Information Technology
TABLE 3 Dept_Salary Table (Temporary Table)
8/6/2019 Teradata Application Dev4
29/31
Chapter 4
DEPT Sum_SAL Count_Sal Avg(Sal)
10
20
30
40
40000.00
125000.00
20000.00
95000.00
1
2
1
2
40000
75000
20000
47500
Volatile Temporary Tables
Volatile tables have multiple characteristics in common with derived tables. They are materialized in spool andare unknown to the Data Dictionary. They require NO data dictionary access ortransaction logging. The
table definition is designed foroptimal performance because the definition is kept in memory. It is restrictedto a single query statement at a time. However, unlike a derived table, a volatile table may be utilized multiple
times, and in more than one SQL statement throughout the life of a session. This feature allows for additionalqueries to utilize the same rows in the temporary table without requiring the rows to be rebuilt. The ability to
use the rows multiple times is the biggest advantage over derived tables. An example of how to create a volatiletable would be as follows:
CREATE VOLATILE TABLE Sales_Report_vt, LOG(
Sale_Date DATE
,Sum_Sale DECIMAL(9,2)
,Avg_Sale DECIMAL(7,2)
,Max_Sale DECIMAL(7,2)
,Min_Sale DECIMAL(7,2)
)
ON COMMIT PRESERVE ROWS ;
Now that the Volatile Table has been created, the table must be populated with an INSERT/SELECT statementlike the following:
8/6/2019 Teradata Application Dev4
30/31
Chapter 4
INSERT INTO Sales_Report_vt
SELECT Sale_Date
,SUM(Daily_Sales)
,AVG(Daily_Sales)
,MAX(Daily_Sales)
,MIN(Daily_Sales)
FROM Sales_Table
GROUP BY Sale_Date;
The create statement of a volatile table has a few options that need further explanation.The LOG option indicates there will be transaction logging of before images. The ON COMMIT PRESERVE
ROWS means that at the end of a transaction, the rows in the volatile table will not be deleted. The informationin the table remains for the entire session. Users can ask questions to the volatile table until they log off. Then
the table and data go away.
Global Temporary Tables
Global Temporary Tables are similar to volatile tables in that they are local to a users session. However, when
the table is created, the definition is stored in the Data Dictionary. In addition, these tables are materializedin a permanent area known as Temporary Space. Because of these reasons, global tables can survive a system
restart and the table definition will not discarded at the end of the session. However, when a session normallyterminates, the rows inside the Global Temporary Table will be removed. Lastly, Global tables require no spool
space. They use Temp Space.
Users from other sessions cannot access another users materialized global table. However, unlike volatiletables, once the table is de-materialized, the definition still resides in the Data Dictionary. This allows for future
materialization of the same table. If the global table definition needs to be dropped, then an explicit DROPcommand must be executed. How real does Teradata consider global temporary tables? They can even be
referenced from a view ormacro.
An example of how to create a global temporary table would be as follows:
CREATE GLOBAL TEMPORARY TABLE Sales_Report_gt, LOG
( Sale_Date DATE
,Sum_Sale DECIMAL(9,2)
,Avg_Sale DECIMAL(7,2)
8/6/2019 Teradata Application Dev4
31/31
Chapter 4
,Max_Sale DECIMAL(7,2)
,Min_Sale DECIMAL(7,2) )
PRIMARY INDEX(Sale_Date)ON COMMIT PRESERVE ROWS ;
Now that the Global Temporary Table has been created, the table must be populated with
an INSERT/SELECT statement like the following: INSERT INTO Sales_Report_gtSELECT Sale_Date
,SUM(Daily_Sales)
,AVG(Daily_Sales)
,MAX(Daily_Sales)
,MIN(Daily_Sales)FROM Sales_Table
GROUP BY Sale_Date ;