+ All Categories
Home > Documents > Teradata Application Dev4

Teradata Application Dev4

Date post: 07-Apr-2018
Category:
Upload: saurav560
View: 218 times
Download: 0 times
Share this document with a friend

of 31

Transcript
  • 8/6/2019 Teradata Application Dev4

    1/31

    Chapter 4

    Chapter 4 The Extended Logical Data Model

    "Too much of a good thing is just right."

    Too much of a good thing is just right, but even a little of a bad thing will mess up your data warehouse. We

    have arrived at building the Extended Logical Data Model. The Extended Logical Data Model will serve as theinput to the Physical Data Model. We are in a critical stage. We must produce excellence in this section because

    we dont want bad input going into our Physical Data Model. This comes from Computer Science 101 -GarbageIn Garbage Out (GIGO). Building a poor Extended Logical Data Model makes about as much sense as Mae

    West running east looking for a Sunset! Things will be headed in the wrong direction. If however, you canproduce quality in your ELDM, your warehouse is well on its way to being just right!

    This chapter will begin with the Application Development Life Cycle, and then talk about the Logical Modeland Normalization. From there, we get to the meat and discuss the metrics, which is a critical part of the

    Extended Logical Data Model. The ELDM will become our final input into the Physical Data Model.

    The Application Development Life Cycle

    "Failure accepts no alibis.

    Success requires no explanation."

    The design of the physical database is key in the success of implementing a data warehouse along with theapplications that reside on the system. Whenever something great is built, whether a building, an automobile, or

    a business system, there is a process that facilitates its development. When designing the physical database, theprocess to conduct this is known as the Application Development Life Cycle. If you follow the process you

    will need no alibis because success will be yours.

    The six major phases of this process are as follows:

    yDesign - Developing the Logical Data Model (LDM), Extended Logical Data Model (ELDM), and the

    Physical Data Model (PDM).

    yDevelopment - Generating the queries according to the requirements of the business users.

    yTest- Measure the impact that the queries have on system resources.

    yProduction - Follow the plan and move the procedures into the production environment.

    yDeployment - Training the users on the system application and query tools (i.e., SQL, MicroStrategy,

    Business Objects, etc.).

    yMaintenance - Re-examining strategies such as loading data and index selection.

    It is important that you understand the fundamentals and which order to perform them in the development life

    cycle of an application. First there is the Business Discovery, then second is a Logical Data Model, then third

  • 8/6/2019 Teradata Application Dev4

    2/31

    Chapter 4

    you get an outstanding Physical Model, then fourthDesign the Application, and then lastly you perform yourDevelopment and Assurance Testing.

    During the testing phase of an application it is important to check that Teradata is using Parallelism, there are

    no large spool space peaks, and that AMP utilization is equal. A HOT AMP is a bad sign and so is runningout of Spool.

    Asking the Right Questions

    "He who asks a question may be a fool for five minutes, but he who never

    asks a question remains a fool forever."

    The biggest key to this section is knowledge and not being afraid to ask the right questions. Knowledge about

    the user environment is vitally important. If you can ask the right questions, you will build a model that willmap to the users needs. In addition, you will be able to deliver a world-class data warehouse that remains cool

    forever.

    Here is how this works. The logical modelers will create a logical data model. Then it is up to you to ask theright questions and find out about the demographics of the data. Only then you can build the proper Physical

    Data Model.

    Remember:

    The Logical Data Model will be the input to the Extended Logical Data Model.

    The Extended Logical Data Model will be input to the Physical Data Model.

    The Physical Data Model is where Denormalization and the advantage of parallelism are determined.

    Logical Data Model

    "When you are courting a nice girl an hour seems like a second. When

    you sit on a red-hot cinder a second seems like an hour. Thats relativity."

    The first step of the design phase is called the Logical Data Model (LDM). The LDM is a logical representationof tables that reside in the data warehouse database. Tables, rows and columns are the equivalent of files,

    records and fields in most programming languages. A properly normalized data model allows users to ask awide variety of questions today, tomorrow, and forever.

    The following illustration displays the Employee Table. The columns are emp, dept, lname, fname and sal:

  • 8/6/2019 Teradata Application Dev4

    3/31

    Chapter 4

    Employee Table

    EMP DEPT LNAME FNAME SAL

    Primary

    Key

    Foreign

    Key

    1

    2

    3

    4

    40

    20

    20

    10

    BROWN

    JONES

    NGUYEN

    JACOBS

    CHRIS

    JEFF

    XING

    SHERRY

    95000.00

    70000.00

    55000.00

    34000.00

    Notice that each of the four rows in the employee table is listed across all of columns and each row has a value for each

    column. A row is the smallest unit that can be inserted, or deleted from a table in a data warehouse database.

    Primary Keys

    "Instead of giving politicians the keys to the city, it might be better to

    change the locks."

    The Primary Key of a table is the column or group of columns whose values will identify each row of thattable.

    Every table has to have a primary key: Tables are very flexible when it comes to defining how a tables datacan be laid out. However, every table must have a primary key. Each row within that table must always be

    uniquely identifiable.

    Every table can only have one primary key: If the table happens to have several possible combinations thatcould work as a primary key, only one can be chosen. You cannot have more than one primary key on a table.

    The smallest group of columns, often just one, is usually the best.

    PKmeans primary key: Primary keys will be marked with the letters PK.

  • 8/6/2019 Teradata Application Dev4

    4/31

    Chapter 4

    "Life is a foreign language; all men mispronounce it."

    A foreign key is a column or group of columns that happens to be a primary key in another table. Foreign keys

    help to relate a group of rows to another group of rows. Both groups of rows are required to have a commoncolumn containing like-data so that they can match up with each other, and these groups of rows can be found

    on the same or multiple table(s).

    FK means foreign key: When drawing out the design of your tables, FK will stand for your foreign keys. Theycan also be numbered so that you can properly mark multi-column foreign keys (FK1, FK2, etc).

    Primary Key Foreign Keys Establish a Relationship"One of the keys to happiness is bad memory."

    Primary Key Foreign Key relationships establish a relation therefore the tables can be joined. The picture below shows

    the joining of the Employee Table and the Department Table.

  • 8/6/2019 Teradata Application Dev4

    5/31

    Chapter 4

    "The noblest search is the search for excellence."

    The term normalizing a database came from IBMs Codd. He was intrigued by Nixons term of trying tonormalize relations with China so he began his search for excellence to normalize relations between tables. He

    called the process normalization. Most modelers believe the noblest search is the search for perfection so theyoften can take years to perfect the data model. This is a mistake. When creating a logical data model, dont

    strive for perfection; aim for excellence. And do it quickly!

    Normalizing a database is a process of placing columns that are not key related (PK/FK) columns into tables in

    such a way that flexibility is utilized, redundancy is minimized, and update anomalies are vaporized! Industry

    experts consider Third Normal Form (3NF) mandatory, however it is not an absolute necessity when utilizing aTeradata Data Warehouse. Teradata has also been proven to be extremely successful with Second Normal Formimplementations. The first four forms are described as follows:

    1. First Normal Form (1NF) eliminates repeating groups, for each Primary Key there is only one occurrence

    of the column data.

    2. Second Normal Form (2NF) is when the columns/attributes relate to the entire Primary Key, not just a

    portion of it. All tables with a single column Primary Key are considered second normal form.

    3. Third Normal Form(3NF) states that all columns/attributes relate to the entire primary key and no other

    key or column.

    The interesting thing about a normalized model is that at each level, consistency is maintained. For example, in

    first normal form there are no repeating groups. When a table is normalized from first to second normal form, itwill have no repeating groups and attributes must relate to the entire primary key. In third normal form it will

    have no repeating groups (like first normal form), attributes must relate to the entire primary key (likesecond normal form), and attributes must relate to the primary key and not to each other (like third normal

    form).

  • 8/6/2019 Teradata Application Dev4

    6/31

    Chapter 4

    Once the tables are defined, related, and normalized, the Logical Data Model is complete. The next phaseinvolves the business users, which establish the foundation for creation of the Extended Logical Data Model.

    Read on as we discuss this next step of the design process of Teradata.

    "Examine what is said, not him who speaks."

    An entity represents a person, place, or anything of that matter that can be uniquely identified. We mentioned

    earlier in the book that a noun is the name of a type of a person, place or thing in simple terms. Entities arenouns that we want to track. We will want to track our favorite nouns such as employees, customers, locations,

    departments, products and stores.

    A great building starts with a great foundation. A great table starts by tracking a great entity. If a business canname a noun that it wants to keep track of, then this is the start of a great table.

  • 8/6/2019 Teradata Application Dev4

    7/31

  • 8/6/2019 Teradata Application Dev4

    8/31

    Chapter 4

    Examples of Major and MinorEntities

  • 8/6/2019 Teradata Application Dev4

    9/31

    Chapter 4

    "Fate chooses your relations, you choose your friends."

    A relation can be a state of being, an association, an action, or an event that will tie two or more entities

    together. Relations come in three different forms; one-to-one, one-to-many and many-to-many. No matter whatthe form of the relation is the tables being related will have a Primary Key Foreign Key relationship.

    The term Normalization in a relational database actually comes from Dr. Codd of IBM. He was inspired by

    President Richard Nixon who at the time was trying to build a positive relationship with China. President Nixontermed what he was doing as normalizing relations with China. Because there is only one China and only one

    United States of America, their normalizing of relations could be considered a one-to-one relationship.

    When they discover the center of the universe, a lot of people will be

    disappointed to discover they are not in it."

    A one-to-one relation is found when each occurrence of one entity (entity A) can be related to only oneoccurrence of another entity (entity B), or none at all. The same applies when you relate entity B to entity A.

    This is the rarest form of relations, and may not be found in most data models you create.

    Ones dignity may be assaulted, vandalized, and cruelly mocked, but it

    cannot be taken away unless it is surrendered."

    A one-to-many relation is found when each occurrence of one entity (entity A) can be related to only oneoccurrence of another entity (entity B). But when it comes to each occurrence of entity B, you can find multiple

    occurrences relating to Entity A. This is a very common relation found in tables, and will appear in almost allmodels. For example, you can only have 1 department assigned to an employee, but you can have multiple

    employees assigned to a department.

  • 8/6/2019 Teradata Application Dev4

    10/31

    Chapter 4

    "Cats are smarter than dogs. You cant get eight cats to pull a sled

    through snow."

    A many-to-many relation is found when each occurrence of on entity (entity A) can be related to many different

    occurrences of another entity (entity B). The same will apply when you relate entity B to entity A. This is also avery common form of a relation.

  • 8/6/2019 Teradata Application Dev4

    11/31

    Chapter 4

    "A lot of people approach risk as if its the enemy when its really

    fortunes accomplice."

    Because a Many-to-Many relationship does not have a direct Primary Key Foreign Key relationship anassociative table is utilized as the middle man. The associative table has a multi-column Primary Key. One of

    the associative tables Primary Key columns is the Primary Key of table A and the other Primary Key column ofthe associative table is the Primary Key of table B. The example below shows the exact syntax for joining the

    Student Table to the Course Table via the associative table; Student Course Table.

    "The surprising thing about young fools is how many survive to become

    old fools."

    As you can see from our example below we were able to join our Many-to-Many relationships table via the

    associative table.

  • 8/6/2019 Teradata Application Dev4

    12/31

    Chapter 4

    "If computers get too powerful, we can organize them into a committee

    that will do them in."

    An attribute is a characteristic of an entity or a relation describing its character, amount or extent.

    "This is the sixth book Ive written, which isnt bad for a guy whos only

    read two."

  • 8/6/2019 Teradata Application Dev4

    13/31

  • 8/6/2019 Teradata Application Dev4

    14/31

    Chapter 4

    "To me, old age is always 15 years older than I am."

    In this chapter were going to learn about special cases of the relation section, which are called recursiverelations. A recursive relation is a relation between variable occurrences of an entity. This also extends to

    relations to subsets of entities and dependents of entities.

    "My idea of an agreeable person is a person who agrees with me."

    The second type of special cases is called complex relations. A complex relation is a relation thats shared

    between more than two entities. This also includes subsets and dependents of entities, which is why it can becomplex.

  • 8/6/2019 Teradata Application Dev4

    15/31

    Chapter 4

    "They always say time changes things, but you actually have to change them

    yourself."-

    The third scenario of special relations is the time relation. This type of scenario is a relation between an entity, asubset, or a dependent, and a time value.

    A Normalized Data Warehouse

    "The reputation of a thousand years may be determined by the conduct of one

    hour."

    A normalized data warehouse will have many different tables that are related to one another. Tables that

    have a relation can be joined. Most normalized databases will have many tables with fewer columns. Thisprovides flexibility and is a natural way for business users to view the business. Each table created will have a

    Primary Key. All of the columns in the table should relate to the Primary Key. A Foreign Key is anothercolumn in the table that is also a Primary Key in another table. The Primary Key Foreign key relation is

    how joins are performed on two tables.

  • 8/6/2019 Teradata Application Dev4

    16/31

    Chapter 4

    Relational Databases use the Primary Key Foreign Key relationships to join tables together.

    "Never insult an alligator until after you have crossed the river."Never insult a modeler until after you have the ERWin diagram in hand. Many believe that a normalized model

    is best while others argue that dimensional models are better. A combination of both is an excellent strategy.

    Dimensional Modeling was originally designed for retail supermarket applications because their systems didnot have the performance to perform joins, full table scans, aggregations, and sorting. Dimensional Modelingoften implements fewer tables and can be adapted to enhance performance. This is because dimensional

    modeling was designed around answering specific questions.

  • 8/6/2019 Teradata Application Dev4

    17/31

    Chapter 4

    v

    "Facts are stupid things."

    The Dimension Table The dimension table helps to describe the measurements on the fact table. Thedimension table can and will contain many columns and/or attributes. Dimension tables tend to be relatively

    shallow when it comes to the average number of rows within a dimension table. Each dimension is defined byits primary key, which serves as the basis for referential integrity with any fact table to which it is joined.

    Dimension attributes serves as the source of query constraints, groupings, and report labels. When a query orreport is requested, attributes can be identified as the words following by. For example, a user may ask to see

    last weeks total sales by Product_Id and by Customer_Number. Product_Id and Customer_Number will haveto be available as dimension attributes. Dimension Table attributes are key to making a data warehouse usable

    and understandable. Dimensions also implement the user interface to the data warehouse.

  • 8/6/2019 Teradata Application Dev4

    18/31

  • 8/6/2019 Teradata Application Dev4

    19/31

    Chapter 4

    "The weak can never forgive. Forgiveness is the attribute of the strong."

    Each dimension on a dimensional model is going to have attributes that make that table unique from the rest.

    Each table will vary, depending on what information is on which table. Attributes will look something like thefollowing:

    "People can have the Model T in any color so long as its black."

    The following two pictures represent an Entity-Relational (ER) Model and a Dimensional Model (DM):

  • 8/6/2019 Teradata Application Dev4

    20/31

    Chapter 4

    "The man who is swimming against the stream knows the strength of it."

    Dimensional Modeling contains a number of warehousing advantages that the ER model lacks. First off, thedimensional model is very predictable. Query tools and report writers are able to make strong assumptions

    about the dimensional model to make user interfaces more understandable, and to make more efficientprocessing. Metadata is able to use the cardinality of values within a dimension to help guide user-interface

    behavior. Because the framework is predictable, it strengthens processing. The database engine is able to makestrong assumptions about constraining dimension tables and then linking to the fact table all at once along with

    the Cartesian product of the dimension table keys that satisfy the users constraints. This approach enables theuser to evaluate arbitrary n-way joins to a fact table within a single pass through the fact tables index.

    There are several standard approaches for handling certain modeling situations. Each situation has a well-

    understood set of alternative decisions that can be programmed in report writers, query tools, and userinterfaces. These situations can include:

    - Slow-changing dimensions involving dimension tables that change slowly over time. Dimensionalmodeling helps to provide techniques for handling these slow-changing dimensions, depending on the

    company.

    - Miscellaneous products where a business needs to track a number of different lines of business within asingle common set of attributes and facts.

    - Pay-in-advance databases where transactions of the company are more than small revenue accounts,

    however the business may want to look at single transactions as well as a report of revenue regularly.

    The last strength of the dimensional model is the growing pool of DBA utilities and software processes that

    regulate and use aggregates. Remember, an aggregate is considered summary data that is logically redundantwithin the data warehouse and are used to enhance query performance. A well-formed strategy for

    comprehensive aggregates is needed for any medium to large data warehouse implementation. Another way tolook at it is if you dont have any aggregates, lots of money could end up being wasted on minuscule hardware

    upgrades.

    Aggregate management software packages and navigation utilities depend on a specific single structure of thefact and dimension tables, which in turn is dependant of the dimensional model. If you stick to ER modeling,

    you will not benefit from these tools.

    "The problem with political jokes is that they get elected."

    The dimensional model can fit a data warehouse very nicely. However, this doesnt mean that the dimensionalmodel is perfect. There are several situations where an ER model is better than the dimensional model, and vice

    versa.

    A star-join schema tends to fit the needs of users who have been interviewed; its possible that it doesnt fit all

    users. There will always be users who dont contribute to the dimensional modeling process. The star-joindesign tends to optimize the access of data for solely one group of users at the expense of everyone else. A star-

    join schema with just one shape and a set number of constraints can be extremely optimal for one group ofusers, and horrendous for another. Star-join schemas are shaped around user requirements, and because a) not

  • 8/6/2019 Teradata Application Dev4

    21/31

    Chapter 4

    every user can be interviewed during the dimensional model design phase, and b) user requirements vary fromgroup to group, different star-schemas will be optimal for different groups of users.

    A single star-join schema will never fit everyones requirements in a data warehouse. The data warehousing

    industry has long-since discovered that a single database will not work for all purposes. There are severalreasons why companies need their own star-join schemas and cant share a star-join schema with another

    company:

    - Sequencing of data. Finance users love to see data sequenced one way while marketing users love to seeit sequenced another way.

    - Data definitions. The sales department considers a sale as closed business, while the accountingdepartment sees it as booked business.

    - Granularity. Finance users look at things in terms of monthly and yearly revenue. Accounting users

    look at things in terms of quarterly revenue.

    - Geography.The sales department looks at things in term of ZIP code, while the marketing departmentmight look at things in terms of states.

    - Products. Sales tend to look things in terms of future products, while the finance department tends tolook at things in terms of existing products.

    - Time. Finance looks at the world through calendar dates, while accounting looks at the world through

    closing dates

    - Sources of data. A source system will feed one star join while another source system feeds another.

    The differences between certain business operations and others are much more vast than the short list above.Because there are many types of users on a database, many situations can arise at any time, and on any subject.

    "A problem well stated is a problem half-solved."

    Each department of a company tends to conduct their business differently than other departments. Eachdepartment sees an area of the data warehouse differently than the others because of their unique and different

    user requirements. Each department tends to care for different aspects of the company, which is why eachdepartment sees things differently from others. Because of the need for each department to view things in their

    own unique way, each department requires a different star-join schema for their data warehouse. A star joinschema that is optimal for the finance department is practically useless for marketing. The reality of the

    business world is that one star-join schema will not fit all aspects of a companys data warehouse.

    It is possible to design a star-join schema specifically for each department, but even then certain problems will

    arise. Most of the problems dont even become apparent until multiple star joins have been designed for the datawarehouse. When you have multiple independent star-join schema environments, the same detailed data will

    appear on each star-join. Data will no longer be reconcilable, and any new star-join creation will require thesame amount of work as the old star-joins. Because of this:

  • 8/6/2019 Teradata Application Dev4

    22/31

  • 8/6/2019 Teradata Application Dev4

    23/31

    Chapter 4

    Follow Socrates advice and assume you know nothing when starting this portion of the design process.However, If you make a column your Primary Index that is never accessed in the WHERE clause or used to

    JOIN to other tables, then Socrates had you in mind on the last three words of his quote.

    It will be your job to interview users, look at applications, and find out what columns are being used in theWHERE clause for the SELECTs. The key here is to investigate what tables are being joined. It is also

    important to know what columns are joining the tables together. You will be able to find some join informationfrom the users, but common sense plays a big part in join decisions. Understand how column(s) are accessed,

    and your warehouse is on its way to providing true wisdom!

    COLUMN ACCESS in the WHERE CLAUSE:

    Value Access Frequency How frequently the table will be accessed via thiscolumn.

    Value Access Rows * Frequency The number of rows that will be accessed multiplied

    by the frequency at which the column will be used.

    JoinA

    ccess FrequencyHow frequently the table will be joined to another

    table by this column being used in the WHEREclause.

    Join Access Rows The number of rows joined.

    Quite often, new designers to Teradata believe that selecting the Primary Index will be easy. They just pick thecolumn that will provide the best data distribution. They assume that if they keep the Primary Index the same as

    the Primary Key column (which is unique by definition), then the data will distribute evenly (which it true). ThePrimary Index is about distribution, but even more important is Join Access. Remember this golden rule!

    The Primary Index is the fastest way to access data and Secondary Indexes are next.

    Dont count the days, make the days count."

    Mohammed Ali has given you some advice that is reflective of your next objective. Count the DATA so youcan make the DATAcount. So how do you accomplish this task? The following will be your guide:

    y Write SQL and utilize a wide variety of tools to get the data demographics.

    yCombine these demographics with the column and join access information to complete the Extended LogicalData Model.

    yThen use this information to create the Primary Indexes, Secondary Indexes, and other options in the PhysicalDatabase Design.

    DATA DEMOGRAPHICS:

  • 8/6/2019 Teradata Application Dev4

    24/31

    Chapter 4

    Distinct Values The total number of unique values that will be storedin this column.

    Maximum Rows per Value The number of rows that will have the most popularvalue in this column.

    Typical Rows per Value The typical number of rows for each column value.

    Maximum RowsNULL

    The number of rows with NULL values for thecolumn.

    Change Rating A relative rating for how often the column value willchange. The value range is from 0-9, with 0

    describing columns that do not change, and 9describing columns that change with every write

    operation.

    Data Demographics answer these questions:

    y How evenly will the data be spread across the AMPs?

    yWill my data have spikes causing AMP space problems?

    y Will the column change too often to be an index?

    Extended Logical Data Model Template

    Below is an example of an Extended Logical Data Model. The Value Access and Data Demographics have

    been collected. We can now use this to pick our Primary Indexes and Secondary Indexes. During the first passat the table you should pick your potential Primary Indexes. Label them UPI or NUPI based on whether or not

    the column is unique. At this point in time, dont look at the change rating.

    The reason for the Extended Logical Data Model is to provide input for the Physical Model so that the Parsing

    Engine (PE) Optimizer can best choose the least costlyaccess method orjoin path for user queries. Theoptimizer will look at factors such as row selection criteria, index and column demographics, and for the

    best index choices.

    A great Primary Index will have:

    yA Value Access frequency that is high

    yA Join Access frequency that is high

    yReasonable distribution

    yA change rating below 2

    The Example Below illustrates an ELDM template of the Employee Table (assuming 20,000 Employees):

  • 8/6/2019 Teradata Application Dev4

    25/31

    Chapter 4

    EMP DEPT LNAME FNAME SAL

    PK & FK PK SA FK

    ACCESS

    Value Acc Freq 6K 5K 100 0 0

    Join Acc Freq 7K 6K 0 0 0

    Value Acc Rows 70K 50K 0 0 0

    DATA

    DEMOGRAPHICS

    Distinct Rows 20K 5K 12K N/A N/A

    Max Rows Per Value 1 50 1K N/A N/A

    Max Rows Null 0 12 0 N/A N/A

    Typical Rows Per Value 1 15 3 N/A N/A

    Change Rating 0 2 1 N/A N/A

    Once we have our table templates we are ready for the Physical Database Design. Read on, this is becominginteresting!

    The Physical Data Model

    "Nothing can stand against persistence;

    even a mountain will be worn down over time."

    We have arrived at the moment of truth. We are arriving at the top of the mountain. It is now time to create thePhysical Database Design model. The biggest keys to a good physical model are choosing the correct:

    yPrimary Indexes

    ySecondary Indexes

    yDenormalization Tables

    yDerivedData Options

    The physical model is important because that is the piece that makes Teradata perform at the optimum level ona daily basis. If you have done a great job with the physical model Teradata should perform like lightning. If

    you have done the job on the physical model and Teradata is not performing to your anticipated speed then youmight want to get an upgrade. Remember, Teradata needs to be designed to perform best on a daily basis.

    You dont justify an UPGRADE because of a slow Year-End orQuarter End Report! You justify an upgradeif you have done due diligence on the physical model and have reached the point in time when your system is

    not performing well on a daily basis.

  • 8/6/2019 Teradata Application Dev4

    26/31

    Chapter 4

    No two minds are alike, but two tables are usually joined in exactly the same manner by everybody. This is whythe most important factor when picking a Primary Index is Join Access Frequency.

    You still need to look if the index will distribute well. With poor distribution you are at risk of running out of

    Perm or Spool.

    Then you must look at Value Access Frequency to see if the column is accessed frequently. If it is then it is a

    great candidate for a primary or secondary index.

    If theDistribution for a Column is BAD, it has already been eliminatedas a Primary Index Candidate.

    If you have a column with high Join Access Frequency, then this is yourPrimary Index of choice.

    If you have a column that has a high ValueAccess Frequency, you can always create a secondary indexfor it.

    If there are no Join Columns that survived theD

    istribution Analysis, then pick the column with the best Value Access asyour Primary Index.

    If all of the above fail or two columns are equally important, then pick the column with the best distribution as yourPrimary Index.

    Denormalization

    "Most databases denormalize because they have to, but Teradata

    denormalizes because it wants to."

    Denormalization is the process of implementing methods to improve SQL query performance. The biggest keys

    to consider when deciding to denormalize a table are PERFORMANCE and VOLATILTITY. Willperformance improve significantly and does the volatility factor make denormalization worthwhile? It is in the

    physical model that you can also determine the places to denormalize for extra speed.

    Before you go crazy denormalizing remember these valid considerations foroptimal system performance firstor you are wasting your time. Make sure statistics will be collected properly. Then make sure you have chosen

    yourPrimary Indexes based on userACCESS, UNIQUENESS for distribution purposes, and that yourPrimary Indexes are stable values (Change Rating low). Also, know your environment and your business

    priorities. For example, most often the performance benefits of secondary indexes in OLTP environments

    outweigh performance costs ofBatch Maintenance.

    Improved performance is an admirable goal, however, one must be aware of the hazards of denormalization.

    Denormalization will always reduce the amount of flexibility that you have with your data and can alsocomplicate the development effort. In addition, it will also increase the risk for data anomalies. Lastly, it could

    also take on extra I/O and space overhead in some cases.

    Others believe that denormalization has a positive effect on application coding because some feel it will reduce

    the potential for data problems.

  • 8/6/2019 Teradata Application Dev4

    27/31

    Chapter 4

    Either way you should consider denormalization if users run certain queries over and over again and speed is anecessity. The key word here is performance. Performance for known queries is the most complete answer.

    It is a great idea whenever you denormalize from your logical model to include the denormalization in "The

    Denormalization Exception Report". This report keeps track of all deviations from 3rd

    normal form in your datawarehouse.

    Derived Data

    Derived data is data that is calculated from other data. For instance, taking all of the employees salaries and

    averaging them would calculate the Average Employee Salary. It is important to be able to determine whetherit is better to calculate derived data on demand or to place this information into a summary table.

    The 4key factors for deciding whether to calculate or store stand alone derived data are:

    y Response Time Requirements

    y Access Frequency of the request

    y Volatility of the column

    y Complexity of the calculation

    Response Time Requirements Derived data can take a period of time to calculate while a query is running. If

    user requirements need speed and their requests are taking too long, then you might consider denormalizing tospeed up the request. If there is no need for speed, then be formal and stay with normal.

    Access frequency of the request If one user needs the data occasionally then calculate on demand, but if

    there are many users requesting the information daily, then consider denormalizing so many users can besatisfied.

    Volatility of the column If the data changes often, then there is no reason to store the data in another table or

    temporary table. If the data never changes and you can run the query one time and store the answer for a longperiod of time then you may want to consider denormalizing. If the game stays the same there is no need to be

    formal make it denormal.

    Complexity of the calculation The more complex the calculation the longer the request may take to process.

    If the calculation takes a long time to calculate and you dont have the time to wait then you might considerplacing it in a table.

    When you look at the above considerations you begin to see a clear picture. If there are several requests for

    derived data, and the data is relatively stable, then denormalize and make sure that when any additional requestsare made the answer is ready to go.

  • 8/6/2019 Teradata Application Dev4

    28/31

    Chapter 4

    Temporary Tables

    Setting up Derived, Volatile or Global Temporary tables allow users to use a temporary table during their entire

    session. This is a technique where everyone wins. A great example might be this: Lets say you have a table thathas 120,000,000 rows. Yes, the number is 120 million rows. It is a table that tracks detail data for an entire year.

    You have been asked to run calculations on a per month basis. You can create a temporary table, insert only the

    month you need to calculate, and run queries until you logoff the session. Your queries in theory will run twelvetimes faster. After you logoff, the data in the temporary table goes away.

    TABLE 1- Employee Table

    EMP DEPT LNAME FNAME SALPK FK

    1

    2

    3

    4

    5

    6

    40

    20

    20

    40

    10

    30

    BROWN

    JONES

    NGUYEN

    JACOBS

    SIMPSON

    HAMILTON

    CHRIS

    JEFF

    XING

    SHERRY

    MORGAN

    LUCY

    65000.00

    70000.00

    55000.00

    30000.00

    40000.00

    20000.00

    TABLE 2 - Department Table

    DEPT DEPT_NAME

    PK

    10

    20

    30

    40

    Human Resources

    Sales

    Finance

    Information Technology

    TABLE 3 Dept_Salary Table (Temporary Table)

  • 8/6/2019 Teradata Application Dev4

    29/31

    Chapter 4

    DEPT Sum_SAL Count_Sal Avg(Sal)

    10

    20

    30

    40

    40000.00

    125000.00

    20000.00

    95000.00

    1

    2

    1

    2

    40000

    75000

    20000

    47500

    Volatile Temporary Tables

    Volatile tables have multiple characteristics in common with derived tables. They are materialized in spool andare unknown to the Data Dictionary. They require NO data dictionary access ortransaction logging. The

    table definition is designed foroptimal performance because the definition is kept in memory. It is restrictedto a single query statement at a time. However, unlike a derived table, a volatile table may be utilized multiple

    times, and in more than one SQL statement throughout the life of a session. This feature allows for additionalqueries to utilize the same rows in the temporary table without requiring the rows to be rebuilt. The ability to

    use the rows multiple times is the biggest advantage over derived tables. An example of how to create a volatiletable would be as follows:

    CREATE VOLATILE TABLE Sales_Report_vt, LOG(

    Sale_Date DATE

    ,Sum_Sale DECIMAL(9,2)

    ,Avg_Sale DECIMAL(7,2)

    ,Max_Sale DECIMAL(7,2)

    ,Min_Sale DECIMAL(7,2)

    )

    ON COMMIT PRESERVE ROWS ;

    Now that the Volatile Table has been created, the table must be populated with an INSERT/SELECT statementlike the following:

  • 8/6/2019 Teradata Application Dev4

    30/31

    Chapter 4

    INSERT INTO Sales_Report_vt

    SELECT Sale_Date

    ,SUM(Daily_Sales)

    ,AVG(Daily_Sales)

    ,MAX(Daily_Sales)

    ,MIN(Daily_Sales)

    FROM Sales_Table

    GROUP BY Sale_Date;

    The create statement of a volatile table has a few options that need further explanation.The LOG option indicates there will be transaction logging of before images. The ON COMMIT PRESERVE

    ROWS means that at the end of a transaction, the rows in the volatile table will not be deleted. The informationin the table remains for the entire session. Users can ask questions to the volatile table until they log off. Then

    the table and data go away.

    Global Temporary Tables

    Global Temporary Tables are similar to volatile tables in that they are local to a users session. However, when

    the table is created, the definition is stored in the Data Dictionary. In addition, these tables are materializedin a permanent area known as Temporary Space. Because of these reasons, global tables can survive a system

    restart and the table definition will not discarded at the end of the session. However, when a session normallyterminates, the rows inside the Global Temporary Table will be removed. Lastly, Global tables require no spool

    space. They use Temp Space.

    Users from other sessions cannot access another users materialized global table. However, unlike volatiletables, once the table is de-materialized, the definition still resides in the Data Dictionary. This allows for future

    materialization of the same table. If the global table definition needs to be dropped, then an explicit DROPcommand must be executed. How real does Teradata consider global temporary tables? They can even be

    referenced from a view ormacro.

    An example of how to create a global temporary table would be as follows:

    CREATE GLOBAL TEMPORARY TABLE Sales_Report_gt, LOG

    ( Sale_Date DATE

    ,Sum_Sale DECIMAL(9,2)

    ,Avg_Sale DECIMAL(7,2)

  • 8/6/2019 Teradata Application Dev4

    31/31

    Chapter 4

    ,Max_Sale DECIMAL(7,2)

    ,Min_Sale DECIMAL(7,2) )

    PRIMARY INDEX(Sale_Date)ON COMMIT PRESERVE ROWS ;

    Now that the Global Temporary Table has been created, the table must be populated with

    an INSERT/SELECT statement like the following: INSERT INTO Sales_Report_gtSELECT Sale_Date

    ,SUM(Daily_Sales)

    ,AVG(Daily_Sales)

    ,MAX(Daily_Sales)

    ,MIN(Daily_Sales)FROM Sales_Table

    GROUP BY Sale_Date ;


Recommended