6234 Course Notes

transcript

6234 SQL Server Analysis Services

Module 1 Introduction to SQL Server Analysis ServicesLecture: 60 minutesLab: 40 minutes (install SSAS – start installation then take break)

Companies want information, data is easy to come by, information is not. Companies want to:

Track key performance indicators (# defects, sales, #returning students) Identify trends – Nbr incidents reported is going up, Sales are down Make predictions – Men between 18-25 are more likely to be in a car accident

Tools available Relational Reporting – provides summarized data to users from OLTP databases,

easy to use, but some report are slow to run (matrix reports, history reports) OLAP – stores aggregates (totals) and is faster than relational reporting for large

volumes of data, especially matrix type reports Data Mining – searching OLTP & OLAP for trends and patterns

o Which is a bigger factor in who buys a bike? Age or income?o What combinations of courses are often taken by a student?

OLAP Concepts – distribute Handout on OLTP vs OLAP Data Warehouse – contains large amounts of historical data usually combined

from multiple sources and denormalized in a snowflake or star schema Data Mart – subset of a data warehouse on a particular subject Facts – numerical measurements, or measures that are summarized, e.g. $ Sales,

quantity sold Dimension – the different ways to categorize a fact e.g. by product, by Customer,

by Region Cubes – multidimensional structure that stores summarized fact & dimension

data. When a user wants to query a data warehouse they use a cube Slicing and Dicing – isolating individual results in a cube e.g. comparing each

region’s sales of bicycles (slicing), comparing each region’s sales of bicycles by month (dicing)

Pivot Tables – user interface for browsing cubes, slicing and dicing

SQL Server Analysis Services Features OLAP – design, build, deploy and query cubes Data Mining – Identify patterns and trends to try and make predictions. SSAS

supports a number of data mining algorithms to data analysis Multiple Data Sources – Data Source Views allow you to access data from

multiple data stores OLAP and OLTP KPIs – Allows you to monitor a metric or a combination of metrics based on a

formula that you have identified as a key performance indicator

Server Stores analysis services database Analysis service runs on server and accesses the analysis services database Analysis services handles aggregations, transactions, calculations, metadata

management, security, XML for analysisClient

Allows clients to connect to Analysis services with ADO MD, ADO MD.Net, XML/A, OLEDB for OLAP. Users access cubes through tools like Microsoft Excel, or Performance Point

Tools Business Intelligence Development Studio Visual Studio for developing BI

applications: SSAS, Reporting Services and SS Integration Services. SQL Server Management Studio – for manipulating databases and managing

deployed Analysis service solutions SQL Server Configuration Manager – to manage sql server client-server

configurations (network protocols supported, Service settings for SSAS)

Analysis Services Objects Data Sources – connection information to your data source Data Source Views - a view that determines what objects in your data sources

are available to a cube (could come from multiple data sources) Measures – numeric facts that users analyze e.g. sales, units sold Measure Groups – logical groupings of measures, e.g. sales is the measure –

internet Sales, retail Sales are measure groups (often one measure group = one fact table)

Dimensions – represent what you are aggregating the data ‘by’ . Dimensions have have attributes and hierarchies e.g. Time dimension has year/quarter/month, location has country/province/city. Product Dimension has name, price, size, category

Cubes – create sub-totals (aggregations) of measures by different combinations of dimensions to create a multidimensional structure that users can query quickly and easily

Installing SQL Server Analysis Services Resources – the more dimensions the bigger your cubes will be, the more cubes

the more memory and processing power you need Instances – you can have one or multiple instances. Each instance has its own

security, service packs, and listens on a different TCP port number Client connectivity – use a pre-defined port number or the default 2383 Availability – clusters improve availability Installation – use SQL Server 2008 setup program

Upgrading SQL Server Analysis Services from SSAS 2000

Side-by-side – can run on same machine as SQL Server 2000 Analysis Services. SQL Server 2000 Analysis service must be the default instance since it does not support multiple instances. If you don’t want to lose functionality from 2000.

Upgrade – use wizard to upgrade from 2000 to 2005 – good if you want to move completely to 2005, but if it screws up takes a long time to fix it.

Migrate – set up 2005 and migrate data from 2000 to 2005, good if you have multiple 2000 databases and you want to upgrade some but not all of them. Requires more hardware

Lab Notes – detailed instructions in Lab Answer Key on CD (30-40 install)

Go to E:\Labfiles\Evaluation and run Setup.exe to start the SQL Server Installation WizardYou will be prompted to install a Windows Hotfix and .NET framework as well, just choose ‘Yes’, it will install the components and the you have to restart the computer and re-execute Setup.exeChoose Installation – New SQL Server Stand-alone installationSetup support rules - OKProduct Key – Specify Free EditionLicense Terms – AcceptSetup Support Files – InstallSetup Support Rules – Next (warnings are okay)Feature Selection – Database Engine Services, Analysis Services, Business Intelligence Development Studio, Management Tools – CompleteInstance Configuration – Default InstanceDisk Space Requirements – NextServer Configuration – Use the same account for all services NY-SQL-01\sqlserver Pa$$w0rdDatabase Engine Configuration – Specify SQL Server Administrators – Add NY-SQL-02\AdministratorAnalysis Services Configuration – Specify which users have administrator privileges for Analysis Services NY-SQL-02/AdministratorNext, Next until you reach Ready to Install then choose InstallExercise 2 Verify InstallationView log file by following link or opening at C:\program files\microsoft\sql server\100\setup bootstrap\log\20081014_142821\SummaryNYSQL02.txt

Start – Microsoft SQL Server – Sql Server Management Studio Connect – Server Type = Analysis Services Server Name= NY-SQL-02Databases folder is empty for now

Module 2 Creating Multidimensional Analysis SolutionsLecture: 90 minutesLab: 45 mins (p. 2-16 ex 1 Create Data Source, Ex 2 Create Data Source View, Ex 3 Create, Deploy & Process Cube)

Online Mode – Once your cubes have been deployed to your database, you can connect directly to SQL Server analysis database and make changes. The only thing in the project file is the database & server name, can’t do version control with this, no files to save!Project Mode – To initially create cubes you create an SSAS project and deploy them to the analysis database, changes are developed in project, saved in a file then deployed to a database. The project stores the cube definitions, not the data, you cannot browse a cube without deploying it.Reverse-engineering a project – If you want to use Project mode, but do not have an existing SSAS project for your cubes, you can reverse-engineer one using File – New Project – Import Analysis Services 9.0 Database.

BI Studio has Solution Explorer, Designers, Wizards, built-in help

Source Control – using a tool like SourceSafe helps prevent overwriting each other’s work, by using Check In, Check Out when in Project Mode.

Demo Business Intelligence Studio1. Open Project E:\mod07\labfiles\solution\AdventureWorks OLAP\

AdventureWorks OLAP.sln2. D:\LabFiles\Solution\AdventureWorksOLAP3. Double Click Data Source to show properties of data sources4. Double Click Data Source View to show Data Source View Designer5. Double Click a dimension to open Dimension Designer Pane 6. Double Click Cube to open Cube Designer7. (this cube has errors and cannot be deployed)8. Go to Tools – Options to show where you can customize BI Studio settings9. Show Tools – Options – Source Control Plug in selection that you can set to

integrate BIDS with a source control tool

Data SourcesContain connection strings to underlying databases that contain fact & dimension tablesImpersonation options – when you are in BIDS, your current user’s credentials are used to connect to the database to retrieve data, after deployment the impersonation credentials are used

Specific Windows User Name & Pwd – when service account does not have permissions to access database

Service account – usually selected, requires service account to have access to database

Credentials of current user – used for data mining, used for mining models and DMX OPENQUERY statements

Demo Creating a Data Source1. Create a SQL Server Analysis Project2. In Solution Explorer, rt click Data Source – New Data Source3. Create data source based on New or Existing Connection4. New connection – select server-NY-SQL-01, Windows Authentication, database

name AdventureWorksDW20085. Test Connection & Click ‘OK’6. Click ‘Next’ – impersonation – specifies how SSAS connects to the data source

when processing the cube, or executing Data mining queries Usually choose default (uses the service account for SSAS service)

7. Give data source a meaningful name click ‘Finish’8. Double click on Data source in Solution Explorer to show Data Source Designer9. Point out Query Timeout Setting and Maximum nbr of Connections10. The Maintain a reference to another object in the solution, allows you to specify

the connection string by reading it from another data source

Data Source ViewsCreating a Data Source View

1. In Solution Explorer, rt click Data Source View – New Data Source View2. Select the data source3. Select tables dbo.FactInternetSales 4. Click “Add related tables” and Save

Creating a Data Source View based on multiple Data Sources1. Create a second data source Server=NY-SQL-01, Database-

AdventureWorks2008, Use Service Account (you cannot add multiple data sources in the wizard, you add them afterwards in the designer)

2. Go to Data Source View designer rt click in table diagram choose Add/Remove Tables, choose a different data source point out first one is listed as primary data source, add Production.ProductReview Table from AdventureWorks2008

Browsing Data1. Rt click on FactInternet Sales table and choose Explore data to show data in the

table2. Click Pivot Table, click on Fields button in toolbar to display list of fields3. Drag promotion key field to rows4. Drag SalesTerritoryKey field to columns5. Drag SalesAmount to totals6. Show pivot table created7. Click on Chart tab, show chart created8. Click on Pivot Chart, drag SalesTerritoryKey to Series9. Drag PromotionKey to bottom of chart10. Drag SalesAmount to chart11. Click on PromotionKey, select promotions 1,212. Click on SalesTerritoryKey, select Territory 1,213. Show pivot chart created

Data Source View Table and Column properties1. Double Click on Data source view in solution explorer to bring up Data Source

view designer2. Rt click a table or column and choose Properties – set Friendly Name property of

a table and column to make names more user-friendly

Add a Named QueryFrom toolbar in Data Source View Designer, choose New Named Query (if you don’t have permissions to create a view in the database, you can create a named query in your data source view)

1. Name – LargeSales 2. SELECT * FROM FactInternetSales WHERE SalesAmount > 10003. Click OK/Finish – show it in the Data Source View, rt-click explore data

Add a Named Calculation1. Select DimCustomer table in the data source view designer2. Choose New Named Calculation from the toolbar3. Name – Full Name4. Concatenate name columns FirstName + ' ' + LastName5. Save the changes, show the full Name column, rt-click dimCustomer explore data

and show the concatenated column in the data

Add a logical Primary key (if an underlying table does not have a primary key, or a view or named query has no primary key, you can add a logical primary key, you cannot add a logical key if a primary key already exists)

1. Go to named query LargeSales, select SalesOrderNumber & SalesOrderLineNumber, rt-click set logical primary key

Creating RelationshipsForeign keys in underlying database appear as relationships in Data source view, you can add or delete relationships in Data Source view, especially useful for named queries and tables from multiple data sources

1. Go to named query Large Sales, select productKey column, rt-click add relationship, make relationship to DimProduct - productKey

Creating DiagramsWhen you have a lot of tables allows you to focus on a particular subset of the data source view

1. Go to diagram organizer pane – rt click Add New Diagram2. Drag FactInternet Sales & dimTime tables to the Diagram

CubesWe now have our Data source view, which contains fact & dimensions. Now we want to pre-calculate aggregates/subtotals in a cube, to speed up reporting by different dimensions. A cube is made up of one or more measures from a fact table(Sales, quantity) which will be aggregated by one or more dimensions (Product, time) from a dimension table. Remember Measure & Dimension attribute names here will be viewed by the users, so try to give them meaningful names

Dimensions - You can have the Cube Wizard create your dimensions, or design the dimensions separately and re-use them in different cubes

Attributes & Hierarchies – can be created through cube wizard or added later in cube designer

Date & Time DimensionsOne of the most common dimensions users want is time, they want to see data by month, calendar quarter, fiscal quarter, year, day of week, hour, etc…SSAS has special support for this dimension

You probably want to create your own DimTime for a Time Dimension, you can include the attributes of interest to you and include extra attributes like ‘stat holiday’, ‘manufacturing shift’

The dimension wizard can be used to create a Time dimension table for you If you just want standard attributes for the time dimension you can choose Server

Time dimension which contains standard hierarchies and attributes that are stored on the server instead of within a dimension table. You specify the range of dates for the server time dimension.

Creating a Cube with the Cube Wizard1. Rt click Cubes in Solution Explorer – New Cube2. In Cube Wizard – Choose Build the cube using a data source (if you choose

autobuild, the wizard will suggest dimensions, measures, attributes and hierarchies, for more control deselect this checkbox)

3. Select the AdventureWorks DW 2008 Data source view to use for the cube4. Specify FactInternetSales as your measure group table5. Select the Order Quantity and Sales Amount measures (they can be renamed here

by click in their column name if you want or later in cube designer)6. Select DimProduct, DimDate, DimPromotion as dimensions7. Name the cube InternetSales8. Rt-Click Cube in Solution Explorer – choose Process to build and deploy cube9. Go to dimension designer drag color to product dimension10. Drag CalendarYear to DimDate11. You will get an error…remember the Data source how there were two places to

specify login information? Well BIDS can login but the service account we selected for impersonation can’t! So we need to set up an account for the service account in SQL Server

a. Go to SSMS connect to AdventureWorksDW2008b. Security – Users – New Userc. Username: sqlserver, Login name: NY-SQL-01\sqlserver, default

schema:dbod. Schemas owned by this user: db_owner, db_securityadmine. Database role membership: db_owner, db_securityadmin

12. Now process the cube again13. Go to Browser drag SalesAmount Measure, Product Color and Order Date-

Calendar Year to cube14. Go to Cube Designer Pane, rt-click Sales Amount, show how you can change the

name of the measure & set the FormatString=currency15. Right click anywhere in measures pane show how you can switch between Show

Measures in Grid and Show Measures in Hierarchy views16. Re-process and reconnect to cube to show currency formatting17. Walk through the different panes in the Cube Designer18. Open Microsoft Excel, Data Tab – From Other Sources – Analysis Services –

NY-SQL-01 Add data source and cube, to pivot table, then select a measure group and add a dimension to the rows and columns groups to the pivot table

Cube Designer Tabs – each is covered later in the course Cube Structure Tab - To add, modify properties of measures and dimensions,

attributes and hierarchies Dimension Usage Tab – to show which facts/measures are related to which

dimensions Calculations Tab – to create calculated fields KPIs Tab – KPI measure progress a business is making toward meeting its goals.

You create a KPI define the metric value to examine and a goal to achieve. Then

you define MDX expressions to calculate the current status of the KPI and the trend. Then you choose a visual indicator to show the trend

Actions Tab Actions are initiated by users, for example a URL action to navigate to a website, reporting action to link a report in reporting services to a cube, drillthrough action to provide access to detailed data

Partitions Tab You can partition the data to make searching more efficient Aggregates Tab for pre-calculating aggregates (slows processing, speeds queries) Perspectives Tab You can create Views of relational data, you can create

perspectives of Cubes to simplify or focus on a particular part of a cube for users Translations Tab To add captions in multiple languages Browser Tab To browse cube data from SSAS

Creating a Cube without a Data Source (optional demo) – use this when you are trying to figure out what fact and dimension tables you will need based on the cube you are designing, rather than designing a cube based on an existing set of fact and dimension tables

1. Rt Click cubes in Solution Explorer – New Cube2. In Cube Wizard – Choose Build without a data source and no template3. Add a measure – Sales – Sales Group – Single – Sum 4. Add new dimensions Time, Course Category, Course (SCD is slowly changing

dimension requires extra attributes to handle)5. Select Time periods for time dimension6. Give the cube a name and click finish, you can also generate the schema for the

cube at the same time (required if you want the demo to work) – will launch the generate Schema wizard to create the database tables in the specified data source.

7. Go to Cube Designer – Browser – can drag columns and rows, no data because you haven’t created any data yet

8. Go to SQL Server Management Studio, show tables created in Adventure Works DW Database Course, Course Category, etc..

Lab NotesLab says use AdventureWorksDW should be AdventureWorks DW 2008

Module 3 Working with DimensionsLecture: 70 minutesLab: p 3-20 45 mins (ex 1 Configuring Dimensions, Ex 2 Hierarchies and Relationships, Ex 3 Sorting and Grouping)

Dimensions give us a way of aggregating/totaling our fact data, e.g. by product, by time, by color, by size.

Dimensions are made up of one or more attributes from a dimension table Each column from the dimension table can be an attribute of a dimension (e.g.

Productid, Name, Color, Size, Category) Each dimension needs a key attribute to link it to the fact table, this is usually a

PK-FK relationship in the underlying database Dimension Attributes often form hierarchies e.g. day, week, month, year or

subcategory, category After you define a dimension with the necessary attributes & hierarchies you can

re-use it in as many cubes as you like (The date dimension is frequently re-used)

Dimension Designer – used to edit dimensions, specify which attributes to include from the dimension table, hierarchies, and translations for attribute headings

1. Create a Cube showing FactInternetSales, Customer, Time & Product2. Deploy and process the cube3. In Solution Explorer, Dbl Click On Product Dimension to go to Dimension

Designer 4. Show each of the Dimension Designer Tabs

Dimension Structure Tab – edit attributes, hierarchies, hierarchy levels Attribute Relationships – create, modify, delete attribute relationships Translations Tab – to enter multilingual translations for the dimension Browser Tab – to browse members of the hierarchy (after deploying)

Dimension StorageMOLAP – Multidimensional OLAP – (the default) dimension data is stored in the cube which is faster when you execute a query, but if new dimension records are added or modified, changes are not picked up until you process the dimension.ROLAP – Relational OLAP – leaves dimension data stored in the relational database as source, queries are slower, but provides real-time data, since dimension data is read straight from the database table and not from the cube.

Can be set in Properties of the dimension – Storage Mode

Editing Attributes & display folders1. Rt-click an attribute, show the delete and rename options. Deleting only removes

attribute from the dimension it will not affect the data source view2. Delete French Description3. Then Drag French Description back from Data Source View to add it back4. Go to attribute properties, point out name property5. Set AttributeHierarchyDisplayFolder to Description for Arabic Description,

Chinese Description and other description Attributes

6. Process & Deploy the Cube7. Go to Cube Browser, Reconnect and show how those attributes are now

contained in a Descriptions folder

Attribute Column BindingsKeyColumn Each dimension has a KeyColumn, this is usually the Primary Key of the dimension table, this is how the dimension attributes are linked to the fact table. If the attribute is not tied to the logical primary key of the table, change the KeyColumn value This happens because data warehouses may not be normalized (e.g. StateName is tied to StateCode not to GeographyKey)NameColumn – value displayed to the user for this attribute, e.g. Product is the key attribute but displays id numbers. You might choose to display ProductName to users when they request Product. If not specified KeyColumn is displayedValueColumn – value to be used when doing MDX calculations, e.g. might want to display a formatted date, but use an actual date column for calculations. If not specified KeyColumn is used

1. Rt-click EnglishDescription attribute, show the Properties for KeyColumn, NameColumn and ValueColumn Properties

2. Change NameColumn to a FrenchDescription3. Process dimension4. Go to dimension browser, Display English Description, point out how French

Descriptions appear5. Change the property back to it’s original NameColumn (need to Reconnect in

browser to see changes)

Attribute HierarchiesAll attributes are part of at least one hierarchy: All/One - primarykey - attributeYou can define additional hierarchies to allow users to drill-down and drill-up.

Just drilling from the lowest level to All is not always useful (e.g. Sales of the 2791 Course to sale of all courses), you probably want more levels, (e.g. Sales of SQL Server, Sales of Microsoft, Sales of all Courses)

Hierarchy typesSSAS creates an All hierarchy for each attributeNatural Hierarchies - based on 1-many relationships in the database tables Category – SubCategory, year-month, Aggregations are pre-calculated for natural hierarchiesNon-Natural hierarchies – created in dimension designer for reporting purposes e.g. Size – Color, gender-city (many-many)Unbalanced Hierarchy – different number of levels under different parents (e.g. manager – staff, level of reporting from CEO to lowest level varies)Parent-Child Hierarchies are defined by self-referencing relationships in the dimension tableRagged Hierarchy – number of levels is different because sometimes levels are skipped (e.g. Country – state – city, is sometimes Country – city). In data this is represented with NULL values or sometime we store the parent name in the missing level

Displaying Ragged Hierarchies(You can set this property when you create a hierarchy, go to the attribute in the hierarchy and go to the properties)HideMemberIf = Never – for regular hierarchy(there are no gaps)HideMemberIf =NoName for ragged hierarchy so that the NULL values are not displayed in the hierarchy

Country State CityCanada Ontario OttawaItaly NULL RomeCanada-Ontario-OttawaItaly-Rome

HideMemberIf=ParentName for ragged hierarchy when storing parent name twice instead of NULL valuesCountry State CityCanada Ontario OttawaItaly Italy RomeCanada-Ontario-OttawaItaly-Rome

HideMemberIf=OnlyChildWithNoName, to not show lowest level if nothing thereCountry State CityCanada Ontario OttawaCanada PEI NULLCanada-Ontario-OttawaCanada-PEI

HideMemberIf=OnlyChildWithParentName, to not show lowest level if nothing thereCountry State CityCanada Ontario OttawaCanada PEI PEICanada-Ontario-OttawaCanada-PEI

SkippedLevels column – Store a column in your dimension that specifies how many levels are skipped for each memberEmpId Title Name MgrID Skip14 Supervisor Mike 12 013 Pion Joe 12 1

Dimension Attribute PropertiesIsAggregatable – determines if there is an ALL level (might set to false on time because it would be very slow to show values for ALL dates)AttributeHierarchyOrdered – determines if the hierarchy is ordered, the attribute used to order is specified in the Order By property. Setting AttributeHierarchyOrdered=False can speed up processing, because it does not need to be sorted unless queriedAttributeHierarchyOptizimizedState – if set to not optimized, no indexes are created for the hierarchy, speeds up processing, slows querying, default is FullyOptimized. If an attribute is only occasionally used in queries, you could set to NotOptimized.AttributeHierarchyEnabled – Allows you to use this attribute in a cube and aggregate for this attribute. Set to false is good for attributes we want to display but never plan to pivot by (e.g. Product Description/Photo)AttributeHierarchyVisible – If set to false, you can only access this attribute through a hierarchy

Demo Hierarchy AttributesDisable All & Grand totals with IsAggregatable

1. Go to Dimension Designer for Dim Product – Go to properties of Model Name attribute set IsAggregatable = False

2. Deploy & browse dimension show how ‘All’ is no longer displayed for Model3. Browse Cube, when you drag Model you do not see a Grand Total because it is

not aggregatable Make attribute available only within Hierarchy

4. Go to DimensionDesigner Product, Size, Set AttrbuteHierarchyVisible=False5. Deploy & browse & Re-connect dimension , Show how you cannot select Size

attribute anymore (it can only be viewed through a hierarchy now)Create a user hierarchy

6. Go to Dimension Designer 7. Drag Color to Hierarchy Tab, drag Size under Color, rename Hierarchy Color –

Size8. Deploy and Browse, Re-Connect Dimension show hierarchy created9. Deploy and browse the cube show how you can drag the hierarchy to the query

and expand levels

Demo Parent Child Hierarchy Manager – Employee 1. Create New Data Source View Add DimEmployee and Related Tables2. Create Cube using FactResellerSales and DimEmployee, DimSalesTerritory

select measures: SalesAmount and Quantity3. Go to DimEmployee in Dimension Designer, Process Dimension4. Browse DimEmployee, show parentEmployeeKey Hierarchy5. Select Parent Employee Key Attribute, go to properties – Naming Template,

specify CEO; VP; Director (do not expand type straight into property)6. Deploy and Browse 7. Select a node on level 2 or 3, point out title bar displays values from naming

hierarchy

8. Point out MembersWithData Property on Employees Hierarchy Attribute, this controls when the CEO has data associated with him/her do we show his/her data or only data of his/her reports e.g. when showing manager do we show the mgrs sales + his staff sales or just his staff sales

Demo Calendar Hierarchy1. Go to Date Dimension2. Right Click Date Dimension in Solution Explorer add Business Intelligence3. Dimension intelligence, Time4. Map Calendar Year= Year5. Map Calendar Quarter = Quarter6. Map Month= Month7. Add hierarchy Calendar Year – Calendar Quarter – Month8. Point out blue squiggle indicating you should add a relationship to improve

performance9. Go to attribute relationship add relationship Source Month – Calendar Quarter10. Source Calendar Quarter –Calendar Year11. Build and process show how the hierarchy appears in the dimension and cube

Sorting using OrderBy Name – sorts by name attribute in alphabetical order Key – by one or more key columns (e.g. Quarter & Year) Secondary attribute – use a different column for sorting e.g Month you might

want to appear in order of occurrence not by alphabetical sorting of month nameSort by Attribute Name

1. Go to Time Dimension in Dimension Designer2. Go to English Month Name3. Deploy and Browse dimension, show how months are sorted by Month Name

April, August, etc…4. Show how MonthNumberOfYear is sorted as characters 1,10,11,12,2

Sort by Attribute5. Go back to dimension Designer6. For MonthNumberOfYear Set OrderBy=Key (order by Name treats value as

alphabetical order, e.g. month 1, month 10, month 11, month 12, month 2)7. Deploy and browse dimension show how MonthNumberofYear is now sorted

numerically8. Go to properties of English month Name set Orderby=AttributeKey9. set OrderByAttribute = MonthNumberofYear YOU CANT it is not listed you

need a relationship between the two attributes10. Go to the Attribute relationships pane, Add new relationship EnglishMonthName

to MonthNumberofYear11. Go to Attributes of English Month Name, set OrderByAttribute =

MonthNumberofYear 12. Deploy and Browse show how EnglishMonthName is sorted correctly

Sort by composite Key (do not demo unless you practice it first)

13. Go to properties of Month – Key Add CalendarYear Column to the key, move it to be the first column of the key, make MonthNumberofYear second column

14. Go to Orderby = Key15. Set Name property of English month Name to English Month Name (otherwise by

default key column is displayed)16. Deploy and Browse, show how months are now sorted by year 2000-January is

not the same as 2007-January

GroupingFor Attributes that have no natural hierarchies you can use grouping to make them into smaller groups (you don’t decide grouping, SSAS does)Cannot do grouping on top level of hierarchy, or on consecutive levels of hierarchy, or on ROLAPDiscretizationMethod

EqualAreas – divide into groups with equal members Clusters – use the k-algorithm to divide into clusters (more meaningful but slower

to process attribute must be numeric)DiscretizationBucketCount – how many groups to createNamingTemplate – default is first & last value of group (e.g. January-March)Demo Grouping

1. Go to Property page of Product, List Price in Dimension Designer2. Set DiscretizationMethod = EqualAreas3. Set DiscretizationBucketCount=44. Deploy & Browse show how values are broken into buckets

EXTRA INFOAttribute RelationshipsAttribute Relationships – define dependencies between attributes. By default you have a relationship between each non-key attribute and the key attribute. – Show Key Attribute, expand show how all other attributes are listed

Adding a hierarchy tells SSAS to build a cube containing that hierarchy and allows a user to drill down and drill up when querying the cube. Adding a relationship improves processing. Whenever you have a 1-1 or 1-many relationship between columns in a dimension, you should add an attribute relationship between them. E.g. Add CalendarYear as attribute of CalendarQuarter. Set attribute relationship property One (Month name – Month Number of Year), or Many (Year –Quarter), and rigid (Year-Quarter) or Flexible (Category-Subcategory)

1. Go to Time dimensionSet up attribute relationships

2. Drag Calendar Year attribute from TimeKey to Calendar Quarter3. Drag Calendar Quarter attribute from TimeKey to CalendarMonth4. Drag Month NumberYear to MonthName (relationshiptype=One)

Add Business Intelligence

Define Time Intelligence – allows you to specify which fields map to calendar quarter, year, and so on to get default hierarchies

Define Account Intelligence – allows you to define things like which columns specify if an account is income or expense

Specify a unary operator – to change the default aggregation for parent-child hierarchies in a cube

Create custom member formula – to replace default aggregation with a different operatorSpecify Attribute ordering -

LAB NOTES Ex 3 Task 1 – STEPS HAVE MANY MISTAKES AND ARE MISSING STEPS!!! LESS MISTAKES in LAB ANSWER KEY but still keep an eye on students

1. Go to the Calendar Date Hierarchy in the Date Dimension not Calendar Time in Time dimension

2. To do a New Attribute from Column, right click the Month column in Data source view in Dimension designer, this creates a duplicate attribute for the column…but if you already have MonthNumberOfYear in the Date Dimension you can skip that step!

3. Instead of expanding the TimeKey column, go to the attribute relationship pane and show how all the attributes are related to the DateKey

4. Create a new relationship Source =Month to MonthNumberOfYear5. Change OrderByAttribute Property of Month to MonthNumberOfYear6. Set OrderBy property of Month to AttributeKey

Task 2Set DiscretizationMethod = Automatic

Module 4 Working with Measures and Measure GroupsLecture: 70 minutesLab: 60 mins (p 4-17 Ex 1 Configure Measures, Ex 2 Define Dimension Usage and Relationships, Ex 3 Configure Measure Group Storage) ***TYPOS IN EX 2***

Measure Display Properties Name – the Name displayed to the user Format String – how data will be displayed, Currency, Percent, True/False, or

user-defined dd/mm/yyy, $#,#0.00 (regional settings in control panel determines date and currency formatting)

DisplayFolder – to organize measure into folders for the users Visible – to hide measures used for calculations that are not meant to be displayed

directly to users. MeasureExpression can be A*B or A/B (cannot be more complicated i.e. A*B*C)

Demo Member display properties1. Go to Cube Designer build cube for Internet Sales showing Customer, Product and

Date2. Change Name property of measure Order Quantity to Quantity Ordered 3. Change Format String of SalesAmount to currency4. Go to properties of two Cost measures and set Display Folder Name = Cost5. Deploy & Browse show how measures are contained in a folder

Measure Values a column in a fact table (sales amount, quantity ordered) row based, e.g. count number of rows in a table (nbr orders, nbr students) based on an MDX expression (net profit)

Aggregating Measures Additive - Across all dimensions (sales amount can be totaled for all product, all

customers, all years, etc…) Semi-additive - Across some dimensions but not others (e.g. inventory should be

aggregated across warehouses but not across months) Non-additive - Not aggregated across any dimensions (e.g. nbr distinct records)

Aggregate Functions Sum(additive) – adds up values across dimensions Count(additive) – counts number of values Min(semi-additive) – returns lowest value Max(semi-additive) – returns highest value DistinctCount(non-additive) None supplies values directly from fact table without aggregations

1. Go to cube designer browse cube showing Sales Amount by color and marital status

2. Select Sales Amount Measure change Aggregate Function =Min3. Deploy and Browse (use Customer & Color to show diff values)

Measure GroupsMeasure Group PropertiesAggregationPrefix – common prefix used for any aggregation names, and partitions created for a measure groupDataAggregation – can SSAS creates aggregates for persisted and/or cached data for the measure group. Default is create aggregates for persisted and cached dataErrorConfiguration – Default – error messages come from msmdsrv.ini file, Custom you can define error messages for duplicate keys, null keys, etc.. and define the action to occur when an error occurs in processing, e.g. convert to a specific value, stop processingEstimatedRows – estimated number of rows in fact table (good for aggregation wizard)Estimated Size – estimated size in bytes of the measure group (good for aggregation wizard)IgnoreUnrelatedDimensions - Determines whether unrelated dimensions are forced to their top level when members of dimensions that are unrelated to the measure group are included in a query. Default setting is True. So basically if you try to show internet sales by geography region (there is no link between them) if True – you see the total for all regions listed for each region (so Australia shows the grand total, so does Canada), if you set it to false, the individual regions will display NULL instead of the grand total, which I personally prefer)

ProcessingMode Regular – data is not available until processing is complete LazyAggregations – data is accessible as soon as available, but total processing

time is increasedProcessingPriority – processing priority of the cube during background operations such as lazy aggregations and indexingStorageLocation – file system storage location for the measure group, if not specified location is inherited from the cube that contains the measure groupType – type of the measure groupDemo Error Configuration

1. Go to Cube Structure tab of Cube Designer2. Select a Measure Group and Display Properties3. Change ErrorConfiguration =Custom4. Expand and show different options for different errors, errorlimit, errorlogfile5. Point out ProcessingMode Property

StorageMode MOLAP – Multidimensional - stored aggregations and copy of data in

multidimensional format, best for query performance, but requires cube to be processed to see most recent data. Proactive caching helps with that.

ROLAP – Relational - stores aggregated data as indexed views in the relational data source, and reads source data from relational tables, query time and processing time is slow but consumes less memory and allows real-time updates of data

HOLAP – Hybrid – stores aggregations as Multidimensional, but leaves source data in relational data source, fast for queries of aggregated data, slow if underlying data is required because must go to relational data source

Proactive CachingSince MOLAP & HOLAP aggregations can become out of date when source data changes, you can use Proactive caching to update aggregations on a schedule or when source data changes.

Update Cache when data changes – updates MOLAP when notified of data changes (requires setting up notifications on partitions)

Silence Interval - how long Cube must be inactive before beginning to process new MOLAP image

SilenceOverrideInterval - how long to wait before beginning to process new MOLAP image even if cube is active

Drop outdated cache – how long to wait before dropping an outdated cache when a new cache is created

Update Cache periodically – interval of time after which to refresh cache Notifications – is set at the partition level, when to be notified to update the cache BringOnlineImmediately – If checked, allows users to query (will use ROLAP

for data) while MOLAP image is being processed, if unchecked MOLAP processing must be completed before cube can be accessed

Enable ROLAP aggregations – create indexed views for aggregations Apply settings to dimensions – applies storage mode and proactive caching

settings to dimensionsDemo Measure Group Storage Properties 2. Go to Cube Structure tab3. Go to properties of a measure Group4. Show StorageMode Property5. Show StorageLocation Property

StorageLocation – specify folder where cube is stored (overridden by partitions)6. Select Proactive Caching Property – Select Custom – show options7. Enable Proactive Caching

Relationships between Measure Groups & DimensionsSomehow SSAS has to figure out which measures go with which dimensions (eg which sales are for which product color). This is done by creating relationships between measure groups and dimensions.

Regular - Relationships between measure groups and dimensions are created based on the PK-FK relationships of underlying tables (eg FactInternetSales – DimProduct)

Reference - If you have a snowflake schema there may be no direct PK-FK relationship, so you can create dimension based on multiple tables, one of which has a PK-FK relationship with the fact table or create the relationships manually with columns from multiple tables (e.g. Category-DimSubcategory-DimProduct-FactInternetSales)

Fact – dimension is stored in a fact table and has PK-FK with the fact table (e.g. parent-child hierarchies employee-manager)

Many-Many – uses an intermediate dimension table to break up a many-many relationship into two 1-many relationships

Demo relationship type1. Go to Dimension Usage tab of Cube Designer2. Click on a few existing relationships to show relationship type

PartitionsPartitions allow you to store data in separate partitions. For example data for each quarter could be in its own partition, so if you only want current data you only need to search one partition and some partitions do not need to be reprocessed as often so you shorten processing time. Or if you do need to search multiple partitions, if you have multiple processors the searching can be done in parallel. Make sure you define the partitions so data is not stored in two partitions to save on memory!

You can partition horizontally, you have multiple fact tables Orders1998, Orders1999, and each fact table is a partition of a single measure groupYou can partition vertically, you have a single fact table and you define a query that filters which data goes in which partition e.g. SELECT dbo.FactResellerSales Where orderdatekey>= ‘20040601’ AND orderdatekey <= ‘20041231’

You can define the Partition slice to tell SSAS what is in each partition so that queries know which partition to querye.g. to get all products in category 1 (Bikes) from 2001 and 2002{[Date].[Calendar Year].&[2001],[Date].[Calendar Year].&[2002]}*{[Product].[Product Categories].[Category.&[1]}

Usually you partition by time, or by a dimension member (country, product category)You can set storage options for each partition, e.g. ROLAP for current month so it is always up to date, HOLAP for previous quarters of the current year, MOLAP for past years so they are quick to query, but don’t need to be reprocessed since data is fixed.

Define notifications to be used by proactive caching per partition1. Go to partitions tab of cube designer2. Change property Table source to query source…add a where clause Where

orderdatekey>= ‘20040601’3. Add a second partition with a complementary WHERE clause Where orderdatekey <

‘20040601’4. Select a partition go to Storage settings5. Choose custom setting6. Enable proactive Caching 7. Go to notifications tab

SQL Server specify tracking tables – specify list of tables in database that if they receive an update you want to be notified (e.g. FactInternetSales) (separated by ; to send notifications for)

Client initiated – XMLA command notify NotifytableChange to notify of changes instead of SQL Server initiated

ScheduledPolling – queries are run on a scheduled basis to detect changes

Designing AggregationsYou can precalculate aggregations to speed up query time. It is not practical to pre-store all aggregations, so you have to balance pre-calculated aggregations with memory usageAggregation Design Wizard helps you determine what aggregations to pre-calculate

1. Go to partitions tab of cube designer2. Rt-click on a partition choose Design Aggregation brings up Wizard3. Choose standard settings – Click Next4. When tables are listed, click count to count nbr of records in each table (or enter

estimated number of rows, if tables are not fully loaded yet) – click Next5. Choose Design Aggregations until Option and click Start

Estimated Storage Reaches – ask wizard to create pre-calculated aggregations to a specified memory limit, use if memory is limited

Performance Gain reaches (start with 30%), to generate pre-calculated aggregations to improve query performance by a specified amount

I click Stop – watch the graph and click stop when you have reached the desired improvement vs memory

Do not design aggregations – to remove existing aggregations6. Try a couple of different options, Reset between options7. Click Next after aggregations are calculated8. Either Deploy & process, or save but do not process9. Then you can go to the partition choose process and choose process Index (do not

deploy & process cube first) and see how long it takes to process the additional aggregations to see if the processing time is too long

To measure query improvements, check query times before and after aggregations are added. To measure processing cost, process index after designing aggregations, or compare processing times before and after aggregations are added.

Once aggregations are designed for one partition, they can be copied to another partition using Object Explorer in SQL Server Management Studio Lab NotesUnfortunately this is not the greatest lab, it works, but doesn’t teach much.Exercise 1, Task 3 Open the .dwproj file not the .Sln fileExercise 2: Task 2 the regular relationship was created automatically by the dimension wizardExercise 3: Task 2 Right Click Internet Sales and launch Aggregation Design WizardYou will not get the Select Partitions to Modify screenThey tell you to NOT design aggregations…kind of pointless, why not try performance gain of 30%!

Module 5 Querying Multidimensional SolutionsLecture: 70 minutesLab: 45 mins p 5-12 (ex 1 MDX Queries, Ex 2 Calculated Member, Ex 3 Named Set)

MDX (multidimensional Expressions)Was created to query OLAP databases, designed by Microsoft but has been generally adopted by OLAP providers. It used as a query language for querying and as an expression language for calculationsSQL queries an OLTP database, returns a 2-D set of results, like a tableMDX queries an OLAP database returns a cube

Cells – intersection between the measure and the dimensionsTuple – an expression that identifies a cell or section of a cube (e.g.Bikes in January, or Bikes in January bought by Smith, OR just bikes). When a tuple represents a section of the cube you are slicing the cubeSet – a collection of tuples from the same hierarchy is called a set, sets are in “{}”, each member of the set is separated by a “,” an ordered set of tuples, from the same hierarchy e.g. {[Sales].[Bikes].[January],[Sales].[Bikes].[February]} make up a setYou must have at least one axis, ON COLUMNThen you can add more axis ON ROWS, ON PAGE,

MDX for QueriesSELECT query_axis_clauseFROM subcube

// is used for comments with MDX queries

MDX Queries1. Open BIDS MOD05\labfiles\project and deploy the cube2. Open SQL Server Manager Studio Connect to Analysis Services on Server NY-

SQL-013. Expand the MOD05 database4. Click on Adventure Works and Choose New Query5. Walk through MDX Queries in handout

ON COLUMNS uses one or more tuples to define an axis, returns a row of valuesON ROWS uses one or more tuples to define an axis, returns a pivot tableON PAGES adds 3rd dimension, returns a cube, cannot be viewed in SQL Server Management Studio

CalculationsAnalysis services stores the syntax for calculations, Calculations do not add to the size of the cube, they are calculated at runtime

1. Show MDX Cube example in SSMS then do this example in BIDS2. Create new cube for FactInternetSales make sure to include Sales amount and

Unit Price measures with dimProduct and DimCustomer3. Go to Product Dimension add Color attribute1. Deploy and Process Cube2. Go to Calculations Tab3. Click New Calculated Member on Toolbar4. Name – [Price without Tax]5. Hierarchy – Members6. Calculation [Measures].[Sales Amount]-[Measures].[Tax Amt]7. Set format string = Currency in additional properties8. Process & deploy cube9. Browse cube add Sales Amount, Unit Price and Price Without Tax by Product10. Show Script View on Calculations Tab

Creating a calculation using the MDX command instead of the form view (you can use the example on the MDX queries handout. Just go to script view and type this command after the last command.To calculate pct of total sales for each product categoryCREATE MEMBER CURRENTCUBE.[MEASURES].[Percent of Color Sales] AS ([Measures].[Sales Amount])/([Measures].[Sales Amount],[Dim Product].[Color].[All])If you set format of the calculation to Percentage it will multiply by 100

Named SetsA set is like a Database View, it contains a subset of the cube. You can create a permanent set to use for calculations, or temporary sets for use within queries or a single session.

1. Go to Calculations tab2. Add Named Set3. Name [Dark Colors]4. Expression {[Dim Product].[Color].&[Black],[Dim Product].[Color].&[Blue]}5. Browse Cube Sales Amount by color6. Drag Named Set to Dimension Filter above Cube area (subcube area), show how

cube now only shows colors in named set, a subset of the original cube.7. Show Script View on Calculations Tab

To create a SET using MDX commandCreate a set of all measures for the combination of Black & Blue productsCREATE SET [Adventure Works].[Black and Blue]AS {[Product].[Color].[Black],[Product].[Color].[Blue]} ;

When you use CREATE SESSION SET the set only exists for the session

CREATE SESSION SET [Adventure Works UDM].[Dark Colors]AS {[Product].[Color].[Black],[Product].[Color].[Blue]} ;

SCOPEAllows you to define a subcube that can then be used as the target for MDX calculations. It is like an Update statement, first you define what records to update, then you define what the new value should be when the cube is browsed

This allows you to change the values displayed in a cube after the cube is processedFor example, if you didn’t want anyone to see the values for Black Products.

1. Go to calculations Tab2. Click Script View3. After CALCULATE;4. Add SCOPE([Measures].members);5. ([Dim Product].[Color].&[Black])=NULL;6. END SCOPE;7. Process Cube, Browse cube, show sales for all color products, black does not

appear

We want to set 2002 Sales for Q4 bikes to be 50% more than Q1 sales SCOPE (measures.[sales amount], [product].[category].[bikes],[Date].[Calendar].[Q4 CY 2002]);THIS = ([product].[category].[bikes], [Date].[Calendar].[Q1 CY 2002]) *1.50;END SCOPE;

We want to increase sales quotas for 2002 one hundred foldSCOPE([Date].[Fiscal Year].&[2002],[Date].[Fiscal Quarter].Members,[Measures].[Sales Amount Quota]) ;This = [Measures].[Sales Amount Quota]* 100 ;END SCOPE;

Module 6 Customizing Cube FunctionalityLecture: 75 minutesLab: 60 mins (Ex 1 KPIs, Ex 2 Actions, Ex 3 Perspective, Ex 4 Translation)**There is a .txt file in D:\Labfiles with MDX expressions for KPI Exercise

Key Performance Indicators – KPIsKPIs are a measure of business metrics against targets, e.g. sales, registrations, completed census surveysTrend indicators – show the trend of the KPI over timeYou can assign visual indicators in the cube for the KPI & trend

Creating a KPI Name – name of KPI Associated Measure Group – which measure groups are associated with the KPI Value Expression – MDX expression to calculate the KPI (e.g. Sales Amount) Goal Expression – the value or MDX expression to calculate the KPI target (e.g.

increase of 15% over last year, Last years Sales * 1.15) Status Indicator – graphic to show indicating status of KPI (happy face, traffic

light) Status Expression – MDX expression to calculate value for status indicator must

return value between -1 and 1 (-1 bad, 0 acceptable, 1 good) Trend Indicator – graphic to show indicating trend of KPI (usually an arrow) Trend Expression - – MDX expression to calculate value for status indicator

must return value between -1 and 1 (-1 bad, 0 acceptable, 1 good)

Create a KPI1. Build a cube based on FactInternetSales and related tables2. Go to Date Dimension and Add Calendar Year, and Calendar Quarter3. Create a Hierarchy called Calendar with Calendar Year and Calendar Quarter 4. Build & Process cube5. Go to KPI Tab, click New KPI6. Name=Sales KPI7. Measure Group = Fact Internet Sales8. Drag Sales Amount measure from the Metadata tab to the Value expression

[Measures].[Sales Amount]9. Make the goal to be 1.2* the sales from the previous year, usually you would

compare to currentMember, but since the data is old, we hard code 2004 as the year to go one before.

1.2* ([Measures].[Sales Amount],ParallelPeriod([Order Date].[Calendar].[Calendar Year],1,[Order Date].[Calendar].[Calendar Year].[2004]))

10. Status indicator = faces11. Status Expression = 1 (-1 bad, 1 good) (eventually this will be MDX expression12. Trend Indicator = arrows13. Trend Expression = 0.8 (-1 bad, 1 good)

14. Deploy the project, on KPI tab, reconnect switch to Browser View, show the indicators.

15. Change status (0) and trend values (.3), redeploy & browse to show changes to symbols

Use MDX Expressions for Status & Trend16. Change Status to use a case statement that divides Value by Goal and based on

the percentage returns either -1, 0 or 1CASE

WHEN KpiValue("Sales KPI") /KpiGoal ("Sales KPI") >= 1THEN 1WHEN KpiValue("Sales KPI") /KpiGoal ("Sales KPI") >= .5THEN 0ELSE -1

END17. Now process and deploy the cube, you should see a happy face for status, if you

change the goal to be 4* instead of 1.2 * you will see a neutral face, if you change the goal to be 10* instead of 1.2 you will see a sad face

18. Change Trend to an MDX expression that compares current value of KPI to value from previous time periodCASE WHEN ([Measures].[Sales Amount])> ([Measures].[Sales Amount],[Order Date].[Calendar Year].[2003])THEN 1WHEN ([Measures].[Sales Amount]) < ([Measures].[Sales Amount],[Order Date].[Calendar Year].[2003])THEN -1END

19. Deploy and browse20. If you want - open Microsoft Excel 2007, Data – Other Sources Analysis

Services, connect to the cube, in the Pivot Table field list you will see KPIs listed after the measures…you could show Sales Amount by Calendar Year and add the Sales KPI Value, Goal, Status, and Trend (numbers look a little wacky though, may need to use previous member current member to show KPIs per year)

In the lab you will use the ParallelPeriod function, you pass it a parent value in a hierarchy (e.g. year, the number of period you want to go back, usually 1, then the value you want to move back from e.g. current Month. This will move from current Month, up to the year level, move back one year, and go to equivalent month in that previous year.

ParallelPeriod([Date].[Fiscal Time].[Fiscal Year],1,[Date].[Fiscal Time].CurrentMember))

PARALLELPERIOD accepts an expression for the level in the hierarchy, nbr of periods to lag, member to use as start point to compare to.

Browsing KPIsExcel 2007 and Performance Point allow you to browse KPIs or you can use MDX quries

Go to SQL Server Manager and execute MDX query to retrieve KPIsSELECT {KPIValue("Sales KPI"), KPIGoal("Sales KPI"), KPIStatus("Sales KPI"), KPITrend("Sales KPI")} ON COLUMNSFROM [MyCube]

ActionsActions are MDX expressions that allow users to browse data, launch an application, go to a URL or other defined actionActions are server based so they can be managed centrally and attached to the cubeAction types

Drillthrough – allow user to drill to more data Report – submit a URL request to SQL Reporting Services to launch a report Dataset – return a dataset based on an MDX query to the client application Proprietary – custom actions you define Rowset – return a rowset based on an OLEDB command, good for returning

relational data Statement – run OLEDB Commands that return success or failure URL – display dynamic webpages CommandLine – Execute a command at the command prompt, must be created

with an MDX statement, can’t be done in Business Intelligence Studio HTML – execute HTML scripts in Browsers, cannot be created in Business

Intelligence studio, you must use MDX expressions

Action Properties Name – name for the action Action Target – the target Type & Target object Condition – optional MDX expression to limit scope of the action Action Content – the action to take, syntax depends on type of action selected Invocation – how the action should run in the application Application – the application associated with this action, allows client

applications to control which actions to show Description – optional description of the action Caption – name the user will see for the action Caption is MDX – allows you to use MDX expression for the caption Report Server, Parameters – for Report Actions defines report server and report

parameters Drillthrough columns, maximum rows – Columns and rows to return for a

drillthrough action

Demonstrate a Drillthrough Action1. Go to Product dimension, add color, size, English product name2. Go to Actions Tab of Cube Designer3. Add New Drillthrough Action4. Leave Condition Blank5. Add Measure group Members – Fact Internet Sales (this defines what you rt-click

on to get the drillthrough)6. Add Drillthrough Columns: Dimension Product – Attributes color, size, product

name (what details you will see when you drillthrough)7. Go to additional properties8. Set Default=True9. Maximum Rows = 100010. Caption=Drillthrough…11. Deploy & Browse Cube12. Create Cube with SalesAmount and Product - Color13. Select a SalesAmount – Rt-click Drillthrough …14. Show the data displayed by the drillthrough action

Demonstrate a URL Action15. Go to actions tab of cube designer16. Add new action17. Name = Go to website18. Target Type=Attribute Members19. Target Object=Product.color20. Action Type=URL21. Action Expression=”http://www.bernardcallebaut.com”22. Deploy & Browse Cube23. Drag Product.color to cube, rt-click a product choose Go to Website24. Show how it launches specified website

You can reference the selected attribute in the urle.g."http://www.bernardcallebaut.com?Product="+[Dim Product].[Color].CurrentMember.Name

PerspectivesA perspective is a logical subset of a cube to help focus data for users, perspectives do not store data and cannot be used for security (since you can’t use permissions to say you can only see this perspective) they are meant to simplify browsing for the user

1. Go to Perspectives tab of cube designer2. Create new perspective3. Give perspective a name “Internet”4. Deselect all measures, KPIs, actions & dimensions that do not apply5. Deploy & Browse Cube6. Reconnect to Cube & select Internet from Perspectives Listbox7. Show how only selected measures and dimensions are available

TranslationsAllows you to display the caption in a different language, or display a different column for different languages

1. Go to Translations tab of cube designer2. Create new Translation3. French-Canada4. Enter French captions for dimensions and measures5. Deploy & Browse Cube6. Reconnect and Select French Canada from Languages Listbox7. Point out the names of measures and dimensions have switched to French

translation8. Go to Dimension Designer for Time9. Go to Translation Tab10. Add French translations for captions11. Click on ellipsis button in English Month Name where you enter French

translation12. select French Month Name to use when language is French13. deploy and browse cube to show French data displayed

LAB NOTES there are .txt files with the Status and Trend expressions in E:\Mod06\Democode\UsingKPI_*.txtThe Trend expects a hierarchy level of Calendar Year which does not exist in the Date Calendar Hierarchy, add Calendar Year above Calendar Quarter and then reprocess the dimension and cube then the Trend will work

Exercise 3 only clear checkboxes for all measures, leave the dimensions!

Module 7 Deploying and Securing an Analysis DatabaseLecture: 60 minutesLab: 60 mins (p7-18 Ex 1 Deploy Solution, Ex 2 Secure Solution)

Deploying an Analysis Services DatabaseIn Analysis Services 2000 you had to backup a database and restore it to production. This still works but now we have new options3 Steps completed when you deploy a project

1. Build project in BIDS, any definition errors are caught here2. Database with name of the project is created and objects defined in project are

created within this database3. Database is processed

How to deploy Project? Deployment Wizard – can’t do incremental updates, scripts will recreate entire

database, you can save XMLA script to re-run later XMLA Script – Created by Wizard or SQL Server Management Studio, run

XMLA scripts to recreate database – cannot do incremental updates Synchronize Database Wizard – in SQL Server Management Studio to

synchronize SSAS databases on separate instances. If target database exists data is synchronized, if target does not exist creates a new copy.

Backup and Restore - Backup database restore on another server, like you did in SSAS 2000

Setting Deployment OptionsGo to project properties show configuration settings under Build and Deployment

Deployment Server Edition – Edition of the Server on which solution will be deployed (Enterprise, Standard, Developer)

Output Path – where output files are placed after a build Remove Passwords – Whether to remove known passwords from connection

strings, if removed passwords will need to be supplied when deployed project is processed

Deployment Mode – whether only changed objects or all objects are deployed (e.g. if you have two cubes in the project does it send out both?)

Processing Option – whether to do processing, and if so whether to do full processing of cube on deployment

Transactional Deployment – deploy as a transaction? Deployment Server and Database – where to deploy

Use Deployment Wizard to generate XMLA Script to create the database. Can be run graphically or from command prompt.asdatabase file contains the definitions for all the SSAS objects and is created when you do a build

1. Start – All Programs – Microsoft SQL Server 2008 – Analysis Services – Deployment Wizard

2. Select E:\Labfiles\Mod06\Labfiles\bin\Mod06Lab.asdatabase (this file is created when you do a build of a Project in BIDS)

3. Specify target server and database4. Specify whether or not to deploy roles & partitions5. Expand Data Source Connection strings to show how you can change them for

deployment6. Select whether to deploy & process or just deploy 7. Specify location E:\Mod06\Labfiles\bin\Mod06Lab Script.xmla8. Open E:\Mod06\Labfiles\bin\Mod06Lab Script.xmla9. If you run that script in a SQL Server Management Studio it will create and

process the cube

Create XMLA Scripts using SQL Server Management Studio, then run scripts on production server.

1. Open SQL Server Management Studio, Connect to Analysis Services – NY-SQL-01

2. Rt click database - Script Database As – CREATE TO - Query Editor Window3. Show generated XMLA

Use Synchronize Database Wizard in SSMS to synchronize two separate Analysis services databases. If 2nd database does not exist it is created, if it already exists, synchronizes the data

with the target Target database remains online during synchronization so users can still query First synchronization must synchronize all files Second synchronization can be changes onlyDone with the SYNCHRONIZE XMLA command

1. Go to SQL Server Management Studio2. Rt-Click Databases folder – Choose Synchronize to launch wizard3. Show how you select source & destination servers

SecuritySSAS relies on Windows Authentication to authenticate users. After user is authenticated SSAS controls permissions based on the user’s role membership Fixed server role – for administrators

Can create database roles for users with specific rights Grant permissions for database and cube dimensions, dimension members, cells

within a cube, mining structures, mining models, data sources and stored procedures

Role permissions are additive…if you are granted two roles and one role is denied access the other has access…the access is granted ---this is different from most security models

Granting access to Fixed Server Role (Administrator role)Members of Administrators local group are members of fixed server role automatically

1. Open SQL Server Management Studio2. Highlight Server name (NY-SQL-01) – rt-click Properties3. Go to security Tab, show ‘Add’ for adding users to this role

Creating User Roles1. Open SQL Server Management Studio2. Go to a SSAS Database, expand roles folder3. Rt-click Roles Folder – choose Add Role4. Show tabs where you can define permission levels for different objects5. Open BIDS, create a project, go to Menu-Project Choose New role or in Solution

Explorer Rt Click Roles New role, show tabs to define role in BIDSPermission considerations

You do not need access to a data source to access a cube. You need access to a data source if you use a mining model that connects to a

data source to access user-defined data, so create a role for this purpose if required.

Granting permission on a cube by default grants permissions on dimensions in the cube

Cell PermissionsFor cell level security you need an MDX expression, if expression returns 1 (True) value is displayed, if returns False (0) the value is not displayedRead access – can read cells and includes calculated cells based on these cellsRead Contingent – can read cells but not calculated cells based on these cellse.g. NOT Measures.CurrentMember IS [Measures].[Sales Amount Quota] will hide Sales Amount Quota

Test PermissionsOn Cube Browser tab in BIDS click Change User buttonTest Administrator permissions using Run As

LAB NOTES:If you get a connection error running the .sql script, copy the script to a clipboard, create a new query window and then paste the script into the query window and run it.

Module 8 Maintaining a Multidimensional SolutionLecture: 75 minsLab: 40 mins p8-25 (ex 1 Processing, Ex 2 Logging and Monitoring, Ex 3 Backup and Restore)

Processing One of the important jobs of an Analysis Services DBA is to process the objects (cubes, dimensions, mining models)OLAP databases and Cubes must be deployed and processed.

Deploying creates the schema. Processing is when the multidimensional objects are populated with data and

aggregations are calculated. When processing occurs, the data source is queried to fetch the source data.

Processing is done within a transaction, so if all the dimensions process correctly but the cube processing fails, everything is rolled back.

Processing CubesProcess Default – detects the state of the cube and executes the appropriate processing option (useful after a Process Structure)Process Full – Does 2 steps: Process Data (reading data from data source), and Process Index (processing aggregations and indexes). This processes the object and all objects it contains. Any old data is cleared out. It is done as a transaction. Temporary files are created containing the processing, so users can continue to access the cube during processing. When processing is complete during the Commit the temporary files are moved to production and users cannot access the cube. This takes a lot of memory! You basically have your entire database stored twice plus temporary files created to do calculations as well!! So although we love Process Full, it is not always an option. You should do this (or Data then Index) if the structure of the cube changesProcess Incremental adds new fact data, it creates new files and then merges them with existing files. Warning only use this to add new values, if you use Process Incremental on a fact table that includes values already processed they will be double counted!!!! So add new fact data to a partition and process that partition, or use Process Incremental and specify a query for the new fact data to be processedProcess Data – populates data but does not build indexes or aggregations. It is usually used for cubes, it is similar to Full process except if you process data for a cube it will never reprocess the dimensions it will always use the existing dimension files. If Process Full is failing for a large cube, break it into a Process Data followed by Process IndexProcess Index – creates or rebuilds indexes and aggregations for all processed partitionsUnprocess – clears the dataProcess Structure – creates only the cube definitions for previously processed cubes. You can browse the structure see the names of measures & dimensions, but you cannot query the cube data. When you are satisfied and want the data, run the Process Default so users can query the data.

Processing DimensionsWhen a new row is added to a dimension (e.g. new product), or an attribute of a dimension changes (e.g. employee changes location) you need to reprocess the dimension Process Default – dimension data or indexes are processed if they have not been processed or are out of dateProcess Full the entire dimension is re-processed, dimension data and indexes are dropped and re-created (unavailable to users at that time). Process Update/Incremental – is like an incremental update of a dimension picks up new records and updates to attributesProcess Data – processes the dimension data (not indexes)Process Index – creates indexes for attributes in the dimensions

Demo Go to SQL Server Management Studio1. Open Adventure Works UDM – Cubes – Adventure Works UDM2. Rt –click Process – show how you can change the process options in the drop down

box3. Go to change settings

a. Parallel tasks (how many processors & how much memory have you got?) + all objects as a single transaction (if dimensions succeed and cube fails it all rolls back)

b. Sequential – dimensions and cubes can be done as separate transactions c. Writeback table option – whether or not writeback is enabled so you can

create or modify your cube without going to the data sourced. Process Affected Objects – e.g. dimensions for the cube?e. Dimension Key Errors – by default if there is an error it rolls back the

transaction, you might want to handle duplicate key values or Key Not Found (fact record has product id that does not exist in Product dimension)

4. Show how there is a script button at the top to generate an XMLA script for you

Batch processingYou can process several objects at once in parallel or in sequence and you can control the orderUse Ctl-click to select multiple objects in SQL Server Management Studio and rt-click and choose ProcessUse Ctl-click to select multiple objects in the solution explorer in Visual Studio and rt-click then choose processUse XMLA scripts or SQL Server Agent to automate batch processingTo create an XMLA script use SQL Server Management Studio rt-click ScriptUse SSIS tasks to do batch processing

Demo SSIS Project1. Open BIDS, Create new SSIS project2. Point out the two SSAS objects

a. Analysis Services Processing Task – specify SSAS connection and one or more objects to process, same processing options as in SSMS

b. Analysis Services Execute DDL Task – can execute XMLA scripts

LoggingLogging can be enabled on each instance of SSAS

Five Error Logs Error – maintains errors configured and raised during processing and other

operations Flight Recorder – short-term log tracks activity on Analysis Services instance, used

for troubleshooting, only enable in production when troubleshooting is required. Has high overhead

Query – records statistical information about running queries on instance – good for Usage-Based aggregation design

Exception - should only be used with guidance from Microsoft Support Trace – should only be used with guidance from Microsoft Support

Demo Logging Properties & Query Log5. Go to SQL Server Management Studio, connect to Analysis Services6. Select NY-SQL-01 – Rt-click Properties7. Show the different logs listed8. Go to Log\QueryLog\QueryLogConnectionString – set value create connection to

relational database AdventureWorksDW9. Go to Log\QueryLog\QueryLogTableName = OLAPQueryLog10. Go to Log\QueryLog\CreateQueryLogTable = True11. Server will log 1 in 10 queries by default, change Log\QueryLog\QueryLogSampling

= 1 to log every query12. Click OK To save settings13. Restart Analysis Services to pick up log setting changes14. Connect to DatabaseEngine – NY-SQL-01 – AdventureWorksDW – show

OLAPQueryLog table created15. Go to cube, and execute New Query 16. SELECT [Measures].[Total Product Cost] ON COLUMNS 17. FROM [Adventure Works UDM]18. Go to AdventureWorksDW and execute the query19. SELECT * FROM olapquerylogNow you can go to BIDS and the Aggregations tab and choose Usage Based Optimization from the toolbar to design aggregations based on the queries that have run!

Monitoring with SQL Server ProfilerSQL Server Profiler has events, event classes and event categories to test functionality and performance of MDX queries. You can capture traces in production and replay them in a test environment to test and optimize.Demo SQL Server Profiler

1. Start – SQL Server – Performance Tools – SQL Server Profiler2. File – New Trace – NY-SQL-01 Analysis Services 3. Go to Events – show the different events you can trace

Monitoring with System MonitorSystem Monitor includes counters for SQL Server 2008Object Names start with MSAS 2008Before optimizing you may want to restart or clear the cache so you only have the statistics in which you are interested. The XMLA script on the slide will clear the cache

Demo System Monitor1. Start – All Progams –Administrative Tools – Performance2. Rt-click - Add Counter – show the MSAS counters

Optimization SuggestionsUsage Based Optimization Wizard reads information in query log Make sure query log is representative of usage patternsYou can filter the query log based on a date range, users or frequency of queriesMake sure you have correct counts of records for SSAS to design the aggregations and have the proper attribute relationships

Demo Usage Based Optimization Wizard1. In SQL Server Management Studio2. Expand a OLAP Database3. Expand Measures4. Expand a Measure5. Select a partition rt-click choose Usage Based Optimization Wizard6. OR you can launch it from the aggregation tab in BIDS Usage Based Aggregation

Sample Perfmon countersMSAS 2005: Memory

Memory Limit High KB

N/A Shows (as a percentage) the high memory limit configured for SSAS in C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Config\msmdsrv.ini

MSAS 2005: Memory

Memory Limit Low KB

N/A Shows (as a percentage) the low memory limit configured for SSAS in C:\Program Files\Microsoft SQL Server\MSAS10.MSSQLSERVER\OLAP\Config\msmdsrv.ini

MSAS 2005: Memory

Memory Usage KB

N/A Displays the memory usage of the server process.

MSAS 2005: Memory

File Store KB

N/A Displays the amount of memory that is reserved for the Cache. Note if total memory limit in the msmdsrv.ini is set to 0, no memory is reserved for the cache

MSAS 2005: Storage Engine Query

Queries from Cache Direct / sec

N/A Displays the rate of queries answered from the cache directly

Queries from Cache Filtered / Sec

N/A Displays the Rate of queries answered by filtering existing cache entry.

Queries from File / Sec

N/A Displays the Rate of queries answered from files.

Average time /query

N/A Displays the average time of a query

MSAS 2005: Connection

Current connections

N/A Displays the number of connections against the SSAS instance

MSAS 2005: Connection

Requests / sec

N/A Displays the rate of query requests per second

MSAS 2005: Locks

Current Lock Waits

N/A Displays the number of connections waiting on a lock

MSAS 2005: Threads

Query Pool job queue Length

N/A The number of queries in the job queue

MSAS 2005:Proc Aggregations

Temp file bytes written/sec

N/A Shows the number of bytes of data processed in a temporary file

MSAS 2005:Proc Aggregations

Temp file rows written/sec

N/A Shows the number of bytes of data processed in a temporary file

Backup and RestoreYou need to back up analysis services and your relational databaseAnalysis Services Backup will backupMOLAP – multidimensional structure, aggregates and dataHOLAP – multidimensional structure and aggregates (crucial to backup relational database where data is stored)ROLAP – multidimensional structure (crucial to backup relational database where data is stored

Security considerationsUser performing backup or restore needs appropriate file system permissions to write to backup location and be a member of Analysis services Server role or have Full Control permissions on database they are backing upYou can encrypt a backup with a password

How to Backup Analysis Services Database In SQL Server Management Studio, select database in Object Explorer, rt-click

Backup Use XMLA Script (example in course book) Create SQL Server Agent job – job step type SQL Server Analysis Services

Command and specify XMLA backup script

Backup options Backup file name – name of backup file Database name – name of database to backup Allow overwrite – overwrite existing backup files? Apply compression – compress backup files less space but slower to execute Encrypt Backup file – allows you to encrypt backup so cannot be restored without

password Password – password for encrypted backup file Backup remote partition – whether to backup data from remote partitions Remote partition backup location Security – how to back up roles and permissions (only available in XMLA

backups)

Restoring SSAS Database Sql Server Management Studio – select database in Object Explorer, rt Click

Restore Use XMLA script (example in course book)

Restore options Restore database – name of database to restore From Backup file – location of backup file to use when restoring Allow Database overwrite - replace existing database? Include Security Information – whether to copy security information Password – to decrypt encrypted backups

Chapter 9 Data MiningLecture: 70 minsLab: 45 mins (p9-18 Ex 1 Create Data Mining Structure, Ex 2 Add Data Mining Model, Ex 3 Explore Data Mining Model, Ex 4 Validate Data Mining Model

What is Data Mining?Data Mining is the process of searching through data to extract patterns and trends using various algorithmsUse data mining to predict unknown values based on statistics and patterns in previous dataUse Data mining to:

Explore data – find out profile of users who bought a product Find Patterns – find out what types of products a particular customer purchases Predict – predict sales for the next quarter

Cubes vs Data Mining Cubes show us aggregations (total sales) Data Mining shows us the patterns of

data (products frequently purchased together) Cubes show us what has happened. Data Mining can forecast the future

Data Mining Structure - made up of case tables and one or more data mining models. When you create a mining model SSAS retrieves the data from the data source and stores it in a proprietary format, to avoid storing the same data multiple times, a mining structure is created so the models can share the same data.Case Table – stores the source data for training data mining models. This data is used to train the models so it should be accurate and relevant and plentifulData Mining Model – defines which data mining algorithm to use, which columns the algorithm uses and whether each column is an input column, a key column or a predictable column

Key Columns – identify the row and are usually the primary key of the table (e.g. Customer Key)

Input Columns – are factors that might affect the output (e.g. Age, Marital Status, Gender)

Predictable Columns – are what we want the model to try and predict (e.g. Sales Total, Bike Buyer y/n) A predictable column is often an input column as well

Ignore Columns – Some columns should be ignored by the model (e.g. Name)

Sources Cubes (must be in same database as data mining model) Relational Databases (must define a Data Source View)

Training the model

To train the model you process it in SSAS, training the model involves loading it with data and executing the mathematical algorithm associated with it to derive useful patterns and rules from the input data. These patterns and rules are then stored on the server.

When a model is trained it must read input data from a single table called a case table. If the data you want to analyze is in two tables, you can nest the tables e.g. Product table could be a nested table of the Product Purchased column in the CustomerOrders TableSSAS only supports one level of nesting! So you must design your data tables accordingly

Validating ModelsYou have different models you can use to analyze data, so you can try each one and see which is most accurate. You define one Data mining structure which you use for multiple data mining models much like we define one data source view to user for one or more cubes. Then you use Lift charts to compare results and determine which model is the most accurate for the given data.

Steps in developing and deploying Data Mining Solution1. Define mining domain – gather requirements, what problem are you trying to

solve: customer profiling? Sales forecasting? Determine what data is needed2. Prepare data – put data into a source data format appropriate for Data Mining3. Construct Data Schema – data source view & data mining structure4. Build model – Identify and build models Using Data Mining Wizard5. Explore Model –Mining Viewers6. Validate Model – Mining Accuracy Chart7. Deploy Model – BI Studio or SQL Server Management Studio

Data mining algorithms (2000 only supported Decision Trees & Clustering) Decision trees – predicts probability of each state of input based on each state of

predictable column e.g. try to see if a marketing approach will be successful with customers

Time Series – to predict future continuous values based on historical continuous values e.g. future sales of bikes based on sales in past 3 years

Clustering Algorithm – groups cases into clusters with similar characteristics. Use to assess probability of a point entering a cluster or to assign a data point to a cluster e.g. will this person earn $20-30K a year or $30-60K a year

Association Rules – looks for items that occur together in a transaction (helpful for finding cross-selling opportunities, e.g. most people who took SQL Server also took Windows Server)

Sequence Clustering – clusters together cases with similar sequences. Useful for predicting patterns like the paths users take on a web site

Neural Network Algorithm – similar to decision trees, calculates probabilities for each possible state of the input attribute with each state of the predictable attribute. (e.g predict stock movements)

Naïve Bayes algorithm – used for predictive modeling. Assumes all columns are independent, runs faster, but might not detect all correlations. Use for steps like which customer is most likely to buy a product

Linear Regression Algorithm – relates two continuous columns. Can predict values outside the existing range

Logistic Regression Algorithm – similar to linear regression but constrained to vales the output column can contain. Linear regression might suggest something reaching 110%, Logistic would never exceed 100%

Data Mining Tools Data Mining Wizard -create Data Mining structures Data Mining Designer - to configure the structure, add new data mining models,

train models and create predictionsCreate a Data Mining Structure

1. Open Visual Studio. Open the E:\MOD09\Democode\Adventure Works DM Folder\Adventure Works DM.sln

2. You will need to upgrade it to 2008 and you will need to edit the data source because it currently points to MIAMI AdventureWorksDW instead of NY-SQL-01 AdventureWorksDW2008

3. Labfiles\Starter\AdventureWorksDataMining.sln4. Show Data Source view containing V-TargetMail5. In Object Explorer, click Data Mining Structure – Rt-click New Data Mining

Structure6. Launches Data Mining Wizard7. Select create structure for relational Database8. Select your Data Mining Algorithm (Decision Tree)9. Select the Data Source View to use for the model AdventureWorksDM_DSV10. Select the Case table to use (v_targetMail)11. Select Key Column (the key value that identifies the different rows) Customer

Key12. Select Predictable column (the column whose value you want to try and predict

(BikeBuyer)13. Try Suggest for Input Column to see which columns have patterns for the

predictable column (Bike Buyer, Age, CommuteDistance, MaritalStatus)14. Accept default Columns Content and Data Type (discrete have fixed set of values

e.g. gender, province, continuous have any value e.g. salary)15. Enter name for mining structure and model – allow drill-through16. Process Model

Demo Data Mining Designer1. Mining Structure Tab – to change properties or columns or Data Mining Model

(e.g. Change column property to discrete or continuous)2. Mining Models - to add or modify algorithms used. Add a Clustering Algorithm

– reprocess/train new model3. Mining Model Viewer - to explore the model data, each algorithm has its own

viewer format. (do not bring up more tabs for cluster view it will probably hang!!!!)

a. Show decision tree model, darker boxes have more buyers b. Change level to show how only more important dependencies are listedc. Rt-click on a box and choose drill-through to see the records for this boxd. Click on Dependency Network tab of Mining Model Viewer, move slider

down, it drops off the less significant dependencies so you can see which factors had the largest influence on the predictor column

e. Show Cluster Diagram4. Mining Accuracy Chart – To test accuracy of a mining model or compare

accuracy of several mining models with Lift chartsa. Select v_target Mail as case tableb. Click on Lift chart, can’t chart the models, but scores for models are listed

at the bottom of the chartc. Lift charts show different lines showing prediction for each model –

Ideal Model line is actual data, Random Guess line is without using a model to predict, and other lines are for each mining model in the data mining structure, so you can see which is closest to the Ideal Model line.

d. Classification Matrix tells you numerically how far off model was - Where the row & column match is the actual number, the other value is the error of the model.

5. Mining Model Prediction – To specify input tables and map columns in these tables to the inputs of mining models and display the results of the prediction on your data in different format. Also use Prediction Query Builder to write DMX queries (Data mining Extensions)

a. Select Singleton Query Enter Age 35, Commute Distance 1-2 Miles, Marital Status S

b. In prediction query choose Prediction Function, Field PredictProbability, , Criteria [Bike Buyer],1

c. Choose Switch to Query result view button to see calculated probabilityd. Click Switch to SQL View to show DMX query that was created

SELECT PredictProbability([Bike Buyer],1)From [mm_DecisionTree_bikeBuyers]NATURAL PREDICTION JOIN(SELECT 85 AS [Age], '10+ Miles' AS [Commute Distance], 'M' AS [Marital Status]) AS t

For reference only do not show to students.DMXSELECT <expression_list> FROM <mining_model>[NATURAL] PREDICTION JOIN<source_data> AS <alias> ON <column_mappings>

<expression_list> the predictable columns from the model that will be retrieved (e.g. Customer Age, Education Level, Occupation)

PREDICTION JOIN ON joins mining model to the source dataIf names in mining model match names in source data you can use NATURAL PREDICTION JOIN

<source_data> is the input dataset could be OPENQUERY or OPENROWSET that fetches data from a relational model SELECT statement Another DMX query An application rowset such as a ADO.Net DataReader

e.g. find out probability that a particular customer falls into a certain group and is therefore likely to buy the new bikeSELECT ClusterProbability(‘Cluster B’)FROM [CustomerProfilingMC]NATURAL PREDICTION JOIN(SELECT 35 as [Age], ‘Professional’ As [Occupation], 80000 as [Yearly Income]) as t

Getting Started Data Mining1. What is the business trying to do? What is their product? Their goal? Understand

the business first2. Understand the data, where is the data? Bring it together. Clean it up. If half the

records say ON, and the other half say Ontario, the data mining algorithm won’t know that is the same thing. Create calculated columns, like BikeBuyer to help predictive analysis. Break the data up into two sets about 2/3 to train the model, and 1/3 to test the model.

3. Identify attributes of the data, e.g. for a customer age, income, nbr of children. Which attributes are discrete (fixed number of values like Gender) vs continuous (can be anywhere in a range , age, salary)

4. Select the right algorithm (if you aren’t sure you can validate after the fact)5. Choose a data set to data mine (usually 2/3 of the data, the remaining 1/3 saved to

validate the model. The algorithm will analyze the data and create a data mining model.

6. Use your model to make predictions, or show patternsExamples of real world data miningFraud detection for credit card transactionsAmazon & Chapters – people who bought this book also boughtProfessional sports to identify which players were successful in what situations (which players playing together improved the team)

Data Mining Algorithms in SQL Server 2008Classification – predict one or more discrete variables based on attributes of input data. For example predict if the child will be a boy or girl, how many cars a family will purchase, will someone default on a loan

Microsoft Decision Trees Naïve Bayes Neural Network

Regression – similar to classification but they predict continuous instead of discrete variables. For example predict salary, age, sale price of a house. At least one attribute of the input data should be continuous as well (e.g. purchase price is a continuous input attribute for sale price)

Microsoft Regression Trees Time Series Linear Regression Logistic Regression

Segmentation – very popular – segmenting by input attributes – who is buying bikes, people under 35? Women? People who live < 5 km from work? Helps target customers

ClusteringSequence Analysis – grouping input data by a sequence of operations. What web pages are people going to in what order based on input attributes (target internet advertising, people under 30 go this way, over 30 go this way)

Sequence ClusteringAssociation – which data values are associated. Which products are purchased together. Helps supermarkets figure out what products to put on the same shelf.

Association Rules

6234 Course Notes

Documents