+ All Categories
Home > Documents > Lections №4 Using Microsoft Excel for the statistical calculations.

Lections №4 Using Microsoft Excel for the statistical calculations.

Date post: 02-Jan-2016
Category:
Upload: hugo-barton
View: 254 times
Download: 3 times
Share this document with a friend
40
Lections Lections 4 4 Using Microsoft Excel for the statistical calculations
Transcript
Page 1: Lections №4 Using Microsoft Excel for the statistical calculations.

LectionsLections № №44

Using Microsoft Excel for the statistical calculations

Page 2: Lections №4 Using Microsoft Excel for the statistical calculations.

Main QuestionsMain Questions

Using Microsoft Excel for the mathematic calculations.

Statistical calculations in the Microsoft Excel.

Curve Fitting Using Excel

Page 3: Lections №4 Using Microsoft Excel for the statistical calculations.

1.1.MathematicMathematic calculationscalculations in in the Microsoft Excelthe Microsoft Excel

Structure of the Excel equation.Arguments of functions in the ExcelEquation Wizard

Page 4: Lections №4 Using Microsoft Excel for the statistical calculations.

1.1.11..Structure of the Excel equationStructure of the Excel equation

reference to the cell (relative)

Function with lists of the arguments

Mathematic operator

Equation start symbol

Simple equation example: =(А4+В8)*С6=(А4+В8)*С6;; Composite equation example:

Page 5: Lections №4 Using Microsoft Excel for the statistical calculations.

1.1.22. . AArgumentrgumentss of function of functionssConstantsConstants – textual or numbering data;

RReference toeference to the cell the cell – address of cell (or cells) that contain data for processing. There are two types of the reference:

relativerelative –– change when equation moved around table, for example: : F7F7;;

absoluteabsolute –– do not change when equation moved around table : :

on to the cell, for example : $F$7$F$7; on to the table column, for example : $F7$F7; on to the table row, for example : F$7F$7;

Page 6: Lections №4 Using Microsoft Excel for the statistical calculations.

1.1.22.1. .1. Arrays as argumentsArrays as arguments

ArrayArray – – address of the cells are separated by : : ((coloncolon)) – – you must define address of the left top and right bottom cells of the array. For example: definition C4:C7C4:C7 represented the array with elements C4, C5, C6, C7C4, C5, C6, C7;;

Set Set – – address of the cells are separated by ; ; ((semicolonsemicolon)) – you must define address of the each cells of the array. For example: definition D2:D4;D6:D8D2:D4;D6:D8 – represented the array with elements D2, D3, D4, D6, D7, D8D2, D3, D4, D6, D7, D8.

Page 7: Lections №4 Using Microsoft Excel for the statistical calculations.

1.1.33. . Using the Equation WizardUsing the Equation Wizard Run wizardRun wizard – – use command Insert-FunctionInsert-Function of

the main menu or click on FunctionFunction icons on the toolbar

StepStep 1 1 – – in dialog box select category of the functions (CategoryCategory list) and choose function name in sub-list. Click ОКОК to finish;

StepStep 2 2 – – input arguments of the function (constant or address of the cell). Different function has different counts of the arguments ;

You can input data manual or click ChooseChoose button and select input area on the Excel’s worksheet.

Page 8: Lections №4 Using Microsoft Excel for the statistical calculations.

Step 1 : You can select category and function name

Page 9: Lections №4 Using Microsoft Excel for the statistical calculations.

Using the Equation WizardUsing the Equation Wizard

Step 2 : You can input arguments of the function

Page 10: Lections №4 Using Microsoft Excel for the statistical calculations.

2.2.Statistical calculations in the Statistical calculations in the Microsoft ExcelMicrosoft Excel

Descriptive statistics.Statistical hypothesis testing.Data Analysis add-on.

Page 11: Lections №4 Using Microsoft Excel for the statistical calculations.

22..1.1.Descriptive statisticsDescriptive statistics StatisticStatistic - - Measure of a sample characteristic.Measure of a sample characteristic. PopulationPopulation - Contains all members of a group. SampleSample - A subset of a population. Interval DataInterval Data - Objects classified by type or characteristic,

with logical order and equal differences between levels of data.

Ordinal DataOrdinal Data - Objects classified by type or characteristic with some logical order.

VariableVariable - A characteristic that can form different values from one observation to another.

Independent VariableIndependent Variable - A measure that can take on different values which are subject to manipulation by the researcher.

Response VariableResponse Variable - The measure not controlled in an experiment.  Commonly known as the dependent variable. 

Page 12: Lections №4 Using Microsoft Excel for the statistical calculations.

22..1.1.1.Descriptive statistics1.Descriptive statisticsFor interval level datainterval level data, measures of central tendency

and variation are common descriptive statistics. Measures of central tendencycentral tendency describe a series of

data with a single attribute. Measures of variationvariation describe how widely the data

elements vary. Standardized scoresStandardized scores combine both central tendency

and variation into a single descriptor that is comparable across different samples with the same or different units of measurement.

For nominal/ordinal datanominal/ordinal data, proportions are a common method used to describe frequencies as they compare to a total.

Page 13: Lections №4 Using Microsoft Excel for the statistical calculations.

22..1.1.2.Descriptive statistics2.Descriptive statistics

Page 14: Lections №4 Using Microsoft Excel for the statistical calculations.

22..1.1.3.Descriptive statistics3.Descriptive statistics MeanMean - the arithmetic average of the scores in a

sample distribution. MedianMedian - the point on a scale of measurement below

which fifty percent of the scores fall. ModeMode - the most frequently occurring score in a

distribution. RangeRange - The difference between the highest and

lowest score (high-low). VarianceVariance - The average of the squared deviations

between the individual scores and the mean. The larger the variance the more variability there is among the scores.

Standard deviationStandard deviation - The square root of variance. It provides a representation of the variation among scores that is directly comparable to the raw scores.

Page 15: Lections №4 Using Microsoft Excel for the statistical calculations.

22..1.1.4.Descriptive statistics4.Descriptive statistics

Page 16: Lections №4 Using Microsoft Excel for the statistical calculations.

22..1.1.5.Descriptive statistics5.Descriptive statistics

Statistical Statistical parameterparameter name name

Excel function nameExcel function name

English ver.English ver. Russian ver.Russian ver.

Mean AVERAGE СРЗНАЧ

Max MAX МАКС

Min MIN МИН

Variance VAR ДИСП

Standart deviation STDEV СТАНДОТКЛОН

Coef. of skewness SKEWNEES СКОС

Coef. of kurtosis KURT ЭКСЦЕС

Page 17: Lections №4 Using Microsoft Excel for the statistical calculations.

22.2.2..SStatistical Hypothesis Testing tatistical Hypothesis Testing The Normal Distribution.The Normal Distribution. Although there are

numerous sampling distributions used in hypothesis testing, the normal distribution is the most common example of how data would appear if we created a frequency histogram where the x axis represents the values of scores in a distribution and the y axis represents the frequency of scores for each value.

Most scores will be similar and therefore will group near the center of the distribution.

Some scores will have unusual values and will be located far from the center or apex of the distribution. .

Page 18: Lections №4 Using Microsoft Excel for the statistical calculations.

22.2.2..1.The Normal Distribution1.The Normal DistributionProperties of a normal distribution: Forms a symmetric bell-shaped curve 50% of the scores lie above and 50% below the midpoint

of the distribution Curve is asymptotic to the x axis Mean, median, and mode are located at the midpoint of

the x axis

Page 19: Lections №4 Using Microsoft Excel for the statistical calculations.

22.2.2..SStatistical Hypothesis Testing tatistical Hypothesis Testing Hypothesis testingHypothesis testing is used to establish whether

the differences exhibited by random samples can be inferred to the populations from which the samples originated.

Chain of reasoning for inferential statistics Chain of reasoning for inferential statistics Sample(s) must be randomly selected Sample(s) must be randomly selected Sample estimate is compared to Sample estimate is compared to

underlying distribution of the same size underlying distribution of the same size sampling distribution sampling distribution

Determine the probability that a sample Determine the probability that a sample estimate reflects the population parameterestimate reflects the population parameter

Page 20: Lections №4 Using Microsoft Excel for the statistical calculations.

22.2.2..1.S1.Statistical Hypothesis Testing tatistical Hypothesis Testing The four possible outcomes in hypothesis

testing:

DECISION

Actual Population Comparison

Null Hyp. True(there is no difference)

Null Hyp. False(there is a difference)

Rejected Null Hypothesis

Type I error (alpha)

Correct Decision

Did not Reject Null

Correct Decision Type II Error

Page 21: Lections №4 Using Microsoft Excel for the statistical calculations.

22.2.2..2.S2.Statistical Hypothesis Testing tatistical Hypothesis Testing When conducting statistical tests with computer software, the

exact probability of a Type I error is calculated. It is presented in several formats but is most commonly reported as "p <p <" or "SigSig." or "SignifSignif." or "SignificanceSignificance." The following table links p values with a benchmark alpha of 0.05:

P < Alpha Probability of Type I Error Final Decision

0.05 0.05 5% chance difference is not significant

Statistically signif.

0.10 0.05 10% chance difference is not significant

Not statistically signif.

0.01 0.05 1% chance difference is not significant

Statistically signif.

0.96 0.05 96% chance difference is not significant

Not statistically signif.

Page 22: Lections №4 Using Microsoft Excel for the statistical calculations.

22.2.2..3.S3.Statistical Hypothesis Testing tatistical Hypothesis Testing

General assumptions:General assumptions: Population is normally distributed Population is normally distributed Random sampling Random sampling Mutually exclusive comparison samples Mutually exclusive comparison samples Data characteristics match statistical Data characteristics match statistical

techniquetechnique.For intervalinterval / / ratioratio data use: t-tests, Pearson t-tests, Pearson

correlation, ANOVA, regressioncorrelation, ANOVA, regression For nominalnominal / / ordinalordinal data use: Difference of Difference of

proportions, chi square and related proportions, chi square and related measures of associationmeasures of association

Page 23: Lections №4 Using Microsoft Excel for the statistical calculations.

2.2.2.2.44..Hypothesis Testing Testing Hypothesis Testing Testing

State the HypothesisState the Hypothesis Null Hypothesis (Ho):Null Hypothesis (Ho): There is no difference between

___ and ___. Alternative Hypothesis (Ha):Alternative Hypothesis (Ha): There is a difference

between __ and __. Rejection CriteriaRejection Criteria This determines how different the parameters and/or

statistics must be before the null hypothesis can be rejected. This "region of rejection" is based on alphaalpha () - the error associated with the confidence level. The point of rejection is known as the critical valuecritical value.

For the medical For the medical investigationsinvestigations use value use value = 0,05 = 0,05 (5%)(5%).

Page 24: Lections №4 Using Microsoft Excel for the statistical calculations.

Practical point of the view

Statistical point of the view

Additional conditions Appropritate method

Comparing the control and experimental samples

Comparing Two Independent Sample Means

Normal distribution

Variances are equal

T-test with homogeneity of Variance

Variances arenot equal

T-test without homogeneity of Variance

Without variance test

T-test without variance test

Not Normal

distribution

Variances are equal

U-test (Willcocson - Mann – Uitny)

Without variance test

Median test

Comparing the sample data before and after experiment

Comparing Two Dependent Sample Means

Normal distribution T-test for the dependent sample

Not Normal distribution One sample signed test (Willcocson)

Comparing a Sample Mean to a constant

Comparing a Population Mean to a Sample Mean

Normal distributionComparing a constant to a Sample Mean (T-test)

Not Normal distribution Gupt signed test

Comparing the parameter diffusion in two samples

Comparing Two Independent Sample Variances

Normal distribution Computing F-ratio

Not Normal distribution Zigel-Tiuky, Mozes tests

Page 25: Lections №4 Using Microsoft Excel for the statistical calculations.

2.3.The Analysis ToolPak 2.3.The Analysis ToolPak Performing statistical analyses on sample data is

very convenient to do in Excel. It has dozens of built-in spreadsheet functions that allow us to perform all sorts of statistics calculations. The Analysis ToolPak add-in Analysis ToolPak add-in also contains several other statistical tools.

To make sure you have the Analysis ToolPak Analysis ToolPak add-inadd-in available in your version of Excel, select ToolsTools from the main menu bar and see if the Data Data AnalysisAnalysis menu option appears toward the bottom of the Tools menu. If not, select Tools - Add-InsTools - Add-Ins from the main menu bar and select the Analysis Analysis ToolPakToolPak option from the list.

Page 26: Lections №4 Using Microsoft Excel for the statistical calculations.

2.3.1.The Analysis ToolPak 2.3.1.The Analysis ToolPak The Analysis ToolPakAnalysis ToolPak provides several tools for

conducting statistical tests. These tools include: F-Test Two-Sample for VariancesF-Test Two-Sample for Variances t-Test Paired Two-Sample for Meanst-Test Paired Two-Sample for Means t-Test Two-Sample Assuming Equal Variancest-Test Two-Sample Assuming Equal Variances t-Test Two-Sample Assuming Unequal Variancest-Test Two-Sample Assuming Unequal Variances z-Test Two-Sample for Meansz-Test Two-Sample for MeansTo access these tools, select Tools Data AnalysisData Analysis

from the main menu bar to open the Data Analysis Data Analysis dialog boxdialog box. You'll find each of the statistical test tools listed in this dialog box.

Page 27: Lections №4 Using Microsoft Excel for the statistical calculations.

MS EXCEL Add-ins dialog box

Page 28: Lections №4 Using Microsoft Excel for the statistical calculations.

The Data Analysis ToolPakThe Data Analysis ToolPak

Data Analysis dialog boxData Analysis dialog box

Page 29: Lections №4 Using Microsoft Excel for the statistical calculations.

33. . Curve Fitting Using ExcelCurve Fitting Using Excel

Understanding Curve Fitting.MS Excel trendline feature.

Page 30: Lections №4 Using Microsoft Excel for the statistical calculations.

3.1. Understanding 3.1. Understanding Curve FittingCurve FittingCurve fittingCurve fitting is the process of trying to find

the curve (which is represented by some model equation) that best represents the sample data, or more specifically the relationship between the independent and dependent variables in the dataset.

When the results of the curve fit are to be used for making new predictions of the dependent variable, this process is known as regressionregression analysis analysis.

Page 31: Lections №4 Using Microsoft Excel for the statistical calculations.

3.1. Understanding 3.1. Understanding Curve FittingCurve Fitting The Linear Linear trendline uses the equation:

у = k • x + b,у = k • x + b,

– where kk and bb are parameters to be determined during the curve-fitting process.

The LogarithmicLogarithmic trendline uses the equation:

у = у = сс • ln(x) + b, • ln(x) + b,

– where cc and bb are parameters to be determined during the curve-fitting process.

Page 32: Lections №4 Using Microsoft Excel for the statistical calculations.

3.1. Understanding 3.1. Understanding Curve FittingCurve Fitting The Power Power trendline uses the equation:

у = с • ху = с • хbb,,

– where cc and bb are parameters to be determined during the curve-fitting process.

The ExponentialExponential trendline uses the equation:

у = с • еу = с • еbb • х, • х,

– where cc and bb are parameters to be determined during the curve-fitting process.

Page 33: Lections №4 Using Microsoft Excel for the statistical calculations.

3.1. Understanding 3.1. Understanding Curve FittingCurve Fitting

The Polynomial Polynomial trendlines use the equation:

у = у = bb + + сс11 х + х + сс22 х х22 + + сс33 х х33 + + сс44 х х44 + + сс55 х х55 +с +с66 х х66

– where the cc-coefficients and bb are

parameters of the curve fit. Excel supports

polynomial fits up to sixth orderup to sixth order.

Page 34: Lections №4 Using Microsoft Excel for the statistical calculations.

3.2. MS Excel trendline feature3.2. MS Excel trendline feature

The 5 listed before curve fitscurve fits are easily

generated using the trendline feature built into

Excel's XY scatter chart.

Once you've plotted your data using an XY

scatter chart, you can generate a trendlinetrendline

that will be displayed on your chart,

superimposed over your data.

You can also include the resulting equationequation

for the best-fit line on your chart.

Page 35: Lections №4 Using Microsoft Excel for the statistical calculations.

3.2. MS Excel trendline feature3.2. MS Excel trendline featureTo use a trendlinetrendline feature in the Excel chart: Create chart, that based on your data samples (recommended

use an XY scatterXY scatter or linear linear chart type). Right-click on the data series and select Add TrendlineAdd Trendline from

the pop-up menu. The Add TrendlineAdd Trendline dialog box will shown. Select the Trend/RegressionTrend/Regression type that you need. On to the

OptionsOptions tab select "Display equation on chartDisplay equation on chart" and "Display Display R-squared value on chartR-squared value on chart.“

– The former will display the resulting best-fit equation on your chart

– The latter will also include the R-squared value, allowing you to assess the goodness of the fit.

Press OKOK to go back to your chart and see the resulting trendline.

Page 36: Lections №4 Using Microsoft Excel for the statistical calculations.

3.2. MS Excel trendline feature3.2. MS Excel trendline feature

The The Add Add Trendline Trendline dialog boxdialog box

Page 37: Lections №4 Using Microsoft Excel for the statistical calculations.

3.2. MS Excel trendline feature3.2. MS Excel trendline feature

The The Add Add Trendline Trendline Options Options

tabtab

Page 38: Lections №4 Using Microsoft Excel for the statistical calculations.

Various trendlinesVarious trendlines

Page 39: Lections №4 Using Microsoft Excel for the statistical calculations.

ConclusionConclusion

In this lecture was described next questions:Using Microsoft Excel for the

mathematic calculations.Statistical calculations in the Microsoft

Excel.Curve Fitting Using Excel.

Page 40: Lections №4 Using Microsoft Excel for the statistical calculations.

LiteratureLiterature

Electronic documentation on to the intranet server:http://miserverhttp://miserver

http://10.21.0.49http://10.21.0.49


Recommended