+ All Categories
Home > Documents > Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics...

Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics...

Date post: 02-Jan-2016
Category:
Upload: charlene-greer
View: 216 times
Download: 0 times
Share this document with a friend
26
Various topics Petter Mostad 2005.11.14
Transcript
Page 1: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Various topics 

Petter Mostad

2005.11.14

Page 2: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Overview

• Epidemiology

• Study types / data types

• Econometrics

• Time series data

• More about sampling– Estimation of required sample size

Page 3: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Epidemiology

• Epidemiology is the study of diseases in a population– prevalence

– incidence, mortality

– survival

• Goals– describe occurrence and distribution

– search for causes

– determine effects in experiments

Page 4: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Some study types

• Observational studies– Cross-sectional studies– Cohort studies – Longitudinal studies– Case / control studies

• Experimental studies– Randomized, controlled experiments– Interventions

Page 5: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Cross-sectional studies

• Examines a sample of persons, at a single timepoint

• Time effects rely on memory of respondents

• Good for estimating prevalence

• Difficult for rare diseases

• Response rate bias

Page 6: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Cohort studies and longitudinal studies

• A sample (cohort) is followed over some time period.

• If queried at specific timepoints: Longitudinal study

• Gives better information about causal effects, as report of events is not based on memory

• Requires that a substantial group developes disease, and that substantial groups differ with respect to risk factors

• Problem: Long time perspective

Page 7: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Case – control studies

• Starts with a set of sick individuals (cases), and adds a set of controls, for comparison.

• Cases and controls should be from same populations

• Matching controls

• Good method for rare diseases

• Problem: Bias from selection

Page 8: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Measures of risk

• Relative risk

• Odds ratio

• Incidence rate ratio

• Attributable risk

Page 9: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Econometrics

• ”Econometrics is the field of economics that concerns itself with the application of mathematical statistics and the tools of statistical inference to the empirical measurement of relationships postulated by economic theory”

• Is the unification of– economic statistics

– quantitative economic theory

– mathematical economics

Page 10: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

About econometrics

• Variations and extensions of the regression model– heteroscedasticity– autocorrelation models– panel data– logistic regression – non-linear regression models– multivariate regression

• Matrix computations (linear algebra) is almost indispensable tool

• Time series data • Simultaneous equations models

Page 11: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Heteroscedasticity

• Recall: When the variances of independent errors in the model vary, the model is heteroscedastic.

• Example: In a regression model of house size against income, the variance of house sizes might increase with income

• In case of heteroscedasticity, ordinary regression models are not optimal.

• Previously, we mentioned variable transformation as a possible solution

• Much more advanced solutions exist, when the heteroscedasticity is known or can be estimated: Generalized least squares,…

Page 12: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Autocorrelations

• Recall: When for example the data is from a time series, the random errors for adjacent time steps might be correlated!

• Improvements in model might reduce problem• Standard regression methods are not optimal• Modelling and estimating the autoregression gives

improved results

Page 13: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Panel data

• Data collected for the same sample, at repeated time points

• Corresponds to longitudinal epidemiological studies

• A combination of cross-sectional data and time series data

• Increasingly popular study type

Page 14: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Analyzing panel data

• Fixed effects: Standard regression, but using a constant term differing for each individual– We get a parameter for each person!

• Random effects: A stochastic variable models variation connected to individual– The individual variation is assumed drawn from a

distribution with fixed variance

– A generalization of least squares is needed for computations

Page 15: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Analyzing panel data

• Heteroscedasticity might also here be a problem

• Autocorrelations

• Dynamic models: Lagged variables

Page 16: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Logistic regression

• What if the dependent variable is an indicator variable?

• The model then has two stages: First, we predict a value zi from predictors as before, then the probability of indicator value 1 is given by

• Given data, we can estimate coefficients in a similar way as before

/(1 )z ze e

Page 17: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Non-linear regression models

• Ordinary regression is very useful, but it is limited by the linear form of the equations

• Sometimes, variable transformations can bring the connection between variables to a linear form

• Other times, this is not possible: The relationship describes the dependent variable as some function of independent variables and some random error.

• The model may still be estimated by minimizing the errors. This is non-linear regression.

Page 18: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Multivariate regression

• Instead of one dependent variable, one can have a vector of dependent variables

• A theory of multivariate multiple regression can be developed (with the help of matrix algebra): Many similar results to ordinary multiple regressions

• Captures the dependencies between dependent variables

Page 19: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Simultaneous equations models

• Often, you want to describe interdependencies between variables, rather than explaining one variable in terms of others

• Example: – Demand is a function of various variables, including price– The same is the case with supply– Setting demand = supply creates simultaneous equations

• Identifiability?• Estimation: Least squares is not optimal; other methods

exist

Page 20: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Time series models

• Time series issues: – Identifying trends, cycles, etc.– Predicting future values

• Autoregressive models:– Explicit models for time dependencies:

• (Box-Jenkins, ARIMA models)

1 1t t tX X

1 1 2 2t t t tX X X 1( ( , ) )j

t t jCorr X X AR(1)

AR(2)

Page 21: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

The runs test (for random samples)

• In a random sample, the probability that an observation is above or below the median is independent of whether the previous observation is.

• A run is a (maximal) sequence of observations such that all are above the median, or all are below.

• For n observations, the number of runs has a null distribution under the assumption of no autocorrelation. With too few runs, the null hypothesis of no autocorrelation can be rejected. (Table in Newbold).

• For large samples, a formula based on a normal approximation can be used.

Page 22: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Sampling in practice

• Newbold mentions: 1. Information required? 2. Relevant population? 3. Sample selection?4. Obtaining information? 5. Inferences from sample? 6. Conclusions?

• Sampling / nonsampling errors

Page 23: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Types of sampling

• Simple random sampling

• Stratified sampling

• Cluster sampling

• Two-phase sampling (using pilot studies)

• Each requires somewhat adjusted formulas for estimation

Page 24: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Correcting for finite population in estimations

• Our estimates of for example population variances, population proportions, etc. assumed an ”infinite” population

• When the population size N is comparable to the sample size n, a correction factor is necessary. (Why?)

• Examples: – Variance of population mean estimate: – Variance of population proportion estimate:

2 (1 ) ( )ˆ

1p

p p N n

n N

22 ( )

ˆX

s N n

n N

Page 25: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Estimation of required sample size

• An important part of experimental planning

• The answer will generally depend on the parameters you want to estimate in the first place, so only a rough estimate is possible

• However, a rough estimate may sometimes be very important to do

• A pilot study may be very helpful

Page 26: Various topics Petter Mostad 2005.11.14. Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Example: Estimating the mean of a normally distributed population

• We want to estimate mean • We want a confidence interval to extend a distance

a from the estimate• We guess at the population variance• A sample size estimate:

• If we have a population of size N, and want a specified , we get

2

2 2 2/ 2

2 2

4Zn

a a

at 95% confidence

2X 2

2 2( 1) X

Nn

N


Recommended