+ All Categories
Home > Documents > CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Date post: 23-Dec-2015
Category:
Upload: tamsin-sanders
View: 214 times
Download: 0 times
Share this document with a friend
34
CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007
Transcript
Page 1: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

CCPR Computing ServicesMore Efficient Programming

Courtney EngelOctober 12, 2007

Page 2: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Outline Overview of programming Thinking through a programming task Ways of efficiently documenting and organizing your

project Naming variables, programs, files Commenting code Including file header Implementing directory structure

Programming constructs Examples

Raw data -> finished product: Replicable?

Page 3: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Overview “Recipe” to complete given task

Commands that tell your computer what to do Language standards determine correct

commands Basic programming allows you to:

Read, write, and reformat data files Perform data calculations Have the computer complete mundane tasks and

minimize human error

Page 4: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Before you start coding… Conceptualize Clearly define the problem in writing Write down the solution/algorithm in English

Modularity Create test (if reasonable)

Translate one section to code Test the section thoroughly Translate/Test next section, etc.

Page 5: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Documentation - File Header*Josie Bruin ([email protected])*HRS project*/u/socio/jbruin/HRS/*October 5, 2007*Stata version 8*Purpose: Create and merge two datasets in Stata,* then convert data to SAS*Input programs: * HRS/staprog/H2002.do, * HRS/staprog/x2002.do, * HRS/staprog/mergeFiles.do*Output: * HRS/stalog/H2002.log, * HRS/stalog/x2002.log, * HRS/stalog/mergeFiles.log * HRS/stadata/Hx2002.dta * HRS/sasdata/Hx2002.sas*Special instructions: Check log files for errors * check for duplicates upon new data release

File header includes: Name (email) Project Project location Date Software Version Purpose of program Inputs Outputs Special Instructions

Page 6: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Naming Files, Variables, and Functions Use language standard (if it exists) Be aware of language-specific rules

Max length, underscore, case, reserved words Meaningful variable names:

LogWt vs. var1 AgeLt30 vs. x

Procedure that cleans missing values of Age: fixMissingAge

Matrix multiplication X transpose times X matXX

Differentiating log files: Programs MergeHH.sas, MergeHH.do Log files MergeHHsas.log, MergeHHsta.log

Page 7: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Commenting Code

Good code is self-commenting Naming conventions, structure/formatting, header should

explain 95% Comments should explain

Purpose of code, not every detail Tricks used Reasons for unusual coding

Comments do not fix sloppy code translate syntax

If it takes longer to read the comment than to read the code, don’t add a comment!

Page 8: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Commenting Code - Stata example

SAMPLE 2*Convert names in dataset to

lowercase.program def lowerVarNames foreach v of varlist _all { local LowName = lower("`v'")

if `"`v'"' != `"`LowName'"' { rename `v' `=lower("`v'")' }

}end

SAMPLE 1program def function1foreach v of varlist _all {local x = lower("`v'")if `"`v'"' != `"`x'"' {rename `v' `=lower("`v'")'}}end

Compare formatting, comments, variable name and function names

Page 9: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Directory Structure

A project consists of many different types of files

Use folders to separate files in a logical way

Be consistent across projects if possible

ATTIC folder for older versions

HOME

PROJECT NAME

DATA

RESULTS

LOG

PROGRAMS

ATTIC

Page 10: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Stata example: using directory structure** Paths:

global parentpath "C:\Documents and Settings\jbruin\Fall07\prog\progtips"global pgmsloc "$parentpath\pgms"global logsloc "$parentpath\logs"global cleandataloc "$parentpath\data\clean"global rawdataloc "$parentpath\data\raw"

log using "$logsloc\test200710", text replace**********************************************************************INSERT FILE HEADER HERE...then it’s included in log file.*********************************************************************macro list

webuse union, clearsave "$rawdataloc\union.dta", replace

keep idcode year age gradesave "$cleandataloc\unionLJP.dta", replace

log close

Page 11: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Programming Constructs

Tools to simplify and clarify your coding Available in virtually all languages Constructs

Loops - for, foreach, do, while If/elseif/else– if, then, else, case continue exit

Page 12: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Loop Construct

The syntax for foreach is

 foreach lname { in | of listtype } list { Stata commands referring to lname }

where lname is the name of the new local macro and listtype is the type of list on which you want to operate.

Page 13: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Loop Example 1 – pulling from 2 lists From Stata FAQ websiteCode:local animalgrp "cat dog cow pig"local noisegrp "meow woof moo oinkoink"local n : word count `animalgrp'

forvalues i = 1/`n' { local animal : word `i' of `animalgrp' local noise : word `i' of `noisegrp' display "`animal’ says `noise'" }Resulting output:cat says meowdog says woofcow says moopig says oinkoink

Page 14: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Loop Example 2 Given indicator variables white, black, other, and continuous

variable EducYrs, create interaction variables Solution using loop:

local allraces "white black other"

foreach race of varlist `allraces' {

generate `race'_educ=`race‘*EducYrs

}

Obs # White Black Other EducYrs White_educ

Black_educ

Other_educ

1 1 0 0 10 10 0 0

2 0 1 0 15 0 15 0

3 0 0 1 20 0 0 20

Page 15: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Loop Example 3 Problem:

Dataset contains variables over multiple years (1970-1990) Need to perform a number of commands separately for 1970, 1975,

1980, 1985. Solution without loop

bysort year: command1 if year==70 | year==75 | year==80 | year==85bysort year: command2 if year==70 | year==75 | year==80 | year==85

Solution with loop

foreach year in 70 75 80 85 { display as result "***Regression for year = `year':" regress ln_wage grade tenure ttl_exp if year==`year' display as result "***Summarize for year = `year':" summarize ln_wage if year==`year'}

Page 16: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Constructs - If/then/else Execute section of code if condition is true:

if condition then

{execute this code if condition true}

end

Execute one of two sections of code: if condition then

{execute this code if condition true}

else

{execute this code if condition false}

end

Page 17: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

If/Else Example

Problem: need to execute commands on an operating system, but only if the os is Unix…the commands will fail if os is anything else

Solution:if "`c(os)'"~="Unix" { display as err "Sorry; this section requires Unix OS."}else { ** continue with unix commands…}

Page 18: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Constructs - Elseif/case Elseif - Execute one of many sections of code:

if condition1 then{execute this code if condition1 true}

elseif condition2 then{execute this code if condition2 true}

else{execute this code if condition1, condition2 are all false}

end

Case- same idea, different name

case condition1 then{execute this code if condition1 true}

case condition2 then{execute this code if condition2 true}

etc.

Page 19: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Elseif Example

Problem: Continue example from if…else, but execute different section of code for Unix, Windows, and Mac

Solution:if "`c(os)'"=="Unix" {

display "This is a Unix environment"

}

else if "`c(os)'" == "Windows" {

display "This is a Windows environment"

}

else if "`c(os)'" =="MacOSX" {

display "This is a MacOS” environment."

}

else {

display as err "`c(os)' not recognized."

}

Page 20: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Example Problem: Given 4 indicator variables (south, union, black,

not_smsa) and 2 discrete variables (age, grade), generate 8 new indicator variables:

south_age21 = south and age > 21, south_gr12 = south and grade > 12 Similarly for union, black, not_smsa

Solution without loop 8 lines of code similar to:

generate newvar = (south==1 & age>21 & age<.) generate newvar = (south==1 & grade>12 & grade<.)

Solution with loopforeach j in south union black not_smsa {

generate `j'_age21 = (age>21 & age<. & `j'==1)

generate `j'_gr12 = (grade>12 & grade<. & `j'==1)

}

Page 21: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Example, cont.*CHECK GENERATED VARIABLES AGAINST ORIGINAL VARIABLESforeach j in south union black not_smsa { quietly count if `j'==1 & age>21 & age<. local origCount = r(N) quietly count if `j'_age21==1 if `origCount' ~= `r(N)' { display "Counts do not match for `j'_age21!" } else display "Counts match for `j'_age21."

quietly count if `j'==1 & grade>12 & grade<. local origCount = r(N) quietly count if `j'_gr12==1 if `origCount' ~= `r(N)' { display "Counts do not match for `j'_gr12!" } else display "Counts match for `j'_gr12."}

Obs#

South Age Grade South_age21 South_gr12

1 1 10 5 0 0

2 1 35 16 1 1

3 0 14 9 0 0

4 0 39 20 0 0

5 1 56 n/a 1 0

6 1 20 13 0 1

7 0 38 11 0 0total 4 2 2

Page 22: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Stata- If qualifier vs If command ifcmd was designed to be used with a single expression Example:

Given variable x with 5 observations: 1, 1, 2, 1, 3 Compare the following three pieces of Stata code:if x==2 { replace x=99}

if x==1 { replace x=99}

replace x=99 if x==2

Page 23: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Stata- If qualifier vs If commandlist x

+---+ | x | |---| 1. | 1 | 2. | 1 | 3. | 2 | 4. | 1 | 5. | 3 | +---+

if x==2 { replace x=99}

. list x

+---+ | x | |---| 1. | 1 | 2. | 1 | 3. | 2 | 4. | 1 | 5. | 3 | +---+

if x==1 { replace x=99(5 real changes made)}

list x

+----+ | x | |---- | 1. | 99 | 2. | 99 | 3. | 99 | 4. | 99 | 5. | 99 | +----+

replace x=99 if x==1(3 real changes made)

list x

+----+ | x | |---- | 1. | 99 | 2. | 99 | 3. | 2 | 4. | 99 | 5. | 3 | +----+

Page 24: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Constucts -- Continue Example from Stata online help Continue is used to exit current iteration of loop and

continue with next iteration The following two loops produce the same result:

forvalues x = 1/10 { if mod(`x',2)==1 { display "`x' is odd" continue } display "`x' is even"}

forvalues x = 1/10 { if mod(`x',2)==1 { display "`x' is odd" } else { display "`x' is even" }}

3 R 1/3 3 10 - 9 1 mod(10,3)=1

Page 25: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Constructs – Exit Stop execution of program (only “hello” displayed) Examples:

Do-file contains a number of data checks followed by analysis commands. If data checks reveal something unacceptable, you can exit out of do-file before running analysis.

Program requires user input. If user enters “bad” information, need to quit program.

Debugging. If particular error occurs then break. Check denominator prior to dividing. If equals zero, exit.

display “hello”exitdisplay “goodbye”

Page 26: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Raw data to finished product

Raw data

Analysis data

Runs/results

Finished product

Page 27: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Raw Data -> Analysis Data

Always have two distinct data files- the raw data and analysis data

A program should completely re-create analysis data from raw data

NO interactive changes!! Final changes must go in a program!!

Page 28: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Raw Data -> Analysis Data

Document all of the following: Outliers? Errors? Missing data? Changes to the data?

Remember to check- Consistency across variables Duplicates Individual records, not just summary stats

Page 29: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Analysis Data -> Results

All results should be produced by a program Program should use analysis data (not raw) Have a “translation” of raw variable names ->

analysis variable names -> publication variable names

Page 30: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Analysis Data -> Results

Document- How were variances estimated? Why? What algorithms were used and why? Were

results robust? What starting values were used? Was

convergence sensitive? Did you perform diagnostics? Include in

programs/documentation.

Page 31: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Log files

Your log file should tell a story to the reader. As you print results to the log file, include

words explaining the results Include not only what your code is doing, but

your reasoning and thought process Don’t output everything to the log-file- use quietly and noisily in a meaningful way.

Page 32: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Project Clean-up

Create a zip file that contains everything necessary for complete replication

Use a readme.txt file to describe zip contents Delete/archive unused or old files Include any referenced files in zip When you have a final zip archive containing

everything- Open it in it’s own directory and run the script Check that all the results match

Page 33: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

CCPR’s Cluster and helping your research Software and Data

STATA, SAS, R, Compilers, text editors, etc HRS, CPS (Unicon version), AddHealth, IFLS, etc

Efficiency Your PC is available for other work when you submit a job

to the cluster Faster processors More RAM Easy to share data, programs, etc. with colleagues via the

cluster Obtain access by requesting an account

http://lexis.ccpr.ucla.edu/account/request/

Page 34: CCPR Computing Services More Efficient Programming Courtney Engel October 12, 2007.

Questions/Feedback Please email me if you need help in the future

[email protected]


Recommended