11
Unix Talk #2Unix Talk #2
AWK overviewAWK overviewPatterns and actionsPatterns and actionsRecords and fieldsRecords and fields
Print vs. printfPrint vs. printf
22
IntroductionIntroduction
Students' grades in a text fileStudents' grades in a text file JohnJohn 22 56 38 70 85 8022 56 38 70 85 80 AlexAlex 90 89 7990 89 79 98 3598 35 How can I calculate John's current average within this fileHow can I calculate John's current average within this file GREP?GREP?
– Search for John with grep? Gives me the line.Search for John with grep? Gives me the line.– Now I can use my calculator to figure it out.Now I can use my calculator to figure it out.– SED?SED?
sed will allow me to print, change, delete, etc.sed will allow me to print, change, delete, etc.
I really want to automatically manipulate the values within this line.I really want to automatically manipulate the values within this line.
This is where awk comes in.This is where awk comes in. (awk me amadeus) (awk me amadeus)
33
awkawk
The first initials from the last names of each The first initials from the last names of each of the authors, Aho, Weinberg and of the authors, Aho, Weinberg and KernighanKernighan
Which awk are we tawking about?Which awk are we tawking about?– awkawk– nawk – new awk ( on CS machines )nawk – new awk ( on CS machines )– gawk – GNU awk ( bart ) gawk – GNU awk ( bart )
44
AWK syntaxAWK syntax
awk ‘/pattern/’ fileawk ‘/pattern/’ file awk ‘{action}’ fileawk ‘{action}’ file awk ‘/pattern/ {action;}' fileawk ‘/pattern/ {action;}' file cat file | awk ‘{action}’cat file | awk ‘{action}’
Awk automatically reads in the file for you Awk automatically reads in the file for you line line by line.by line.– No need to open/close file. (like in C or Java)No need to open/close file. (like in C or Java)– pattern section FINDS LINES with that patternpattern section FINDS LINES with that pattern– action section does the actions you defined on the action section does the actions you defined on the
lines it foundlines it found– The original file does not change.The original file does not change.
55
Simple exampleSimple example
awk ‘{ print }’ fruit_pricesawk ‘{ print }’ fruit_prices
Note: Here the pattern is missing, in this Note: Here the pattern is missing, in this case, the awk command case, the awk command printprint is used to is used to print each line it readprint each line it read
66
Simple exampleSimple example
awk ‘awk ‘
/\$[0-9]*\.[0-9][0-9]*/ { print}/\$[0-9]*\.[0-9][0-9]*/ { print}
‘ ‘ fruit_pricesfruit_prices
77
ActionAction
Actions are specified by the programmers not just Actions are specified by the programmers not just print, delete, etc (p/d/s from sed). That is why it is print, delete, etc (p/d/s from sed). That is why it is so awesome!so awesome!
Actions consists of Actions consists of – variable assignments, variable assignments, – arithmetic and logic operators, arithmetic and logic operators, – decision structures, decision structures, – looping structures. looping structures.
For example, print, if, while and forFor example, print, if, while and for awk ‘{print}’ filenameawk ‘{print}’ filename
88
Execution typesExecution types
format 1: awk ‘script’format 1: awk ‘script’– where INPUT must come from pipe or STDINwhere INPUT must come from pipe or STDIN– command | awk ‘script’command | awk ‘script’
format 2: awk ‘script’ input1 input2 ... inputnformat 2: awk ‘script’ input1 input2 ... inputn– where we supply input FILES as input1, input2, etc.where we supply input FILES as input1, input2, etc.
format 3: awk -f script_file input1...format 3: awk -f script_file input1... (# in "script..." is comment)(# in "script..." is comment)
99
PatternPattern
TypesTypes– Regular expressionsRegular expressions– BEGINBEGIN
Do all the stuff BEFORE reading any input Do all the stuff BEFORE reading any input – ENDEND
does all this stuff AFTER reading ALL input. does all this stuff AFTER reading ALL input. Pattern is optionalPattern is optional If no pattern is specified, the "action" will occur for EVERY If no pattern is specified, the "action" will occur for EVERY
LINE one @ time.LINE one @ time. awk ‘{Action}’ filenameawk ‘{Action}’ filename awk '{print;}' namesawk '{print;}' names prints all linesprints all lines awk ‘BEGIN {print “The average grades”}’awk ‘BEGIN {print “The average grades”}’
1010
Awk Regular Expression Awk Regular Expression MetacharactersMetacharacters
SupportsSupports– ^, $, ., *, +, ?, [ABC], [^ABC],^, $, ., *, +, ?, [ABC], [^ABC],– [A-Z], A|B, (AB)+, \, &[A-Z], A|B, (AB)+, \, &
Not supportNot support– Backreferencing, \( \)Backreferencing, \( \)– Repetition, \{ \}Repetition, \{ \}
1111
awk ‘awk ‘BEGIN { actions ; }BEGIN { actions ; }/pattern/ { actions ; }/pattern/ { actions ; }/pattern/ { actions ; }/pattern/ { actions ; }END { actions ;}END { actions ;}
‘ ‘ filesfiles
Execution steps:Execution steps:1)1) If a BEGIN pattern is present, executes its actions If a BEGIN pattern is present, executes its actions 2)2) Reads an input line and parses it into fieldsReads an input line and parses it into fields3)3) Compares each of the specified patterns against the input line, Compares each of the specified patterns against the input line,
if find a match, executes the actions. This step is repeated for if find a match, executes the actions. This step is repeated for all patterns.all patterns.
4)4) Repeats steps 2 and 3 while input lines are presentRepeats steps 2 and 3 while input lines are present5)5) After the script reads all the input lines, if the END pattern is After the script reads all the input lines, if the END pattern is
present, executes its actionspresent, executes its actions
1212
Try This!Try This!
Place the following in the file tryawk1.awkPlace the following in the file tryawk1.awkBEGIN { print "Starting to read input";BEGIN { print "Starting to read input"; nLines = 0; }nLines = 0; }/^.*$/ { nLines++; }/^.*$/ { nLines++; }END { print “DONE: Total lines = “ nLines; }END { print “DONE: Total lines = “ nLines; }
– Run the command: Run the command: cat tryawk1.awk | cat tryawk1.awk | awk –f tryawk1.awkawk –f tryawk1.awk
– Counts the # of lines in the inputCounts the # of lines in the input nLines is a variable … note NO declaration, just usenLines is a variable … note NO declaration, just use print command prints a line of text, adds newline to print command prints a line of text, adds newline to
end of the lineend of the line
1313
Records and fieldsRecords and fields awk has RECORDS (lines) and FIELDSawk has RECORDS (lines) and FIELDS $0 represents the entire line of input$0 represents the entire line of input $1 represents the first field$1 represents the first field Print just like echoPrint just like echo
– Print $1 $2 # $1 concat $2Print $1 $2 # $1 concat $2– Print $1, $2 # $1 OFS $2Print $1, $2 # $1 OFS $2
cat fruit_pricescat fruit_prices
awk '{print;}' fruit_prices awk '{print;}' fruit_prices #prints all lines#prints all lines
awk '{print $0;}' fruit_prices awk '{print $0;}' fruit_prices #prints each entire line#prints each entire line
awk '{print $1;}' fruit_prices awk '{print $1;}' fruit_prices #prints first field in each line#prints first field in each line
awk '{print $2;}' fruit_prices awk '{print $2;}' fruit_prices #prints second field in each line#prints second field in each line
1414
ExamplesExamples
cat phones.datacat phones.dataJohn Robinson 234-3456John Robinson 234-3456Yin Pan 123-4567Yin Pan 123-4567
awk ‘{ print $1, $2, $3 }’ phones.dataawk ‘{ print $1, $2, $3 }’ phones.data John Robinson 234-3456John Robinson 234-3456
Yin Pan 123-4567Yin Pan 123-4567awk ‘{ print $2 “, ”, $1, $3 }’ phones.dataawk ‘{ print $2 “, ”, $1, $3 }’ phones.data Robinson, John 234-3456Robinson, John 234-3456 Pan, Yin 123-4567Pan, Yin 123-4567awk ‘/^$/ { print x += 1 }’ phones.dataawk ‘/^$/ { print x += 1 }’ phones.dataawk ‘/Mary/ { print $0 }’ phones.dataawk ‘/Mary/ { print $0 }’ phones.data
1515
Examples (con’t)Examples (con’t)
ls -l | awk ‘ls -l | awk ‘$6 == "Oct" { sum += $5 ; } $6 == "Oct" { sum += $5 ; } END { print sum ; }END { print sum ; }‘‘
ls -l | awk -f block_use.awkls -l | awk -f block_use.awk
cat block_use.awkcat block_use.awk$6 == "Oct" { sum += $5 ; } $6 == "Oct" { sum += $5 ; } END { print sum ; }END { print sum ; }
1616
Taking Pattern-specific ActionsTaking Pattern-specific Actions
#!/bin/sh#!/bin/sh
awk ‘awk ‘
/\$[1-9][0-9]*\.[0-9][0-9]*/ { print $0,”*”;}/\$[1-9][0-9]*\.[0-9][0-9]*/ { print $0,”*”;}
/\$0\.[0-9][0-9]*/ { print ;}/\$0\.[0-9][0-9]*/ { print ;}
‘ ‘ fruit_pricesfruit_prices
1717
Intrinsic variablesIntrinsic variables
awk defines RECORDS (lines) and FIELDSawk defines RECORDS (lines) and FIELDS– FS, input field separator (default=space/tab)FS, input field separator (default=space/tab)– OFS, output field separator (default=space)OFS, output field separator (default=space)– ORS, Output record separator (default=newline)ORS, Output record separator (default=newline)– RS, Input record separator (default=newline)RS, Input record separator (default=newline)– NR, number of the current record being processedNR, number of the current record being processed– NF, number of fields within current recordNF, number of fields within current record– FILENAME, awk sets this pattern to the name of the file FILENAME, awk sets this pattern to the name of the file
that it's currently reading. (If you have more than input that it's currently reading. (If you have more than input file, awk resets this pattern as it reads each file in turn.file, awk resets this pattern as it reads each file in turn.
1818
How does awk workHow does awk work
awk ‘{print $1, $3}’ namesawk ‘{print $1, $3}’ names– Put a line of input to $0 based on RSPut a line of input to $0 based on RS– The line is broken into fields based on FS and store The line is broken into fields based on FS and store
them in a numbered variable, starting with $1them in a numbered variable, starting with $1– Prints the fields with print or others based on OFS to Prints the fields with print or others based on OFS to
separate fieldsseparate fields– After awk displays it output, it goes to next line and After awk displays it output, it goes to next line and
repeat. The output lines are separated by ORS.repeat. The output lines are separated by ORS.
1919
Changing the Input Field SeparatorChanging the Input Field Separator
Manually resetting FS in a BEGIN patternManually resetting FS in a BEGIN pattern– Forces you to Forces you to hard codehard code the value of the field separator the value of the field separator– BEGIN{FS=“:” ; }BEGIN{FS=“:” ; }– Example: Example:
$ awk ‘BEGIN { FS=“:” ; } { print $1, $6 ; }’ /etc/passwd$ awk ‘BEGIN { FS=“:” ; } { print $1, $6 ; }’ /etc/passwd
Specifying the –F option to awkSpecifying the –F option to awk– awk –F: ‘ { … } ’awk –F: ‘ { … } ’– Enables using a shell variable to specify the field separator Enables using a shell variable to specify the field separator
dynamicallydynamically– Example:Example:
sep=‘:’sep=‘:’ $ awk –F$sep ‘ { print $1, $6 ; }’ /etc/passwd$ awk –F$sep ‘ { print $1, $6 ; }’ /etc/passwd
2020
ExampleExample
FirstName;LastName;Address;City;State;Zip;PhoneFirstName;LastName;Address;City;State;Zip;Phone SSN:DOB:NumberOfDependentsSSN:DOB:NumberOfDependents HospitilizationCOde,DentalCode,LifeCOdeHospitilizationCOde,DentalCode,LifeCOde
Convert this file format to:Convert this file format to: SSN,LastName,FirstName,Address,….SSN,LastName,FirstName,Address,….
2121
awk ‘BEGIN{OFS=“,”; FS=“;”}awk ‘BEGIN{OFS=“,”; FS=“;”}
{NR%3==1 {FS=“;”; #prepare{NR%3==1 {FS=“;”; #prepareF=$1; L=$2; A=$3;…..}F=$1; L=$2; A=$3;…..}
NR%3==2 {FS=“:”; SSN=$1;DOB=$2;…}NR%3==2 {FS=“:”; SSN=$1;DOB=$2;…}
NR%3==0{FS=“,”;…;print F L A…}NR%3==0{FS=“,”;…;print F L A…}
}’ filename}’ filename
2222
Print vs. Printf.2Print vs. Printf.2
printfprintf– 11stst argument is a string … the ‘format’ argument is a string … the ‘format’– Prints each character of the formatPrints each character of the format
Upon reaching a %, the next few characters are a format specifierUpon reaching a %, the next few characters are a format specifier The next argument is printed according to the specifierThe next argument is printed according to the specifier
– Does not append a newlineDoes not append a newline– More control over appearance of outputMore control over appearance of output– ConsiderConsider
awk 'BEGIN { printf "%5.2f\n", 2/3; }' awk 'BEGIN { printf "%5.2f\n", 2/3; }' Prints Prints 0.67 (here, the 0.67 (here, the represents a space) represents a space) %5.2f means print a fractional number (the ‘f’) in a field 5 %5.2f means print a fractional number (the ‘f’) in a field 5
characters wide, with 2 digits to the right of the decimal point.characters wide, with 2 digits to the right of the decimal point.
2323
Why PrintfWhy Printf printf - for formatting output of your printf - for formatting output of your
“print”“print” We have function print, why printfWe have function print, why printf
– Printf allows us to FORMAT stuff.Printf allows us to FORMAT stuff.– can FORCE printing of stringcan FORCE printing of string– DecimalsDecimals– whole numberswhole numbers– how many digits fall on either side of how many digits fall on either side of
decimal ptdecimal pt– scientific notationscientific notation– make things line up nicelymake things line up nicely
2424
printfprintf
printf (format, what to print)printf (format, what to print) printf ( "%s", x)printf ( "%s", x)
– %s is a PLACEHOLDER for some OUTPUT.%s is a PLACEHOLDER for some OUTPUT.– s is a specific type of output (string)s is a specific type of output (string)– ONE item (%s), must have ONE thing to print in the "what to print“ONE item (%s), must have ONE thing to print in the "what to print“– format inside of quotes, followed by comma, followed by variables format inside of quotes, followed by comma, followed by variables
outside the quotes to print.outside the quotes to print.
printf ( " s = %s ", x )printf ( " s = %s ", x )– "s=" is a LITERAL string"s=" is a LITERAL string
2525
Printf formatPrintf format
s = A character strings = A character string f = A floating point numberf = A floating point number d or i= the integer part of a decimal numberd or i= the integer part of a decimal number g or e = scientific notation of a floating point g or e = scientific notation of a floating point c = An ASCII characterc = An ASCII character if x=65 and I use this print statementif x=65 and I use this print statement printf ( " s = %c ", x )printf ( " s = %c ", x ) output is "s = A“output is "s = A“
awk 'BEGIN{x=65; printf("char: %c\n", x)}'awk 'BEGIN{x=65; printf("char: %c\n", x)}'
2626
PrintfPrintf
More control:More control:– %wd%wd
Print an integer out in a field of width wPrint an integer out in a field of width w If the number is smaller than w characters, print If the number is smaller than w characters, print
leading spacesleading spaces Try Try awk 'BEGIN { printf "%10d\n", 10; }' /dev/nullawk 'BEGIN { printf "%10d\n", 10; }' /dev/null
– Try to add a ‘-’ immediately after the %Try to add a ‘-’ immediately after the % Left justifies the value in the fieldLeft justifies the value in the field
2727
PrintfPrintf
%ws%ws– Print a string out in a field of width wPrint a string out in a field of width w– Supply leading spaces as necessarySupply leading spaces as necessary
Place a ‘-’ immediately after the % to get left Place a ‘-’ immediately after the % to get left justificationjustification
2828
PrintfPrintf
%w.df%w.df– Prints the value out in a field of width wPrints the value out in a field of width w– Places the decimal point d places from the right Places the decimal point d places from the right
endend– Place a ‘-’ immediately after the % to get left Place a ‘-’ immediately after the % to get left
justificationjustification
2929
Printf examplesPrintf examples Apple 10 20 25 Apple 10 20 25 <---10----><-5-><-5-><-5-><---10----><-5-><-5-><-5->
awk ‘{printf (" %10s %5d %5d %d ", $1, $2, $3, $4 )}’ fileawk ‘{printf (" %10s %5d %5d %d ", $1, $2, $3, $4 )}’ file
awk ‘{printf (" %-10s %5d %5d %d ", $1, $2, $3, $4 )}’ fileawk ‘{printf (" %-10s %5d %5d %d ", $1, $2, $3, $4 )}’ file
minus sign designates that this field will be LEFT JUSTIFIEDminus sign designates that this field will be LEFT JUSTIFIED
awk ‘{printf (" %-10s %-5d %-5d %d ", $1, $2, $3, $4 )}’ fileawk ‘{printf (" %-10s %-5d %-5d %d ", $1, $2, $3, $4 )}’ file
awk ‘{printf (“|%-15s|\n”, $1)}’awk ‘{printf (“|%-15s|\n”, $1)}’
3030
Printf examplesPrintf examples Let’s put an average in there...Let’s put an average in there...
printf (" %-10s %-5d %-5d %-5d %f ", $1, $2, $3, $4, average )printf (" %-10s %-5d %-5d %-5d %f ", $1, $2, $3, $4, average ) Will provide RAW number ( as many decimals as the Will provide RAW number ( as many decimals as the
calculation provides with 6 char’s to RIGHT of decimal)calculation provides with 6 char’s to RIGHT of decimal)
printf (" %-10s %-5d %-5d %-5d %.2f ", $1, $2, $3, $4, average )printf (" %-10s %-5d %-5d %-5d %.2f ", $1, $2, $3, $4, average )
%.2f says use TWO char's to RIGHT of decimal%.2f says use TWO char's to RIGHT of decimal
printf doesn't provide the newline automatically....printf doesn't provide the newline automatically.... printf (" %-10s %-5d %-5d %-5d %.2f \n ", $1, $2, $3, $4, average )printf (" %-10s %-5d %-5d %-5d %.2f \n ", $1, $2, $3, $4, average )
3131
The OFMT variableThe OFMT variable(stands for Output Formatting for (stands for Output Formatting for
numbers)numbers) A special awk variableA special awk variable Control the printing of numbers when using Control the printing of numbers when using
print functionprint function awk ‘BEGIN{print 1.243434534;}’awk ‘BEGIN{print 1.243434534;}’ awk ‘BEGIN{OFMT=“%.2f”; print awk ‘BEGIN{OFMT=“%.2f”; print
1.23344455;}’1.23344455;}’