10 The Awk Programming Language
Mauro Jaskelioff(originally by Gail Hopkins)
Introduction
• What is awk?• Command line syntax• Patterns and procedures• Commands• Variables
– Built in variables– Variable assignment
• Arrays• Defining functions
What is awk?
• A pattern matching program for processing files
• There are different versions of awk:– awk - the original version, sometimes called
old awk, or oawk– New awk - additional features added in 1984.
Often called nawk– GNU awk (gawk)- has even more features
• The version installed in unnc-cslinux is GNU awk 3.1.3
What does awk do?
• A text file is thought of as being made up of records and fields
• On this file you can:– Do arithmetic and string operations– Use loops and conditionals (if-then-else)– Produce formatted reports
What does awk do? (2)
• awk (new awk) also allows you to:– Execute UNIX commands from within a
script– Process the output from UNIX
commands– Work with multiple input streams– Define functions
What does awk do? (3)
• awk can also be combined with sed and shell scripting!– Shell is very easy and quick to write, but
it lacks functionality.– sed, awk and shell are designed to be
integrated• Simply invoke the sed or awk interpreter
from within the shell script, rather than from the command line!
awk Command Line Syntax
• From the command line, you can invoke awk in two ways:– awk [options] ‘script’ var=value file(s)
• Here, a script is specified directly from the command line
– awk [options] -f scriptfile var=value file(s)• Here, a script is stored in a scriptfile and
specified with the -f flag• nawk allows you to specify more than one
scriptfile at a time (-f scriptfile1 -f scriptfile2, etc.)
awk Command Line Syntax - assigning values to variables• You can assign a value to a variable on the
command line (nawk only):– This value can be one of three things:
• A literal, e.g. count=5– awk -f scriptFile count=5
• A shell variable, e.g. $count– awk -f scriptFile count=$count
• A command substitution, e.g. `cmd`– awk -f scriptFile count=`who | wc-l`
• The value is ONLY available after the BEGIN statement within the script is executed– To make the value available to BEGIN
statement:• awk -v count=5 -f scriptFile
awk Command Line Syntax - giving awk a file to operate on
• awk operates on one or more more files
• You do not have to give awk any files to operate on– Either don’t specify one– Or specify none using ‘-’
• awk -f scriptFile -
• If you don’t give awk a file to operate on it takes input from STDIN
awk Command Line Syntax - Field separators
• You can set a field separator– In other words, a symbol (or even a regular
expression in nawk) that should appear between fields of a record
• Do this using -F• E.g. awk –F’;’ –f scriptFile count=5 myFile
– Would look for fields in a record (or line) in myFile separated by a semi-colon
– Also awk –f scriptFile FS=’;’ count=5 myFile• Fields are referred to by the variables $1, $2,
etc.– $0 means the whole record
Field Separators - example
• Suppose you want to extract and print the first three (colon-separated) fields of each record in /etc/passwd, on separate lines
$ head /etc/passwdroot:x:0:0:root:/root:/bin/bashrootnir:x:0:0:root:/root:/bin/bashbin:x:1:1:bin:/bin:/sbin/nologindaemon:x:2:2:daemon:/sbin:/sbin/nologinadm:x:3:4:adm:/var/adm:/sbin/nologinlp:x:4:7:lp:/var/spool/lpd:/sbin/nologinsync:x:5:0:sync:/sbin:/bin/syncshutdown:x:6:0:shutdown:/sbin:/sbin/shutdownhalt:x:7:0:halt:/sbin:/sbin/haltmail:x:8:12:mail:/var/spool/mail:/sbin/nologin
Field Separators - example (2)
$ awk -F: '{print $1; print $2; print $3}' /etc/passwdrootx0rootnirx0binx1daemonx2admx3…
Look for fields separated by a colon
Print the first ($1), second ($2)
and third ($3) fieldLook in the file
/etc/passwd
Patterns and Procedures• awk scripts consist of patterns and
procedures:
• Patterns and procedures are optional– If a pattern is missing, the procedure applies to
all lines– If the procedure is missing, the matched line
(matched by pattern) is printed
awk -F: ‘/^...:/ {print $1}’ /etc/passwd
Pattern Procedure
Patterns
A pattern can be:• /regular expression/
– Use the metacharacters we have already seen– ^ and $ mean the beginning and end of a
string (e.g. the fields) NOT beginning/end of a lineawk -F: ‘/^...:/ {print $1}’ /etc/passwd
• Relational expression– Use relational operators, e.g. $1 > $2– Can do numeric or string comparisons
awk -F: ‘$1==“gdm” {print $0}’ /etc/passwd
Patterns (2)• Pattern-matching expression
– E.g. quoted strings, numbers, operators, defined variables…
– ~ means match, !~ means don’t matchawk -F: '$1 ~ /.dm.*/ {print $0}' /etc/passwd
• BEGIN– Specifies procedures that take place before
the first input line is processedawk ‘BEGIN {print “Version 1.0”}’ dataFile
• END– Specifies procedures that take place after
the last input record is readawk ‘END {print “end of data”}’ dataFile
Procedures
• Consist of one or more:– Commands– Functions– Variable assignments
• These are separated by newlines or semi-colons and are contained within curly brackets { }
Commands used with Procedures
• There are 5 types of commands:– Assignments of variables or arrays– Commands that print– Built-in functions– Control-flow commands– User-defined functions (in nawk only)
Some Examples usingPatterns and Procedures
awk –F: '{print $1}' /etc/passwd -print first field of each
line in /etc/passwd
awk '/root/' /etc/passwd-print all lines in
/etc/passwd that contain the pattern “root”
awk -F: '/root/ {print $1}' /etc/passwd -print first field of linesthat contain “root” in
/etc/passwd
awk ‘{print NR}’-print the number of
the current record
awk Built-in Variables
• awk has a number of built in variables:– FILENAME - current filename– FS - Field separator– NF - Number of fields in current record– NR - Number of current record– RS - Record separator– $0 - Entire input record– $n - nth field in current record
awk OperatorsSymbol Meaning$ Field reference
++ -- Increment, decrement
+ - ! Addition, subtraction, negation
* / % Multiplication, division, modulus
< <= > >= != == Relational operators
~ !~ Match regular expression and negation
In Array membership
&& || Logical and, Logical or
?: If-then-else for expressions
x == y ? “Equal” : “Not equal”
= += -= *= /= %= Assignment
Variable Assignments
• Assign variables with an =, E.g.:– FS = “:”– var1 = count+2– var2 = max-min– var3 = 2 < 3 ? 4 : 5
• Access variables using just the name– {print var3}
• What’s the result?
Arrays in awk
• awk has arrays with elements subscripted with strings (associative arrays)
• Assign arrays in one of two ways:– Name them in an assignment statement
• myArray[i]=n++
– Use the split() function• n=split(input, words, " ")
Reading elements in an array
• Using a for loop:
• Using the operator in:
• …use this to see if an index exists. (nawk)
for (item in array)print array[item]
if (index in array)...
Defining Functions in awk
• You can define your own functions in awk, in much the same way as you define a function in C or Java– Thus code that is to be repeated can be
grouped together inside a function– Allows code reuse!– NOTE: when calling a function you have
defined yourself, no space is allowed between the function name and the left bracket.
An Example using a Function and an Array
# capitalise the first letter of each word in a stringfunction capitalise(input){
result= ""n=split(input, words, " ")for (i=1; i <=n; i++){
w = words[i]w = toupper(substr(w, 1, 1)) substr(w, 2)if (i > 1)
result = result " "result = result w
}return result
} # this is the main program{ print capitalise($0) }
Break-down of Example
# capitalise the first letter of each word in a stringfunction capitalise(input){
…Variable to be used in function
- input contains whatever the caller called the function with
Break-down of Example (2)
…result= ""n=split(input, words, " ")
…
Set result to be an empty string
Take the input and split it up into the array “words” - divide the input wherever there is a space
n is the result returned by the split command and contains the number of elements in the array “words”
Break-down of Example (3)
…for (i=1; i <=n; i++){
w = words[i]w = toupper(substr(w, 1, 1)) substr(w, 2)if (i > 1)
result = result " "result = result w
}return result
}…
Assign element to w
For each element of array from 1 to the number of elements…
Tag a space on to the end of the result string
Tag the next word on to the end of the result string
Take remainder of string starting at 2nd character and append it to capitalised character
Take the substring which starts at the first character and has a length of 1 and capitalise using toupper()
Break-down of Example (4)
…# this is the main program{ print capitalise($0) }
This is a comment in awk
Call the capitalise function with the entire input record. Print the result.
Output from Example
• Given the input file:
• …our Capitalise function will output:
In theory there is no difference between theory and practice, but in practice there is
In Theory There Is No Difference Between Theory And Practice, But In Practice There Is
Summary
• An introduction to awk• Using awk patterns and procedures
on the command line• Writing awk scripts