Learning AWK

Post on 31-Dec-2015

59 views 7 download

description

Learning AWK. What is awk ?. scripting language used for manipulating data and generating reports created by: Aho, Weinberger, and Kernighan unlike other filters, it operates at the field level and can easily access, transform and format individual fields in a line. - PowerPoint PPT Presentation

transcript

UNIX

IBM India Private Limited

© 2012 IBM Corporation

Learning AWK

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

WHAT IS AWK? scripting language used for manipulating data and

generating reports created by: Aho, Weinberger, and Kernighan unlike other filters, it operates at the field level and

can easily access, transform and format individual fields in a line.

awk programs are based on the idea of pattern and action; the program scans a document looking for a pattern and when found it performs the action.

awk never modifies the input file.

2

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

WHAT CAN YOU DO WITH AWK? awk operation:

scans a file line by line splits each input line into fields compares input line/fields to pattern (field

matching is implemented in only awk and perl) performs action(s) on matched lines

Useful for: transform data files produce formatted reports

Programming constructs: format output lines arithmetic and string operations conditionals and loops 3

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

THE COMMAND: AWK

4

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

BASIC AWK SYNTAX awk [options] ‘script’ file(s)

E.g. awk –F: ‘/search/ {print $0}’ file1

awk [options] –f scriptfile file(s) E.g. awk –F: -f ip.awk file1

Options:

-F to change input field separator

-f to name script file

5

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

BASIC AWK PROGRAM

consists of patterns & actions:

awk ‘pattern {action}’ file(s)

if pattern is missing, action is applied to all lines awk '{print}' datafile prints all lines in datafile

if action is missing, the matched line is printed awk '/for/' testfile prints all lines containing

string “for” in testfile

must have either pattern or action

6

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

BASIC TERMINOLOGY: INPUT FILE A field is a unit of data in a line Each field is separated from the other fields by the field separator

default field separator is whitespace A record is the collection of fields in a line A data file is made up of records

7

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

BUFFERS

awk supports two types of buffers:

record and field

field buffer: one for each fields in the current record. names: $1, $2, …

record buffer : $0 holds the entire record (print and print $0 are

same)

8

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

SOME SYSTEM VARIABLESFS Field separator (default=whitespace)RS Record separator (default=\n)NF Number of fields in current recordNR Number of the current recordOFS Output field separator (default=space)ORS Output record separator (default=\n)FILENAME Current filename$0 Entire input record$n nth record field.ARGC Number of arguments on command line. ARGV An array containing the command-line arguments.

9

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

EXAMPLE: RECORD NUMBER - NR

% cat empsTom Jones 4424 5/12/66 543354Mary Adams 5346 11/4/63 28765Sally Chang 1654 7/22/54 650000Billy Black 1683 9/23/44 336500

% awk '{print NR, $0}' emps1 Tom Jones 4424 5/12/66 5433542 Mary Adams 5346 11/4/63 287653 Sally Chang 1654 7/22/54 6500004 Billy Black 1683 9/23/44 336500

10

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

EXAMPLE: SPACE AS DEFAULT FIELD SEPARATOR - FS

% cat empsTom Jones 4424 5/12/66 543354Mary Adams 5346 11/4/63 28765Sally Chang 1654 7/22/54 650000Billy Black 1683 9/23/44 336500

% awk '{print NR, $1, $2, $5}' emps1 Tom Jones 5433542 Mary Adams 287653 Sally Chang 6500004 Billy Black 336500

11

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

EXAMPLE: COLON AS FIELD SEPARATOR - FS

% cat em2

Tom Jones:4424:5/12/66:543354

Mary Adams:5346:11/4/63:28765

Sally Chang:1654:7/22/54:650000

Billy Black:1683:9/23/44:336500

% awk -F: '/Jones/{print $1, $2}' em2

Tom Jones 4424

12

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

EXAMPLE: MULTIPLE FIELD SEPARATORS% cat d1.txt

1|NE|20-JAN-2012

2|DE|02-FEB-2012

3|PE|12-MAR-2012

% awk -F"[|-]" '{print $1,$2,$4}' d1.txt1 NE JAN

2 DE FEB

3 PE MAR

awk '{FS="[|-]" ;print $1,$2,$4}' d1.txt

13

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

EXAMPLE: OFS File: gradesjohn 85 92 78 94 88 andrea 89 90 75 90 86 jasper 84 88 80 92 84

% awk '{OFS="-";print $1 , $2}' grades

john-85

andrea-89

Jasper-84

14

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK SCRIPTS awk scripts are divided into three major

parts:

comment lines start with #15

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK SCRIPTS BEGIN: pre-processing (optional)

performs processing that must be completed before the file processing starts (i.e., before awk starts reading records from the input file)

useful for initialization tasks such as to initialize variables and to create report headings

BODY: Processing contains main processing logic to be applied to input records like a loop that processes input data one record at a time:

if a file contains 100 records, the body will be executed 100 times, one for each record

END: post-processing (optional) contains logic to be executed after all input data have been

processed logic such as printing report grand total should be

performed in this part of the script 16

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

PATTERN / ACTION SYNTAX

17

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

CATEGORIES OF PATTERNS

18

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

EXAMPLE: SIMPLE PATTERN

% cat employees2

Tom Jones:4424:5/12/66:543354

Mary Adams:5346:11/4/63:28765

Sally Chang:1654:7/22/54:650000

Billy Black:1683:9/23/44:336500

(find those recs which contain 00 at the end)

% awk –F: '/00$/' employees2

Sally Chang:1654:7/22/54:650000

Billy Black:1683:9/23/44:336500

19

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

EXAMPLE: SIMPLE PATTERN% cat datafile

northwest NW Charles Main 3.0 .98 3 34

western WE Sharon Gray 5.3 .97 5 23

southwest SW Lewis Dalsass 2.7 .8 2 18

southern SO Suan Chin 5.1 .95 4 15

southeast SE Patricia Hemenway 4.0 .7 4 17

eastern EA TB Savage 4.4 .84 5 20

northeast NE AM Main 5.1 .94 3 13

north NO Margot Weber 4.5 .89 5 9

central CT Ann Stephens 5.7 .94 5 13

(find those records which have .7 at the end in 5th field)

% awk '$5 ~ /\.[7-9]+/' datafile

southwest SW Lewis Dalsass 2.7 .8 2 18

central CT Ann Stephens 5.7 .94 5 13 20

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

EXAMPLES: SIMPLE PATTERN% awk '$2 !~ /E/{print $1, $2}' datafilenorthwest NWsouthwest SWsouthern SOnorth NOcentral CT

(those records which start with n or s)% awk '/^[ns]/{print $1}' datafilenorthwestsouthwestsouthernsoutheastnortheastnorth

21

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

RANGE PATTERNS Matches ranges of consecutive input lines

Syntax:pattern1 , pattern2 {action}

pattern can be any simple pattern pattern1 turns action on pattern2 turns action off

22

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

RANGE PATTERN EXAMPLE

23

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

POSITIONAL PARAMETERS

24

Positional parameters in awk are represented as $1, $2, $3 and so forth.

Shell also uses identical parameters to represent the command line arguments, so these in awk have to be placed in ‘single quotes’.

Example:

% script1.sh 400

inside the shell script, awk can access parameter like:

$3 > ‘$1’ (instead of $3 > 400)

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

getline : MAKING awk INTERACTIVE

25

Usage : getline var1 < "/dev/tty" Example: cat test1.awk BEGIN { printf "Enter the salary :"

getline sal < "/dev/tty" }

$8 > sal {printf "Employee %s has salary above %d\n", $3, sal }

=> cat datafilenorthwest NW Charles Main 3.0 .98 3 34

western WE Sharon Gray 5.3 .97 5 23

southwest SW Lewis Dalsass 2.7 .8 2 18

southern SO Suan Chin 5.1 .95 4 15

=> awk -f test1.awk datafileEnter the salary :20 Interactive behavior

Employee Charles has salary above 20

Employee Sharon has salary above 20

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

ARITHMETIC OPERATORS

Operator Meaning Example

+ Add x + y

- Subtract x – y

* Multiply x * y

/ Divide x / y

% Modulus x % y

^ Exponential x ^ y

Examples:% awk '$3 * $4 > 500 {print $0}' file

Calculate total file size in a directory:% ls -ltr | awk ‘BEGIN {print "Calculating total file size"} {x=x+$5} END { print "total bytes: " x }’

Calculating total file size

total bytes: 11915

26

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

RELATIONAL OPERATORS

Operator Meaning Example

< Less than x < y

< = Less than or equal x < = y

== Equal to x == y

!= Not equal to x != y

> Greater than x > y

> = Greater than or equal to x > = y

~ Matched by reg exp x ~ /y/

!~ Not matched by req exp x !~ /y/

27

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

LOGICAL OPERATORS

Operator Meaning Example

&& Logical AND a && b

|| Logical OR a || b

! NOT ! a

Examples:% awk '($2 > 5) && ($2 <= 15)

{print $0}' file

% awk '$3 == 100 || $4 > 50' file

28

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK ACTIONS

29

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK VARIABLES

Format: variable = expression

Examples:% awk '$1 ~ /Tom/ {wage = $3 * $4; print wage}' filename

% awk '$4 == "CA" {$4 = "California"; print $0}‘ filename

30

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK ASSIGNMENT OPERATORS

= assign result of right-hand-side expression to left-hand-side variable

++ Add 1 to variable-- Subtract 1 from variable+= Assign result of addition-= Assign result of subtraction*= Assign result of multiplication/= Assign result of division%= Assign result of modulo^= Assign result of exponentiation

31

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK EXAMPLE File: gradesjohn 85 92 78 94 88 andrea 89 90 75 90 86 jasper 84 88 80 92 84

awk script: average.awk# average five grades { total = $2 + $3 + $4 + $5 + $6 avg = total / 5 print $1, avg }

Run as: awk –f average grades 32

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

OUTPUT STATEMENTS

print

print easy and simple output

printf

print formatted (similar to C printf)

sprintf

format string (similar to C sprintf)

33

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

FUNCTION: PRINT Writes to standard output Output is terminated by ORS

default ORS is newline If called with no parameter, it will print $0 Printed parameters are separated by OFS,

default OFS is blank Print control characters are allowed:

\n \f \a \t \\ …

34

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

PRINT EXAMPLE

% awk '{print}' gradesjohn 85 92 78 94 88andrea 89 90 75 90 86

% awk '{print $0}' gradesjohn 85 92 78 94 88andrea 89 90 75 90 86

% awk '{print($0)}' gradesjohn 85 92 78 94 88andrea 89 90 75 90 86

35

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

PRINT EXAMPLE

% awk '{print $1, $2}' grades

john 85

andrea 89

% awk '{print $1 "," $2}' grades

john,85

andrea,89

% awk '{OFS="-";print $1 , $2}' grades

john-85

andrea-8936

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

REDIRECTING PRINT OUTPUT Print output goes to standard output unless

redirected via:> “file”>> “file”| “command”

Example:

% awk '{print $1 , $2 > "file"}' grades

% cat filejohn 85

andrea 89

jasper 8437

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

REDIRECTING OUTPUT EXAMPLEo Remove only files

=> ls –l| awk '$1!~/^drwx/{print $9}'|xargs rm

o Kill a process

=> find / -name abc.txt -print 2>/dev/nullu807735 21626990 20119632 7 00:41:04 pts/3 0:07 find / -name abc.txt -print

=> kill -9 `ps -ef | awk '$0 ~ /-name abc.txt/ {print $2}'`

38

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

PRINT EXAMPLE

% awk '{print $1,$2 | "sort"}' grades

andrea 89

jasper 84

john 85

% awk '{print $1,$2 | "sort –k 2"}' grades

jasper 84

john 85

andrea 89

39

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

PRINT EXAMPLE

% date

Wed Nov 19 14:40:07 CST 2008

% date |

awk '{print "Month: " $2 "\nYear: ", $6}'

Month: Nov

Year: 2008

40

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

PRINTF: FORMATTING OUTPUT

Syntax:

printf(format-string, var1, var2, …)

works like C printf each format specifier in “format-string” requires

argument of matching type

41

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

FORMAT SPECIFIERS

%d decimal integer

%c single character

%s string of characters

%f floating point number

%o octal number

%x hexadecimal number

%e scientific floating point notation

%% the letter “%”

42

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

SPRINTF: FORMATTING TEXT

Syntax:sprintf(format-string, var1, var2, …)

Works like printf, but does not produce output Instead it returns formatted string

Example:{

text = sprintf("1: %d – 2: %d", $1, $2)

print text

}

43

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK built-in functions for STRING manipulation

tolower(string) Example: tolower("MiXeD cAsE 123")

returns "mixed case 123"

toupper(string) returns a copy of string, with each lower-case character converted to

upper-case.

index(input_string,find_string) This searches the string input_string for the first occurrence of the

string find_string, and returns the position in characters where that occurrence begins in the string input_string. For example:

awk 'BEGIN { print index("peanut", "an") }'

prints `3'. If find_string is not found, index returns 0.

44

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK built-in functions for STRING manipulationlength(string) This returns the number of characters in string.

substr(string, start, length) Returns the substring in “string” startng from position “start” upto

position “length”.

split(string, array, fieldsep) divides string into pieces separated by fieldsep, and stores the

pieces in array if the fieldsep is omitted, the value of FS is used.

Example: split("auto-da-fe", a, "-")

sets the contents of the array a as follows:

a[1] = "auto"

a[2] = "da"

a[3] = "fe" 45

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK built-in functions for STRING manipulationsub(search_string, replacement_string, source) This function searches the ‘source’ for first occurrence of

‘search_string’ and replaces it with ‘replacement_string’.

Example:

str = "water, water, everywhere"

sub(/at/, "ith", str)

sets str to "wither, water, everywhere"

Example: awk '{ sub(/candidate/, "& and his wife"); print }' filename

changes the first occurrence of ‘candidate' to ‘candidate and his wife‘

on each input line in the file.

gsub is similar to sub but has ‘global’ effect, i.e. it makes

replacement in whole file.46

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK EXAMPLE: LIST OF PRODUCTS103:sway bar:49.99 101:propeller:104.99 104:fishing line:0.99 113:premium fish bait:1.00 106:cup holder:2.49 107:cooler:14.89 112:boat cover:120.00 109:transom:199.00 110:pulley:9.88 105:mirror:4.99 108:wheel:49.99 111:lock:31.00 102:trailer hitch:97.95

47

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK EXAMPLE: OUTPUTMarine Parts R UsMain catalogPart-id name price======================================101 propeller 104.99102 trailer hitch 97.95103 sway bar 49.99104 fishing line 0.99105 mirror 4.99106 cup holder 2.49107 cooler 14.89108 wheel 49.99109 transom 199.00110 pulley 9.88111 lock 31.00112 boat cover 120.00113 premium fish bait 1.00======================================Catalog has 13 parts

48

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK EXAMPLE: COMPLETEBEGIN { FS= ":" print "Marine Parts R Us" print "Main catalog" print "Part-id\tname\t\t\t price" print "======================================"}{ printf("%3d\t%-20s\t%6.2f\n",$1,$2,$3) | "sort" count++}END { print "======================================" print "Catalog has " count " parts"}

49

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK ARRAY awk allows one-dimensional arrays to store strings or

numbers index can be number or string array need not be declared, its declared when used array elements are created when first used

initialized to 0 or “”

50

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

ARRAYS IN AWK

Syntax:

arrayName[index] = value

Examples:

list[1] = "one"

list[2] = "three"

list["other"] = "oh my !"

51

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

ILLUSTRATION: ASSOCIATIVE ARRAYS awk arrays can use string as index

52

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

DELETE ARRAY ENTRY The delete function can be used to delete an element

from an array.

Format:delete array_name [index]

Example:delete deptSales["supplies"]

53

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

AWK CONTROL STRUCTURES Conditional

if-else

Repetition for while do-while also: break, continue, next

54

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

IF STATEMENT

Syntax:if (conditional expression)

statement-1

else

statement-2

Example: for $6 > 1200 {

This can also be written as below in term of if:awk ‘{if ( $6 > 1200)

print $2;

else

print $3}’ filename 55

IBM India Private Limited

04/19/23 © 2012 IBM Corporation

56

Awk limitations

http://docstore.mik.ua/orelly/unix/sedawk/ch10_08.htm

http://balazsdeak.blogspot.in/2010/02/solaris-awk-max-record-size-is-2559-2-8.html