Date post: | 29-Nov-2014 |
Category: |
Technology |
Upload: | giovanni-dallolio |
View: | 1,582 times |
Download: | 1 times |
Programming for Evolutionary BiologyMarch 17th - April 1st 2012
Leipzig, Germany
Introduction to Unix systemsExtra: awk and gawk
Giovanni Marco Dall'OlioUniversitat Pompeu Fabra
Barcelona (Spain)
awk
“awk” is a “swiss army” command line tool to manipulate tabular files
Things you can do with awk: Extract all the lines of a file that match a pattern, and
print only some of the columns (instead of “grep | cut”)
Add a prefix/suffix to all the element of a column (instead of “cut | paste”)
Sum values of different columsn
awk and gawk
In these slides we will be talking about awk In reality, the original awk is not available
anymore. We will use gawk, a free version of gawk developed by the GNU project
Basic awk usage
“awk '<pattern to select lines> {instructions to be executed on each line}' ”
Example awk usage
“awk '$0 ~ AAC {print}' sample_vcf.vcf” $0 ~ AAC select all the lines that contain AAC→
{print} for each line that matches the previous →expression, print it
Column names in awk
awk assumes that you are working on tabular files Each column of the file can be accessed by
$<columnname>. For example, $2 is the second column of the file
$0 matches all the columns of the file
Accessing columns in awk
“awk '{print $1, $2, $3}' sample_vcf.vcf” prints →the first three columns
“awk '{print $0}' sample_vcf.vcf” print all the →columns
Adding a prefix to a column with awk
A common awk usage is to add a prefix or suffix to all the entries of a column
Example: awk '{print $2 “my_prefix”$2}' myfile.txt
Summing columns in awk
If two columns contain numeric values, we can use awk to sum them
Usage: “awk '{print $1 + $2}' myfile.txt
Selecting columns with awk
Awk can be used to select columns, It is like grep, but more powerful, because it let you
specify on which columns the match must be This example will print all the lines that have a
AAC in their first colum: “awk '$1 ~ AAC {print}' myfile.txt
More on awk
awk is a complete programming language It is the equivalent of a spreadsheet for the
command line If you want to know more, check the book “Gawk
effective AWK Programming” at http://www.gnu.org/software/gawk/manual