+ All Categories
Home > Documents > Perl scripting

Perl scripting

Date post: 22-Feb-2016
Category:
Upload: kato
View: 33 times
Download: 0 times
Share this document with a friend
Description:
Perl scripting. Computer Basics. CPU. CPU, RAM, Hard drive CPU can only use data in the register directly. RAM. HARD DRIVE. Computer languages. Machine languages: binary code directly taken by the CPU. Usually CPU model specific. Fast. - PowerPoint PPT Presentation
16
PERL SCRIPTING
Transcript
Page 1: Perl scripting

PERL SCRIPTING

Page 2: Perl scripting

COMPUTER BASICS• CPU, RAM, Hard drive• CPU can only use data

in the register directly

CPU

RAM

HARDDRIVE

Page 3: Perl scripting

COMPUTER LANGUAGES• Machine languages: binary code directly taken by the CPU.

Usually CPU model specific. Fast.• Assembly language: mapping binary code to three-letter

instructions; Platform-dependent. Fast• High-level language: “human-like” syntax, often non-CPU

dependent. Compiled into machine code before use. Fast. E.g. C, C++, Fotran, Pascal, Basic.

• Scripting language: usually not compiled into binary code. Interpreted and executed on request. Slow. E.g. Perl, Php, Python Javascript, Bash script,Ruby

• Byte-code language: source code converted to platform independent, intermediate code for rapid compilation. Java, Microsoft .NET. Speed intermediate.

Page 4: Perl scripting

TWO ELEMENTS OF A PROGRAM

• Data structure & Algorithm• Different data structures may have corresponding,

well optimized algorithms for information processing and extraction. (computer science)

• For example: Inserting (algorithm) a node (data structure) in a linked list (data structure).

Page 5: Perl scripting

BASIC TYPES• Bit: 1 bit has 2 states, 1 or 0• 1 Byte = 8 bits, i.e. max(1 Byte) = (binary)11111111

= 255• Characters in the ASCII encoding can be encoded by

1 byte. In C, data type byte is in fact written as “char”

• Byte is the smallest unit of storage.• Boolean (true/false) theoretically takes only 1 bit, but

in reality it takes 1 Byte. • How many Boolean states can you store using 1

byte?

Page 6: Perl scripting

BASIC TYPES• Integer: 32 bit, signed -216 + 1 ~ +216 - 1; unsigned +232 -1• Long integer: 64 bit.• Float: 32 bit. 24bit for significand, the rest for the exponent. • Float point numbers could lose precision, try this in perl:• print 0.6/0.2-3;• Correct way:• sub round {

• my($n) = @_;• return int($n + $n/abs($n*2));

• }• print round(0.6/0.2)-3;

Page 7: Perl scripting

POINTERS / REFERENCE• Pointers (or reference in other languages) are

essentially an integer.• This integer stores a memory address. • This memory address refers to another variable.• http://perldoc.perl.org/perlref.html

Page 8: Perl scripting

COMPLEX TYPES• Set: unordered values.• Array (vector): a set of ordered values of the same

basic type.• Index starting from 0 in most langs, last index = length

-1• Hash: key => value pairs. Key must be unique. Array

can be thought of as a special Hash where key values are ordered, consecutive integers.

• String * : in C, a string is simply an array of characters. In many other languages, strings are treated as a “basic type”. Most algorithms for arrays also works for strings.

Page 9: Perl scripting

COMPLEX TYPES• Classes: objected-oriented programming• A class packages related data of different datatypes,

as well as algorithms associated with them into a nice blackbox for you to use.

• Objected-oriented programming.

Page 10: Perl scripting

PERL• PERL lumps all “basic types” as “Scalar”, “$”• PERL interpreter decides on what it “looks like”• Convenient, but sometimes problematic, especially when you

parse in a user-provided data file.• Arrays, definition: @, reference $.• Hash, definition: %, reference $• RegExp• Handlers.• use strict;• PERL has an ugly grammar.• PERL has many short-cuts, such as $_• DO NOT USE THEM!

Page 11: Perl scripting

FLOW CONTROL• for, foreach, while, unless, until, if elsif else• http://

perldoc.perl.org/perlsyn.html#Compound-Statements

Page 12: Perl scripting

FUNCTIONS (SUBROUTINES)

• Traditionally, “subroutines” do not accept parameters• Function is a better term, but b/c perl is ugly so it continues to use

sub.• sub functionname {• my($param1, $param2) = @_; #get the parameters• return xxxx.• }• Call: functionname($param1, $param2);• I prefix all private functions with “fn”. But you don’t need to do that.• However, capitalize first letter of each word!• Use Verb + Noun phrases as function names• fnGetFileName(), fnDownloadPicture.

Page 13: Perl scripting

HOW TO NAME VARIABLES

• Variable names should reflex their basic types.• Descriptive names should be given, with each word capitalized• I use the c-style prefix on them

Type prefix Exp.bool b $bGenomeLoadedinteger n $nLenfloat n/f $fAlleleFreqstring s $sInFileFile Handler h $hInFilearray arr $arrLocihash arr $arrGeneIDconstant ALLCAPS MAX_LINE

Page 14: Perl scripting

1. Start with the DNA sequence:  ATGGAAATGGAGAGGCCTCTGCAAATGATGCCGGATTGTTTCAGACATATAGAAATGTCT,

   report its length and check if its length can be divided by 3, also check if it's a valid DNA sequence. If check fails, do not continue.

2. Translate it into Peptide sequences using universal codon table.

3. Display it on screen in the following format where DNA is on first line, translated amino acids aligns with the middle letter at each codon at the second line:

4. This DNA sequence goes through generation after generation of replication.

5. At each replication, it has a user-specified probability (0-1) of single-nucleotide mutation. This mutational probability is specified through the command line.

Page 15: Perl scripting

6. If mutation happens, 1 random letter in the DNA will be changed to A,T,C or G with equal probability. It's okay if the letter "changes" to the same letter.

7. Display at each generation the DNA and protein sequence as described in step 3, also display the generation.

8. Check if a stop codon has occured at each generation. If so the protein has lost its function, stop the evolution and output the generation at which the stop codon occurs.

9. This program should be able to deal with DNA sequence with upper or lowercase letters.

Page 16: Perl scripting

• Create a shell script called getdistr.sh1. Run the simulation mutation.pl for 1000 times with

mutational probabilities of 0.01, 0.1 and 0.5 respectively

2. Collect all DNA and protein sequence outputs to dist_$mutationprob.log

3. Collect the stopping generation at which stop codon first occurs in dist_$mutationprob.txt

4. Use R to plot dist_0.01.txt, dist_0.1.txt and dist_0.5.txt on a histogram (each parameter with different colors). X axis should be log10(Generation).


Recommended