1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And...

Post on 28-Dec-2015

215 views 0 download

Tags:

transcript

1

Introduction to Matlab & Data Analysis

Tutorials 8 and 9: Cell Arrays

Advanced Text Processing And File Handling

Please change directory to directory E:\Matlab (cd E:\Matlab;)

From the course website

(http://www.carine.co.il/htmls/page_1176.aspx?c0=13889&bsp=14333&bssearch=4,0,5,3,41,0

)

Download:

t89.zip and unzip itWeizmann 2010 ©

2

Outline

2

Cell arrays: Creating and indexing Useful functions for strings lists

Structures Advanced string manipulation

Regular expressions File handling

Reading files Writing to files High-level file handling functions

Final example – P53

3

Cell Arrays – Lecture Reminders Cell arrays –

Used for keeping different types of data in the same array

For example: A{1}= 2; A{2}= 4:2:44; A{3}= ‘hello’;

Extremely useful for handling lists of strings

Notice the curly brackets

2 4:2:44 hello

Cell Cell Cell Cell Array

4

Creating Cell Arrays – Lecture Reminder

A(1) = {3}; A{2} = 3; A{3} = ‘radio blabla’; A{4} = 2:2:66;B(1:3) = {3, [1, 2], ’abc’};

C = {‘george clooney’ ; … ‘richard gere’ }; %Initializing an empty cell array:

D=cell(4,2);

>>A‘ans = [ 3][ 3]

' radio blabla'[ 1x33 double]

C = ' george clooney'

' richard gere'

D = ][ ][ ][ ][ ][ ][ ][ ][

5

Indexing Cell Arrays Define a cell array:>> A(1) = {3};>> A{2} = 3;>> A{3} = ‘radio blabla’;

>> A{4} = 2:2:66; (or load A.mat;)

What is the difference?A(1)

A{1}

>>x=A(1) >>class(x)

>>x=A{1}>> class(x)

>>x=A(3)>> class(x)

>>x=A{3} >>class(x)

x = [3]cellx = 3doublex = 'radio blabla'cellx = radio blablachar

3 [1,2,7] ‘Str’

Cell Cell Cell Cell ArrayTry:

6

Manipulating Cell arraysJust like numerical arrays…Examples:x([1,3,5]) = {'aaa','bbb','ccc'}x = repmat(x,2,3)x(:,4)x(1:2,3:5)

% Notice:% Using curly brackets returns couple of cells

[a, b]=x{1:2}

Numerical array default value is zero, in cell array it is []

7

Cell Arrays Are Very Useful For Keeping Lists of Strings

Cell arrays of strings can be treated similarly to numerical arrays.

Many functions can work both numerical & cell arrays Many functions which work on strings can handle cell

arraysload fruit.mat;%fruit={‘mango’,’banana’,’melon’,’apple’,’kiwi’,’orange’};%fruit_prices=[30 15 10 5 35 8]; Find what is the price of melon?ind = find(strcmp(fruit,’melon’));fruit_prices(ind) Sort the fruits from cheapest to most expensive[sorted_p,y]=sort(fruit_prices);fruit(y)

ans = 10

{‘apple‘,’orange‘,’melon‘,’banana‘,’mango‘,’kiwi‘}

8

Manipulating Cell Arrays That Hold Lists Of Strings

unique

intersect

setdiff

union

9

Manipulating Cell Arrays That Hold Lists Of Strings - Example

%fruit={‘mango’,’banana’,’melon’,’apple’,’qiwi’,’orange’};

%fruit_sales={‘mango’,’banana’,’melon’,…

’mango’,’mango’,’qiwi’,’banana’,’mango’};

Which fruits were not sold today?setdiff(fruit,unique(fruit_sales))

{'apple‘,'orange‘}

For efficiency

10

ismember Function Is Useful For Mapping One List To Another

Finds if an element exists in a list>> b = {‘z’,’y’,’x’,’w’};>> a = ismember(‘x’,b)a = 1

If it does – ismember can tell you where it is>>[a,map]= ismember(‘x’,b)a=1, map=3

ismember is good for mapping one list to another – when order is important! >>[a,map]= ismember({‘x’,’y’,‘c’},b);a=[1 1 0], map=[3 2 0]

11

Comparing Two Lists of Strings:ismember, find and intersect

Which function to use? I want to find the order of

elements of one list in another list?

ismember I want to find which elements

of a list are also in another list?

intersect I want to find all the

occurrences of an element in a list?

find

When the element appears in the list more than once, ismember will return only the last position

12

Using ismember - Example

>> a = ismember(‘banana’, fruit_sales)a=1>> a = ismember(‘orange’, fruit_sales)a=0>> a = ismember(fruit, fruit_sales);a = [1 1 1 0 1 0]% Reminder: fruit_prices = 30 15 10 5 35 8

Example: calculate the amount of money made by each fruit sale

>> [a,b]= ismember(fruit_sales, fruit);a = [1, 1, 1, 1, 1, 1, 1, 1]b = [1, 2, 3, 1, 1, 5, 2, 1]

>> sales_money = fruit_kilos .* fruit_prices(b)sales_money = [90, 30, 10, 60, 240, 17.5, 45, 150]

13

Structures

14

Lecture Reminder - Structures Creation

>> dogs.name = 'rufus';>> dogs.breed = 'Bulldog';>> dogs.age = 1.5; % in years>> dogs.special_food = 'none';>> dogsdogs =

name: 'rufus' breed: 'Bulldog ' age: 1.5000 special_food: 'none‘

14

15

Lecture Reminder - Structures creation

Adding more dogs…>> dogs(2).name = 'king-kong';>> dogs(2).breed = ‘Chihuahua';>> dogs(2).age = 5; >> dogs(2).special_food = 'filet mignon';

>> dogs(3).name = 'wong';>> dogs(3).breed = 'pekingese';>> dogs(3).age = 20; >> dogs(3).special_food = 'sushi';

>> dogs =

1x3 struct array with fields: name breed age special_food

15

16

Structures – Short Example

Define a “fruits” structure array that has the fields: name price color

and contains two fruits of your choice

Get: Cell array of the names Array of the prices The first fruit

>> fruits(1).name = 'Lemon';>> fruits(1).color = 'Yellow';>> fruits(1).price = 20; >> fruits(2).name = 'Apple';>> fruits(2).color = 'Green';>> fruits(2).price = 10;

>> {fruits.name}'Lemon' 'Apple'>> [fruits.price]20 10>> a = fruits(1)a = name: 'Lemon' color: 'Yellow' price: 20

17

Structure Advertisement

Although this tutorial focuses on cells:

Using Structures to aggregate variables that belong to the same entity makes the program easier to design, more readable and easier to debug.

18

Advanced Text processing (String Manipulation)

1. Review of useful functions:1. findstr, strfind, strtok, strtrim2. sprintf

2. Regular expressions

19

Review of Useful Functions For String Manipulation

So far we learned simple string manipulations: str2num, num2str strcmp, strncmp, strcmpi, strncmpi

More advance string manipulation functions (used in text processing): findstr, strfind strtok strtrim sprintf (related functions: fprintf, sscanf)

20

Finding One String Inside Another - findstr and strfind

findstr(str1,str2) – Searches the longer of the two input

strings for any occurrences of the shorter string (input order does not matter!):

>> k = findstr('beauty is in the eyes of the beholder','be')

k=[1, 30]

strfind(str1,str2) The order matters: finding str2 inside

str1 str1 can be a cell array of strings!!!

23

Consider the line ‘this is an example’ How we write a program that breaks it to a

cell array of single words?rem=‘this is an example’;

words=cell(0);

while 1

[tok,rem] = strtok(rem);

if isempty(tok)

break;

end

words{end+1}=tok;

end

Example –Parsing a Line Using strtok

words'

ans =

'this' 'is' 'an' 'example'

25

load fruit.mat;for i=1:length(fruit) s = sprintf('Fruit number %d: %s', i, fruit{i}); disp(s);end

sprintf – Write Formatted Data Into Strings

Fruit number 1: mangoFruit number 2: bananaFruit number 3: melonFruit number 4: appleFruit number 5: qiwiFruit number 6: orange

Number String

sprintf(format,…) – write formatted data into strings

Good for creating massages for disp Related functions: fprintf, sscanf

format special characters: %s – a string %d – an integer %f – a float (short double)

26

sprintf - Example Consider the cell arraynames = {'Danny', 'Noa', 'Moti'}; Write a script that prints:Number:1, Name:Danny.Number:2, Name:Noa.Number:3, Name:Moti. Answer:for i=1:length(names) s = sprintf('Number:%d, Name:%s.',…

i, names{i}); disp(s);end

See also: sscanf & fprintf

27

More Useful String Manipulation Functions

strtrim(str) – removes all leading and trailing white-space>> strtrim(' do not blink ')'do not blink‘

strtok(str,delim) - breaks a string into “tokens”>> [tok,rem]=strtok('this is an example', ' ')

tok =‘this’ rem = ‘ is an example’ strfind (str1,str2) - searches str2 in str1. str1 can be a cell array of strings! >> k = strfind('beauty is in the eyes of the

beholder','be') k=[1, 30] findstr(str1,str2) – Searches the longer of the two input

strings for any occurrences of the shorter string More useful functions at:

Help -> Matlab -> Functions by category -> Strings functions

28

Regular expressions

29

Regular Expression - Definition

Wikipedia – Regular expressions provide a concise and flexible means for identifying strings of text

of interest, such as particular characters, words, or patterns of characters.

ind = regexp(long_str,'\w+ain')

Regular expressions

We need to learn the regular expressions “language” syntax

30

Regular Expressions Syntax

Defining a pattern: [] is like OR

Any character out of a,b,c or d: [abcd] Anything other than a,b,c or d : [^abcd]

Character range: (all characters a to z) [a-z] Special Charecters used in defining a pattern:

Any character: . Whitespace: \s Newline: \n Tab: \t Any alphanumeric character: \w [a-zA-Z_0-9] Any digit: \d [0-9]

31

Pattern definition - Expression Quantifiers: One or more: exp+ (Example: ‘[\w]+’) Zero or more: exp* Between n and m times: exp{n,m}Examples

Read more about “regular expressions” in the MATLAB help!(search “regular expressions” )

Function: loc = regexp(str, pattern)

Regular Expressions Syntax

‘\w\s+\w’ – Two alphanumeric expressions with one or more spaces in the middle

‘[SRM]amy’ –

Ramy, Samy or Mamy

32

Using Regular Expressions to Search For Pattern occurrences In a Long String

Example:

prof_higgins = 'The rain in Spain stays mainly in the plain.';

We would like to find all the words that rhyme with ‘ain’

1. Defining the pattern: new word (preceded with space) One or more alphanumeric characters ‘ain’ pattern= ‘\w+ain[\s\.]’ OR pattern= ‘[a-zA-Z]+ain [\s\.]’

33

>> prof_higgins = … 'The rain in Spain falls mainly on the plain.';

Find occurrences indices: >> loc = regexp(prof_higgins,'\w+ain')loc = [5 13 25 39]

Get pattern occurrences:>> words = regexp(prof_higgins,'\w+ain','match')words = {'rain','Spain','main','plain'}

Using Regular Expressions to Search For Pattern occurrences In a Long String

34

Replace all pattern occurrences:

>> eliza_doolittle=regexprep(prof_higgins,’ain’,’yne’)

elisa_doolittle = ‘The ryne in Spyne falls mynely on the plyne.’

Split a line to the words (Good for parsing lines of input file): >> words = regexp(prof_higgins, '\s', 'split');words ={'The‘, 'rain‘,'in‘,'Spain‘,'falls‘,'mainly‘,'on‘,'the‘,

'plain.‘}

Using Regular Expressions to Replace Pattern Occurrences In a Long String

35

Using Regular Expression to Parse a line (see strtok for another option)

no_rhymes = regexp(prof_higgins, 'ain\w*\s', 'split')no_rhymes =

{'The r' 'in Sp' 'falls m' 'on the plain.‘}

Error: The last word does not have space after it

Fixing it:

no_rhymes = regexp(prof_higgins, '\w+ain[\s\.]', 'split')no_rhymes =

{'The ' 'in ' 'falls mainly on the ' '' }

36

Running Example – Finding Bomb Threats

You are a CIA agent,who is in charge of identifying potential bombing threats of cities, by going over emails of terrorists .

37

Using Regular Expression to Identify Significant Lines

Assume an email is stored as a cell array of strings (each line in a cell), called “email”

Using Regular expression: Identify lines that contain the expression “bomb” in it. When you find such a line, print: “Help!!!” load email.mat;for i=1:length(email)

line=email{i};if( )

disp(‘HELP!!!’);end

end

~isempty(regexp(line,’bomb’))

38

Using Regular Expression to Identify Significant Lines

Notice there is a “bug” in the code: load email.mat;for i=1:length(email)

line=email{i};if(~isempty(regexp(line,’bomb’)) )

disp([‘HELP!!!:’ line]);end

end

HELP!!!:thinking of bombing rehovotHELP!!!:thinking of bombing sderotHELP!!!:thinking of going to the bombamella festival next week

How do we fix the bug?Hint | is or: ‘smil[e|ed|ing]’

39

Using Regular Expression to Identify Significant Lines

Here is a fix for the bug:

load email.mat;for i=1:length(email)

line=email{i};if(~isempty(regexp(line,’[Bb]omb[ed|ing|s]*\s’)))

disp([‘HELP!!!:’ line]);end

end

HELP!!!:thinking of bombing rehovotHELP!!!:thinking of bombing sderot

| is or

40

Regular Expression Tokens Are Used to Retrieve Specific Part of the Pattern Occurrences

tokens = regexp( …'bla bla ami@weizmann.ac.il bli bli tami@tau.ac.il ya', …

'(\w+)@(\w+)\.ac\.il', 'tokens')

Token 1 Token 2

tokens =

{ {‘ami’, ‘weizmann’} {‘tami’ ‘tau’} }

ocuurence1

tokens{1}{1} = ‘ami’

Token1

Token2

ocuurence2Token

1Token

2

41

Using Tokens to Retrieve Specific Parts of the Pattern Occurrences

Now that you identified the suspicious email, take out the threatened city Hint: Use

regexp(line, <some expression>, ‘tokens’).

for i=1:length(email)line=email{i};if(~isempty(regexp(line,’[Bb]omb[ed|ing|s|\s]*\s’))) city = regexp(line,…

'[Bb]omb[ed|ing|s|\s]*\s(\w+)',…

'tokens');disp([‘HELP!!! Bomb threat on ‘ city{1}{1}]);

endend

HELP!!! Bomb threat on:rehovotHELP!!! Bomb threat on:sderot

42

Using Tokens to Retrieve Specific Parts of the Pattern Occurrences

Here is a loop-less version: load email.mat;cities = regexp(email, '[Bb]omb[ed|ing|s]*\s(\w+).*', 'tokens')

is_threat = ~cellfun('isempty',cities);cities = cities(is_threat);cities = [cities{:}];cities = [cities{:}];warnings = strcat('HELP!!! Bomb threat on: ', cities)disp(strvcat(warnings))

HELP!!! Bomb threat on:rehovotHELP!!! Bomb threat on:sderot

regexp can handle cell array

43

Handling Files

44

Lecture Reminder –Opening and Closing Files

Opening a file for reading:fid=fopen(‘filename’,’r’); Opening a file for writing:fid=fopen(‘filename’,’w’); fid is a scalar MATLAB integer, called a

file identifier. You use the fid as the first argument to

other file input/output routines

Always close your file!!! fclose(fid);

Permissions: ‘a’ – append‘r+’- read and writeMore in the HELP…

45

Lecture Reminder –Reading a File Line by Line

Reading line by line:line = fgetl(fid); How can we read the entire file?fid = fopen('names.txt');

while feof(fid)==0tline = fgetl(fid);

if ~ischar(tline) break; endtline = strtrim(tline);%<do whatever you want>

end

fclose(fid);

Open

Close

feof – did file reached the end

fgetl – file get linebreak if not char

46

Lecture Reminder – Writing to a File

Open the file for writing permission Writing, line by line, using:

fprintf(fid,format,…); % similar to sprintf!!! Format – is a string with special characters:

%s – a string, %d – an integer, %f – a float (short double) Close the file Example:

fid = fopen(‘tmp.txt', 'w');for i=1:length(lines) fprintf(fid,’this is a line: %s\n’,lines{i});Endfclose(fid);

47

fid = fopen('names.txt', 'r');l_cnt = 0;

while feof(fid)==0 line = fgetl(fid); if ~ischar(line) break; end l_cnt = l_cnt +1; disp(['Line number ' num2str(l_cnt) ':' line]); end

fclose(fid);

File handling - Example

Open the file names.txt for read

Display it with line numbers:Line number 1: <line1>Line number 2: <line2> …

Close the file

48

File Handling - Example Congratulations!

You were just promoted to a senior spy. You have a directory full of emails text

files. Now you need to read all emails files,

identify the bomb threat, and write them into a summary threat_report.txt file.

49

File Handling - ExampleSolution strategy:1. Open output the threats file 2. Go over all the emails in a given

directory:1. Open an input email file2. Read it, line by line 3. identify threats

When a threat is identified – Print the line

4. Close the input email file3. Close output threats file

50

File Handling – Example:Programs Design

searchEmailsDirForThreats – Open report output file Open a directory and get all the files

names For each file run

searchEmailForThreats – Open email input file Search line by line for threat If threat is found –

Write the threat to the output file

1. Email file name2. Report output fid

51

File Handling – Example:Main Function Design

function threats_found = searchEmailsDirForThreats(in_emails_dir, out_report_fname)

%<getting all files names> % <opening report output file>

% <going over the files>

% <closing report output file>

52

File Handling – Example:Main Function Design

function threats_found = searchEmailsDirForThreats(in_emails_dir, out_report_fname

%<getting all files names>if (~isdir(in_emails_dir)) error([in_emails_dir ' is not a directory']);end % getting file namesfs = dir(in_emails_dir);file_names = {fs.name};

Directory management:

dir, pwd, cd, copyfile, delete, movefile, mkdir, rmdir, …

53

File Handling – Example:Main Function Design

function threats_found = searchEmailsDirForThreats(in_emails_dir, out_report_fname

%<getting all files names> % <opening report output file>out_report_fid = fopen(out_report_fname, 'w');if out_report_fid < 0 error(['File ' ,out_report_fname ,' could not open']);end threats_found = 0;

54

File Handling – Example:Main Function Design

function threats_found = searchEmailsDirForThreats(in_emails_dir, out_report_fname % <going over the files>for i=1:length(file_names) email_fname = file_names{i};

if (~isdir(email_fname)) threats_found = threats_found + ..

searchEmailForThreats(out_report_fid, … [in_emails_dir '/' email_fname]); end end% <closing report output file>fclose(out_report_fid);

55

File Handling – Example:Looking for Threats In an Email

function threats_found = searchEmailForThreats(out_report_fid,email_fname)

% <opening email input file>%<going over the file line by line>while feof(in_fid) == 0 % <read line> if % <is found threat>

%<get the threatened city> % <adding to the report> endend%<closing input file>

56

File Handling – Example:Opening File For Read

function threats_found = searchEmailForThreats(out_report_fid,email_fname)

% <opening email input file>in_fid = fopen(email_fname, 'rt');if in_fid < 0 error(['File ' , email_fname ,' was not found.']);end threats_found = 0; l_cnt = 0;

57

File Handling – Example:Reading a File Line by Line

function threats_found = searchEmailForThreats(out_report_fid,email_fname)

% <opening email input file>%<going over the file line by line>while feof(in_fid) == 0 % <read line > line = fgetl(in_fid); if ~ischar(line) break; end l_cnt = l_cnt+1; line = strtrim(line); if % <is found threat>

… endend

58

File Handling – Example:Using Regular expression to find and retrieve pattern occurences

while feof(in_fid) == 0 % <read line> % <is found threat> if (~isempty(regexp(line,'.*bomb.*'))) city = regexp(line, '.*bomb\w*\s([\w-]+).*', 'tokens'); % <adding to the report> fprintf(out_report_fid,'File: %s, Line number:%d, Threat on %s - %s\n', ... email_fname , l_cnt, city{1}{1},line); threats_found = threats_found + 1; end

end

59

File Handling – Example:Looking For Threats in an Email

function threats_found = searchEmailForThreats(out_report_fid,email_fname)

% <opening email input file>%<going over the file line by line>while feof(in_fid) == 0 % <read line> if % <is found threat>

%<get the threatened city> % <adding to the report> endend%<closing input file>fclose(in_fid);

60

High-Level File Handling Functions

61

Matlab Has a Collection of High Level Write / Read Functions

Matlab has a collection of high level read and write functions

These functions can save the need to write read/ write the file line by line.

Examples: dlmread, dlmwrite textread, textscan xlsread importdata

62

High-level File Reading Function Example- textread

Reading an entire text file in one line: lines=textread(filename,format,parameters) Example: When reading a file containing a single word in every

line: names=textread(‘names.txt’,’%s’);

If there are more words in a line – each word will be read separately

Example 1:

email=textread(‘email.txt’,’%s’); What happens?

email = {'thinking' 'of'' bombing' 'rehovot''thinking‘…}

63

High-level File Reading Function Example- textread

Example 2: Reading a text file, line by line Try:

email = textread('email.txt', '%s', 'delimiter','\n‘);

What happens?

email = {'thinking of bombing rehovot''thinking of bombing sderot''thinking of going to the bombamella festival next week’}

64

MATLAB functions for High-level file reading

Reading an entire Excel file in one line:

[nums,t]=xlsread(filename,options…) Will create a numerical array nums and a

cell array t. Try:

[n,t]=xlsread('rt_example3.xls') What happens?

Textual cells are set to NaNs in n Numerical cells are set to ‘’ (empty strings) in t

Note: can read each sheet (read the HELP)

65

MATLAB functions for High-level file reading

Reading an entire Excel/tab delimited text file /other preformatted files:

A=importdata(filename,options…) Will create a structure A, which contains:

A.data - numerical array A.textdata - a cell array.

Try: A=importdata('rt_example3.xls') What happens?

66

Summery – File Handeling

Matlab has diverse and powerful functions for text processing

Before you start coding using low levels I/O function – Check if one of the high level functions solves it.

67

Final example:Looking for p53 TFBS

(Transcription Factor Binding Sites)in human promoters

68

Looking for p53 TFBS in human promoters

A TF can recognize a variable site Some positions are fixed Some are optional, e.g. A/T are

acceptable, but not G/C. Consensus sequence: the pattern

representing all possible recognized sites.

69

Looking for p53 TFBS in human promoters

Let’s define a consensus for p53 half-site:1. Pos #1: G/A/T2. Pos #2: G/A3. Pos #3: A/G/C4. Pos #4: C5. Pos #5: A/T6. Pos #6: A/T7. Pos #7: G8. Pos #8: N9. Pos #9: T/C/G10. Pos #10: T/C

Variable space0-13

Half-site Half-site

70

Looking for p53 TFBS in human promoters

How do we even start???1. Read the promoter file into a cell array.2. Go through the promoters:

Look for the p53 consensus (need to define it – regular expression) When we find it store the data on the hit

3. Open a result file4. Go through all the hits you found

Print them into the results file

71

Looking for p53 TFBS in human promoters

1. Reading the promoter file:

The file name: masked_promoters.some.txtThe file format: FASTA>gene1 header lineSequence…Sequence…> gene2 header lineSequence…Sequence…

>GENE=ENSG00000001036 Transcript=1 LLid=2519 orgDBsym=FUCA2 other details… CCATGTTCTAAACGACTTCATAGATTTATTTCTTTCAGTCAT…

72

Looking for p53 TFBS in human promoters

1. Reading the promoter file:promoters={};ensID={};symb={};

fid=fopen('masked_promoters.all.txt');while feof(fid)==0 tline = fgetl(fid); >process the data> endfclose(fid);

73

Looking for p53 TFBS in human promoters

1. Reading the promoter file:while 1 >from previous slide…> if(tline(1)=='>') %it is a header tmp=regexp(tline,…

'.*GENE=(\w+)\s.*orgDBsym=(\w+)',… 'tokens');

ensID{end+1}=tmp{1}{1}; symb{end+1}=tmp{1}{2}; else %it is a promoter promoters{end+1}=tline; endend

74

Looking for p53 TFBS in human promoters

2. Go through the promoters:

hit_seq={};hit_gene=[];hit_pos=[];p53_consensus = ...'[GAT][GA][AGC]C[AT][AT]G.[TCG][TC].{0,13}[GAT][GA]

[AGC]C[AT][AT]G.[TCG][TC]';

for i=1:length(promoters) [m s e] = regexp(promoters{i}, p53_consensus, 'match', …

'start', 'end');%let’s ignore that DNA is double stranded…

if(~isempty(m)) hit_seq(end+1:end+length(m))=m; hit_gene(end+1:end+length(m))=repmat(i,1,length(m)); hit_pos(end+1:end+length(m))=s; endend

75

Looking for p53 TFBS in human promoters

3&4. Open a result file, print all the hits

fid=fopen('p53_TFBS.txt','w');%printing a header linefprintf(fid,'gene ID\tgene name\tsite\tpos\n');for i=1:length(hit_gene) fprintf(fid,'%s\t%s\t%s\t%d\n',

ensID{hit_gene(i)},... symb{hit_gene(i)},...

hit_seq{i},... hit_pos(i));

endfclose(fid);