Insight Through Computing
15. Strings
OperationsSubscriptingConcatenationSearchNumeric-String Conversions
Built-Ins: int2str,num2str, str2double
Insight Through Computing
Previous Dealings
N = input(‘Enter Degree: ’)
title(‘The Sine Function’)
disp( sprintf(‘N = %2d’,N) )
Insight Through Computing
A String is an Array of Characters
‘Aa7*>@ x!’
A a 7 * > @ x !
This string has length 9.
Insight Through Computing
Why are Stirngs Important?
1. Numerical Data often encoded as strings
2. Genomic calculation/search
Insight Through Computing
Numerical Data is Often Encoded in Strings
For example, a file containingIthaca weather data begins with the string
W07629N4226
Longitude: 76o 29’ WestLatitude: 42o 26’ North
Insight Through Computing
What We Would Like to Do
W07629N4226
Get hold of the substring ‘07629’
Convert it to floating format so thatit can be involved in numerical
calculations.
Insight Through Computing
Format Issues
9 as an IEEE floating point number:
9 as a character:
0100000blablahblah01001111000100010010
01000otherblablaDifferent Representation
Insight Through Computing
Genomic Computations
Looking for patterns in a DNA sequence:
‘ATTCTGACCTCGATC’ACCT
Insight Through Computing
Genomic Computations
Quantifying Differences:
ATTCTGACCTCGATCATTGCTGACCTCGAT
Remove?
Insight Through Computing
Working With Strings
Insight Through Computing
Strings Can Be Assignedto Variables
S = ‘N = 2’
N = 2;
S = sprintf(‘N = %1d’,N)
‘N = 2’
S
sprintf produces a formatted string using fprintf rules
Insight Through Computing
Strings Have a Length
s = ‘abc’;
n = length(s); % n = 3
s = ‘’; % the empty string
n = length(s) % n = 0
s = ‘ ‘; % single blank
n = length(s) % n = 1
Insight Through Computing
Concatenation
This: S = ‘abc’;
T = ‘xy’
R = [S T]
is the same as this: R = ‘abcxy’
Insight Through Computing
Repeated Concatenation
This: s = ‘’;
for k=1:5
s = [s ‘z’];
end
is the same as this:
z = ‘zzzzz’
Insight Through Computing
Replacing and AppendingCharacters
s = ‘abc’;s(2) = ‘x’ % s = ‘axc’
t = ‘abc’t(4) = ‘d’ % t = ‘abcd’
v = ‘’v(5) = ‘x’ % v = ‘ x’
Insight Through Computing
Extracting Substrings
s = ‘abcdef’;
x = s(3) % x = ‘c’
x = s(2:4) % x = ‘bcd’
x = s(length(s)) % x = ‘f’
Insight Through Computing
Colon Notation
s( : )
Starting Location
Ending Location
Insight Through Computing
Replacing Substrings
s = ‘abcde’;
s(2:4) = ‘xyz’ % s = ‘axyze’
s = ‘abcde’
s(2:4) = ‘wxyz’ % Error
Insight Through Computing
Question Time
s = ‘abcde’;
for k=1:3
s = [ s(4:5) s(1:3)];
end
What is the final value of s ?
A abcde B. bcdea C. eabcd D. deabc
Insight Through Computing
Problem: DNA Strand
x is a string made up of the characters‘A’, ‘C’, ‘T’, and ‘G’.
Construct a string Y obtained from x by replacinig each A by T, each T by A, each C by G, and each G by C
x: ACGTTGCAGTTCCATATGy: TGCAACGTCAAGGTATAC
Insight Through Computing
function y = Strand(x)
% x is a string consisting of
% the characters A, C, T, and G.
% y is a string obtained by
% replacing A by T, T by A,
% C by G and G by C.
Insight Through Computing
Comparing Strings
Built-in function strcmp
strcmp(s1,s2) is true if the strings s1 and s2 are identical.
Insight Through Computing
How y is Built Up
x: ACGTTGCAGTTCCATATGy: TGCAACGTCAAGGTATAC
Start: y: ‘’ After 1 pass: y: TAfter 2 passes: y: TGAfter 3 passes: y: TGC
Insight Through Computing
for k=1:length(x)
if strcmp(x(k),'A')
y = [y 'T'];
elseif strcmp(x(k),'T')
y = [y 'A'];
elseif strcmp(x(k),'C')
y = [y 'G'];
else
y = [y 'C'];
end
end
Insight Through Computing
A DNA Search Problem
Suppose S and T are strings, e.g.,
S: ‘ACCT’
T: ‘ATGACCTGA’
We’d like to know if S is a substring of T and if so, where is the first occurrance?
Insight Through Computing
function k = FindCopy(S,T)
% S and T are strings.
% If S is not a substring of T,
% then k=0.
% Otherwise, k is the smallest
% integer so that S is identical
% to T(k:k+length(S)-1).
Insight Through Computing
A DNA Search Problem
S: ‘ACCT’
T: ‘ATGACCTGA’
strcmp(S,T(1:4)) False
Insight Through Computing
A DNA Search Problem
S: ‘ACCT’
T: ‘ATGACCTGA’
strcmp(S,T(2:5)) False
Insight Through Computing
A DNA Search Problem
S: ‘ACCT’
T: ‘ATGACCTGA’
strcmp(S,T(3:6)) False
Insight Through Computing
A DNA Search Problem
S: ‘ACCT’
T: ‘ATGACCTGA’
strcmp(S,T(4:7))) True
Insight Through Computing
Pseudocode
First = 1; Last = length(S);
while S is not identical to T(First:Last) First = First + 1;
Last = Last + 1;
end
Insight Through Computing
Subscript Error
S: ‘ACCT’
T: ‘ATGACTGA’
strcmp(S,T(6:9))
There’s a problem if S is not a substring of T.
Insight Through Computing
Pseudocode
First = 1; Last = length(s);
while Last<=length(T) && ... ~strcmp(S,T(First:Last))
First = First + 1;
Last = Last + 1;
end
Insight Through Computing
Post-Loop Processing
Loop ends when this is false:
Last<=length(T) && ...
~strcmp(S,T(First:Last))
Insight Through Computing
Post-Loop Processing
if Last>length(T) % No Match found k=0;else % There was a match k=First;end
The loop ends for one of two reasons.
Insight Through Computing
Numeric/StringConversion
Insight Through Computing
String-to-Numeric Conversion
An example…
Convention: W07629N4226
Longitude: 76o 29’ West Latitude: 42o 26’ North
Insight Through Computing
String-to-Numeric Conversion
S = ‘W07629N4226’
s1 = s(2:4);
x1 = str2double(s1);
s2 = s(5:6);
x2 = str2double(s2);
Longitude = x1 + x2/60
There are 60 minutes in a degree.
Insight Through Computing
Numeric-to-String Conversion
x = 1234;
s = int2str(x); % s = ‘1234’
x = pi;
s = num2str(x,’%5.3f’); % s =‘3.142’
Insight Through Computing
Problem
Given a date in the format ‘mm/dd’
specify the next day in the same format
Insight Through Computing
y = Tomorrow(x)
x y
02/28 03/01
07/13 07/14
12/31 01/01
Insight Through Computing
Get the Day and Month
month = str2double(x(1:2));
day = str2double(x(4:5));
Thus, if x = ’02/28’ then month is assignedthe numerical value of 2 and day is assigned the numerical value of 28.
Insight Through Computing
L = [31 28 31 30 31 30 31 31 30 31 30 31];
if day+1<=L(month)
% Tomorrow is in the same month
newDay = day+1;
newMonth = month;
Insight Through Computing
L = [31 28 31 30 31 30 31 31 30 31 30 31];
else
% Tomorrow is in the next month
newDay = 1;
if month <12
newMonth = month+1;
else
newMonth = 1;
end
Insight Through Computing
The New Day String
Compute newDay (numerical) and convert…
d = int2str(newDay);if length(d)==1 d = ['0' d];end
Insight Through Computing
The New Month String
Compute newMonth (numerical) and convert…
m = int2str(newMonth);
if length(m)==1;
m = ['0' m];
end
Insight Through Computing
The Final Concatenation
y = [m '/' d];
Insight Through Computing
Some other useful string functionsstr= ‘Cs 1112’;
length(str) % 7isletter(str) % [1 1 0 0 0 0 0]isspace(str) % [0 0 1 0 0 0 0]lower(str) % ‘cs 1112’upper(str) % ‘CS 1112’
ischar(str) % Is str a char array? True (1)strcmp(str(1:2),‘cs’) % Compare strings str(1:2) & ‘cs’. False (0)strcmp(str(1:3),‘CS’) % False (0)
Insight Through Computing
ASCII characters(American Standard Code for Information Interchange)
ascii code Character: :: :65 ‘A’66 ‘B’67 ‘C’: :90 ‘Z’: :
ascii code Character
: :: :48 ‘0’49 ‘1’50 ‘2’: :57 ‘9’: :
Insight Through Computing
Character vs ASCII code
str= ‘Age 19’
%a 1-d array of characters
code= double(str)
%convert chars to ascii values
str1= char(code)
%convert ascii values to chars
Insight Through Computing
Arithmetic and relational ops on characters
• ‘c’-‘a’ gives 2• ‘6’-‘5’ gives 1• letter1=‘e’; letter2=‘f’; • letter1-letter2 gives -1
• ‘c’>’a’ gives true• letter1==letter2 gives false
• ‘A’ + 2 gives 67• char(‘A’+2) gives ‘C’
Insight Through Computing
Example: toUpperWrite a function toUpper(cha) to convert character cha to upper case if cha is a lower case letter. Return the converted letter. If cha is not a lower case letter, simply return the character cha.
Hint: Think about the distance between a letter and the base letter ‘a’ (or ‘A’). E.g.,
a b c d e f g h …
A B C D E F G H …
Of course, do not use Matlab function upper!
distance = ‘g’-‘a’ = 6 = ‘G’-‘A’
Insight Through Computing
function up = toUpper(cha)% up is the upper case of character cha.% If cha is not a letter then up is just cha.
up= cha;
cha is lower case if it is between ‘a’ and ‘z’
Insight Through Computing
function up = toUpper(cha)% up is the upper case of character cha.% If cha is not a letter then up is just cha.
up= cha;
if ( cha >= 'a' && cha <= 'z' )
% Find distance of cha from ‘a’
end
Insight Through Computing
function up = toUpper(cha)% up is the upper case of character cha.% If cha is not a letter then up is just cha.
up= cha;
if ( cha >= 'a' && cha <= 'z' )
% Find distance of cha from ‘a’ offset= cha - 'a';
% Go same distance from ‘A’ end
Insight Through Computing
function up = toUpper(cha)% up is the upper case of character cha.% If cha is not a letter then up is just cha.
up= cha;
if ( cha >= 'a' && cha <= 'z' )
% Find distance of cha from ‘a’ offset= cha - 'a';
% Go same distance from ‘A’ up= char('A' + offset);end