+ All Categories
Home > Documents > Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC...

Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC...

Date post: 27-Dec-2015
Category:
Upload: marsha-caldwell
View: 218 times
Download: 0 times
Share this document with a friend
41
Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Ja va Strings and Characters C H A P T E R 8 Surely you don’t think that numbers are as important as words. —King Azaz to the Mathemagician Norton Juster, The Phantom Toolbooth, 1961
Transcript
Page 1: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Chapter 8—Strings and Characters

The Art and Science of

An Introductionto Computer ScienceERIC S. ROBERTS

Java

Strings and Characters

C H A P T E R 8

Surely you don’t think that numbers are as important as words.

—King Azaz to the Mathemagician—Norton Juster, The Phantom Toolbooth, 1961

Page 2: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Characters• Computers use integers (a code) to represent character data

inside the memory of the machine.

• Character codes, however, are not particularly useful unless they are standardized. If different computer manufacturers use different coding sequence (as was indeed the case in the early years), it is harder to share such data across machines.

• The first widely adopted character encoding was ASCII (American Standard Code for Information Interchange).

• With only 256 possible characters, the ASCII system proved inadequate to represent the many alphabets in use throughout the world. It has therefore been superseded by Unicode, which allows for a much larger number of characters.

Page 3: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

The ASCII Subset of UnicodeThe Unicode value for any character in the table is the sum of the octal numbers at the beginning of that row and column.The letter A, for example, has the Unicode value 1018, which is the sum of the row and column labels.The following table shows the first 128 characters in the Unicode character set, which are the same as in the older ASCII scheme:

\000 \001 \002 \003 \004 \005 \006 \007\b \t \n \011 \f \r \016 \017\020 \021 \022 \023 \024 \025 \026 \027\030 \031 \032 \033 \034 \035 \036 \037space ! " # $ % & '( ) * + , - . /0 1 2 3 4 5 6 78 9 : ; < = > ?@ A B C D E F GH I J K L M N OP Q R S T U V WX Y Z [ \ ] ^ _` a b c d e f gh i j k l m n op q r s t u v wx y z { | } ~ \177

0 1 2 3 4 5 6 700x01x02x03x04x05x06x07x10x11x12x13x14x15x16x17x

\000 \001 \002 \003 \004 \005 \006 \007\b \t \n \011 \f \r \016 \017\020 \021 \022 \023 \024 \025 \026 \027\030 \031 \032 \033 \034 \035 \036 \037space ! " # $ % & '( ) * + , - . /0 1 2 3 4 5 6 78 9 : ; < = > ?@ A B C D E F GH I J K L M N OP Q R S T U V WX Y Z [ \ ] ^ _` a b c d e f gh i j k l m n op q r s t u v wx y z { | } ~ \177

Page 4: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Notes on Character Representation • The important observation is that a character has a numeric

representation, and not what that representation happens to be.

• To specify a character in a Java program, you need to use a character constant, which consists of the desired character enclosed in single quotation marks. Thus, the constant 'A' in a program indicates the Unicode representation for an uppercase A. That it has the value 65 is an irrelevant detail.

Page 5: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Special Characters• Most of the characters in the Unicode table are the familiar

ones that appear on the keyboard. These characters are called printing characters. The table also includes several special characters that are typically used to control formatting.

• Special characters are indicated in the Unicode table by an escape sequence, which consists of a backslash followed by a character of sequence of digits. The most common ones are:

\b Backspace

\f Form feed (starts a new page)

\n Newline (moves to the next line)

\r Return (moves to the beginning of the current line without advancing)

\t Tab (moves horizontally to the next tab stop)

\\ The backspace character itself

\' The character ' (required only in character constants)

\" The character " (required only in string constants)

\ddd The character whose Unicode value is the octal number ddd

Page 6: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

println("Hello\nworld");println("Hey\tworld");println("Helloo\tworld");println("Helloooo \"world\" ");println("Helloooo \\ world");println("Helloooo \101\102\103 world");

Special Characters

HelloworldHey worldHelloo worldHelloooo "world" Helloooo \ worldHelloooo ABC world

Output:

Page 7: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Useful Methods in the Character Class

static boolean isDigit(char ch)Determines if the specified character is a digit.

static boolean isLetter(char ch)Determines if the specified character is a letter.

static boolean isLetterOrDigit(char ch)Determines if the specified character is a letter or a digit.

static boolean isLowerCase(char ch)Determines if the specified character is a lowercase letter.

static boolean isUpperCase(char ch)Determines if the specified character is an uppercase letter.

static boolean isWhitespace(char ch)Determines if the specified character is whitespace (spaces and tabs).

static char toLowerCase(char ch)Converts ch to its lowercase equivalent, if any. If not, ch is returned unchanged.static char toUpperCase(char ch)Converts ch to its uppercase equivalent, if any. If not, ch is returned unchanged.

Page 8: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Character Arithmetic• The fact that characters have underlying representations as

integers allows you can use them in arithmetic expressions. For example, if you evaluate the expression 'A' + 1, Java will convert the character 'A' into the integer 65 and then add 1 to get 66, which is the character code for 'B'.

• As an example, the following method returns a randomly chosen uppercase letter:

public char randomLetter() { return (char) rgen.nextInt('A', 'Z');}

• The following code implements the isDigit method from the Character class:

public boolean isDigit(char ch) { return (ch >= '0’ && ch <= '9');}

Page 9: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Exercise: Character Arithmetic• Implement a method toHexDigit that takes an integer and

returns the corresponding hexadecimal digit as a character. Thus, if the argument is between 0 and 9, the method should return the corresponding character between '0' and '9'. If the argument is between 10 and 15, the method should return the appropriate letter in the range 'A' through 'F'. If the argument is outside this range, the method should return '?'.

public char toHexDigit(int n) { if (n >= 0 && n <= 9) { return (char) ('0' + n); } else if (n >= 10 && n <= 15) { return (char) ('A' + n - 10); } else { return '?'; }}

Page 10: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Strings as an Abstract Idea• Ever since the very first program in the text, which displayed

the message "hello, world" on the screen, you have been using strings to communicate with the user.

• Strings are actually complicated objects whose details are better left hidden.

• Java supports a high-level view of strings by making String a class whose methods hide the underlying complexity.

Page 11: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

What is a string?

public void run() { println(“Enter a number"); …}

What is a character?

public void run() {println(“Enter a number");println(“a”); // not a characterprintln(‘a’); // a character

…}

Page 12: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Using Methods in the String Class• A char is a primitive type. (like int, boolean, double)

• A String is an object type; a string is made out of several characters.

• You may send messages to String objects, just like you send messages to GOval, GLine, etc. objects.

• You cannot send messages to primitive values. For char values, you have to use the methods of the Character class.

Page 13: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Strings vs. Characters

• In the Character class, you call toUpperCase as a static method, like this:

ch = Character.toUpperCase(ch);

• In the String class, you apply toUpperCase to an existing string, as follows:

str = str.toUpperCase();

• Note that both classes require you to assign the result back to the original variable if you want to change its value.

Page 14: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Using Methods in the String Class• Java defines many useful methods that operate on the

String objects.

• These methods do not change the value of the String object

• Instead, they return a new string on which the desired changes have been performed

Page 15: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Selecting Characters from a String• Conceptually, a string is an ordered collection (array) of

characters.

• For example, the characters in the string "hello, world" are arranged like this:

h0

e1

l2

l3

o4

,5

6

w7

o8

r9

l10

d11

• You can obtain the number of characters by calling length.

• You can select an individual character by calling charAt(k), where k is the index of the desired character. The expression

returns the first character in str, which is at index position 0.

str.charAt(0);

Page 16: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Concatenation• One of the most useful operations available for strings is

concatenation, which consists of combining two strings end to end with no intervening characters.

• Concatenation is built into Java in the form of the + operator.

• If you use + with numeric operands, it signifies addition. If at least one of its operands is a string, Java interprets + as concatenation.

Page 17: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Concatenation public void run() { String s1 = "hello"; String s2 = "world"; int x = 5; double y = 4.5; String s3 = s1 + s2; println(s3); // helloworld println(s1 + s2); // helloworld String s4 = x + s1; println(s4); // 5hello String s5 = s2 + y; println(s5); // world4.5 String s6 = s1 + x + y; println(s6); // hello54.5 String s7 = s1 + (x + y); println(s7); // hello9.5 String s8 = x + y + s1; println(s8); // 9.5hello }

Page 18: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Extracting Substrings• The substring method makes it possible to extract a piece

of a larger string by providing index numbers that determine the extent of the substring.

where p1 is the first index position in the desired substring and p2 is the index position immediately following the last position in the substring.

• The general form of the substring call is

str.substring(p1, p2);

• As an example, if you wanted to select the substring "ell" from a string variable str containing "hello, world" you would make the following call:

str.substring(1, 4);

Page 19: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Checking Strings for Equality• Many applications will require you to test whether two strings

are equal, in the sense that they contain the same characters.

• Although it seems natural to do so, you cannot use the == operator for this purpose. While it is legal to write

if (s1 == s2) . . .

the if test will not have the desired effect. When you use == on two objects, it checks whether the objects are identical, which means that the references point to the same address.

• What you need to do instead is call the equals method:

if (s1.equals(s2)) . . .

Page 20: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

String equality

Gimme a string bro: abcdYo, gimme another: abcds1 == s2 is falses1.equals(s2) is true

Output:

Page 21: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Comparing Characters and Strings• The fact that characters are primitive types with a numeric

internal form allows you to compare them using the relational operators. If c1 and c2 are characters, the expression

is true if the Unicode value of c1 is less than that of c2.

c1 < c2

• The String class allows you to compare two strings using the internal values of the characters, although you must use the compareTo method instead of the relational operators:

This call returns an integer that is less than 0 if s1 is less than s2, greater than 0 if s1 is greater than s2, and 0 if the two strings are equal.

s1.compareTo(s2)

Page 22: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Searching in a String• Java’s String class includes several methods for searching

within a string for a particular character or substring.

• The method indexOf takes either a string or a character and returns the index within the receiving string at which the first instance of that value begins. If the string or character does not exist at all, indexOf returns -1. For example, if the variable str contains the string "hello, world":

str.indexOf('h') returns 0

str.indexOf("o") returns 4

str.indexOf("ell") returns 1

str.indexOf('x') returns -1

• The indexOf method takes an optional second argument that indicates the starting position for the search. Thus:

str.indexOf("o", 5) returns 8

Page 23: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Other Methods in the String Class

int lastIndexOf(char ch) or lastIndexOf(String str)Returns the index of the last match of the argument, or -1 if none exists.

boolean equalsIgnoreCase(String str)Returns true if this string and str are the same, ignoring differences in case.

boolean startsWith(String str)Returns true if this string starts with str.

boolean endsWith(String str)Returns true if this string starts with str.

String replace(char c1, char c2)Returns a copy of this string with all instances of c1 replaced by c2.

String trim()Returns a copy of this string with leading and trailing whitespace removed.

String toLowerCase()Returns a copy of this string with all uppercase characters changed to lowercase.

String toUpperCase()Returns a copy of this string with all lowercase characters changed to uppercase

Page 24: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

public void run() {String s1 = "Java is ";String s2 = "easy";// int x = 5;

String t2 = new String("difficult");// GOval circle = new GOval(..)

String s3 = s1 + t2;

println(s3);

String up = s3.toUpperCase();

println(up);}

String Creation and Usage

Page 25: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Reading String from userpublic void run() { String s1 = readLine("Gimme a string bro: "); String s2 = s1.toUpperCase(); println("s1 is " + s1); println("s2 is " + s2);}

Page 26: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Exercise• Write a Java program that takes two lines of String input from the user, concatenates them with a space character in between, and prints the resulting string out.

• Sample output should be as follows.Enter a string: Hello

Enter a string: WorldConcatenated string: Hello World

Page 27: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Simple String Idioms

for (int i = 0; i < str.length(); i++) { char ch = str.charAt(i); . . . code to process each character in turn . . .}

When you work with strings, there are two idiomatic patterns that are particularly important:

Iterating through the characters in a string. 1.

String result = "";for (whatever limits are appropriate to the application) { . . . code to determine the next character to be added . . . result += ch;}

Growing a new string character by character. 2.

Page 28: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Exercises: String Processing• As a client of the String class, how would you implement

toUpperCase(str) so it returns an uppercase copy of str?

public String toUpperCase(String str) { String result = ""; for (int i = 0; i < str.length(); i++) { char ch = str.charAt(i); result += Character.toUpperCase(ch); } return result;}

• How would you code the method indexOf(ch, str) that returns the index of the first occurrence of ch in str?

public int indexOf(char ch, String str) { for (int i = 0; i < str.length(); i++) { if (ch == str.charAt(i)) return i; } return -1;}

Page 29: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

The reverseString Methodpublic void run() { println("This program reverses a string."); String str = readLine("Enter a string: "); String rev = reverseString(str); println(str + " spelled backwards is " + rev);}

ReverseString

str

STRESSED

This program reverses a string.

STRESSED spelled backwards is DESSERTSSTRESSEDEnter a string:

rev

DESSERTS

private String reverseString(String str) { String result = ""; for ( int i = 0; i < str.length(); i++ ) { result = str.charAt(i) + result; } return result;}

istrresult

STRESSED 012345678STSRTSERTSSERTSSSERTSESSERTSDESSERTS

skip simulation

public void run() { println("This program reverses a string."); String str = readLine("Enter a string: "); String rev = reverseString(str); println(str + " spelled backwards is " + rev);}

str

STRESSED

rev

DESSERTS

Page 30: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

ReverseString v2private String reverseString(String str) { String result = ""; for ( int i = str.length()-1; i >= 0; i-- ) { result += str.charAt(i); } return result;}

Page 31: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

The StringTokenizer Class• Can be used for dividing a String into words

• imported as java.util.StringTokenizer

• Divides a string into independent units called tokens. These tokens can be read one at a time.

• StringTokenizer splits a string into a set of tokens that are separated by a special string called the delimiter.

• By default, space characters are assumed to be the delimeter

Page 32: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

The StringTokenizer Class• The constructor for the StringTokenizer class takes three

arguments, where the last two are optional:– A string indicating the source of the tokens.– A string which specifies the delimiter characters to use. By default,

the delimiter characters are set to the whitespace characters.

• Once you have created a StringTokenizer, you use it by setting up a loop with the following general form:

while (tokenizer.hasMoreTokens()) { String token = tokenizer.nextToken(); code to process the token}

Page 33: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Exercise• Write a Java program that takes a String input from the user, retrieves tokens splitted with spaces (“ ”) and prints them out in uppercase letters

Page 34: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Tokenizerimport acm.program.*;import java.util.StringTokenizer;

public class StringLecture extends ConsoleProgram {

public void run() {

String inputStr = readLine("Enter a string: ");

StringTokenizer tokenizer = new StringTokenizer(inputStr, " ");

while(tokenizer.hasMoreTokens()) {String token = tokenizer.nextToken();token = token.toUpperCase();println(token);

}

}}

Page 35: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Exercise• Write a method called acronym that takes a String as argument and returns another String standing for the acronym of this argument.

• Write the run method to take a String input from the user and print out the acronym for it (obtained using the acronym method).

• Sample output should be as follows.Enter a string: Turkish RepublicAcronym: TR

Page 36: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Acronympublic String acronym(String s){ // Uses StringTokenizer. Imported from java.util package String output = ""; StringTokenizer tokenizer = new StringTokenizer(s); while(tokenizer.hasMoreTokens()) { String token = tokenizer.nextToken(); output += token.charAt(0); } return output;}

Page 37: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Exercise• Write a method called beginsWith that takes two Strings s1 and s2, and determines whether s1 starts with s2. Do NOT use the method startsWith available in the String class.

public boolean beginsWith(String s1, String s2) { if (s1.length() < s2.length()) return false;

for(int i=0; i < s2.length(); i++) { if(s1.charAt(i) != s2.charAt(i)) return false; } return true;}

Page 38: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Exercise• Write a method called compare that takes two Strings s1 and s2 and returns 1 if s1 is (lexicographically) less than s2, returns -1 if s1 is (lexicographically) bigger than s2, returns 0 otherwise. Do NOT use the method compareTo available in the String class.

Page 39: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

comparepublic int compare(String s1, String s2) { int length = Math.min(s1.length(), s2.length());

for (int i = 0; i < length; i++) { if (s1.charAt(i) > s2.charAt(i)) return -1; if (s1.charAt(i) < s2.charAt(i)) return 1; } // If we reach this point, we’ve run out of characters // of one of the strings and every char is equal to the // corresponding char in the other string. // The last thing to check is the lengths of the strings. if (s1.length() > s2.length()) return -1; else if (s2.length() > s1.length()) return 1; else return 0;}

Page 40: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

Exercise• Write a method called toPosition that takes a String and replaces the characters in the string with their position in the word they are located.

Enter your line: The force is strong with himResult: 123 12345 12 123456 1234 123

Page 41: Chapter 8—Strings and Characters The Art and Science of An Introduction to Computer Science ERIC S. ROBERTS Java Strings and Characters C H A P T E R 8.

comparepublic String toPosition(String s) { String result = ""; int position = 1; for (int i = 0; i < s.length(); i++) { if (s.charAt(i) == ' ') { result += s.charAt(i); position = 1; // reset the counter } else { result += position; position++; } } return result;}


Recommended