Finaal application on regular expression

APPLICATION OF REGULAR EXPRESSION

Ankit G – 014

Gagan – 034

Nikhil R.K- 060

Parashuram - 065

• A regular expression (regex) describes a pattern to match multiple input strings.

• Regular expressions descend from a fundamental concept in Computer Science

called finite automata theory

• Regular expressions are endemic to Unix

• Some utilities/programs that use them:– vi, ed, sed, and emacs

– awk, tcl, perl and Python

– grep, egrep, fgrep

– compilers

• The simplest regular expression is a string of literal characters to match.

• The string matches the regular expression if it contains the substring.

What is a Regular Expression?INTRODUCTION

Application in Linux

The “egrep” Tool

Copyright © 2007 by Adam Webber

Text File Search

• Unix tool: egrep

• Searches a text file for lines that contain a substring matching a specified pattern

• Echoes all such lines to standard output

In linux operating System:

Regular expressions are used by several different Unix commands, including ed, sed, awk, grep, and, to a more limited extent, vi.

Sed also understands something called addresses. Addresses are either particular locations in a file or a range where a particular editing command should be applied. When Sed encounters no addresses, it performs its operations on every line in the file.

Sed stands for stream editor is a stream oriented editor which was created exclusively for executing scripts. Thus all the input you feed into it passes through and goes to STDOUT and it does not change the input file.

Oracles implementation is the extension of the POSIX

(Portable Operating system for UNIX)

Editing Commands

COMMANDS ACTION

Insert

i, a

I, A

o, O

Insert text before, after cursor

Insert text before beginning, after end of line

Open new line for text below, above cursor

Editing Commands

COMMANDS ACTION

Change

r

cw

c

Replace character

Change word

Change current line

cmotion

C

R

s

Change text between the cursor and the target

of motion

Change to end of line

Type over (overwrite) characters

Substitute: delete character and insert new text

S Substitute: delete current line and insert new text

Application in Search Engine

One use of regular expressions that used to be very common was in web search engines.

Archie, one of the first search engines, used regular expressions exclusively to search through a database of filenames on servers.

Regular expressions were chosen for these early search engines because of both their power and easy implementation.

In the case of a search engine, the strings input to the regular expression would be either whole web pages or a pre-computed index of a web page that holds only the most important information from that web page.

A query such as regular expression could be translated into the following regular expression. (Σ∗regularΣ∗expressionΣ∗ )∗∪(Σ∗expressionΣ∗regularΣ∗ )∗ Σ, then, of course, would be the set of all characters in the character encoding used with this search engine.

Regular expressions are not used anymore in the large web search engines because with the growth of the web it became impossibly slow to use regular expressions. They are however still used in many smaller search engines such as a find/replace tool in a text editor or tools such as grep.

In web application String matching is used

Regular Expressions in Lexical Analysis

To perform lexical analysis, two components are required: a scanner and a tokenizer.

The purpose of tokenization is to categorize the lexemes found in a string to sort them by meaning.

The process can be considered a sub-task of parsinginput.

http://en.wikipedia.org/wiki/Parsing

For example, the C programming language could contain tokens such as numbers, string constants, characters, identifiers (variable names), keywords, or operators.

We can simply define a set of regular expressions, each matching the valid set of lexemes that belong to this token type. This is the process of scanning.

This process can be quite complex and may require more than one pass to complete.

Another option is to use a process known as backtracking

For example, to determine if a lexeme is a valid identifier in C, we could use the following regular expression: [a-zA-Z ][a-zA-Z 0-9]∗ This regular expression says that identifiers must begin with a Roman letter or an underscore and may be followed by any number of letters, underscores, or numbers

CONCLUSION

Both regular expressions and finite-state automata represent regular languages.

The basic regular expression operations are: concatenation, union/disjunction, and Keene closure.

The regular expression language is a powerful pattern-matching tool.

Any regular expression can be automatically compiled into an NFA, to a DFA, and to a unique minimum-state DFA.

An FSA can use any set of symbols for its alphabet, including letters and words.

THANK YOU

Date post:	16-Jul-2015
Category:	Education
Upload:	gagan019
View:	85 times
Download:	0 times

Finaal application on regular expression

Education