Chapter - 2

Chapter - 2

Data Structure for Language Processing

Classification of Data structures

• Language processor makes frequent use of search operation over its data structures.

• The data structure used in language processing can be classified on the basis of the following criteria:1. Nature of a DS – whether a linear or nonlinear DS.2. Purpose of a DS – whether a search DS or an

allocation DS.3. Lifetime of DS – whether used during language

processing or during target program execution.

Linear data structure

• A Linear DS consist of a linear arrangement of elements in the memory.

• Linear DS requires a contiguous memory for elements as shown in fig(a).

• Problem: In the situations where the size of a data structure is difficult to predict. In such a situation a designer is forced to overestimate the memory requirements of linear data structure. This leads to wastage of memory.

Non linear data structure

• The elements of a nonlinear data structure need not contiguous areas of memory, which avoids the memory allocation problem seen in the context of linear DS.

• fig(b) shows allocation to four nonlinear data structure i.e. E,F,G and H, where F is stored in 3-different memory areas.

• nonlinear arrangement of elements leads to lower search efficiency.

Search Data Structure

• Search DS (or search structure) is a set of entries, each entry accommodating the information concerning one entity in source program.

• Each entry assumes to contain a key field which forms the basis for a search operation.

• Search DS are used to construct various tables of information.

• Search DS are used during language processing to maintain attribute information of different entities in source program.

Search Data Structure Cont...

Entry Formats:• Entries consist of two parts, a fixed part and a variant part. Each

part consists of set of fields.• Fields of the fixed part exist in each entry of the search structure.• The value in the tag field of the fixed part determines the

information to be stored in the variant part of the entry.

• For e.g., Entries in the symbol table of a compiler have the following fields:1. Fixed part: Fields symbol and class (class is the tag field)2. Variant part: variable, operator, procedure name, function name etc.


Algorithm (Generic search procedure):1. Make a prediction concerning the entry of the search DS

with symbol ‘s’ may be occupying. Let this be entry e.2. Let se be the symbol occupying eth entry. Compare ‘s’ with

se. Exit with success if the two match.3. Repeat steps 1 and 2 till it can be concluded that the symbol

does not exist in the search DS.

• Each comparison of step 2 is called a probe (p). Ps : Number of probes in a successful searchPu : Number of probes in an unsuccessful search


Operations on search structure:• The following operations are performed on search

structure:1. Operation add: Add the entry of a symbol2. Operation search: Search & locate the entry of a

symbol.3. Operation delete: Delete the entry of a symbol.

• The entry for a symbol is created only once, but may search for a large number of times during the processing of a program.


Table organization• Table is linear data structure. Two points can be made

concerning table as search structure.– Given the location of an entry of the table, so easy to move on

next entry or previous entry of table for search technique.– Tables using the fixed length entry organization. It states that the

address of an entry in a table can be determined from its entry number.

• 3-main types of Table organization are:1. Sequential search organization2. Binary search organization3. Hash table organization


Sequential search organization:• It uses Generic search procedure to search any symbol from

the table.• Fig. shows a typical state of a table using the sequential

search organization.

Search Data Structure Cont...Sequential search organization (operations):• Search for a symbol:

Ps = f/2 for a successful searchPu = f for an unsuccessful search

• Add a symbol:

The symbol is added to the first free entry in the table. The value of ‘ f ’ is updated accordingly.

• Delete a symbol:1. Physical deletion : Entry is deleted by erasing or by overwriting. Thus, if the dth

entry is to be deleted, entries d+1 to f can be shifted ‘up’ by one entry each. This would require (f - d) shift operations in symbol table.

2. Logical deletion : It is performed by adding some information to the entry to indicate its deletion. This can be implemented by introducing a new field to indicate whether an entry is active or deleted.


Binary search organization:• All entries in a table are assumed to satisfy an ordering relation.

Algorithm (Binary search):1. Start := 1; end := f;2. While Start <= end

a) e := (Start + end )/2; take rounded value. Exit with success if s = se.

b) If s < se then end := e – 1; else start := e + 1;

3. Exit with failure.


Hash table organization:• Search prediction depends on the value of s. • 3-possibilities exist:

1. The entry may be occupied by s2. The entry may be occupied by some other symbol, or3. Entry may be empty

• Algorithm (Hash table organization):1. e : = h(s)2. Exit with success if s = se and with failure if entry e is unoccupied.3. Repeat steps 1 and 2 with different hashing functions

(multiplication function or division functions etc…).

Allocation Data Structure

We will discuss two allocation data structure, stack(linear) and heaps(nonlinear).

Stack:A stack is a linear Data Structure which specifies the following properties:

1. Allocation and deallocations are performed in a last-in-first-out (LIFO) manner.

2. Only the last entry is accessible at the time.

Allocation Data Structure

Following fig. illustrates the stake allocation and deallocation process.

Allocation Data StructureExtended stack • Sometimes extension is needed in the simple stack model

because all entities may not be of the same size. The size of an entity is assumed to be an integral multiple of the size of a stack entry.

• Following figure shows extended stack model. In addition to SB and TOS, two new pointers exist in the model:– A record base pointer (RB) pointing to the first word of the last

record in stack.– The first word of each record is a reserved pointer. This pointer

is used for housekeeping purposes as explain below.

Allocation Data StructureExtended stack

Extended stack mode (b)-allocation (c)-deallocation

Allocation Data Structure• Allocation time actions:

No Statement1. TOS := TOS + 1 ;2. TOS* := RB;2. RB := TOS;3. TOS := TOS + n;

• Deallocation time actions:No Statement1.TOS := RB - 1 ;2.RB := TOS*;

Heap• A heap is a nonlinear DS which permits allocation

and deallocation of entities in a random order.• An allocation request returns a pointer to the

allocated area in the heap. • A deallocation request must present a pointer to

the area to be deallocated.• Memory management: memory management

thus consisting of:1. Identifying the free memory areas (or holes).2. Reusing free memory areas.

Heap Cont…Identifying the free memory areas:• Two popular techniques used to identify free memory space are:

1. Reference Counts2. Garbage Collection

Reference Counts• In reference count techniques, the system associates a reference count

with each memory area to indicate the number of its active user.

• The number incremented when a new user gains access to that area and is decremented when a user finishes using it. The area is known to be free when its reference count drops to zero.

• Advantage: reference count technique is simple to implement • Disadvantage: Incremental overheads, i.e. overheads at every allocation

and deallocation.

Heap Cont…Garbage Collection:• Garbage collection makes two passes over the memory to

identify unused areas.

• In the first pass it traverses all pointers pointing to allocated areas and marks the memory areas which are in use.

• The second pass finds all unmarked areas and declares them to be free.

• The garbage collection overheads are not incremental. They are incurred every time the system runs out of free memory to allocate to fresh requests.

Heap Cont…Reuse of memory:• When a free list is used, two techniques can

be used to perform a fresh allocation: 1. First fit technique: Select the first free area

whose size is >= n words, where n is the number of words to be allocated.

2. Best fit technique: This technique finds the smallest free area whose size >= n.

Date post:	22-Feb-2016
Category:	Documents
Upload:	jodie
View:	24 times
Download:	0 times

Chapter - 2

Documents