Fast Class Rendering Using Multiresolution Classification in Discrete Cosine Transform Domain

transcript

Fast Class Rendering Using Multiresolution Classification in

Discrete Cosine Transform Domain

Presented byLi-Jen Kao

July, 2005

Outline

Introduction Feature Extraction Classification Scheme Experimental Results Conclusion

1 Introduction Classification of objects (or patterns) into

a number of predefined classes has been extensively studied in wide variety of applications such as optical character recognition (OCR) speech recognition face recognition

We may consider the design of classification systems in terms of two subproblems: feature extraction classification.

Feature extraction: Features are functions of the measurements

performed on a class of objects It has not found a general solution in most

applications. Our purpose is to design a general

classification scheme, which is less dependent on domain-specific knowledge.

Reliable and general features are required

Discrete Cosine Transform (DCT)

It helps separate an image into parts of differing importance with respect to the image's visual quality.

Due to the energy compacting property of DCT, much of the signal energy has a tendency to lie at low frequencies.

Four advantages in applying DCT

The features extracted by DCT are general and reliable. It can be applied to most of the vision-oriented applications.

The amount of data to be stored can be reduced tremendously.

Multiresolution classification and progressive matching can be achieved by nature.

The DCT is scale-invariant and less sensitive to noise and distortion.

Two philosophies of classification

Statistical the measurements that describe an

object are treated only formally as statistical variables, neglecting their “meaning

Structural regards objects as compositions of

structural units, usually called primitives.

2 Feature Extraction via DCT The DCT coefficients C(u, v) of an N×N

image represented by x(i, j) can be defined as

),()()(2

jixvuN

vuC ),2

)12(cos()

)12(cos(

)(otherwise

Figure 1. The DCT coefficients of the character image “ 為” .

Figure 2. Illustratation of the multiresolution ability

of DCT

(a) (b) (c) (d)

(a) The original image of size 48×48; (b) The reconstructed image of size 8×8; (c) The reconstructed image of size 16×16; (d) The reconstructed image of size 32×32.

3. The Proposed Classification Scheme

The ultimate goal of classification is to classify an unknown pattern x to one of M possible classes (c1, c2,…, cM).

Each pattern is represented by a set of D features, viewed as a D-dimensional feature vector.

3.1. Our classification model

In the training mode: the feature extraction module finds the

appropriate features for representing the input patterns, and the classifier is trained.

In the classification mode: the trained classifier assigns the input

pattern to one of the pattern classes based on the measured features.

To alleviate the burden of classification process, the process is usually divided into two stages: Coarse Classification Fine Classification

Figure 3. Model for multiresolution classification

3.2. Coarse classification module

In the training mode: The features of each training sample are first

extracted by DCT and quantized. Then the most D significant quantized DCT

features of each training sample are transformed to a code, called grid code (GC), which corresponds to a grid of feature space partitioned by the quantization method.

The training samples with the same GC are similar and can be classified into a coarse class.

Therefore, the information about all possible GCs is gathered in the training mode.

In the classification mode: The classes with the same GC as that

of the test sample are chosen as the candidates of the test sample.

3.2.1. Quantization

The 2-D DCT coefficient F(u,v) is quantized to F’(u,v) according to the following equation:

Most of the high frequency coefficients will be quantized to zero and only the most significant coefficients will be retained.

vuFvuF

),(),(

3.2.2. Grid Code Transformation

After the quantization process, the most D significant quantized DCT features of sample Oi are obtained, say [qi1, qi2, .., qiD].

The significance of each DCT coefficient is decided according to the following zigzag order: F(0,0), F(0,1), F(1,0), F(2,0), F(1,1), F(0,2), F(0,3), F(1,2), F(2,1), F(3,0), F(3,1),…, and so on.

Because the value of qij may be negative, for the ease of operation, we transform qij to positive integer dij by adding a number, say kj, to qij.

In this way, object Oi can be transformed to a D-digit GC.

This process is called the grid code transformation (GCT).

3.2.3. Grid Code Sorting and Elimination

After the GCT, we obtain a list of triplets (Ti, Ci, GCi) Ti is the ID of a training sample Ci is the Class ID the training sample

belongs to GCi is the grid code of the training sample.

Then the list is sorted according to the GC ascendingly.

Given the GC of a test sample, we can get a list of candidate classes of the same GC for the test sample.

Elimination of Redundancy

Redundancy occurs as the training samples belonging to the same class have the same GC.

This redundancy can be eliminated by establishing an abstract lookup table that only contains the information about the GCs and their corresponding classes.

Then, given a GC, this table can tell the relevant classes very quickly by binary search.

3.3. The fine classification module

Progressive matching method Adding more DCT coefficients usually imply increasing

the resolution level of an image. If current resolution is not high enough to distinguish

one character from the others, we have to raise the level of resolution such that the discrimination power can also be improved.

The establishment of the templates for each class

Templates are established in the DCT domain. The average DCT coefficients of size N×N are obtained from the set of training samples with respect to the class.

Such that M sets of average DCT coefficients are obtained and served as the templates for each class.

The sum of squared differences (SSD) is used as the matching criterion.

The matching of x and Ti is decomposed into K iterations, each of which corresponds to the matching under the block of size nk×nk.

After the kth iteration, the block size is enlarged from nk×nk to nk+1×nk+1 (nk+1 = nk+d).

The process is repeated until one of the stop criterions is satisfied:

1) to preserve enough signal energy in the block, and 2) to reject unqualified classes as soon as possible.

4 Experimental Results 18600 samples (about 640 categories)

are extracted from Kin-Guan ( 金剛 ) bible. Each character image was transformed into

a 48×48 bitmap. 1000 of the 18600 samples are used for

testing and the others are used for training. The most D significant DCT coefficients were

quantized and transformed to a GC for each

sample.

Figure 3. Reduction and accuracy rate using our coarse classification scheme

Figure 4. Accuracy rate using both coarse and fine

classification

6 Conclusions This paper presents a multiresolution

classification scheme based on DCT for vision-based applications.

The DCT features of a pattern can be extracted progressively according to their significance.

On classifying an unknown object, most of the improbable candidate classes for the object can be eliminated at lower resolution levels.

Experiments were conducted for recognizing handwritten characters in Chinese palaeography and showed that our approach performs well in this application domain.

Future Works

Since only preliminary experiment has been made to test our approach, a lot of works should be done to improve this system. For example, since features of different

types complement one another in classification performance, by using different types of vision-oriented features simultaneously, classification accuracy could be improved.

Fast Class Rendering Using Multiresolution Classification in Discrete Cosine Transform Domain

Documents