Presen_Segmentation

Hindi Scene Text Recognition

Guide: Dr. Gaurav Harit

Surya Yadav, Vikas Yadav, Vikas Goyal

Objective Create a system that detect and

recognize characters from natural scene images containing Devanagari text.

Motivation Hindi is the most spoken language in India and third most spoken

language in the world. Most of the websites in Devnagri use images to represent text. There is

need to index such image based on the text in them so that they can be easily searched.

Tourist often face problem in India. So there is demand for automated system that understand natural scene images and provide translated information.

Scene text like shop name, company name, traffic information, road signs and other natural scene board display are important to be recognized and processed.

Steps:

Natural Scene Image

Text block detection

Word and character

segmentation

Error Correction

Feature Detection and classificationOutput

Text Block Detection

Steps:

Image Gray scale Image

Canny edge map

Morphological closing

Use of similarity

measures to find text

region missed in previous

step

Use of Script Specific Rules

Verification of uniform

thickness

Connected Component

region Extraction

Input Image

Gray Image

Canny Edge Map We compute canny edge map of gray image so as to get the connected components.

Distance Transform of a binary image

Each pixel in the image is set to a value equal to distance from nearest background pixel

Computation of Stroke Thickness

For each pixel with non zero value in distance transformed image if the pixel is local maxima around 3x3 window centered at that pixel we store it in a list

We compute the mean and variance of values in the list. If mean value is greater than twice the standard

deviation then we decide that thickness of underlying stroke transform is nearly uniform and select the sub image as a candidate text region and draw the bounding box.

Condition based on geometry

For each selected region we get in previous step we first test it against these set of rules.1. Aspect ratio of text region should vary between 0.1 to

10.2. Both height and width of candidate text region

cannot be larger than half of the corresponding size of input image.

3. Height of candidate text region should be greater than 10 pixels.

Overlapping problem

There were many bounding box overlapped with each other.

Overlap between two bounding box of adjacent text region should not be greater than 30% of either.

For solving this issue we merge each pair of bounding box which have intersection area greater than some threshold value.

After applying geometry condition and solving overlapping problem

Sobel Filtering Now we use Sobel edge detection algorithm to detect possible horizontal and

possible vertical lines.

Detection of head lines For each above region we compute probabilistic Hough transform of the

image in the previous step that is after Horizontal Sobel filtering of image to obtain characteristic horizontal headlines in Devanagari texts.

Necessary condition for selection of member as candidate headline is that it should lie in the upper half part of bounding box.

Detection of vertical lines

Final decision of existence of possible head line among the possible horizontal lines is based on computation of vertical Hough lines.

We compute vertical lines by again applying Hough transform with lower threshold value as they are not as prominent as horizontal.

If majority of vertical lines lie below member of horizontal line, the corresponding horizontal line will be treated as headline.

Detected Horizontal and vertical Lines

Output Image

Character Segmentation (Next Proposed step) Applying Sobel Filter only in one direction that is in vertical direction

removes the headline from candidate region. After the removal of headline in each of the bounding box we segment

the word based on vertical histogram analysis.

Next Step ………………Phase ii

After headline removal we perform Character Segmentation in selected image.

After the character segmentation of image we get each particular characters of Devanagari Script.

For each character we then perform character recognition.

Segmentation Guide: Dr. Gaurav haritVikas Yadav, Vikas Goyal, Surya Yadav

Previous Work

Until now we are able to get bounding box around words.

Segmentation

Character segmentation

from middle and lower zone

Baseline Detection

Character segmentation

from upper and middle-lower

zone

Headline Detection

Obtain skew corrected

image

Obtain skew angle by

detecting near horizontal line in

upper half of image

Obtain thin image

Conversion of text to black and

background to white

Text and background separation

Combine cluster

from both method

Otsu’s threshholding on pixels not normalized

K-mean clustering on

normalized pixel

RGB Normalizatio

n where needed

Image

Text and Background Detection

Converting the image into a binary image by applying popular global or local thresholding method cannot segment the text from the background properly.

Therefore, we applied combination of otsu’s thresholding and unsupervised k mean clustering to cluster different colour regions in an image.

Often scene image texts are effected by varying lightness. To handle this lightness effect on an image we normalize the RGB values of an image before implementing K-means clustering. But we do not normalize those pixels where the pixel have near gray RGB values.

For each pixel we check (max(R, G, B) - min(R,G,B)/ max(R,G,B)) > 0.2 threshold value 0.2 is selected to filter out the RGB values having near gray values. For the set of pixels not satisfying above criteria, we convert RGB values

to gray and perform otsu’s threshholding. For the set pixels satisfying above criteria, RGB normalization is carried

out on this set to remove the lightness effect from those pixel, keeping color information intact.

Perform K-mean clustering after normalizing the set satisfying criteria to obtain text and background separately.

Combine the clusters from otsu’s thresholding and K-mean clustering to obtain text and background clusters.

Skew Correction

Apply thinning algorithm on text region to obtain skeleton image. Use Hough transform to obtain all line segments in the upper half of

image with slopes less than 65o. If the length of the longest line segment among them is greater than an

empirically selected threshold value, it is decided as the headline. If this headline is not parallel to the x-axis then its skew is corrected by

rotating the word image.

(i) Skeleton image obtained for detecting headline for skew correction

Headline Detection

In order to segment the characters we need to detect the thick headline. Compute the projection profile by row-wise sum of gray values for each

row in the upper half of word image. Scan the normalized projection profiles of successive rows in the upward

direction starting from the spine and stop scanning when this value drops to less than a pre-defined threshold value. This row of the word image is considered as the upper boundary of the headline.

Similarly, we scan these projection profile values downward starting from the spine and the row, for which this value drops to less than the same threshold value, is considered as the lower boundary of the headline.

Character Segmentation

Use the region growing method to extract the individual characters or their parts from the binarized and skew corrected word image.

Locate the lowest and leftmost black pixel in B, and consider it as the seed point for region growing module.

The current segment is extracted using the standard region growing approach based on 8-neighborhood. The stopping criteria for the implementation of region growing is either

(i) reach the upper or lower boundary of the thick headline or (ii) reach at a white pixel. The extraction of the current segment is continued until no pixel is left

to visit satisfying the above.

Appending local headline

Append the part of the headline to the above extracted segment as follows.

The top left and top right pixels of this segment lie on the lower boundary of the headline and the portion of the thick headline just above these two pixels are appended to the segment before its extraction.

Repeat until there is no black pixel left.

Baseline Detection For baseline detection module we feed all the segments of the middle-lower zone which

either hang from the headline or from immediate below (at most 0.2 times the height of the middle-lower region) the headline.

Find the respective heights hi of each segment and then normalize it to hi

’ where 0< hi’<10. Now find

hmin = min{ hi’ | hi

’ >6.0 } Next we find h* = maxi {hi

’ | hi’ > hmin & hi

’ < floor(hmin) + 1}

The horizontal line through the bottom most pixel of the segment with normalized height h* is the baseline

(I) Input image(ii) Image obtained after applying K-mean clustering and Otsu's threshholding and skew correction.

(iii) Segments obtained after character segmentation

(I) Input image (ii) Image obtained after applying K-mean clustering and Otsu's threshholding and skew correction.

(iii) Segments obtained after character segmentation

References

Prakriti Banik , Ujjwal Bhattacharya, Swapan K. Parui. Segmentation of Bangla Words in Scene Images.

U. Bhattacharya, S. K. Parui, and S. Mondal. Devanagari and bangla text extraction from natural scene images. Proc. of Int. Conf. on Document Analysis and Recognition, pages 171{175, 2009.

Date post:	31-Oct-2014
Category:	Technology
Upload:	vikas-goyal
View:	149 times
Download:	0 times

Presen_Segmentation

Technology