Date post: | 31-Oct-2014 |
Category: |
Technology |
Upload: | vikas-goyal |
View: | 149 times |
Download: | 0 times |
Hindi Scene Text Recognition
Guide: Dr. Gaurav Harit
Surya Yadav, Vikas Yadav, Vikas Goyal
Objective Create a system that detect and
recognize characters from natural scene images containing Devanagari text.
Motivation Hindi is the most spoken language in India and third most spoken
language in the world. Most of the websites in Devnagri use images to represent text. There is
need to index such image based on the text in them so that they can be easily searched.
Tourist often face problem in India. So there is demand for automated system that understand natural scene images and provide translated information.
Scene text like shop name, company name, traffic information, road signs and other natural scene board display are important to be recognized and processed.
Steps:
Natural Scene Image
Text block detection
Word and character
segmentation
Error Correction
Feature Detection and classificationOutput
Text Block Detection
Steps:
Image Gray scale Image
Canny edge map
Morphological closing
Use of similarity
measures to find text
region missed in previous
step
Use of Script Specific Rules
Verification of uniform
thickness
Connected Component
region Extraction
Input Image
Gray Image
Canny Edge Map We compute canny edge map of gray image so as to get the connected components.
Distance Transform of a binary image
Each pixel in the image is set to a value equal to distance from nearest background pixel
Computation of Stroke Thickness
For each pixel with non zero value in distance transformed image if the pixel is local maxima around 3x3 window centered at that pixel we store it in a list
We compute the mean and variance of values in the list. If mean value is greater than twice the standard
deviation then we decide that thickness of underlying stroke transform is nearly uniform and select the sub image as a candidate text region and draw the bounding box.
Condition based on geometry
For each selected region we get in previous step we first test it against these set of rules.1. Aspect ratio of text region should vary between 0.1 to
10.2. Both height and width of candidate text region
cannot be larger than half of the corresponding size of input image.
3. Height of candidate text region should be greater than 10 pixels.
Overlapping problem
There were many bounding box overlapped with each other.
Overlap between two bounding box of adjacent text region should not be greater than 30% of either.
For solving this issue we merge each pair of bounding box which have intersection area greater than some threshold value.
After applying geometry condition and solving overlapping problem
Sobel Filtering Now we use Sobel edge detection algorithm to detect possible horizontal and
possible vertical lines.
Detection of head lines For each above region we compute probabilistic Hough transform of the
image in the previous step that is after Horizontal Sobel filtering of image to obtain characteristic horizontal headlines in Devanagari texts.
Necessary condition for selection of member as candidate headline is that it should lie in the upper half part of bounding box.
Detection of vertical lines
Final decision of existence of possible head line among the possible horizontal lines is based on computation of vertical Hough lines.
We compute vertical lines by again applying Hough transform with lower threshold value as they are not as prominent as horizontal.
If majority of vertical lines lie below member of horizontal line, the corresponding horizontal line will be treated as headline.
Detected Horizontal and vertical Lines
Output Image
Character Segmentation (Next Proposed step) Applying Sobel Filter only in one direction that is in vertical direction
removes the headline from candidate region. After the removal of headline in each of the bounding box we segment
the word based on vertical histogram analysis.
Next Step ………………Phase ii
After headline removal we perform Character Segmentation in selected image.
After the character segmentation of image we get each particular characters of Devanagari Script.
For each character we then perform character recognition.
Segmentation Guide: Dr. Gaurav haritVikas Yadav, Vikas Goyal, Surya Yadav
Previous Work
Until now we are able to get bounding box around words.
Segmentation
Character segmentation
from middle and lower zone
Baseline Detection
Character segmentation
from upper and middle-lower
zone
Headline Detection
Obtain skew corrected
image
Obtain skew angle by
detecting near horizontal line in
upper half of image
Obtain thin image
Conversion of text to black and
background to white
Text and background separation
Combine cluster
from both method
Otsu’s threshholding on pixels not normalized
K-mean clustering on
normalized pixel
RGB Normalizatio
n where needed
Image
Text and Background Detection
Converting the image into a binary image by applying popular global or local thresholding method cannot segment the text from the background properly.
Therefore, we applied combination of otsu’s thresholding and unsupervised k mean clustering to cluster different colour regions in an image.
Often scene image texts are effected by varying lightness. To handle this lightness effect on an image we normalize the RGB values of an image before implementing K-means clustering. But we do not normalize those pixels where the pixel have near gray RGB values.
For each pixel we check (max(R, G, B) - min(R,G,B)/ max(R,G,B)) > 0.2 threshold value 0.2 is selected to filter out the RGB values having near gray values. For the set of pixels not satisfying above criteria, we convert RGB values
to gray and perform otsu’s threshholding. For the set pixels satisfying above criteria, RGB normalization is carried
out on this set to remove the lightness effect from those pixel, keeping color information intact.
Perform K-mean clustering after normalizing the set satisfying criteria to obtain text and background separately.
Combine the clusters from otsu’s thresholding and K-mean clustering to obtain text and background clusters.
Skew Correction
Apply thinning algorithm on text region to obtain skeleton image. Use Hough transform to obtain all line segments in the upper half of
image with slopes less than 65o. If the length of the longest line segment among them is greater than an
empirically selected threshold value, it is decided as the headline. If this headline is not parallel to the x-axis then its skew is corrected by
rotating the word image.
(i) Skeleton image obtained for detecting headline for skew correction
Headline Detection
In order to segment the characters we need to detect the thick headline. Compute the projection profile by row-wise sum of gray values for each
row in the upper half of word image. Scan the normalized projection profiles of successive rows in the upward
direction starting from the spine and stop scanning when this value drops to less than a pre-defined threshold value. This row of the word image is considered as the upper boundary of the headline.
Similarly, we scan these projection profile values downward starting from the spine and the row, for which this value drops to less than the same threshold value, is considered as the lower boundary of the headline.
Character Segmentation
Use the region growing method to extract the individual characters or their parts from the binarized and skew corrected word image.
Locate the lowest and leftmost black pixel in B, and consider it as the seed point for region growing module.
The current segment is extracted using the standard region growing approach based on 8-neighborhood. The stopping criteria for the implementation of region growing is either
(i) reach the upper or lower boundary of the thick headline or (ii) reach at a white pixel. The extraction of the current segment is continued until no pixel is left
to visit satisfying the above.
Appending local headline
Append the part of the headline to the above extracted segment as follows.
The top left and top right pixels of this segment lie on the lower boundary of the headline and the portion of the thick headline just above these two pixels are appended to the segment before its extraction.
Repeat until there is no black pixel left.
Baseline Detection For baseline detection module we feed all the segments of the middle-lower zone which
either hang from the headline or from immediate below (at most 0.2 times the height of the middle-lower region) the headline.
Find the respective heights hi of each segment and then normalize it to hi
’ where 0< hi’<10. Now find
hmin = min{ hi’ | hi
’ >6.0 } Next we find h* = maxi {hi
’ | hi’ > hmin & hi
’ < floor(hmin) + 1}
The horizontal line through the bottom most pixel of the segment with normalized height h* is the baseline
(I) Input image(ii) Image obtained after applying K-mean clustering and Otsu's threshholding and skew correction.
(iii) Segments obtained after character segmentation
(I) Input image (ii) Image obtained after applying K-mean clustering and Otsu's threshholding and skew correction.
(iii) Segments obtained after character segmentation
References
Prakriti Banik , Ujjwal Bhattacharya, Swapan K. Parui. Segmentation of Bangla Words in Scene Images.
U. Bhattacharya, S. K. Parui, and S. Mondal. Devanagari and bangla text extraction from natural scene images. Proc. of Int. Conf. on Document Analysis and Recognition, pages 171{175, 2009.