Date post: | 24-Jan-2017 |
Category: |
Education |
Upload: | sathmica-k |
View: | 3,285 times |
Download: | 1 times |
PORTABLE CAMERA-BASED ASSISTIVE TEXT AND PRODUCT LABEL READING FROM HAND-
HELD OBJECTS FOR BLIND PERSONS
By: Sathmica k
Abstract
Camera based assistive text reading framework help blind persons to read text labels and product packing from hand-held objects in their daily lives. To isolate the objects in the camera view, an efficient and effective motion-based method has been proposed to define a region of interest(ROI). In the extracted ROI, text localization and recognition are conducted to acquire text information. The recognized text codes are output to blind users in speech.
What is Assistive Technology?
“Any product, instrument, equipment or technical system used by a disabled or elderly person, made specially or existing on the market, aimed to prevent, compensate, relieve or neutralise the deficiency, the inability or the handicap.”
Introduction
• Of the 314 million visually impaired people worldwide, 45 million are blind.
• Developments in computer vision, digital cameras, and portable computers make it feasible to assist these individuals.
• By developing camera-based products which combine computer vision technology along with OCR systems.
• Already few portable system exist like portable bar code reader, pen scanner, k mobile reader.
K mobile reader
Pen scanner Bar code reader
K mobile reader
Drawbacks
• Cannot handle screen image with complex background.
• Hard to find position of barcode.• Object must be placed on a clear dark surface and
must contain text.
• Cannot handle screen image with complex background.
• Hard to find the position of barcode.
• Objects must be placed on a clear dark surface and must contain text.
Proposed method• The camera-based label reader help blind persons
to read names of labels on the products. • Camera acts as main vision in detecting the label
image of the product then image is processed internally .
• Separates label from image , and finally identifies the product and identified product name is pronounced through voice.
• Received label image is then converted to text.• Once the identified label name is converted to text
and converted text is displayed on display unit connected to controller.
• Now converted text should be converted to voice to hear label name as voice through ear phones connected to audio.
3 FUNCTIONAL COMPONENTS
SCENE CAPTURE
DATA PROCESSOR
AUDIO OUTPUT
• The scene capture component collects scenes containing objects of interest in the form of images or video.
• In this prototype, it corresponds to a camera attached to a pair of sunglasses.
• The data processing component is used for deploying proposed algorithms, they are
object-of-interest detection to selectively extract the image of the object held by the blind user from the cluttered background or other neutral objects in the camera view.
Text localization to obtain image regions containing text, and text recognition to transform image-based text information into readable codes.
• The audio output component is to inform the blind user of recognized text codes.
• A Bluetooth earpiece with mini microphone is employed for speech output.
Flowchart of the proposed framework to read text from hand-held objects for blind users.
Object of interest• Frame sequence v is captured by a camera worn by
blind users.• User’s object of interest S by shaking the object while
recording.S=
V is ith frame in the captured sequence|v| is the number of frames
B is the estimated background from motion based object detection
R is calculated foreground object at each frame
Text localization
• To extract text region=argmax 𝑠 L (s)
L is suitability responses of text layoutXc is candidate text regions from object of interest S
Object region detection
• To ensure that the hand-held object appears in the camera view, a camera with a reasonably wide angle is proposed(since the blind user may not aim accurately).
• Users are asked to shake the hand-held objects containing the text they wish identify.
• Employ a motion-based method to localize the objects from cluttered background.
• Background subtraction (BGS) approach is used to detect moving objects for video surveillance systems with stationary cameras.
• This method is done based on the frame variations.• Since background imagery is nearly constant in all
frames, a Gaussian method is applied.• Gaussian mixture model method is robust to slow
lighting changes.• Texture information is employed to remove false
positive foreground area.
• Texture similarity is measured.• Its subsequent frame pixel distribution is more likely
to be the background model.• To detect moving objects in a dynamic scene,
many adaptive BGS techniques have been developed.
Localizing the image region of the hand-held object of interest. (a)Capturing images by a camera mounted on a pair of sunglasses;(b)an example of a captured image;(c)detected moving areas in the image while the user shaking the object;(d)detected region of the hand-held object for further processing of text recognition.
Automatic text extraction
• Text extraction can be done by two features,Stroke orientation.Edge distribution.
A sample of text strokes showing relationship between stroke orientations and gradient orientation at pixels of stroke boundaries. Blue arrows denote the stroke orientations at the sections and red arrow denotes the gradient orientations at stroke boundaries.
Text stroke orientation• Stroke orientation describes the local structure of
text characters.• Stroke orientation will be perpendicular to the
gradient orientation.
A text patch and its 16-bin histogram of quantized stroke orientations.
Distributed of edge pixels
• Text characters appear in the form of stroke boundaries.
• Describes the density of text region.• Used to distinguish between text region from
background regions.• Edge detection is performed to obtain an edge map.• Number of edges in pixels in each row Y and column
X is calculated as NR(Y) and Nc(X).
Each pixel is labelled with product value of number of edge pixels in its located rows and columns
respectively.Then a 3X3 smooth operator Wn is applied to obtain
the edge distribution feature map.D(X,Y)=∑Wn.NR(Yn).NC(Xn)
(Xn,Yn) is neighbouring pixel of (X,Y) Wn is 1/9(weight value)
Text recognition and audio output
• Text recognition is performed by off-the-shelf OCR prior to output of informative words from the localized text regions.
• A text region labels the minimum rectangular area for the accommodation of characters inside it.
• So the border of the text region contacts the edge boundary of the text character.
• OCR generates better performance if text regions are first assigned proper margin areas and binarized to segment text characters from background.
• Thus, each localized text region is enlarged by enhancing the height and width by pixels, respectively.
Conclusion
• To read printed text on hand-held objects for assisting blind person.
• In order to solve the common aiming problem for blind users.
• This method can effectively distinguish the object of interest from background or other objects in the camera view.
• To extract text regions from complex backgrounds, proposed a text localization algorithm based on models of stroke orientation and edge distributions.
• OCR is used to perform word recognition on the localized text regions and transform into audio output for blind users.
References• Base paper by Chucai Yi, student member,IEEE,
YingLi Tian, Senior member, IEEE, Aries Arditi.• T.Phan, P.Shivakumara and C.L.Tan, “A Laplacian
Method for Video /text Detection,”.• C.Stauffer and W.E.L. Grimson, “Adaptive
Background mixture Model for real-time tracking”,.• Vision Pattern Recognit., Fort Collins, CO, USA,
2013.
THANK YOU..