Simulating the Performance of Parallel Applications on ...taytun/paper/Search in Classroom...

SEARCH IN CLASSROOM VIDEOS WITH OPTICAL CHARACTER RECOGNITION

FOR VIRTUAL LEARNING

A Thesis

Presented to

the Faculty of the Department of Computer Science

University of Houston

In Partial Fulfillment

of the Requirements for the Degree

Master of Science

By

Tayfun Tuna

December 2010



Tayfun Tuna

APPROVED:

Dr. Jaspal Subhlok, Advisor

Dep. of Computer Science, University of Houston

Dr. Shishir Shah

Dep. of Computer Science, University of Houston

Dr. Lecia Barker

School of Information, University of Texas at Austin

Dean, College of Natural Sciences and Mathematics

ii

Acknowledgements

I am very much grateful to my advisor, Dr. Jaspal Subhlok for his guidance,

encouragement, and support during this work. He kept me motivated by his insightful

suggestions for solving many problems, which would otherwise seem impossible to

solve. I would not be able to complete my work in time without his guidance and

encouragement.

I would like to express my deepest gratitude towards Dr. Shishir Shah, who gave

me innumerable suggestions in weekly meetings and in image processing class, both of

them helped me many times to solve difficult problems in this research.

I am heartily thankful to Dr Lecia Barker for her support and for agreeing to be a

part of my thesis committee.

Without the love and support of my wife, it would have been hard to get my thesis

done on time. I am forever indebted to my wife Naile Tuna.

iii



An Abstract of a Thesis

Presented to

the Faculty of the Department of Computer Science

University of Houston

In Partial Fulfillment

of the Requirements for the Degree

Master of Science

By

Tayfun Tuna

December 2010

iv

Abstract

Digital videos have been extensively used for educational purposes and distance

learning. Tablet PC based lecture videos have been commonly used at UH for many

years. To enhance the user experience and improve usability of classroom lecture videos,

we designed an indexed, captioned and searchable (ICS) video player. The focus of this

thesis is search.

Searching inside of a lecture is useful especially for long videos; instead of

losing an hour watching the entire video, it will allow us to find the relevant scenes

instantly. This feature requires extracting the text from video screenshots by using

Optical Character Recognition (OCR). Since ICS video frames include complex images,

graphs, and shapes in different colors with non-uniform backgrounds, our text detection

requires a more specialized approach than is provided by off-the-shelf OCR engines,

which are designed primarily for recognizing text within scanned documents in black and

white format.

In this thesis, we describe how we used and increased the detection of these OCR

engines for ICS video player. We surveyed the current OCR engines for ICS video

frames and realized that the accuracy of recognition should be increased by preprocessing

the images. By using some image processing techniques such as resizing, segmentation,

inversion on images, we increased the accuracy rate of search in ICS video player.

v

Table of ContentsCHAPTER 1. INTRODUCTION..................................................................................................................1

1.1 MOTIVATION....................................................................................................................................11.2 BACKGROUND..................................................................................................................................2

1.2.1 VIDEOINDEXER .........................................................................................................................31.2.2 OVERWIEW OF OPTICAL CHARACTER RECOGNITION (OCR) TOOL ...........................51.2.3 ICS VIDEO PLAYER ..................................................................................................................7

1.3 RELATED WORK.............................................................................................................................101.3.1 VIDEOPLAYERS ......................................................................................................................111.3.2 OCR IMPLEMENTATION IN VIDEOS ...................................................................................12

1.4 THESIS OUTLINE ..........................................................................................................................14

CHAPTER 2. SURVEY OF OCR TOOLS................................................................................................15

2.1 POPULAR OCR TOOLS..................................................................................................................152.2 THE CRITERIA FOR A “GOOD” OCR TOOL FOR ICS VIDEO IMAGES.................................162.3 SIMPLE OCR.....................................................................................................................................182.4 ABBYY FINEREADER....................................................................................................................202.5 TESSERACTOCR..............................................................................................................................212.6 GOCR.................................................................................................................................................222.7 MICROSOFT OFFICE DOCUMENT IMAGING(MODI)...............................................................242.7 CONCLUSION...................................................................................................................................27

CHAPTER 3. OCR CHALLENGES AND ENHANCEMENTS.............................................................28

3.1 WHAT IS OCR ? ..............................................................................................................................283.2 HOW DOES OCR WORK ? .............................................................................................................293.3 CAUSES OF FALSE DETECTION .................................................................................................31

CHAPTER 4. ENHANCEMENTS FOR OCR..........................................................................................34

4.1 SEGMENTATION ............................................................................................................................344.1.1 THREASHOLDING....................................................................................................................374.1.2 EROSION AND DIALATION...................................................................................................394.13 EDGE DETECTION...................................................................................................................414.14 BLOB EXTRACTION................................................................................................................43

4.2 RESIZING FOR TEXT FONT SIZE ...............................................................................................444.3 INVERSION......................................................................................................................................464.4 RESIZING IMAGE...........................................................................................................................484.5 INVERSION......................................................................................................................................49

CHAPTER 5. OCR ACCURACY CITERIA AND TEST RESUTLS....................................................51

5.1 TEST DATA......................................................................................................................................515.1.1 THE IMAGES FOR OCR DETECTION TEST.......................................................................515.1.2 THE TEXT FOR OCR DETECTION TEST.............................................................................535.1.2 SEARCH ACCURACY.............................................................................................................54

5.2 WORD ACCURACY AND SEARCH ACCURACY ......................................................................545.3 PREPARING AND TESTIN TOOLS...............................................................................................55

5.3.1 TEXTPICTURE EDITOR.........................................................................................................565.3.2 OCR TOOL MANAGER AND ACCURACY TESTER .........................................................575.1.2 SEARCH ACCURACY.............................................................................................................59

5.4 EXPERIMENTS AND TEST RESULTS........................................................................................60

CHAPTER 6. CONCLUSION....................................................................................................................69

REFERENCES.............................................................................................................................................72

vi

List of Figures

FIGURE 1.1: BLOCK DIAGRAM OF THE VIDEO INDEXER.....................................................................3FIGURE 1.2: A SNAPSHOOT FROM THE VIDEO INDEXER.....................................................................4FIGURE 1.3 A SNAPSHOOT OF AN OUTPUT FROM THE VIDEO INDEXER..............................................4FIGURE 1.4. THE OCR TOOL FUNCTION IN ICS VIDEO PLAYER............................................................5FIGURE 1.5.A SNAPSHOOT OF AN OUTPUT FOLDER OF OCR TOOL......................................................5FIGURE 1.6. A SNAPSHOOT OF RUNNING OCR TOOL...........................................................................6FIGURE 1.7. A SNAPSHOOT OF ICS VIDEO PLAYER XML OUTPUT OF OCR TOOL.................................6FIGURE 1.8. FLOW OF ICS VIDEO PLAYER........................................................................................7FIGURE 1.9. A SNAPSHOT OF THE VIDEO PLAYER SCREEN.................................................................8FIGURE 1.10. LIST WIEW OF SEARCH FEATURE .................................................................................9FIGURE 1.11.ICS VIDEO PLAYER PROGRESSBAR...............................................................................10FIGURE 2.1 SIMPLE OCR DETECTION EXAMPLE 1............................................................................18FIGURE 2.2 SIMPLE OCR DETECTION EXAMPLE 2.............................................................................19FIGURE 2.3 SIMPLE OCR DETECTION EXAMPLE 3.............................................................................19FIGURE 2.4 ABBY FINE READER DETECTION EXAMPLE....................................................................20FIGURE 2.5 USER INTERFACE OF ABBY FINE READER......................................................................20FIGURE 2.6 TESSERACTOCR DETECTION EXAMPLE 1.......................................................................21FIGURE 2.7 TESSERACTOCR DETECTION EXAMPLE 2.......................................................................22FIGURE 2.8 TESSERACTOCR DETECTION EXAMPLE 3.......................................................................22FIGURE 2.9 GOCR TOOL DETECTION EXAMPLE 1.............................................................................23FIGURE 2.10 GOCR TOOL DETECTION EXAMPLE 2...........................................................................23FIGURE 2.11 GOCR TOOL DETECTION EXAMPLE 3...........................................................................24FIGURE 2.12 USING THE MODI OCR ENGINE IN C PROGRAMMING LANGUAGE.................................25FIGURE 2.13 MODI DETECTION EXAMPLE 1.....................................................................................25FIGURE 2.14 MODI DETECTION EXAMPLE 2.....................................................................................26FIGURE 2.15 MODI DETECTION EXAMPLE 3.....................................................................................26FIGURE 3.1 PATTERN RECOGNITION.STEPS FOR CLASSIFICATION....................................................30FIGURE 3.2 CHARACTER REPRESENTATION FOR FUTURE EXTRACTION...........................................30FIGURE 3.3 DISTORTED IMAGE ANALYSIS........................................................................................31FIGURE 3.4 CONTRAST AND COLOR DIFFERENCES IN CHARACTERS IN AN IMAGE...........................32FIGURE 3.5 SIZE DIFFERENCE IN CHARACTERS IN AN IMAGE...........................................................32FIGURE 4.1 BLACK FONT TEXT ON THE WHITE COLOR BACKGROUND.............................................35FIGURE 4.2 COMPLEX BACKGROUND WITH DIFFERENT COLOR FONT..............................................35FIGURE 4.3 OCR RESULTS FOR A WHOLE IMAGE..............................................................................36FIGURE 4.4 OCR RESULTS FOR A SEGMENTED IMAGE......................................................................36FIGURE 4.5 SIS THRESHOLD EXAMPLE 1..........................................................................................38FIGURE 4.6 SIS THRESHOLD EXAMPLE 2..........................................................................................38FIGURE 4.7 STRUCTURAL ELEMENT MOVEMENT FOR MORPHOLOGICAL OPERATIONS.....................40FIGURE 4.8 STRUCTURED ELEMENT FOR EROSION AND DILATION...................................................40FIGURE 4.9 DILATATION AFFECT ON AN IMAGE...............................................................................41FIGURE 4.10 EDGE DETECTION EFFECT ON A DILATING IMAGE.......................................................42FIGURE 4.11 BLOB EXTRACTION EXAMPLE ON AN IMAGE...............................................................44FIGURE 4.12 RESIZE PROCESS IN AN EXAMPLE................................................................................45FIGURE 4.13 RESIZE PROCESS IN INTERPOLATION...........................................................................45FIGURE 4.14 RESIZE PROCESS IN BILINEAR INTERPOLATION...........................................................46FIGURE 4.15 RGP COLOR MODEL.....................................................................................................47FIGURE 4.16 THE INVERSION OPERATION ON THE LEFT INPUT IMAGE.............................................48

vii

FIGURE 4.17 INVERSION EQUATIONS AND THEIR EFFECT ON THE IMAGES......................................49FIGURE 4.18 OCR ENGINES’ DETECTIONS FOR ORIGINAL IMAGE.....................................................50FIGURE 5.1 EXAMPLE ICS VIDEO IMAGE..........................................................................................52FIGURE 5.2 EXAMPLES OF SOME IMAGES THAT ARE NOT INCLUDED IN THE TEST...........................52FIGURE 5.3 AN EXAMPLE OF SOME TEXT THAT ARE NOT INCLUDED IN THE TEST...........................53FIGURE 5.4 SCREENSHOOT OF TEXTPICTURE EDITOR TOOL.............................................................56FIGURE 5.5 INPUT FOLDER FOR OCR TEST CREATED BY TEXTPICTURE EDITOR TOOL......................57FIGURE 5.6 .SCREENSHOOT OF OCR TOOL MANAGER AND ACCURACY TESTER...............................58FIGURE 5.7 SCREENSHOOT OF OCR TOOL MANAGER AND ACCURACY TESTER...............................58FIGURE 5.8 EXCEL FILE CREATED BY OCR MANAGER TOOL FOR FOLDER........................................59FIGURE 5.9 EXCEL FILE CREATED BY OCR MANAGER TOOL FOR AN IMAGE....................................59FIGURE 5.10 EXAMPLE SCREENS FROM THE VIDEOS HAVE HIGHEST FALSE POSITIVES...................67FIGURE 5.11 EXAMPLE SCREENS FROM VIDEOS HAVE HIGHEST WORD DETECTIONS.......................67FIGURE 5.12 EXAMPLE SCREENS FROM THE VIDEOS WHICH HAVE LOWEST DETECTION.................67

List of Graphs

GRAPH 5.1 OCR ACCURACY TEST GRAPH FOR ‘WORD ACCURACY’.................................................63GRAPH 5.2 GRAPH FOR OCR TEST RESULTS OF ‘SEARCH ACCURACY’.............................................64GRAPH 5.3 GRAPH FOR OCR TEST RESULTS OF EXECUTION TIMES..................................................64GRAPH 5.4 OCR TEST RESULTS FOR FALSE POSITIVES.....................................................................65GRAPH 5.5 GRAPH FOR OCR TEST RESULTS OF SEARCH ACCURACY RATE FOR ALL VIDEOS...........68

List of Tables

TABLE 2.1 POPULAR OCR TOOLS ....................................................................................................15

table 2.2 selected ocr tools test 18

table 5.1 formulation of ‘word accuracy’ 54

table 5.2 formulation of ‘search accuracy’ 55

table 5.3 ocr accuracy test results for ‘word accuracy’ 61TABLE 5.4 NUMBER OF UNDETECTED WORDS WITH METHODS.......................................................62

table 5.5 ocr accuracy test results for ‘search accuracy’ 62TABLE 5.6 TEST RESULTS FOR ‘EXECUTION TIMES’........................................................................64

table 5.7 number of ‘false positives’ 65

viii

table 5.8 videos which have the highest ‘false positivives’ 64

ix

Chapter 1: Introduction

1.1 Motivation

There is a huge database of digital videos in any school that employs lecture

video recording. Traditionally, students would download the video and watch using a

basic video player. This method is not suitable for some users like students who want to

quickly refer to a specific topic in a lecture video as it is hard to tell exactly when that

topic was taught. It is not also suitable for deaf students. To make these videos more

accessible and exciting, we needed to make the content inside videos easily navigable,

searchable and associate closed captions with videos through a visually attractive and

easy to use interface of a video player.

To provide easy access to video content and enhance user experience, we

designed a video player in ICS video project, focused on making the video content more

accessible and navigable to users. This video player allows users to search for a topic

they want in a lecture video, which saves time as users do not need to view the whole

lecture stream to find what they were looking for.

To provide search ability to our video player, we need to get the text of each video

frames. This can be done by using optical character recognition (OCR). Since ICS video

frames include complex images, graphs, and shapes in different colors with non-uniform

backgrounds, our text detection requires a more specialized approach than is provided by

off-the-shelf OCR softwares, which are designed primarily for recognizing text within

scanned documents in black and white format. Apart from the choosing the right OCR

1

tool for ICS video player, using basic pre-image processing techniques to improve

accuracy are required.

1.2 Background

Digital videos in education have been a successful medium for students to study

or revise the subject matter taught in a classroom [1]. Although, a practical method of

education, it was never meant to replace or substitute live classroom interaction as a live

classroom lecture and student-instructor interaction cannot be retained in a video, but we

still provide anytime-anywhere accessibility by allowing web based access to lecture

videos[2]. We wanted to enhance the user experience and make the content of video

lectures easily accessible to students by designing a player which could support indexing

(or visual transition points), search and captioning.

At the University of Houston, video recordings have been used for many years for

distance learning. In all those years, lecture videos have only grown in popularity [2,3,4].

A problem that students face while viewing these videos is that it is difficult to access

specific content. To solve this problem we started a project known as Indexed, Captioned

and Searchable (ICS) Videos. Indexing (a process of locating visual transitions in video),

searching and captioning have been incorporated in the project to attain the goal of

making lecture videos accessible to a wide variety of users in an easy to use manner.

We are looking at the project from the perspective of an end user (most likely a

student). To increase usefulness of the ICS Video project, all videos contain meta-

information associated with them. This meta-information contains information like

description of lecture, a series of points in the video time-line where a visual transition

exists (also known as index points) along with keywords needed to search and closed

2

caption text. The indexer, explained in the following sections, creates index and transition

points of the video as image files for OCR tool. OCR tool detects the text from these

images and stores it in a way that ICS Video Player, explained in the following sections,

organizes this meta-information in a manner which is practical to the end user while

preserving the emphasis on the video.

As stated earlier, this work is a culmination of a larger ICS Video project. In this

section we present a summary of contributions made by others for this project.

1.2.1 Video Indexer

The job of the indexer is to divide the video into segments where each division

occurs at a visual transition as shown in figure 1.1. By dividing a video in this manner we

get a division of topics taught in a lecture because the visual transitions in a video are

nothing but slide transitions. The indexer is also supposed to eliminate duplicate

transition points and place index points at approximately similar time intervals.

Figure 1.1: Block diagram of the video indexer. The output from the indexer is image files and a textual document which essentially contains a list of index points i.e. time stamps where a visual transition exists.

3

Joanna Li[3] outlined a method to identify visual transitions and eliminate

duplicates by filtering. Later, this approach was enhanced with new algorithms [4].

Figure 1.2: A Snapshot from the video indexer. It is running to find index point and transition points.

Figure 1.3: Output from the video indexer. It created all transition points and a data file shows which one is index point.

In figure 1.2 a snapshot from the video indexer is shown. After it finishes

processes, it creates the outputs in a folder for OCR tool as shown in figure 1.3.

4

1.2.2 Overview of Optical Character Recognition (OCR) Tool

We will discuss OCR deeply in the following chapters. Figure 1.4 shows a

workflow with a short description.

Figure 1.4: The OCR tool takes each frame where an index (or visual transition) exists and extracts a list of keywords written on it. This list is then organized in such a way that it can be cross referenced by the index points.

After video indexer creates the index point and transition points which are image

files, OCR module runs to get the keywords from the written text on these video frames

(which are essentially power point slides). As a result we get all the keywords for a video

segment from this tool. These keywords, among other data, are then used to facilitate

search functions in the video player.

Figure 1.5: The OCR tool rename files according to their index point number. L1-082310_i_1_1 refers to first index point and first transition points. L1-082310_t_1_2 refers to first index point and second transition points.

5

Figure 1.6: The OCR tool running for extracting text from images.

OCR tool finishes extracting text from all images one by one and then creates an

XML file for output that includes the keywords for each transition point as shown in

figure 1.7.

Figure 1.7: XML file, output of OCR tool.

Once the xml file is ready, the ICS Video Player is ready to use it on its interface. We discuss in the next chapter how the information supplied by the indexer and the OCR

tool is used in the ICS Video Player.

6

1.2.3 ICS video player

Figure 1.8: The video player is fed the meta-information consisting of index points and keywords along with some of the course information, which that lecture belongs to, and the information about the lecture itself. Caption file is an optional resource, if present, will be displayed below the video as shown in Figure 1.9.

In essence the ICS Videos project aims at providing three features - indexing,

captioning and search to distance digital video education. Here is an overview of how

those three features were integrated in the video player:

1. Indexing

The recorded video lectures were divided into segments where the division

occurs at a visual transition (which is assumed to be a change of topic in a

lecture). These segments (or index points) were organized in a list of index

points in the player interface (see Figure 1.9 (d)).

2. Captioning

The video player was designed to contain a panel which can be minimized and

display closed captions if a closed caption file was associated with that video (see

Figure 1.9 (f)). At the time of writing this, the captions file needs to be manually

generated by the author.

7

3. Search

User can search for a topic of interest in a video by searching for a keyword. The

result shows all occurrences of that keyword among the video segments. This is

implemented by incorporating the indexer and OCR tool discussed earlier in the

video processing pipeline. The search result allows users to easily navigate to the

video segment where a match for the search keyword was found as shown figure

1.9 (b) and figure 1.10.

Figure 1.9: A snapshot of the video player screen. Highlighted components - (a)video display, (b) search box, (c) lecture title, (d) index list, (e) playhead slider, (f)closed captions, (g) video controls

In Figure 1.9 we show a running example of the video player. The lecture in

figure 1.9 belongs to the COSC 1410 - Introduction to Computer Science course by Dr.

Nouhad Rizk in University of Houston. The Figure gives a view of the player as a whole

8

along with every component. The player interface is mostly self explanatory, but we

should clarify some of the functionality. Video display (figure 1.9 (a)) shows the current

status of the video. If the video is paused, it shows a gray overlay over the video with a

play button. Index list (Figure 1.9 (d)) contains a descriptive entry for each index point

(also known as visual transition) in the video. Each entry in the index list is made up of a

snapshot image of the video at the index point, name of the index and its description

shown in figure 1.10.

Figure 1.10: The Figure shows the list of results when the user searched for the keyword "program" in the lecture.

One component that is not shown in figure 1.9 is the search result component.

When the user searches for a keyword, all indices that contain that keyword in their

keyword list, are displayed in the list of search results. The user can then click on a result

to go to that index point in the video. As shown in Figure 1.10, every result also contains

the snapshot of the video at the index point with the name and description of the index

point. It also shows where the keyword was found - in the keyword list (along with

number of matches), in title or in the description of the index point. All of the information

9

here comes from the xml file created in OCR tool as we explained in the previous

section.

Figure 1.11 The Figure shows the progress bar of ICS Video Player. In this case the video is playing at index point 1.

One thing we need to point out for the search feature on this player is that when a

user searches a keyword and finds it in a keyword list, the progress bar pointer goes to the

beginning of that index region; it does not go to the exact position of the videos.

We talked about the work flow of ICS Video player briefly and how the OCR tool

is used in the project. The purpose of the work done in this thesis is to create an OCR

tool, for the video player, which will provide the text of the video frames, so that the user

will be able search inside a video by using ICS video player. There are several ways to

design an OCR tool that will create text for ICS video player; our main goal is to make it

accurate enough for the end user to use and find the keyword in the right place of the

video content. For this, we tested the current OCR tools and used some pre-processing

techniques to improve their accuracy. With the experiments and results, we concluded

that the OCR tools we presented here can be used for ICS Video Player. Modifying

images by using image processing techniques, prior to sending to the OCR tool will

increase the accuracy of these tools.

1.3 Related Work

There have been efforts around the industry along the lines of video indexing and

usage of OCR for getting text from videos. We will take a look at each of them

separately.

10

1.3.1 Video Players

Google video is one of the most famous video players and it is a free video sharing

website developed by Google Inc.[5]. Google Video has incorporated indexing feature in

their videos which allows users to search for a particular type of video among any video

available publicly on the internet. But, they index videos based on the context it was

found in and not on the content of the video itself, which means if a video is located in a

website about animals and one searches for a video about animals then there is a chance

that this video will appear in the search result. The main difference here is that the search

result does not guarantee that videos appearing in search result do indeed have the

required content in them (attributed to the fact that indexing was not done on the video

content). This method does not suit our needs because one of the main requirements of

the project was to allow students to be able to locate the topic they were looking for

inside a video.

Another implementation, known as Project Tuva, implemented by Microsoft Re-

search, features searchable videos along with closed captions and annotations [6]. It also

features division of the video time-line into segments where a segment represents a topic

taught in the lecture. However, the division of video into segments is done manually in

Project Tuva. Tuva also offers an Enhanced Video Player to play videos.

There exists a related technology known as hypervideo which can synchronize

content inside a video with annotations and hyperlinks [7]. Hypervideo allows a user to

navigate between video chunks using these annotations and hyperlinks. Detail-on-

demand video is a type of hypervideo which allows users to locate information in an

interrelated video [8]. For editing and authoring detail-on-demand type hypervideo there

11

exists a video editor known as Hyper-Hitchcock [8, 9, 10]. Hyper-Hitchcock video player

can support indexing because it plays hypervideos, but one still has to manually put

annotations and hyperlinks in the hypervideo to index it.

There has been some research for implementation of search function and topics

inside a video. Authors of [11] have developed a system known as iView that features

intelligent searching of English and Chinese content inside a video. It contains image

processing to extract keywords. In addition to this iView, it also features speech

processing techniques.

Searchinsidevideo is another implementation of indexing, searching and

captioning videos. Searchinsidevideo is able to automatically transcribe the video content

and let the search engines accurately index the content, so that they can include it within

their search results. Users can also find all of the relevant results for their searches across

all of the content (text, audio and video) in a single, integrated search.[12]

1.3.2 OCR implementations in Videos

OCR is used in videos in a lot of applications such as car plate number

recognition in surveillance cameras, or text recognition on news and sport videos. And

there are a lot of projects going on in universities that aim to get a better OCR detection

in videos.

SRI International (SRI) has developed ConTEXTract™, a text recognition

technology that can find and read text (such as street signs, name tags, and billboards) in

real scenes. This optical character recognition (OCR) for text within imagery and video

requires a more specialized approach than is provided by off-the-shelf OCR software,

which is designed primarily for recognizing text within documents. ConTEXTract

12

distinguishes lines of text from other contents in the imagery, processes the lines, and

then sends them to an OCR sub module, which recognizes the text. Any OCR engine can

be integrated into ConTEXTract with minor modifications.[14] The idea of segmenting in

this work is inspired by this work.

Authors of [13] proposed a fully automatic method for summarizing and indexing

unstructured presentation videos based on text extracted from the projected slides. They

use changes of text in the slides as a means to segment the video into semantic shots.

Unlike precedent approaches, their method does not depend on availability of the

electronic source of the slides, but rather extracts and recognizes the text directly from

the video. Once text regions are detected within key frames, a novel binarization

algorithm, Local Adaptive Otsu (LOA), is employed to deal with the low quality of video

scene text. We are inspired by this work by its application of threshold to images and its

use of the Tesseract OCR tool.

Authors of [15], worked on Automatic Video Text Localization and Recognition

for Content-based video indexing for sports applications using multi-modal approach.

They used segmentation by using dilation methods for localizing. The method for

segmentation in our work is inspired by this work.

Authors of [16] from Augsburg University worked on a project named MOCA

where they have a paper for Automatic Text Segmentation, known also as text

localization, and Text Recognition for Video Indexing. They used OCR engines to detect

the text from TV programs. To increase the OCR engine accuracy they presented a new

approach to text segmentation and text recognition in digital video and demonstrated its

suitability for indexing and retrieval. Their idea of using different snapshots of the same

13

scene is not applicable to our work since our videos are indexed and only these indexed

screenshots are available to the OCR tool.

1.4 Thesis Outline

This thesis is organized as follows: Chapter 2 gives an introduction to commonly

used OCR engines in todays world and explains the reason for using the three OCR

engines MODI Tesseract OCR and GOCR. In Chapter 3 we discuss OCR challenges for

ICS video images and explain our approaches to deal with these challenges. The methods

we used to enhance text recognition are discussed in Chapter 4. The criteria for

enhancement on text detection explained in Chapter 5. We also show the results of

experiments in Chapter 5. The work is finally concluded in Chapter 6.

14

Chapter 2: Survey of OCR Tools

Developing a proprietary OCR system is a complicated task and requires a lot of

effort. Instead of creating a new OCR tool it is better to use the existing ones.

In the previous chapter we mentioned that there are many OCR tools that allow us

to extract text from an image. In this chapter, we discuss the criteria for a good OCR tool

suitable for our goals and then we mention some of the tools we tested and justify our

choice(s).

2.1 Popular OCR Tools

Table 2.1 shows us some popular OCR tools.

ABBYY FineReader Puma.NETAnyDoc Software ReadirisBrainware ReadSoftCuneiForm/OpenOCR RelayFaxExperVision TypeReader & RTK Scantron CognitionGOCR SimpleOCRLEADTOOLS SmartScoreMicrosoft Office Document Imaging TesseractOcrad Transym OCROCRopus Zonal OCROmniPage

Table 2.1: Popular OCR tools

These tools can be classified into 2 types:

a) Tools that can be integrated into our project

Open Source tools and some commercial tools for which the OCR module can be

integrated into a project such as Office 2007 MODI.

b) Tools that cannot be integrated into our project

15

Commercial Tools such as ABBY FineReader that encapsulate OCR, mainly aim

to scan, print and edit. They are successful in getting text, but the OCR component cannot

be imported to as a module to a project as they have their own custom user interface and

everything should be done in this interface.

2.2 The criteria for a “Good” OCR Tool for ICS video images

There are many criteria for a good OCR tool in general such as design, user

interface, performance, accessibility etc. The priorities for our project are the

accessibility, usability and accuracy. So the criteria for being a good tool for our project

are:

1. Accessibility-Usability: In ICS video project we will process many image files

and we need to do them automatically. We will not do it one by one or go to a

process of clicking to a program->browse files-> run the tool ->get the text-> put

the text to a place we can use. Accessibility is our first concern. How can we

access the tool? Can we call it in a command prompt so that we can access in C++,

C# or JAVA programming languages? Or can we access in windows operating

system and use it with parameters as many times as possible and any time we

want? Can we include this tool as a package, a dll or a header to our project so

that we can import and use as a part of our project?

2. Accuracy: The tool should also have a reasonable rate of accuracy in

converting images to text. It is also important that the accuracy of the text

recognition we are looking for is only for our project inputs. In other words, most

of the OCR tools are designed for scanned images which are mostly black and

white. They may claim their accuracy is up to 95%, but what about the accuracy

16

for colored images? So accuracy for our inputs is another important criterion to

decide if the tool is good.

3. Complexity: A program doing one task can be considered simple, while a

program doing lot of tasks can be considered a complex tool. In this sense, we

only need the tool to extract the text from images. Anything else it does will

increase the complexity of it.

4. Updatability: No algorithm can be considered as the final algorithm. To

increase the accuracy or performance can we change it so it will work better for

our project? It may be a good tool, but it may not support our input type (which is

jpg files). Can we update it so that it will be able to processes our inputs?

5. Performance and Use Space: Most tools we examined have reasonable

performance and they use reasonable memory and hard drive space. For the ICS

video project, the OCR module will work in server side as a web server. That

means the speed for converting the images to text or the required space or

memory usage for the OCR tool is not as important.

Now we can take a look at the tools in the next section. Since testing all tools that

work mostly in different environments-operating systems- would be time consuming, we

test the tools below as examples for their group. We filtered the popular tools shown in

table 2.1 to a table 2.2 as an example according to the classification of accessibility and

complexity in previous section. We presented one example for each group. Not

importable tools such as commercial big tools and small applications and importable

tools such as MODI. We tested two OCR tools in open source tools: GOCR and

Tesseract OCR which can work in windows environment.

NAME CATEGORY

17

Simple OCR Not importable –Small ApplicationABBYY FineReader Not importable –Big ApplicationTesseract OCR Importable –Open SourceGOCR Importable –Open SourceOffice 2007 MODI Importable – Big Application

Table 2.2: Selected OCR tools to test

2.3 Simple OCR

SimpleOCR is a free tool that can read bi-level and grayscale, and create TIFF files

containing bi-level (i.e. black & white) images. It works with all fully compliant TWAIN

scanners and also accepts input from TIFF files. With this tool, it is expected that we

could easily and accurately convert a paper document into editable electronic text for

use in any application including Word and WordPerfect, with 99% accuracy. [17]

a)b)

Figure 2.1: SimpleOCR detection example 1: a- Input, b- Output

SimpleOCR has a user interface in which we can open a file by clicking,

browsing, running copying and pasting manually. In other words, it does not have

command line usability and also it is not importable to our tool. Hence it could not be

adopted for our project. It also failed to create text in colored images like Figure 2.1 and

Figure 2.2. It gives an error message “Could not convert the page to text.”

18

a b

Figure 2.2: SimpleOCR detection example 2: a- Input, b- Output

SimpleOCR was able detect some of the text in colored images like figure 2.3 but

with a low accuracy, only the word Agents detected correctly.

a b

Figure 2.3: SimpleOCR detection example 3, a- Input b- Output

SimpleOCR failed to be a good OCR tool for our project in the first and second

criterion, Accessibility & Usability and Accuracy.

19

2.4 ABBYY FineReader

ab

ABBYY is a leading provider of document conversion, data capture, and

linguistic software and services. The key areas of ABBYY's research and development

include document recognition and linguistic technologies. [18]

Figure 2.4 ABBYY FineReader detection example a- Input b- Output

ABBYY showed good accuracy for our test images, Figure 2.4. But it is not

applicable to our project: to be able to use OCR part of ABBYY Fine Reader, it is

required to use its own interface to get the text. Open the files from the menu, run the

OCR engine and see if the text is accurate or have to be corrected manually as shown in

figure 2.5.

20

Figure 2.5 User interface of ABBY Fine ReaderEven though the accuracy of ABBY Fine reader is high, it is not a “good” tool for

our ICS Video Project. It does not satisfy our first criteria which is Accessibility &

Usability.

2.5 Tesseract OCR

The Tesseract OCR engine is one of the most accurate open source OCR engines

available. The source code will read a binary, grey or color image and output text. A tiff

reader is built in that will read uncompressed TIFF images, or lib tiff can be added to read

compressed images. Most of the work on Tesseract is sponsored by Google [19].

ab

Figure 2.6: Tesseract OCR detection Example 1; a- Input b- Output

Tesseract OCR engine is being updated frequently and the accuracy of the tool is

precise for colored images. Figure 2.6 and figure 2.7 are good examples detection of

capabilities of Tesseract OCR.

21

ab

Figure 2.7 Tesseract OCR detection example 2 a- Input b- Output

Tesseract OCR may be the most accurate open source tool, but the accuracy rate

is not perfect, in figure 2.6, the last line is not recognized at all. The image in figure 2.7

is recognized precisely, whereas in figure 2.8 is missed the word Summary. But it is

accessible, easy to use and can be called from command prompt in any programming

languages.

a) b)

Figure 2.8 Tesseract OCR detection Example 3; a- Input b- Output

22

2.6 GOCR

GOCR is an OCR program, developed under the GNU Public License, initially

written by Jörg Schulenburg, it is also called JOCR. It converts scanned images to text

files [20].

GOCR engine assumes no colors, black on white only, assumes no rotation, same

font, all characters are separated and every character is recognized empirically based on

its pixel pattern [21].

Figure 2.9: GOCR tool detection Example 1; a- Input b- Output

ab

Figure 2.10: GOCR tool detection Example 2; a- Input b- Output

23

a b

Figure 2.11: GOCR tool detection Example 3 a- Input b- Output

GOCR is also accessible, easy to use and can be called from command prompt in

any programming languages like Tesseract and the detection accuracies for the images in

figure 2.9-2.11 are similar to Tesseract. GOCR is not regularly updated like Tesseract.

2.7 Microsoft Office Document Imaging (MODI)

Microsoft Office Document Imaging (MODI) is a Microsoft Office application

that supports editing documents scanned by Microsoft Office Document Scanning. It was

first introduced in Microsoft Office XP and is included in later Office versions

including Office.

Via COM, MODI provides an object model based on 'document' and 'image' (page)

objects. One feature that has elicited particular interest on the Web is MODI's ability to

convert scanned images to text under program control, using its built-in OCR engine.

The MODI object model is accessible from development tools that support the

Component Object Model (COM) by using a reference to the Microsoft Office Document

Imaging 11.0 Type Library. The MODI Viewer control is accessible from any

24

development tool that supports ActiveX controls by adding Microsoft Office Document

Imaging Viewer Control 11.0 or 12.0 (MDIVWCTL.DLL) to the application project.

When optical character recognition (OCR) is performed on a scanned document,

text is recognized using sophisticated pattern-recognition software that compares scanned

text characters with a built-in dictionary of character shapes and sequences. The

dictionary supplies all uppercase and lowercase letters, punctuation, and accent marks

used in the selected language [22].

In the images tested, the accuracy of Modi was very good and it was easy to

access via code. After importing the Microsoft Office Document Imaging 12.0 Type

Library it is accessible from any development tool that supports ActiveX

MODI.Document md = new MODI.Document(); md.Create(FileName)); md.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true); MODI.Image image = (MODI.Image)md.Images[]; writeFile.Write(image.Layout.Text)

Figure 2.12: Using the MODI OCR engine in C# programming language

ab

Figure 2.13: MODI detection example 1 a- Input b- Output

25

a) b)

Figure 2.14: MODI detection example 2; a- Input b- Output

Figure 2.15: MODI detection example 3; a- Input b- Output

Accessibility-Usability of Modi made it easier to be imported to a C# project and

it has a very good accuracy rate for the ICS video images. In figure 2.15 it was able to

detect the thumbnail of video in the left.

We found MODI to be a friendly engine for the type of images in ICS videos

which are generally fully colored texts and images.

26

2.7 Conclusion:

We presented some popular OCR tools and checked if they can be integrated to

our ICS video project or not. And we justified our choice by giving some examples, 3

input images and the results of for each OCR tool. We conclude that we can use any of

the 3 tools GOCR, Tesseract OCR or MODI can be integrated. It is hard to say which one

is best, by looking at the outputs of 3 examples. We decided to include all of them in our

experiments. A large scale of test images from ICS video images and the results of these

tools will give us a better perspective and this will be done in the experiments and results

section in the chapter 5. Before that, in the following chapter we will look at challenges

of OCR and our proposed methods to enhance the detection.

27

Chapter 3: OCR and Challenges

In the previous chapter we looked for the OCR tools and decided which tools to

use: MODI, GOCR and Tesseract OCR. In the examples we provided in the previous

section, we saw that colored images confuses OCR tools and their accuracy goes down.

That means there are some issues we need to deal with in OCR engines and ICS video

images. We introduce what is OCR, how OCR works and what are the challenges in

OCR detection for ICS Video frames so that we can discuss how we can deal with them

in the next chapter.

3.1 What is OCR?

Optical character recognition, more commonly known as OCR, is the

interpretation of scanned images of handwritten, typed or printed text into text that can be

edited on a computer. There are various components that work together to perform

optical character recognition. These elements include pattern identification, artificial

intelligence and machine vision. Research in this area continues, developing

more effective read rates and greater precision [23].

In 1929 Gustav Tauschek obtained a patent on OCR in Germany, followed by

Handel who obtained a US patent on OCR in USA in 1933(U.S. Patent 1,915,993). In

1935 Tauschek was also granted a US patent on his method (U.S. Patent 2,026,329).

Tauschek's machine was a mechanical device that used templates and a photo detector.

RCA engineers in 1949 worked on the first primitive computer-type OCR to help

blind people for the US Veterans Administration, but instead of converting the printed

28

characters to machine language, their device converted it to machine language and then

spoke the letters. It proved far too expensive and was not pursued after testing [24].

Since that time, OCR is used for credit card imprints for billing purposes, for

digitizing the serial numbers on coupons returned from advertisements, sorting mails

in United States Postal Service, converting the text for blind people to have a computer

read text to them out loud and digitizing and storing scanned documents in archives like

hospitals, libraries etc.

3.2 How Does OCR Work?

OCR engines are good pattern recognition engines and robust classifiers, with the

ability to generalize in decision making based on imprecise input data. They offer ideal

solutions to a variety of classification character.

There are two basic methods used for OCR: Matrix matching and feature extraction.

Of the two ways to recognize characters, matrix matching is the simpler and more

common. Matrix Matching compares what the OCR scanner sees as a character with a

library of character matrices or templates. When an image matches one of these

prescribed matrices of dots within a given level of similarity, the computer labels that

image as the corresponding ASCII character. Matrix matching works best when the OCR

encounters a limited repertoire of type styles, with little or no variation within each style.

Where the characters are less predictable, feature, or topographical analysis is superior.

Feature Extraction is OCR without strict matching to prescribed templates. Also known

as Intelligent Character Recognition (ICR), or Topological Feature Analysis, this method

varies by how much "computer intelligence" is applied by the manufacturer. The

computer looks for general features such as open areas, closed shapes, diagonal lines, line

29

intersections, etc. This method is much more versatile than matrix matching but it needs a

pattern recognition process as shown in Figure 3.1. [25]

Figure 3.1 Pattern Recognition steps for Classification

In OCR engines, for future extraction, computer needs to define which pixels are

path and which are not. In another words, it needs to classify all the pixels as paths and

not paths. Paths can be considered as 1, others can be considered as 0. In figure 3.2 paths

creates a character of E.

a b

Figure 3.2 Character Representation for Future Extraction: a) Black and white image b) Binary representation of image [26]

30

3.3 Causes of False Detection

OCR engines works on pattern recognition on the images, before recognition path

they need to classify the each image pixels as path (1) or not path(0). Like most of the

images ICS video images are colored in different shades and sometimes distorted or

noisy. This makes pattern recognition fail at some level. Even though they have the

ability to lie in their resilience against distortions in the input data and their capability to

learn, they have a limit. After a certain point of distortions they start to make mistakes.

a b c d

Figure 3.3 Distorted image pattern analyses. a is distorted but could be detected and

considered as b, c is distorted more and could not be detected.

This pattern recognition and machine learning problem is related to computer

vision problem which is related to the human visual system. In that sense, we can say if a

picture is hard to read for humans, it is also hard to read for computers. (Reverse is not

applicable: An irony of computer science is that tasks humans struggle with can be

performed easily by computer programs, but tasks humans can perform effortlessly

remain difficult for computers. We can write a computer program to beat the very best

human chess players, but we can't write a program to identify objects in a photo or

understand a sentence with anywhere near the precision of even a child.)

31

The human visual system and visibility is affected by the following factors:

1) Contrast – relationship between the luminance of an object and the luminance

of the background. The luminance- proportion of incident light reflected into the

eye- can be affected by location of light sources and room reflectance (glare

problems).

2) Size – The larger the object, the easier it is to see. However, it is the size of the

image on the retina, not the size of the object per se that is important. Therefore we

bring smaller objects closer to the eye to see details.

3) Color – not really a factor in itself, but closely related both to contrast and

luminance factors.

For humans, it is essential to have a certain amount of contrast in an image to define

what it is, which is the same for computers: computers need a certain amount of contrast

in shapes to be able to detect differences. This is also important for character recognition.

Characters in the image should have enough contrast to be able to be defined.

Figure 3.4: Contrast and color difference in characters in an image. White text has high contrast and is easy to read, blue text has low contrast and it is hard to read.

Figure 3.5 Size difference in characters in an image. Bigger size text is easier to read.

Best OCR results depend on various factors, the most important being font and size used

for OCR. Other noted factors are color, contrast, brightness, and density of content. OCR

32

engines fail in pattern recognition in low contrast, small size and complex colored text.

We can do some image processing techniques to modify images before using OCR to

reduce the number of fails of OCR.

OCR engine detection is also affected by font style of text. To detect a certain font

style of a character it should be previously defined and stored. Since our ICS video

player’s fonts are in the type of fonts which most of the OCR engines supported such as

Tahoma, Arial, San Serif, Times News Roman etc. so font style problem will barely

affect detection of our OCR engines. So our enhancement would be about segmentation,

text size, color, contrast, brightness, and density of content. We will talk about what

approach we used in the next section.

33

Chapter 4: Enhancements for OCR detection

In the previous chapter, we looked at the challenges in OCR detection for ICS

Video frames. Here, we will discuss the approach and the methods we used to get a better

recognition from OCR engines.

OCR engines possess complex algorithms, predefined libraries, and training

datasets. Modifying an OCR algorithm requires an understanding of the algorithm from

the beginning to the end. Apart from that, sometimes images become too complex to be

defined as ICS video images; therefore, for better OCR engine results, doing

enhancements on the image before sending it to the OCR engine can be used.

4.1 Segmentation

In the previous chapter, we stated that OCR engines use segmentation mostly

designed for scanned images with black font on a white background. Segmentation of

text is two phase: detection of the word and detection of character. Detecting the word

can be considered to be locating the word’s text place in the image, and detection of

character is locating the character in the image as well. While using OCR engines, we

saw that this segmentation is not enough for some ICS video images. Due to the lack of

segmentation, OCR engines make mistakes in an image with complex colored objects.

The mistakes are introduced during setting up a threshold of the image to correct

binarization.

A successful image segmentation for a black and white image and a failed

segmentation for a colored image on a uniform background are represented by figure 4.1

and figure 4.2 respectively. Segmentation 1 can be considered to be a segmentation of the

34

words and segmentation 2 can be considered to be a segmentation of the characters in

these figures.

Figure 4.1: Black font text on the white color background segmentation for OCR character recognition.

In Figure 4.1, segmentation 1 and segmentation 2 are done successfully so that all

texts could be separated to words and then to characters. In figure 4.2, due to the lack of

difference between text 1 and background, text 1 is not segmented as a word. So it is not

segmented to characters also. Text 2 in figure 4.2, is in a different background that could

be segmented to word and then to characters also. Text3 and Text4 backgrounds are very

close to each other in figure 4.2; because of that they are considered as a single word, but

since their font and background color are close to one another, character segmentation

could not be done.

Figure 4.2: Complex background with different font color text segmentation for OCR.

35

T e x t 1

T e x t 2

T e x t 3

T e x t 4

Text1 Text2

Text 3 Text4

Text1

Text2

Text3

Text4

Segmentation 1

Segmentation 2

Text 1 Text2

Text3 Text4

Segmentation 1

T e x t 2Segmentation

2

Text2

Text4

Text2

Text3

Text4Text3

We need to remember that these figures are used only for the illustration of OCR

segmentation. We may have to look at ICS video image examples and OCR outputs to

see the importance of segmentation. Figure 4.3 shows that without segmentation OCR

engines fail, whereas in figure 4.4, segmented input allows for a better performance of

OCR engines.

__c____c_0___0_____ c0___0___ ___________ ____0 0_00__ ___0 _0 0 _\l_l_;_l'__ll l'__\_)l)l)\, __\_'___-li_'i__ '__ l il\.\ i t -i__ il t i__)i- ''-" _

-4 I. I’-’ -4 C I 41 — — *

\'Cl'\iCil|n;_i';" -. `hOl`iZ0llILl|$(llll\l`€L| l`€5[)0ll5€5smoothed mean

a) b) c) d)

Figure 4.3: OCR results for a whole image has complex objects with different colors : a)

input image b) GOCR result c) MODI OCR result d) Tesseract OCR result

squared responses squared responses squared responses

ven caI yen I ci I vertical

Classification classification classification

Honizontal horizontal horizontal

smoothed mean smoothed mean smoothed mean

a) b) c) d)

Figure 4.4: OCR Results for a segmented image which has complex objects with

different colors: a) segmented part of input b) GOCR result c) MODI OCR result d)

Tesseract OCR result

36

Image segmentation is probably the most widely studied topic in computer vision

and image processing. Hence, many studies have been done on segmentation for a

particular application. We mentioned some of them in chapter 1. In our approach, we

simply grouped the objects on the screen by thresholding, dilating and doing blob

coloring extraction which will be explained in the following sections.

4.1.1 Thresholding

Images in ICS videos are colored; therefore we need to convert colored images to

black and white images, for segmentation and morphological operations. We will do so

by performing image binarization known as thresholding. We used the SIS filter in

AForge Image Library, a free source image processing library for C#, which performs

image thresholding calculating the threshold automatically using simple image statistics

method. For each pixel:

two gradients are calculated

ex = |I(x + 1, y) - I(x - 1, y)| and |I(x, y + 1) - I(x, y - 1)|;

weight is calculated as maximum of two gradients;

sum of weights is updated (weight total += weight);

sum of weighted pixel values is updated (total += weight * I(x, y)).

The result threshold is calculated as sum of weighted pixel values divided by sum of

weight [27].

37

a) Input Image b) Thresholded Image

FIGURE 4.5: SIS Threshold Example 1: Output result is in the white foreground on the

black background.

Sis thresholding results can be different as shown in Figure 4.5 and Figure 4.6. In

Figure 4.5, output is white foreground and black background; in Figure 4.6, it is reversed.

a) Input Image

b) Thresholded Image

FIGURE 4.6 SIS Threshold Example 2: Output result is in black foreground in white

background.

For morphological operation, we will use erosion or dilation. We need to decide

which one to use according to our image. If the image has a white foreground and a black

38

background, dilation will tend to remove the foreground. This is not desirable, so we

make a decision by Average Optical Density calculating the

AOD (I )=( 1N2 )∑

i=0

N −1

∑j=0

N−1

I (i , j )

We calculate AOD of a binary image which has 0(white) and 1(black) values. This

puts the AOD value between 0 and 1. We found that when AOD >0.15 for ICS video

frames, it refers to a black foreground and white background image using erosion. In the

other case, when AOD <=0.15, it refers to an image with white foreground and black

background and we choose to use dilation.

4.1.2 Erosion and Dilation

We binarized image calculated AOD and decided which morphological operation

to use for segmentation in the previous sections. Here, we will talk about the meaning of

erosion and dilation, as well as how erosion and dilation affect images.

Erosion and Dilation are morphological operations that affect the shapes of

objects and regions in binary images. All processing is done on a local basis and region

or blob shapes are affected in a local manner. A structuring element (a geometric

relationship between pixels) is moved over the image in such way that it is centered over

every image pixel at some point, row-by-row, and column-by-column. An illustration of

movements of a structural element is shown in figure 4.7.

39

FIGURE 4.7 Structural element movements for Morphological Operations

Given a window B and a binary image I:

J1 = DILATE(I, B) if J1(i, j) = OR{B • I(i, j)} = OR{I(i-m, j-n); (m,n) Î B}

J2 = ERODE(I, B) if J2(i, j) = AND{B • I(i, j)} = AND{I(i-m, j-n);(m, n) Î B} [20]

Dilation removes object holes of too-small size on black foreground and white

background image. Erosion is the reverse of dilation, but when we use it for a white

foreground and black background image, it gives the same effect and removes object

holes of too-small size, so we can call it dilation effect.

For erosion and dilation operations we used the structured element shown in

Figure 4.8:

Figure 4.8: Structured Element for Erosion and Dilation

40

0 0 0

1 1 1

0 0 0

Thus, dilation effect allows for growth of separate objects, or joining objects. We

will use it for joining the characters and creating groups for segmentation. We choose a

horizontal window so that the characters will intend to merge in the right and left

direction in the image shown in Figure 4.9. Through trial and error, we found that 8

iterations of dilation/erosion process are reasonable for joining characters in most ICS

video images.

Input Image Dilatation #1 Dilatation #2

Dilatation #3 Dilatation #4 Dilatation #5

Dilatation #6 Dilatation #7 Dilatation #8

Figure 4.9 Dilatation affect on an image: Dilation joins the small objects (characters) and fill the small holes; texts are converted to square objects.

4.1.3 Edge detection

We grouped every small object, such as characters and small items, to a single

object in the previous process. Next, we used Sobel operator to detect the edges in

AForge Image Library. Detection of the edges will unify the objects and provide more

accurate detection of groups.

41

The filter searches for objects' edges by applying Sobel operator. Each pixel in

the resulting image is calculated as an approximate absolute gradient magnitude for

corresponding pixel of the source image:

|G| = |Gx| + |Gy] ,

where Gx and Gy are calculated utilizing Sobel convolution kernels:

Gx Gy -1 0 +1 +1 +2 +1 -2 0 +2 0 0 0

-1 0 +1 -1 -2 -1Using the above kernel, the approximated magnitude for pixel x is calculated using the

next equation: [28]

P1 P2 P3

P8 x P4

P7 P6 P5

|G| = |P1 + 2P2 + P3 - P7 - 2P6 - P5| + |P3 + 2P4 + P5 - P1 - 2P8 - P7|

Edge detection effect is shown in Figure 4.10.

a) Dilated image b) Edge detected image

Figure 4.10: Edge Detection effect on a dilated image

42

4.1.4 Blob Extraction

After we detect the edges, provided the continuity and completeness of the groups,

we will count and extract stand alone objects in the image using connected components

labeling algorithm. It is an algorithmic application of the graph theory, where subsets

of connected components are uniquely labeled based on a given heuristic of the AForge

Image Library. The filter performs labeling of objects in the source image. It colors each

separate object using a different color. The image processing filter treats all none-black

pixels as the objects' pixels and all black pixel as the background.

The AForge Library blob extractor extracts blobs in the input images (which in our

case are thresholded and dilated images). However, we need the original image as an

input for OCR, thus we will use blob extraction to detect the place of the objects in

original image.

A Blob Extraction example is shown in Figure 4.12. In the extracted blob, one

would expect more blobs; however, they were filtered using several criteria. If a blob

contains other blobs we do not extract it. When the blob width/ blob height <1.5. (The

text we want to detect is at least two characters long; since we dilated text to the right

left, every time width will be more than height.). In figure 4.12, man’s body is not

extracted because of the height to width ratio. Also, very small size blobs are not

included in the blobs. After filtering according to these criteria, we pass the parts to

Tesseract OCR engine, and if there is no text detection on this blob, we remove it. This

process can be considered as a back-forward operation.

43

a b

c

Figure 4.11 Blob extraction example on an image by using edge detected image: a) Original Image b) Edge detected image c) Extracted Blobs

Increasing the iterations of dilatation done in figure 4.10, or increasing the size of

structural element will reduce the number of blobs detected. However, it may merge the

text with the objects; thus, we keep the structural element size very low (3x3) and

iterations in 8.

4.2 Resizing Text Font Size

We separated the texts from other complex background by segmentation in the

previous section. What if the segmented text font size is too small to be detected

correctly?

We know that the best font size for OCR is 10 to 12 points. Use of smaller font size

has led to a poor quality OCR. Font sizes greater than 72 are considered images, and thus

should be avoided. Usually dark, bold letters on a light background and vice versa, yield

good results. The textual content should be ideally placed, with good spacing between

words and sentences [29].

44

We need big enough fonts to be able to detect the text. A big size image is not

enough; we need bigger size text, in comparison to the human visual system where a big

size object is also not enough, because a big size image on the retina is needed.

Increasing the size of a text can be achieved easily by changing the size of the image. For

instance, if we resize the image to x1.5 it will make everything inside the text x1.5 bigger

than in the previous one.

a) Original image b) Resized image(1.5)

Figure 4.12: Resize Process in an example

Font size under 10px becomes hard to obtain by OCR engines. We have plenty of

small characters (mostly occurring in the explanation for the images or graphs), so before

the image processing we increase the size of the images by a factor of 1.5 as a default.

Resizing is done by the bilinear image interpolation method; it works in two directions,

and tries achieving the best approximation of a pixel's color and intensity based on the

values at surrounding pixels. The following example illustrates how resizing /

enlargement works:

ab c

Figure 4.13: Resize Process in interpolation a) Original image b)Divided pixels to fill by interpolation c) Resized image

45

Bilinear interpolation considers the closest 2x2 neighborhood of a known pixel

values surrounding the unknown pixel. It then takes the weighted average of these 4

pixels to arrive at its final interpolated value. These results are much smoother looking

images than its closest neighbor.

Figure 4.14: Resize Process in bilinear interpolation in pixels.

On the diagram in Figure 4.14, the left side shows the case when all known pixel

distances are equal, so the interpolated value is simply their sum divided by four [30].

4.4.3 Inversion

In the previous chapter we segmented and resized the images and saw that

applying both of them improved the detection. The detection can be increased by playing

with intensity relationship of the text and fonts by using inversion method. Before we

explain the inversion, we need to look at the RGB color model.

The RGB color model is an additive color model in which red, green, and blue light

are added together in various ways to reproduce a broad array of colors. The name of the

model comes from the initials of the three additive primary colors, red, green, and blue

shown in figure 4.15.

46

Figure 4.15 RGB Color Model

Image file formats such as BMP, JPEG, TGA or TIFF commonly in 24-bit RGB

representations, color values for each pixel encoded in a 24 bits per pixel fashion where

three 8-bitunsigned integers (0 through 255) represent the intensities of red, green, and

blue.

(0, 0, 0) is black

(255, 255, 255) is white

(255, 0, 0) is red

(0, 255, 0) is green

(0, 0, 255) is blue

(255, 255, 0) is yellow

(0, 255, 255) is cyan

(255, 0, 255) is magenta

Inverting colors is basically playing with RGB values. When we invert an image

in a classical way, we take the inverse RGB values. For example, the inverse of the color

(1,0,100) is (255-1,255-0,255-100)=(254,255,155). This will change the view of an

47

image, but will not change the difference between the text and the background since we

subtracted all of them using the same number.

Figure 4.16: The inversion operation: input image in the left, inverted image, in the right.

In our approach, we expand this technique from 1 to 7 inversions by using the

equation in Figure 4.17. OCR engines give different results for inverted images.

Sometimes the 5th inversion is better than the first one. So we use all of them in the

inversion operation and unify the results.

Instead of using only the original image, using inverted image improved the OCR

results. However, since we do not know which inversion will be the best for the OCR

engine, we will use all of them and union the results.

48

Original Image R G B

Inversion 1 255-R G B

Inversion 2 R 255-G B

Inversion 3 R G 255-B

Inversion 4 255-R 255-G B

Inversion 5 R 255-G 255-B

Inversion 6 255-R G 255-B

Inversion 7 255-R 255-G 255-B

Figure 4.17 Inversion equations and their effect on the images.

49

Image MODI result Tessaract Result

Original Image

Question 3 Where did the story say that there was a statue raised in Mrs. Bethune’s h o n Washington, D.C. Miami, Florida Mayesville, South Carolina

` i i wiiiiiipi` " ii iiiiiQuestion 3Where did the story say that therewas a statue raised in Mrs. Bethune’shonor?_B-Nik is

Inversion 1

Question 3 Where did the story say that there was a statue raised in Mrs. Bethune’s h o n I _ L • • Mayesville Washington, D.C. Miami, Florida South Carolina

Question 3Where did the story say that therewas a statue raised in Mrs. Bethune'shonor? V M i ‘_ *ii lliivi

Inversion 3

Question 3 Where did the story say that there was a statue raised in Mrs. Bethune’s honor? Washington, D.C. Miami, Florida Mayesville, South Carolina

` i‘ ilil i|iViQuestion 3Where did the story say that therewas a statue raised in Mrs. Bethune’shonor?Q2= 'iii VE

Inversion 5

Where did the story say that there was a statue raised in Mrs. Bethune’s h o no hiii’iin’ I, IIIk:11fl1I I; 1iiIK ij[ihi uestion I,’.

Question 3Where did the story say that therewas a statue raised in Mrs. Bethune’shonor?

Inversion 7

H Question 3 _ ___ Where did the story say that there was a statue raised in Mrs. Bethune’s hono:? ItM1’1dIIkc, r V4hüifl4h. ID1e. r 7 r 7 IIiif!ihñ1i0 IF1kiI[iKkii S5iiV1f[Ihl iiiøüiiii

V ` , Y fv e_‘~iiiiii";iQuestion 3Where did the story say that therewas a statue raised in Mrs. Bethune’shonor?

Figure 4.18 OCR engines’ detections for original image and inverted images .

50

Chapter 5: OCR Accuracy Criteria and Test Results

In this chapter, we shall have a look at the OCR accuracy criteria and the tools we

used for testing before the results of the experiments. We start the discussion by defining

the test data.

5.1 Test Data

Test data can be considered as image test data and the texts in these images. We

will look at them separately.

5.1.1 The Images for OCR Accuracy Test

Test data consists of 1387 different images which are created by an indexer from

selected 20 different ICS videos. Most of them are also tested by ICS video indexer [4],

so our inputs became more reliable. Selected videos are diverse in templates and color

styles since they are mostly prepared by different instructors. 15 different instructors are

represented. 14 of them are from Computer Science Department and 6 of them from other

departments at University of Houston.

In figure 5.1, there are examples to illustrate the variety of the images in ICS

video test data.

51

Figure 5.1: Example ICS video images included in test data.

Figure 5.2: Examples of images that are not included in the test.

Some images that do not include any text were removed from the list; as shown in

figure 5.2; empty screen of a video or the screens that do not have any related text

information such as the beginning or the end of the video.

5.1.2 The text for OCR test

52

In each image, it is desired that all texts,(main body, tables, figures and their

captions, foot notes, page numbers and headers) in the image are counted, there are some

exceptions though.

Figure 5.3: An example of some text that is not included in the test.

If the text in the image is too small to read, as shown in figure 5.3, that text will

not be included in the text data of the image. Deciding whether the size is small enough

to omit is also related to our ability to read the data. If we do not read accurately, we

cannot write the text to compare it to the results of the tool.

For search, case information is not useful; people will not look for uppercase

letters specifically so our recognition will not be case sensitive. All the data will be

assumed as lowercase.

53

5.2. Word Accuracy and Search Accuracy

If there are n different words and each word has a repeating frequency as shown in

the table 5.1, then the word accuracy will be calculated according to the “WA” formula.

In another words, “word accuracy” will give the information about what fraction of

words are detected correctly.

Word Ground truth

frequency

Detectedfrequency Missed

w1 F1 f1 M1=F1-f1w2 F2 f2 M2=F2-f2w3 F3 f3 M3=F3-f3. . . .. . . .

wn-1 Fn-1 Fn-1 Mn-1=Fn-1-fn-1wn Fn fn Mn=Fn-fn

Table 5.1: Formulation of “word accuracy”

Mi --> Missed values

NTW-> Number of total words value: F1+F2+F3..+Fn

MW Missed words value will be total of M values: M1+M2+….Mn

WA-> Word accuracy will be given by

WA=MW/NTW= [M1+M2+….Mn] / (F1+F2+F3..+Fn)

= [(F1-f1) +(F2-f2)+(F3-f3)+…(Fn-fn)]/ (F1+F2+F3..+Fn)

Search Accuracy is related to the probability of a successful search. If there are n

different words, each word has a frequency, but we will accept all frequencies as 1.So if a

word is detected in an image, we don’t need to know how many times it is detected on

that image because for search, our purpose is to decide if the word exists or not. The

54

formulation of search accuracy is in the table 5.2, and calculation will be done according

to SA formula.

word Ground truth

frequency

Detectedfrequency Search Missed

w1 F1 f1 M1=0 , f1<1 M1=1 , f1>=1

w2 F2 f2 M2=0 , f2<1 M2=1 , f2>=1

w3 F3 f3 M3=0 , f3<1 M3=1 , f3>=1

. . . .

. . . .wn-1 Fn-1 Fn-1 Mn-1=0 , fn-1<1

Mn-1=1 , fn-1>=1wn Fn fn Mn=0 , fn<1

Mn=1 , fn>=1

Table 5.2 : Formulation of “search accuracy”

SMWSearch missed word value will be M1+M2+M3…Mn.

SASearch Accuracy will be given by

SA=SMW/n= (M1+M2+M3…Mn) /n;

5.3 Preparing and Testing Tools

We have created two tools for testing and experiments. One is for preparing the

text data and the other is for experiment and testing the accuracy. We will discus each of

them separately.

55

5.3.1 TextPictureEditor

Before we start to test the accuracy we have to prepare the ground truth for the test.

Each image we wanted to test should have corresponding text. Creating text, looking at

the pictures and writing them manually will take a lot time so we designed a small tool

that can help. This tool provides us the ability to go back and forth, in a folder, in the user

interface. We can see the images and we can write the text we see in the picture on the

text area which is under the picture.

Figure 5.4: Screenshot of TextPictureEditor tool.

The steps for running the tool can be defined as:

-open a folder; it will automatically load a picture in that folder.

-if the text is not created for the picture, send the picture to OCR and get text to text area

-if the text is already created, check if it is correct, if not update and save it to text area.

56

Figure 5.5: Input folder for OCR test created by TextPictureEditor tool.

After going forward-back in the folder we can create text files which will hold ground

truth text information for all images. All comparison will be based on these txt files.

5.3.2 OCR Tool Manager and Accuracy Tester

Rest of the job is done by the accuracy tester: using image processing techniques

on the images, managing the OCR tools and testing the accuracy of these tools. This is

illustrated in figure 5.6.

Regions in the tools as shown in the figure 5.6:

1) Selecting the folders for accuracy test or choosing the image to experiment is

done in this part.

2) Selecting the image processing techniques for modifying the images is done in

this part.

3) Input image region

4) Modified image region

5) Selecting the OCR techniques is done in this part.

6) The output of MODI is shown here

7) The output of GOCR and Tesseract are shown here.

57

Figure 5.6: Screenshot of OCR tool manager and accuracy tester

After running the accuracy test for a folder of ICS video, it modifies the images

and gets the text for each OCR tool. Then it compares each result of these tools to ground

truth data and creates a excel file for showing the statistics and the image as shown in

figures 5.7 and 5.8.

Figure 5.7: Screenshot of OCR tool manager and accuracy tester

58

Figure 5.8: Excel file created by OCR Manager Tool for folder.

As shown in figure 5.8, the excel file lists the missed words, missed searches and

missed characters for each image separately and the total of images. It also creates a

separate excel file for each image for detailed view shown in figure 5.9. Words that are

detected and those that are not are shown. Accuracy at a tool can be read from this file.

Figure 5.8: Excel file created by OCR Manager Tool for an image.

59

5.4 Experiments and Test Results

As mentioned before, we tested 3 different OCR tools: MODI, GOCR and

Tesseract OCR; our purpose is to find the best OCR engine tool to be used in ICS video

player. This test is 2 phase, first one is comparing accuracy rate of these 3 tools without

any image modifications, second phase is comparing the results of each tool after image

modifications.

We employed 20 different videos folder which have 1387 different picture in total,

created by video indexer, reduced from about 2000 images after eliminating some of

them under the criteria we defined in the previous section. Then we created the ground

truth text by using TextPictureEditor; it was a tool that makes it easier to write the text

for an image.

We kept the image files and text file in each folder; in this case it is 20 different

folders. For each image we run these three different OCR tools and compare their results

with the ground truth data. For each image, we separately created the excel files for

statistical information. Their accuracy rate according to the criteria will give us some idea

about which one is better.

For the second phase we modified the inputs by applying image processing

techniques. In other words we preprocessed the images in hopes of getting better results

from these OCR tools. All results are written to the same excel file for each video.

Eventually we unified these 20 excel files manually for an overall picture.

60

Method # Word Miss #Expected Word Word AccuracyModi 1823 27201 93.30%Gocr 7117 27201 73.84%

Tesseract 4406 27201 83.80%Modi-Gocr-Tesseract 1068 27201 96.07%

IE+Modi 766 27201 97.18%IE+Gocr 4829 27201 82.25%

IE+Tesseract 2148 27201 92.10%IE+Modi-Gocr-Tesseract 589 27201 97.83%

Table 5.3: OCR accuracy test results for “Word Accuracy”

Graph 5.1: OCR accuracy test graph for “Word Accuracy”

61

Word accuracy results are shown in the table 5.3 and in graph 5.1. We can

conclude from them that among these 3 OCR tools, Modi, Gocr and Tesseract, Modi has

the highest word accuracy with 93.30%. Then Tesseract OCR with 83.80%. Lastly Gocr

with 73.84% word accuracy. When we use these tools together, the word accuracy

increased up to 96.67%. That means some of the words can be detected with one OCR

tool but not with others. Combining tools increased the accuracy because we took the

Union of these three results. They complemented each other.

It can also be seen from the table 5.3 and graph 5.1 that our proposed image

enhancement method (IE) worked well also, it increased the word accuracy rate on all

methods. IE increased the Modi word accuracy from 93.30% to 97.18% and increased

the GOCR word accuracy from 73% to 82.25%. The increase in the Tesseract OCR with IE

is from 83.80% to 92.10%. IE increased also the word accuracy of combined methods

from 96.07% to 97.03%. The increase in the combined method is not as much as the

increase in the individual methods. We can also see that from the table below.

#detection increase with IE

Modi 1026Gocr 2140Tesseract 2155ModiGocrTesseract 460

Table 5.4: Number of undetected words with methods that they are detected with IE.

Similar results are obtained in table 5.5 and graph 5.2 for search accuracy rate.

Modi has the highest search accuracy rate and IE increased all of the OCR tools search

accuracy rate.

62

Method # Search

Miss#Expected

Unique wordsSearch

AccuracyModi 1784 20006 91.08%Gocr 6736 20006 66.33%

Tesseract 4113 20006 79.44%Modi-Gocr-Tesseract 1044 20006 94.78%

IE+Modi 758 20006 96.21%IE+Gocr 4596 20006 77.03%

IE+Tesseract 1958 20006 90.21%IE+Modi-Gocr-Tesseract 584 20006 97.08%

Table 5.5 OCR accuracy test results for “Search Accuracy”

Graph 5.2: OCR accuracy test graph for “Search Accuracy”

63

MethodExecutionTime of IE (ms)

ExecutionTime of Method(ms)

ExecutionTime Total (ms)

Modi 0 987940574 987940574Gocr 0 988302280 988302280

Teseract 0 989242186 989242186Modi-Gocr-Tesseract 0 4000630424 4000630424

IE+Modi 999164776 1018913396 2018078172IE+Gocr 999164776 1023043090 2022207866

IE+Tesseract 999164776 1035144464 2034309240IE+Modi-Gocr-Tesseract 999164776 4112253332 5111418108

Table 5.6: OCR Test results for “Execution Time(ms)”

Graph 5.3: OCR graph for “Execution Times”

IE increased both word accuracy and search accuracy. But as we can see in the

table 5.6 and graph 5.3, IE operation increased the execution times, which is almost

doubled for the individual methods.

64

Method # of FalsePositivesModi 19271Gocr 10363

Teseract 13613Modi-Gocr-Tesseract 45473

IE+Modi 81499IE+Gocr 52764

IE+Tesseract 93928IE+Modi-Gocr-Tesseract 150913

Table 5.7: Number of false positives

ModiGocr

Tese

ract

Modi-Gocr-

Tesse

ract

IE+Modi

IE+Gocr

IE+Tesse

ract

IE+Modi-G

ocr-Tesse

ract

1927110363 13613

45473

81499

52764

93928

150913

# of FalsePositives

Graph 5.4: OCR test graph for number of false positives

65

Tools detected some word that do not exist in the image, in table 5.7 and graph 5.4

we can see the number of such false positives. In this result, we can see that Modi has the

lowest false positives. Using more than one method at a time increased the number of

false detection a lot, but the maximum increase in the false detection is occurred in the IE

method, which can be explained. We created 7 different images in inversion method with

IE and some of these inversion created other false positives. Combining the tools

together with IE gave us more than 150 thousand false positives which is more than the

total number of words.

MethodComputer Science 4 FalsePositiveTotal

Computer Science 10 FalsePositiveTotal

Modi 3911 3112Gocr 976 865

Tesseract 2773 1961Modi-Gocr-Tesseract 18965 15497

IE+Modi 6337 5316IE+Gocr 17597 12258

IE+Tesseract 8484 6533IE+Modi-Gocr-Tesseract 29008 23587

Table 5.8: The 2 videos which have highest of false positives

When we look for the detail to videos individually for false positive, we realized

that in computer science 4 and 10 videos, false positives are the highest. When we look

for the reason of it, we saw that both of these video prepared in classroom presenter

which has thumbnails of screens in the left. They are also in black font in white

background, that also made the tools to detect text even in the small regions which it

shouldn’t do that. In figure 5.10, there are some examples from these videos.

Similarly we looked for the details of high and low detected videos and example

slides from these videos are shown in figure 5.11 and figure 5.12 respectively.

66

a) Computer Science 4 b) Computer Science 10 Figure 5.10: Example screens from the videos which have highest false positives

a) Computer Science 14 b) Computer Science 20 Figure 5.11: Example screens from the videos which have highest word detection

a) Computer Science 2 b) Computer Science 17 Figure 5.12 : Example screens from the videos which have lowest word detection

67

Graph 5.5 : Graph of OCR test results of search accuracy rate for all videos

68

Chapter 6: Conclusion

In this work, we have demonstrated that using current OCR techniques, searching

keywords in a video is possible. We surveyed the popular OCR tools and choose 3

different OCR engines MODI, GOCR and Tesseract OCR that can be integrated into ICS

video project. We made experiments on the accuracy of these tools. graph 5.5 graph for

ocr test results of search accuracy rate for all videos

We needed some tools for experiments to create the ground truth text data for

OCR accuracy check. We designed a TextPictureEditor in C# language and prepared the

text of 1387 different images which are extracted from 20 different ICS videos. This test

data contains 20007 unique words, 27201 total words (each more than 1 character length)

in total 144613 characters shown in table 5-6. We used for this data for testing with an

OCR engine manager and accuracy checker, designed in C# language. The results of this

testing accuracy and performance processes showed that MODI OCR is best for ICS

video player in the sense of accuracy and performance.

We have also demonstrated that these OCR engines have challenges for complex

colored, uniform background and bad contrast in text font and text background images

such as ICS video images. We proposed a method to deal with this problem by using

image processing techniques. In other words, we proposed a method to enhance these

tools in the sense of accuracy rate by preprocessing the images. After performing SIS

thresholding of the image, several iterations of dilates operation to connect the texts and

with sobel edge detector, we were able to segment the texts for OCR input. These helped

the tool to recognize the text more accurately. In the graphs of 5.1 and 5.2 it can be seen

69

that image enhancement increased the accuracy for all tools. However, the number of

false positives increased with IE as well as the execution time. When we consider the

increase in accuracy when performance is not first priority, but accuracy is priority, our

approach for modifying inputs for a better accuracy is applicable.

In our experiments we have also tested whether using all of the OCR tools one

after another and combining the results will increase the accuracy or not. The idea of

combining these tools inspired by ensemble learning which is a machine learning

algorithm for classification, aimed to use several different approaches to a single problem

and to combine the results. The results of experiment showed that it did increase the

accuracy, but at a very low rate and with high performance loss.

With the results in detail for each video, we could classify the videos as hard to

detect, easy to detect as shown in graph 5.5.

Creating the ground truth test data, writing or correcting the text of 1387 images,

was very challenging and time consuming. Deciding the criteria was also challenging.

Should we include the captions of images in the test? Should we include the

mathematical formulas and operators? What about parenthesis? Finding a good

segmentation algorithm was another challenge.

For future work, by using machine learning algorithms for training the data or for

classification of images and by using other image processing techniques for a better

segmentation, the complex background images can be transformed to a type that current

OCR engines will not confuse in detection of the text. Tesseract OCR engine has the

ability of learning, so training this tool with ICS video images will increase the accuracy.

70

An evaluation of the ICS video player for search feature will guide us to the right

path. How this search feature is useful or whether the accuracy of search is enough or

not. Do false positives affect the users? We will find all the answers by such a broad

evaluation. Until that time current OCR engines with image enhancement we provided

can be used for search in ICS videos.

71

References

[1] Todd Smith, Anthony Ruocco, and Bernard Jansen, Digital video in education, SIGCSE Bull. 31 (1999), no. 1, 122-126.

[2] Jaspal Subhlok, Olin Johnson, Venkat Subramaniam, Ricardo Vilalta, and Chang Yun, Tablet PC video based hybrid coursework in computer science: Report from a pilot project., SIGCSE '07 Proceedings of the 38th SIGCSE technical symposium on Computer science education, 2007

[3] Joanna Li, Automatic indexing of classroom lecture videos, Master's thesis, University of Houston, 2008.

[4] Gautam Bhatt, Efficient automatic indexing for lecture videos, Master's thesis, University of Houston , April 2010.

[5] Google Inc., Google video, http://en.wikipedia.org/wiki/Google_Video,January 2005.

[6] Microsoft, Project tuva, http://research.microsoft.com/apps/tools/tuva/, 2009.

[7] Wei-hsiu Ma, Yen-Jen Lee, David H. C. Du, and Mark P. McCahill, Video-based hypermedia for education-on-demand, MULTIMEDIA '96: Proceedings of the fourth ACM international conference on Multimedia (New York, NY, USA), ACM, 1996, pp. 449-450.

[8]Andreas Girgensohn, Lynn Wilcox, Frank Shipman, and Sara Bly, Designing aordances for the navigation of detail-on-demand hypervideo, AVI '04: Proceedings of the working conference on Advanced visual interfaces (New York, NY, USA), ACM, 2004, pp. 290-297.

[9]Andreas Girgensohn, Frank Shipman, and Lynn Wilcox, Hyper-hitchcock: authoring interactive videos and generating interactive summaries, MULTIMEDIA '03: Proceedings of the eleventh ACM international conference on Multimedia (New York, NY, USA), ACM, 2003, pp. 92-93

[10] Frank Shipman, Andreas Girgensohn, and Lynn Wilcox, Hypervideo expression: experiences with hyper-hitchcock, HYPERTEXT '05: Proceedings of the sixteenth ACM conference on Hypertext and hypermedia (New York, NY, USA),ACM, 2005, pp. 217-226.

[11] Michael R. Lyu, Edward Yau, and Sam Sze, A multilingual, multimodal digital video library system, JCDL '02: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries (New York, NY, USA), ACM, 2002, pp. 145-153.

72

[12] Search in Videos , http://searchinsidevideo.com/#home. [13] Michele Merler and John R. Kender, Semantic keyword extraction via adaptive text binarization of unstructured unsourced video, Nov. 2009 , ISSN: 1522-4880

[14] Video Text Recognition, http://www.sri.com/esd/automation/video_recog.html

[15] Anshul Verma, Design, Development and Evalutaion of a Player For indexed, Captioned and Searchable Videos, Master's thesis, University of Houston, August 2010.

[16] Rainer Lienhart and Wolfgang Effelsberg. Automatic Text Segmentation and Text Recognition for Video Indexing. ACM/Springer Multimedia Systems, Vol. 8. pp.69-81, January 2000;

[17] Simple OCR, http://www.simpleocr.com/Info.asp.

[18] ABBYY FinerReader, http://www.abbyy.com/company/

[19] Tessaract OCR, , http://code.google.com/p/tesseract-ocr/.

[20] GOCR, “Information” , http://jocr.sourceforge.net/.

[21] GOCR, “Linux Tag” http://www-e.uni-agdeburg.de/jschulen/ocr/linuxtag05/w_lxtg05.pdf

[22]Modi, http://office.microsoft.com/en-us/help/about-ocr-international-issues-HP003 081238.aspx

[23] About OCR, http://www.ehow.com/how-does_4963233_ocr-work.html

[24] OCR , http://en.wikipedia.org/wiki/Optical_character_recognition

[25] OCR, http://www.dataid.com/aboutocr.htm

[26] Pattern Recognition, http://www.dontveter.com/basisofai/char.html

[27] Sis Threasholding , http://www.aforgenet.com/framework/docs/html/39e861e0-e4bb-7e09-c067-6cbda5d646f3.htm.

[28] Sobel Edge Detector , http://www.aforgenet.com/framework/docs/

[29] Best OCR Font Size, Computer vision , http://www.cvisiontech.com/pdf/pdf-ocr/best-font-and-size-for-ocr.html?lang=eng.

73

[30] Image Interpolation, http://www.cambridgeincolour.com/tutorials/image-interpolation.htm.

[31] Computer_vision , http://en.wikipedia.org/wiki/Computer_vision .

74

Date post:	06-Feb-2018
Category:	Documents
Upload:	vodiep
View:	212 times
Download:	0 times

Simulating the Performance of Parallel Applications on ...taytun/paper/Search in Classroom...

Documents