+ All Categories
Home > Documents > Mobile Eye - Matthew Gaunt

Mobile Eye - Matthew Gaunt

Date post: 21-Apr-2015
Category:
Upload: gauntface
View: 467 times
Download: 0 times
Share this document with a friend
Description:
Mobile Eye is a thesis documenting research into mobile projection techniques and looking into the feasibility of the Sixth Sense project.
55
Summary Over the past year or so, there has been a large growth in the sales of smartphones[16], enabling users to be connected to the internet where ever they are. The major selling point of these devices is the power of the processors and the thrid party application stores, available on a number of platforms. But it’s widely believed that the bottleneck of mobile devices is the screen size, limiting the amount of data that can be presented to the user[27, 41] and this is leading to some manufacturers beginning to embed projectors into their devices to give users a new method of displaying and sharing information [13, 12, 5]. This project is aimed at researching into the possibilities of using such a device with a steerable projector to give the user an automatic way to enhance their shopping experience. What this would involve is the user wearing their mobile device around their neck, such that the projector and camera would be projecting outward in front of them. Then as the user picks up items of interest, an image processing algorithm would register the product and the system would search for a suitable projection area to project relevant information (i.e. ratings, reviews, related products). The potential area’s a product like this could be used are any situation where a user may wish to be giving information but doesn’t want or need to give input into the system. Apart from the retail use already mentioned, engineering use where someone doesn’t have a free hand to use a seperate device but would benefit from other information. If a company were to invest in this product it would be best to create a smartphone application for some of the leading smartphone platforms, which would use a traditional touchscreen interface. This would be followed by research and development into the use of projector phones, integrating it into the existing application. By doing this the company can sell the product on a software as a service model, to retailers as early as possible. Once projector phones become more widely adopted, the company could introduce the system. I would expect such a company would require an investment of £1.3m and to be sold after 5 years for approximately £15m. This is based on working in the UK and selling the service to UK retailers only. There is currently no product that offers the features of an online e-commerce store, in a retail store setting. After discussing the product with a top e-commerce consultant, he said that retailers are very keen to enter into the smartphone market, showing their is a customer base for such a product. For retailers, projecting information to the user is a new method that can be used to show targetted advertising and show ratings of products to the user. This should improve sales since research has shown that customers are willing to pay 20-99% more for a 5 star rated product than a 4 star product [9]. Current research in such systems have been focused on highly set-up and mod- elled environments, using much larger projector systems or the focus has been on user interaction with the projecting device. This research will be into the development of a passive ubiquitous system with focus on presenting information to the user, expecting no user input. I wrote an Android Application (in Java) which communicates with a laptop (via Bluetooth) to control a GUI (in Python) Learnt to program in python and handle threading with a python GUI I created an algorithm which aims to find the largest empty in a camera view (See page 36) 1
Transcript
Page 1: Mobile Eye - Matthew Gaunt

Summary

Over the past year or so, there has been a large growth in the sales of smartphones[16],enabling users to be connected to the internet where ever they are. The major sellingpoint of these devices is the power of the processors and the thrid party applicationstores, available on a number of platforms.

But it’s widely believed that the bottleneck of mobile devices is the screen size,limiting the amount of data that can be presented to the user[27, 41] and this isleading to some manufacturers beginning to embed projectors into their devices to giveusers a new method of displaying and sharing information [13, 12, 5].

This project is aimed at researching into the possibilities of using such a devicewith a steerable projector to give the user an automatic way to enhance their shoppingexperience. What this would involve is the user wearing their mobile device aroundtheir neck, such that the projector and camera would be projecting outward in frontof them. Then as the user picks up items of interest, an image processing algorithmwould register the product and the system would search for a suitable projection areato project relevant information (i.e. ratings, reviews, related products).

The potential area’s a product like this could be used are any situation wherea user may wish to be giving information but doesn’t want or need to give inputinto the system. Apart from the retail use already mentioned, engineering use wheresomeone doesn’t have a free hand to use a seperate device but would benefit from otherinformation.

If a company were to invest in this product it would be best to create a smartphoneapplication for some of the leading smartphone platforms, which would use a traditionaltouchscreen interface. This would be followed by research and development into theuse of projector phones, integrating it into the existing application. By doing this thecompany can sell the product on a software as a service model, to retailers as earlyas possible. Once projector phones become more widely adopted, the company couldintroduce the system.

I would expect such a company would require an investment of £1.3m and to besold after 5 years for approximately £15m. This is based on working in the UK andselling the service to UK retailers only.

There is currently no product that offers the features of an online e-commerce store,in a retail store setting. After discussing the product with a top e-commerce consultant,he said that retailers are very keen to enter into the smartphone market, showing theiris a customer base for such a product. For retailers, projecting information to theuser is a new method that can be used to show targetted advertising and show ratingsof products to the user. This should improve sales since research has shown thatcustomers are willing to pay 20-99% more for a 5 star rated product than a 4 starproduct [9].

Current research in such systems have been focused on highly set-up and mod-elled environments, using much larger projector systems or the focus has been on userinteraction with the projecting device.

This research will be into the development of a passive ubiquitous system with focuson presenting information to the user, expecting no user input.

• I wrote an Android Application (in Java) which communicates with a laptop (viaBluetooth) to control a GUI (in Python)

• Learnt to program in python and handle threading with a python GUI

• I created an algorithm which aims to find the largest empty in a camera view(See page 36)

1

Page 2: Mobile Eye - Matthew Gaunt

• Implemented a client-server protocol to use an image recognition technique (Seepage

• Created custom hardware to create a steerable projection

2

Page 3: Mobile Eye - Matthew Gaunt

Contents

1 Motivation 51.1 Inspiration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Application Area’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Specialist Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.5 Aims and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5.1 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Related Work - Projection 92.1 Projection Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 LCD Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.2 Digital Light Processing (DLP) . . . . . . . . . . . . . . . . . . 92.1.3 Scan Beam Laser Projection . . . . . . . . . . . . . . . . . . . . 112.1.4 Holographic Projection . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Projector Form Factor and Design . . . . . . . . . . . . . . . . . . . . . 122.2.1 Projector Phones . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.2 Steerable Projection . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Keystoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3.1 Embedded Light Sensors . . . . . . . . . . . . . . . . . . . . . . 152.3.2 Smarter Presentations . . . . . . . . . . . . . . . . . . . . . . . 16

2.4 Projection Uses and Interactions . . . . . . . . . . . . . . . . . . . . . . 182.4.1 View & Share . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.4.2 Search Light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.4.3 Projection Technology Summary . . . . . . . . . . . . . . . . . 19

2.5 MobileEye’s Projection System . . . . . . . . . . . . . . . . . . . . . . 19

3 Related Work - Object/Image Recognition 213.1 Image Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1.1 SIFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.2 Indexing Scale Invariance . . . . . . . . . . . . . . . . . . . . . 223.1.3 SURF’s - Detector . . . . . . . . . . . . . . . . . . . . . . . . . 233.1.4 SURF’s - Descriptor . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Smartphones 264.1 Camera Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1.1 Well Adjusting Capacitors . . . . . . . . . . . . . . . . . . . . . 264.1.2 Multiple Capture . . . . . . . . . . . . . . . . . . . . . . . . . . 274.1.3 Spatially Varying Pixel Exposures . . . . . . . . . . . . . . . . . 284.1.4 Time to Saturation . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2.1 Dalvik Virtual Machine . . . . . . . . . . . . . . . . . . . . . . 284.2.2 Application Development . . . . . . . . . . . . . . . . . . . . . . 304.2.3 Security, Intents and Receivers . . . . . . . . . . . . . . . . . . . 31

3

Page 4: Mobile Eye - Matthew Gaunt

4.2.4 Bluetooth and Camera api . . . . . . . . . . . . . . . . . . . . . 31

5 Project Execution 335.1 Tool and Language Choices . . . . . . . . . . . . . . . . . . . . . . . . 33

5.1.1 Mobile Application . . . . . . . . . . . . . . . . . . . . . . . . . 335.1.2 Hardware for Projection Rotation . . . . . . . . . . . . . . . . . 345.1.3 Projection UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.1.4 Object Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.2 Space Finding Algortihm . . . . . . . . . . . . . . . . . . . . . . . . . . 365.2.1 Histogram Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2.2 Hill Climbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2.3 Area Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2.4 Application Structure . . . . . . . . . . . . . . . . . . . . . . . . 40

6 Project Status 426.1 Current Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6.1.1 Projector UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.1.2 Android Application . . . . . . . . . . . . . . . . . . . . . . . . 426.1.3 Image Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 436.1.4 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436.1.5 Aims Achieved . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436.2.1 Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436.2.2 Automatic Object Recognition . . . . . . . . . . . . . . . . . . . 446.2.3 Space Finding Algorithm . . . . . . . . . . . . . . . . . . . . . . 44

A Space Finding Appendix 45A.1 First Averaging and Thresholding Test’s . . . . . . . . . . . . . . . . . 45A.2 Averaging and Thresholding Improvements . . . . . . . . . . . . . . . . 51A.3 Dalvik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4

Page 5: Mobile Eye - Matthew Gaunt

Chapter 1

Motivation

There are a number of factor’s into the motivation of this project. Smartphones arebecoming more powerful and the platforms which run on them are becoming morestable and open, offering third party developers access to the device’s hardware throughstandard and reliable api’s.

These mobile devices are able to store and play high resolution media, yet the screensize acts as a bottleneck[27, 41]. New projection technologies have made it possible toput projector’s inside mobile devices.

Manufacturers are intending the use of these projectors for viewing media andoverhead projection [48], but research in the HCI field has been on-going into newinterfaces that could be created.

The aim of this project is to research into developing a ubiquitous computing systemthat will take advantage of this new hardware, to recognise relevant products in a usersscene (from the mobile devices camera) and find a suitable area with-in which to projectrelevant data to the user.

From this point onward, when I refer to the ’MobileEye’ system, I will be referringthis system, a projector phone, capable of recognising items and projecting relevantinformation in the user’s environment.

1.1 Inspiration

One of the biggest inspirations of this project, is the gestural interface commonly knownas ’SixthSense’ [39] (Previously called Wear Ur World[40]). It comprises of a cameraand projector to project information onto the users surroundings and the camera isused to interact with the display through gestures.

This is an impressive and thought provocking implementation of such a technology.However it’s hard to gauge the stage of the implementation from the video demonstra-tion. There are many functionalities which appear staged.

The main method of interaction with the system is through gestures which areregistered by colour finger caps. The caps enable the user to perform tasks such astake pictures by making a frame gesture, draw on a projected screen, watch videos ona news paper and much more [37, 38]. But each of these require some form of a visualmarker.

The use of visual markers means the implementation can be done effectively, givesreliable projection area (where appropriate) and gesture recognition. By using themarkers it eases the required image processing to determine a hand gesture from fingersalone or determine suitable projection areas. The implementation is done on a laptopcomputer carried around on the users back, the reason for this may have been to enablethe use of existing software (e-mail client, drawing applications etc) which was shown

5

Page 6: Mobile Eye - Matthew Gaunt

Figure 1.1: Examples of the Sixth Sense Features. Notice the finger caps and markersused for each interation.

in the demonstration, but this does hide any restraints that might be applicable to amobile device.

The area of interest I was most intrigued about was the possibility of the systemprojecting onto newspapers, books and products, similar to that shown in [38]. SixthSense implements this, but projection is done directly onto the book cover, regardlessof what is covering the projection area and the newspaper is calibrated for the projectorby the use of markers (as seen in D of 1.1).

MobileEye is an attempt to address these problems, to consider where (or evenwhen) is best to project information and is there a way of achieving this withoutaltering the environment the system is used in.

1.2 Application Area’s

Because personal projection is such a new technology and the research into methodssuitable for it’s use are still developing, their are no products of this kind in the market.Some manufacturers have begun adding projection to their devices to try and overcomethe screen size bottleneck, such as the Samsung Halo mobile phone and Nikon S1000pjdigital camera [13, 12, 5].

The MobileEye system is a product in itself, suitable for use in a retail setting,when a user picks up a product, a book for instance, it would be desirable (on behalfof the customer and the retail store) to project information about the book, such asratings and related products. This will encourage the user to buy products with ahigh rating and if the product has a low rating, they can be recommended other morepopular choices.

But the whole MobileEye product could be used in any environment or situationwhich require’s or benefits from additional information based on the user’s currentscene. Other such examples are engineers who don’t have hand’s free but require refer-ence material, the projecter can project this information onto the user’s surrounding’swithout the user’s input, it enables them to keep their focus on the task at hand.

The keystoning and space finding techniques are suitable for use in the existingconsumer products previously mentioned, since at present the projection is static and

6

Page 7: Mobile Eye - Matthew Gaunt

it is the user’s responsibility to move and calibrate the projector accordingly.

1.3 Specialist Knowledge

To complete this project there needs to be a number of area’s researched into to covereach component.

Projectors and HCI Mobile projection is a new research area with a number ofpapers targetted at how people interacting with these devices. I need to look intowhat research has been done and whether aim of this supports or goes againstthe method of interaction I’, proposing where the projection gives information tothe user expecting now input from the user.

This would also include any research into steerable projection which may influencethe design of the hardware or highlight any key problems that may occur.

Object Recognition I expect this to be an exetremely broad topic that will have anumber of alternatives each with varying levels of success. The aim will be tolook into some of these choices and make an informed decision as to the mostsuitable method for MobileEye.

It is worth pointing out that I expect this image processing to be outsourced to acomputer with stronger computational capabilities than what is available to themobile device.

Space Finding Algorithm This is the opposite of the object recognition algorithm,in that the aim will be to find a part of the image view with no noise.

I would expect this to be developed from a number of other methods / techniquesthat will be used to inspire the algorithm used for MobileEye, which will betargetted at running on a mobile device.

Each of these topics will be discussed in the relevant chapters of related work.

1.4 Benefits

There seems little research into Sixth Sense like devices, the main focus is on usingprojector phones in a stationary way, i.e. the used has to stand in front of the suitablesurface and project some information onto the wall or overlay some information. SoI’m aiming to create a basic implementation that could then be built upon and changedby other’s to test their own take on MobileEye as well as add to it.

The space finding algorithm obviously isn’t limited to a mobile device and may haveapplications in other research area’s. But I do think there is some interesting researchpossibilities into interacting with multiple projectors on a single screen where eachprojector only takes up a section of the available projection space. At the moment anumber of projects aim to stitch together projection views to create a single projectionarea, but what I’m suggesting is that you let a projector be responsible for a smallsection of the available space.

I wouldn’t expect this research to benefit consumer’s directly, but as I have said, Ihope researchers may find specific aspects of this useful or wish to develop the systemas a whole further.

7

Page 8: Mobile Eye - Matthew Gaunt

1.5 Aims and Objectives

• Research into the possibility of having a space finding algorithm to aid projection

• Implement and improve such an algorithm to run on a mobile device

• Perform a study into the feasibility of a ubiquitous system requiring no user input

• Research into image processing algorithms for object recognition

• Implement object recognition suitable to run or be used by a mobile device

1.5.1 Structure

Discuss structure of the paper - Related work, development of implementation, futurework.

8

Page 9: Mobile Eye - Matthew Gaunt

Chapter 2

Related Work - Projection

2.1 Projection Technology

There are a number of technologies that are used for producing and projecting imagesonto a surface. Since MobileEye is intended to be used in a consumer device, thereare a number of key factors required to make it suitable. This includes the size, powerconsumption and the brightness.

2.1.1 LCD Projection

LCD projection is one the most widely used technologies in larger projector systems,although aren’t use in smaller form factor projectors.

LCD systems function by taking a bright light source and passing it through anumber of dichroic mirrors. Each dichroic mirror acts as a light filter, allowing onlycertain wavelengths (colours) through the mirror, while reflecting others[29].

By passing the light through the dichroic mirrors, the light source is divided intored, green and blue beams. These beams are then passed through 3 individual LCDchips which creates 3 different coloured versions of the image. These images are passedinto a prism and combined to form the projected image[29].

See figure 2.1 for an illustration.

2.1.2 Digital Light Processing (DLP)

Digital light processing (DLP) projection techniques are the most commonly imple-mented system in consumer devices [48].

DLP works by shining a light source onto a set of micro-mirrors. These mirrors areformed together into a digital micro-mirror device (DMD) (see 2.2).

When a digital signal, such as an image or video, is passed into the DLP system, themirrors change angle with respect to the amount of light needed to create a greyscaleversion of the image.

Each mirror moves thousands of times a second, the more an individual mirror isswitched on, the lighter the pixel will become[31]. This method so far only introducesa grey scale image, to introduce colour, a color wheel is inserted between the lightsource and the DMD. The mirrors then move synchronously with the colour wheel,projecting a number of different coloured versions of the image[29]. The human eyewill then merge these colours to the intended colour. See figure 2.3 for a full diagram.

One problem DLP projection suffers from is that if the speed at which the colourwheel spins is too slow, it can cause adverse effects to some viewers. The problem isknown as the “rainbow effect”, this occurs when the colours appear to be oscillating,i.e. the red, green and blue images don’t merge but appear one after the other[46]. It

9

Page 10: Mobile Eye - Matthew Gaunt

Figure 2.1: LCD Projection

Figure 2.2: Digital Micro-Mirror Device (DMD)

Figure 2.3: Full DLP System

10

Page 11: Mobile Eye - Matthew Gaunt

Figure 2.4: Scanned Laser Beam Projection

is also possible to have more than RGB colours on the colour wheel, however the speedthe colour wheel rotates will probably need to be sped up to ensure the rainbow effectdoesn’t occur.

2.1.3 Scan Beam Laser Projection

PicoP R© is a propriatry engine to display images using a Scanned-Beam Laser techniqueand works as follows:

• Three lasers (red, green and blue), each have a lense close to their output. Thelense provides a low numerical aperture (in lasers this means the laser’s point isdefined and not blurred/faded over a large area).

• Each laser (red, green and blue) is joined with dichroic elements into a single whitebeam. The Dichroic elements act as a color filter, allowing certain wavelengthsto pass through, while making others reflect.

• This white beam is relayed onto a MEMS (Micro-Electro-Mechanical System)Scanning mirror, which is used to scan over the surface[50].

Each set of lasers essentially creates a single pixel of the correct color and establishesthe focus. The 2D MEMS scanner then paints the pixels of the image, by movingto position the laser beam correctly for each pixel[25]. See figure 2.4 for an imageillustrating the process.

The advantage of this technique is that is has unlimited focal length, somethingwhich traditional projectors suffer from[30], has low power consumption and is suitableto be implemented in small form factors.

A problem laser projection can suffer from is speckle contrast. Speckle occurswhen a number of waves of different phases add together to give a wave with highintensity[54]. This is prominent in laser projection and gives the viewer a percievedlower image quality, although there are methods to prevent this from occuring.

2.1.4 Holographic Projection

Light Blue Optics has come up with a method of holographic projection, somethingwhich has previously been infeasible due to it’s computational complexity. HoweverLBO (Light Blue Optics) have found a way to use the holographic projection methodin real time.

11

Page 12: Mobile Eye - Matthew Gaunt

Figure 2.5: Holographic Projection

Traditional holographic projection works by taking a hologram h(u,v), which isoften a fixed structure refered to as a diffraction pattern, illuminating it with light ofwavelength λ which is then passed through a lens. An image F(x,y) is produced atthe back focal length of the lense due to the relation between the hologram and it’sdiscrete Fourier Transform shown in 2.5[21].

Fxy = F [huv] (2.1)

The main problem with this approach is that to calculate the hologram h(u,v) youwould need to use the inverse Fourier transform of the image, which would give a fullycomplex result and there is no microdisplay that can handle this information (wherethe microdisplay would need store h(u,v)).

LBO’s approach was to quantise this result (make h(u,v) a set of phase only results),making it feasbile to use on a microdisplay, the cost of this is reduced the image quality.The lose in qualtiy is overcome by displaying a number of versions of a single videoframe at a fast enough rate that the human eye would blend the images together tocreate a high quality image. This is applied to each frame to give high quality videoprojection.

Like scan beam laser projection, this technique adds colour to the display throughthe use of red, green and blue lasers and has unlimited focal length. While LBO’sholographic technique does still suffer from speckle contrast, the method of displayingmultiple versions of a single video frame, gives effective speckle reduction in itself.From this additional speckle reduction methods can be applied.

LBO’s system is energy efficient as it turns laser’s off if they aren’t required, eyesafe and has a small form factor.

2.2 Projector Form Factor and Design

2.2.1 Projector Phones

An interesting aspect of projector phones is where should a projector and camera beplaced on such a device. Since these devices are so new, this is something that shouldbe considered to offer the most flexible and suitable option. Enrico Rukzio came upwith in interesting mapping of possibile positions and uses for each possible componentposition. Overlayed onto this is is the possible uses of each component and which ofthese uses are currently researched into and what the manufacturers expect the systemto be used for[48], see 2.6.

The most interesting thing with this is that most research with these devices ishappening with layouts of components not used by manufacturers. This is clearly inpart to the simply fact that the intended use of this products are extremely differentto the uses that are being researched, but this does pose a barrier to getting some ofthese research techniques into consumer products.

12

Page 13: Mobile Eye - Matthew Gaunt

Figure 2.6: Camera Projector Placement

There is one form factor ignored from this diagram, which is steerable projection.Most steerable projection methods that have been implemented are on traditional,larger projectors (discussed below) but these designs could meet 2 or more of theseprojector positions while still only requiring a single projection component. It mayalso be possible to switch the position of camera’s through the use of steerable mirrorsas well, this would enable the manufaturers to satisfy a large number of uses from asingle device.

Alternatively a form factor which moves the entire projector could be implementedsimilar to the WowWee Cinemin Swivel pico projector, shown in 2.7.

2.2.2 Steerable Projection

One of the biggest restrictions on projector systems, regardless of the technology used,is that the image is projected directly forward. There are some obvious problems andadvantages of using mirrors to reflect and move the target projection.

The Everywhere display is a research project at IBM [44], where a steerable pro-jector with a camera was used to change any surface into a touchscreen interface.

A common problem with projection is the projected image becomes distorted if theprojector is not orthoganal to the projection surface. This is an evident problem withsteerable projection as the surface being projected onto is changing. In the Everywheredisyplay this problem was initally overcome by pre-computation and 3D modelling ofthe environment[44], from this 3D model, they treated the projector as a camera inthe 3D world, then placed a texture map of the image they wished to see and usedthis information to create the pre-warped image that when projected in the real world,would appear un-distorted[43].

This method was time consuming and future research in the project lead to adifferent approach being implemented. The method proposed was using a paper patternand placing it in the scene so a transformation P ′ can be made between the cameraand the surface by finding the four corners of the paper pattern[45]. The relationship

13

Page 14: Mobile Eye - Matthew Gaunt

Figure 2.7: WowWee’s Cinemin Swivel Projector[56].

between the paper pattern corners and the corners observed by the camera is definedby:

P ′C = B (2.2)

Where C is the matrix of the four corners from the point of view from the cameraand B is the matrix of points on the paper pattern. Intuitively this makes sense, givenany point in the camera frame, if we apply some transformation we should be able tocalculate the relative point on the projected surface (where the paper pattern lies).

To calculate P ′, the pseudo-inverse is computed (The pseudo-inverse is the inverseof a complex matrix of size mxn).

P ′ = BCT (CCT )−1 (2.3)

This only gives us the relation between the camera and the projector surface, but tocalibrate the projection, a relationship H between the camera and projector is needed.This can be obtained by projecting a pattern consisting of four points D. When thispattern is projected it will give four points on the projected surface, set E points. Fromthis E can be observed by the camera giving the points F in the camera frame.

we know the points E are related directly to both the projector’s frame and thepoints viewed the camera. So if you view this as a transformation needed to changethe original points D to get the points E, we can define the relationship as PD = E.The relationship to the camera frame is then given similarly P ′F = E, where therelationship between the camera and surface (P ′) has already been calculated. Fromthis we know PD = P ′F , the pseudo-inverse is then calculated, which can then beused to calulate H by taking the inverse of P .

P = P ′FDT (DDT )−1 (2.4)

H = DDT (P ′FDT )−1 (2.5)

From this the transformation can be applied to warp the mesh.While this is similar to the keystoning correction needed for this project, it would

be preferable not to be restricted to the requirement of having a paper pattern on theprojected surface.

14

Page 15: Mobile Eye - Matthew Gaunt

Figure 2.8: Everywhere Display Prototype

A third alternative suggested in the paper was to find four corresponding points intwo 2D projective spaces, which were then manually adjusted to get a good version ofthe pre-warped image. This is again unsuitable for MobileEye as we don’t want theuser to interact with the system.

In terms of hardware for the Everywhere display, a movable circular mirror wasplaced over the projection lense. The hardware was constructed using the system froma disco light, which was controlled through a host computer. For an image of this, see2.8.

The aim of the Everywhere Display was to use it’s features in retail and workspacesetting.

A similar research project [22] used a steerable projector to project on elements onthe room it was placed. This projector was fixed to a moving base and the surfacessuitable for projection were predefined. The interesting thing with this work was thatthe space was divided into small square blocks. From this the items that were to beprojected were assigned to the most suitable block, giving good results for small areas.An example of this is shown in 2.9

An alternative to pre-computing a 3D model of the environment was proposed in[19], which didn’t create a 3D model, but rather pre-computed spaces in it’s environ-ment during intialisation through image processing techniques, again using projectedmarkers to calibrate.

2.3 Keystoning

2.3.1 Embedded Light Sensors

In [33] a keystoning calibration technique was successfully achieved by placing lightsensors at each corner of interest in the object which is being projected onto. A patternis then projected onto the surface, these patterns are black and white and will hencegive the light sensors bright or dark values which are relayed back to a host PC. Thisthen gives a highly accurate calibration which can be applied to planar surfaces or 3Dobjects. The reason for using multiple patterns is that it narrows down the position ofthe points of interest by using horizontal and vertical bar patterns which decrease insize.

This calibration can be done in just under a second (although it is hoped that thiscan be reduced with high speed projection technology). The method scales up wellwith the resolution of the projection.

After the patterns have been projected a homography matrix can be created which

15

Page 16: Mobile Eye - Matthew Gaunt

Figure 2.9: Steerable projection - annotation by using a grid layout to decide whereelements belong

is used to correctly keystone the image.

This method again suffers from the same issues as the Everywhere display in thatis requires the manipulation of the environment to function, as well as needing a wayto relay the data from the sensors to the mobile device. It does however offer someinteresting characteristics for 3D projection.

2.3.2 Smarter Presentations

The most relevant paper for keystoning for MobileEye is [52, 53] where it is assumedthat the intrinsec parameters of the camera and projector are unknown and that theprojection surface is flat. These are the same assumptions that I would expect to makefor MobileEye.

The technique of keystoning is achieved by projecting a calibration pattern (similarto the method proposed in [47]), then finding common points of the projection surfaceon the camera frame. From this a homography can be made, providing there are fouror more common points which are both known on the projection and camera frames.

Given a projected image we take some point Cp = (xp, yp) in a projector imagemwhich is projected onto some unknown point on a planar surface. All we know is thereexists some transform that can be applied to Cp to get the point on the projectedsurface Cs. If we then observe this point from the camera we are given a point in theimage frame Cc = (xc, yc) which has a transformation from the surface to camera. Thesame as suggested in the everywhere display [45].

Now because both the projective views of the projector and camera are viewing thesame points on the surface, there’s a homography between the 2 frames.

So treating the projector as a camera, Cp = HCc and we obtain the homogenouscoordinate as:

16

Page 17: Mobile Eye - Matthew Gaunt

x1

x2

x3

=

H11 H12 H13

H21 H22 H23

H31 H32 H33

xc

yc

1

(2.6)

Where xp = x1/x3 and yp = x2/x3. The above equation can be re-written as:

xp =H11xc +H12yc +H13

H31xc +H32yc +H32

(2.7)

yp =H21xc +H22yc +H23

H31xc +H32yc +H32

(2.8)

This can be re-arranged further to:

aTxh = 0 (2.9)

aTy h = 0 (2.10)

where:

h = (H11,H12,H13,H21,H22,H23,H31,H32,H33)T (2.11)

ax = (−xc,−yc,−1, 0, 0, 0, xpxc, xpxc, xp)T (2.12)

ay = (0, 0, 0,−xc,−yc,−1, ypxc, ypyc, yp)T (2.13)

Now given a set of four or more points, we can create a linear system of equationsAh = 0 and solve the problem in a least squares method.

A =

aTx1c

aTy1c

.

.aT

xNc

aTyNc

(2.14)

Now by writing the sum of squares error of Ah = 0 (Sum of squares is a math-ematically method to calculate the deviation of a set of points from the mean),weget:

f(h) =1

2(Ah)T (Ah) (2.15)

When this is multiplied out and the derivative of f with respect to h is taken, theresult is ATAh = 0. From this we can obtain h as the eigenvector corresponding to thesmallest eigenvalue[32].

This homography is then used as a pre-warp calibration to the projected content.Further refinements of this are then achieved by projection a more complex pattern.

This paper goes on further to then keystone the image not from the point of view ofthe camera, but from the point of view of the audience (viewing on a projector screen),which is done by image processing techniques. However MobileEye assumes that theuser and the camera’s perspective are similar enough, not to edit the warping beyondthis homography.

One final point, that is worth mentioning, this paper also includes a method forinteracting with the projected screen using a laser pointer. This isn’t going to be aconcern with MobileEye, but could lead to a richer interaction technique for users ofthe system.

17

Page 18: Mobile Eye - Matthew Gaunt

2.4 Projection Uses and Interactions

2.4.1 View & Share

There are obvious concerns with the use of projection technology in that there is ashift between private and public interactions. A good illustration of using public andprivate data is in the implementation of View & Share, a mobile application whichenables users to view photos and media on a projected surface, while other users candownload photos from the ’Presenter’ freely and easily[27].

In the View & Share application, there is a public and private option. By changingto a private viewing session, all participants are forced to view and share images ontheir mobile device, turning off the projection.

This is an obvious and required feature, but to build a truely pervasive computingexperience, there needs to be a strict rule set on what is acceptable to project.

Besides illustrating this barrier of private and public interaction, it also illustrateduse of sharing data as well as hardware, through projection, downloading and connec-tivity to essentially lend out the projector.

View & Share gave control to a participant in the group[27], but it could be possibleto give a portion of the projectable area to each user with their own projector phone.If laser projectors are truely able to turn off pixels as suggested in [25] it could bepossible to use a space finding algorithm to divide the available space of a projector tomultiple parties, leading to some interesting functionalities and interactions.

2.4.2 Search Light

One particularly common interface technique for projector phones which is emergingis the analogy of using a projector phone as a torch light in the dark to reveal differentinformation. A good example of this is [34] where a paper map is annotated withGoogle Latitude markers of the users friends. To see all of the friends the user movesthe projector phone around the map to reveal more information.

Now this technqiue has been applied before but on the devices screen, meaning theuser has to switch attention from the background (the paper map) and the display. Ina study aimed specifically at the use of the torchlight interface for annotating points ofinterest on a map, it was found that the projector method was faster than a magic lensapproach [51]. It was proposed that this was because user’s no longer had to changetheir attention from the background and screen display.

It’s interesting to consider how this interaction might be applicable to this researchproject. MobileEye intends to remove the user participation in the positioning of theprojector their are a few scenario’s this may play a factor:

• The user changes their orientation to move the projection across the projectedsurface, wanting this feature.

• The user takes hold of the device wishing to switch and make use of this mode.

With the first idea, the compass or features of the scene (from the camera) couldbe used to determine movement. I would expect that this would be quite unnatural tothe user and would only be required for a few limited applications.

However the second method would be alot more natural to the user, at this pointI would expect the user to hold the device such that the screen was facing up and theprojection would be straight ahead. Then from this state the user could interact withthe devices screen and projection screen independently.

18

Page 19: Mobile Eye - Matthew Gaunt

One such suggested method of interaction is the idea of using the projector phoneas a search light with a cross hair in the centre of the screen and then using a buttonclick to select an item the crosshair is hovering over[48].

2.4.3 Projection Technology Summary

DLP is the technique implemented in most consumer products (pico projectors, mobiledevices etc.). This is most likely because of it’s maturity. However it makes sense touse a laser based projection technique as it offers a projection with unlimited focallength, something that DLP needs done manually.

While it is difficult to gauge the pro’s and con’s of a scan-beam projection systemcompared to holographic projection, it is clear that both of these system’s offer theright features to be used in a consumer device (low power consumption, unlimited focallimit, small form factor). The only difference that can be noted is the extra specklereduction in LBO’s system.

2.5 MobileEye’s Projection System

There has been a large amount of research into steerable projection and interfaces formobile projector phones, but there seems to be little, if any, overlap between the twoareas.

One of the biggest and un-researched problems that the MobileEye system willsuffer from is that the intention will be to project on unknown surfaces in an un-known environment. Most steerable applications over come this by having a staticallyplaced projection systems (i.e. the projector is steerable but only ever kept in one en-vironment) and then pre-computing suitable area’s or using a 3D model to determinesuitable area’s.

With the release of new projector phones and the first ever workshop on personalprojection having commenced on May 17th 2010, an algorithm that could offer betterselection of projection area’s could offer a number of new uses and applications in thisresearch field.

After finding a suitable space within which to project, a calibration step would beideal. This would need to be done before each projection attempt, because each surfacewould be new. The method for the calibration will be the same technique used in [53]which requires no intrinsic parameters of the camera or projector.

The reason for using this previous work over other alternatives is that this requiresno alteration of the environment, unlike [39, 45] and no need for pre-computation. Iwould expect that only an initial estimate of the calibration would be needed, as thiswill speed up the process, but I will also be making the assumption that the keystoningwill calibrate the projection with respect to the camera, which the user will be wearing.This means a satisfactory result should be given for the user.

The best projection technology for this project is obviously either a scan-beam laseror holographic projection method. The reason being that they offer a projection that isalways in focus (A problem that DLP and LCD methods suffer from) but also becausethey are suitable for use in consumer products.

The most suitable form factor would be to have the projector and camera on theback of the device since a user will be expected to wear the device as a pendant.Ideally, flexibility would be given by having the projector projecting upwards, witha steerable mirror over this, with have some mechanism of completely retracting themirror, enabling the projection to project straight ahead (i.e. up). This then allowsthe user to have the option of naturally holding the device and using the device in atorch light method.

19

Page 20: Mobile Eye - Matthew Gaunt

The existing interaction techniques may lead to using device as if it had seperate‘states ’where the device can be left passively to give the user information or be in astate to be used for interaction. This would be useful as if the projection system doesgive the user some interesting information which the user then wishes to act upon, thestate can change to still use the projector (if suitable).

20

Page 21: Mobile Eye - Matthew Gaunt

Chapter 3

Related Work - Object/ImageRecognition

MobileEye will need some method for recognising products and object’s which will beused to obtain information and project onto the user’s environment. The main methodwould be to attempt to match an image with previously seen or stored images.

In this chapter a number of possible algorithms to perform this task will be dis-cussed.

3.1 Image Recognition

The task of identifying correspondences between two images can be divided into threeparts:

• Interest points are selected from the image, this will be things like corners, blobs(areas of an image which are brighter or darker than their surroundings[17]) andT-Junctions.

• Each interest points surrounding pixels are used to create a neighbourhood vector,known as a ‘descriptor ’.

• The final stage is matching, giving two similar images, the descriptors should beable to indicate a reliable match.

3.1.1 SIFT

Scale Invariant Feature Transform (SIFT) and is based on observations of neurons inthe temporal cortex of a mammal[35].

SIFT works as follows:

• First a Difference of Gaussian is applied to the image, this is then used to find localmaxima and minima as points of interest. By blurring the image, the keys arepartially invariant to local variations like 3D Projection. Difference of Gaussianis the method of subtracting one blurred images from another blurred image(where each image is blurred by a different amount). This is an approximationof Laplacian of Gaussian (which is a blob detector).

• These keys are then indexed using a nearest neighbour approach, which uses abest-bin-first search method. Best-bin-first search is an algorithm based on thekd-tree data structure.

21

Page 22: Mobile Eye - Matthew Gaunt

Figure 3.1: The top row is the same image at different resolutions

The Kd-tree data structure is a binary search tree where each node is an n-dimensional hyperplane. This gives some useful properties for handling nearestneighbour data structures when trying to find the closest neighbour. Best bin firstsearch performs a similar search to kd-trees but performing depth first search,comparing each node to determine which branch to traverse along. Howeverinstead of back tracking as done by kd-trees, it just selects the closest, and inSIFt it was found good results are optained by weighting certain nodes and thenonly traversing along the tree to the 200th depth.

• From these reliable keys, a hough transform is used to cluster keys with similarpose. A Hough Transform has been used in the to identify arbitrary shapes, bylooking at the set of pixels identified as part of a line or shape and creating theappropriate shape [18]. In the SIFT algorithm the Hough Transform is used togroup similar keys into bins.

• The final bins are sorted into size order and least-squares is used to match animage point to an affine transformation. By matching the model with the imagefeature, outliers can be found and removed. Once all outliers are removed, if 3models remain, a match has been identified.

The key advantage of SIFT compared to it’s predecessor’s is it’s scale and illumi-nation invariant. It can also handle partial occlusion as well and was developed withspeed as a main concern.

3.1.2 Indexing Scale Invariance

This method was aimed at creating a suitable detector that would be scale, rotationand affine invariant while maintaining a suitable method to index the images, makingit easier to look up matching images.

The detector proposed in [36] handles scale invariance by creating several resolutionsof an image, where each resolution is created by applying a Gaussian kernel which blursthe image. Then given a point in an image a characteristic scale is found by calculatingthe result of some function F, which creates the scale space (i.e. applies the gaussiankernel), and these results are used to find a local maximum. This function F can beany one of a number of functions applied to each scaled image and their work showedthat laplacian gave the most promising results. To help understand this look at 3.1.

So far the methods described was using all the points that satisfied the property, thecharacter scale between to images was equal to the scale factor between the two images.This however revealed a number of in-accuracies and unstable results, so interest pointswere used instead of all the possible points.

22

Page 23: Mobile Eye - Matthew Gaunt

Figure 3.2: An example of integral images. The value of A is simply A, B = A + B, C= C + A and D = A + B + C + D.

The interest points were selected using a Harris detector (Corner and edge detec-tion) [28], since it gave reliable results and works with rotation and scale invariance.From these interest points the Laplacian function was applied to test the matches overthe scale space. From this matching points could be found with good reliability.

Indexing of these points is then achieved by turning each interest point of an imageinto a descriptor, maintaining rotation invariance by using the gradience direction of thepoint.Matches between points are determined using the Mahalanobis Distance (whichis a statistical method used to measure the distance of unknown and known sets, byconsidering the correlations between them). The image retrieval from a database isachieved by a voting system, where each point of a query image that matchs a pointin the database gives a vote for that image. The image with the most votes is thendetermined the most similar.

3.1.3 SURF’s - Detector

SURF uses a method referred to as Integral images, originally proposed by [55]. Theidea behind integral images is that the values of some pixel x=(x,y) is calculated asthe sum of all pixels to the left and above the pixel. Doing this can be done in onesweep of the original image. See 3.2.

SURF’s detector was based on the Hessian-Laplacian detector used in [36] (referredto as harris-laplacian in [36]). The proposed detector for finding points of interest iscalled Fast-Hessian.

Given a point X = (x, y) in some image I, the hessian matrix is defined as:

H(X, σ) =

[Lxx(X, σ) Lxy(X, σ)Lxx(X, σ) Lyy(X, σ)

](3.1)

for the point X and scale σ, where Lxx(X, σ) is the convolution of the Gaussiansecond order derivative ∂

∂x2 g(σ) with the original image I at point X (Similarly for Lyy

and Lxy). This is the Laplacian Gaussian, SIFT proposed the approximation of this(Difference of Gaussian) could be used and this gave successful results. So Fast-Hessianuses an approximation of the second order derivaties to give box filters, which can beefficiently calculated using integral images. These approximations of the second orderGaussian Derivatives for σ = 1.2 can be seen in figure 3.3. Each box filter is namedDxx, Dxy and Dyy respectively.

Now that these filters have been created, they can be applied to the Image I. Becausea filter of any size can be applied efficiently to an integral image (i.e. any filter can beapplied with the same number of operations), there is no dependency on waiting forthe first application of the filter before applying a second filter, which is traditionallythe case. This is important as it means the scale space can be created in parallel givingan speed increase.

23

Page 24: Mobile Eye - Matthew Gaunt

Figure 3.3: SURF Box filters, the left side is the discretised and cropped Gaussian sec-ond order partial derivatives and the right hand two are the box filter approximations.

The filter sizes in SURF increase in size (9x9, 15x15, 21x21, 27x27 etc) due to theform of the box filters and the discrete nature of the integral images. As the filter sizesbegin to get larger, the increase between each layer increases as well, this is double foreach octave (6, 12, 24). So for the first octave you may have 9x9, 15x15, 21x21, 27x27and then the next octave would have 39x39, 51x51 and so on. This has the propertythat the Gaussian derivatives scale with the filter size. For example the 9x9 filter wasthe approximation of σ = 1.2 Gaussian second order derivative which is the scale factors (i.e. s = 1.2). So for the 15x15 filter σ = 15

9x1.2 = 2 = s.

Interest points are then selected by applying a non-maximum suppression to a 3 x3 x 3 neighbourhood (3 x 3 pixels in 3 images of the scale space). This is simply amethod used to detect edge’s by seeing if a pixel is a local maxima along a gradientdirection [24]. The maxima of the Hessian determinant are then interpolated in theimage and scale space, which is a final technique to give us the points of interest.

3.1.4 SURF’s - Descriptor

To store a description of each interest point, a Haar Wavelet response is taken in the xand y direction in a neighbourhood of 6s around the interest point (where s is the scalethe interest point was detected). The size of the wavelets is 4s, which means that thewavelets size can become large with high scale factors, because of this integral imagesare used again.

The responses of the wavelets are weighted against a Gaussian centred at the interestpoint with σ = 2.5s and represented as a vector. A sliding window of 60 degrees is thenused to sum up the horizontal and vertical vectors to give a new vector. The longestnew vector is the choosen orientation of the interest point.

The window size of 60 degrees was choosen experimentally. There is a proposedversion of SURF called U-SURF, which skips this step. The idea being that everythingwould be roughly up, it’s difficult to say whether this is applicable to MobileEye ornot, while the user should be wearing the camera in such a way that we could pro-grammaticly rotate the image so everything was approximately up, there is no datashown to the tolerance of any rotation (i.e. a product slighty rotated left or right tothe camera might cause big problems).

Now we have an interest point and an orientation, a box is created around theinterest point with size 20s (s = scale factor). This is then divided into a 4x4 grid ofsub-regions and each sub-region is given 5x5 evenly spaced sample points. Again Haarwavelet responses are taken along the horizontal and vertical directions (in relationto the region orientated around the interest point). These responses are weight by aGaussian position at the center of the interest point with σ = 3.3s. These responsesare summed up for each sub region. The to include information about the polarity ofthe intensity changes (light to dark vs. dark to light) the sum of absolutes of each subregion’s sample points are extracted. This is then used to create a feature vector:

v = (∑

dx,∑

dy, |∑

dx|, |∑

dy|) (3.2)

24

Page 25: Mobile Eye - Matthew Gaunt

where dx is the sum of vectors of each sub region along the horizontal axes and dy

on the vertical axes. Over the 4x4 sub-region for each dx and dy, v has a length of 64.The reason of setting the sub-regions to a size of 4x4, is again through experimenting

for the best results.The final detector element used is the trace of the Hessian matrix for the interest

point, which gives the sign of the laplacian, this is just a small addition that gives agood speed increase during matching.

25

Page 26: Mobile Eye - Matthew Gaunt

Chapter 4

Smartphones

4.1 Camera Technology

This may seem like an insignificant point, but there is one very apparent feature in cam-era technologies that have proven to be a helpful feature in the space finding algorithmdeveloped for this project, which requires acknowledging for any future implementation.

CCD and CMOS are the two main competing camera technologies, both with anumber of advantages and disadvantages. Here I will be focusing on the techniquesused to improve the dynamic range of these sensors.

Dynamic range is the ability for a chip to handle both bright and dark area’s inan image. It’s hard to produce a good example of this in picture form, but in 4.1 youcan see an illustration hidden by a shadow. In the left hand image the illustration isbarely noticeable, however in the right hand image the illustration is clearly visible.The lighting was not changed, only the angle of the camera. What is happening isthat the scene is exceeding the dynamic range of the image sensor, causing the imageto ‘clipped’ in the dark and/or bright region.

There are a number of methods that can be used to improve the dynamic range ofthese chips, some of which are outlined below.

4.1.1 Well Adjusting Capacitors

This is a method created originally for CCD sensors but has been applied to CMOSsensors[23].

Both CCD and CMOS active pixel sensors work in direct integration[26]. Directintegration is when a circuit is reset to a voltage VReset, before exposure to light.

Figure 4.1: Dynamic range of the Nexus One Smartphone.

26

Page 27: Mobile Eye - Matthew Gaunt

Figure 4.2: Direct integration circuit on the left and saturation levels of the charge onthe right.

Figure 4.3: Example of the saturation levels of a single well capacity adjustment.

During the exposure to light, the photocurrent (current produced by photons hittingthe sensor) drains a capacitor CD. After a set exposure time the negative charge ofthis capactor is read.

In figure 4.2 you can see the diagram of the circuit on the left. On the right is adiagram of the charge read after the integration time (or exposure time) tint.

Well adjusting capacitors operate by changing the saturation value of the circuitover the exposure time. This is achieved through the use of a control gate which altersthe clipping value. The clipping value is the amount of charge that can be stored, afterthe charge exceeds this, the extra charge is passed through the control gate to a sinkregion (ground). The clipping value acts as the saturation value Qsat and is determinedby the magnitude of voltage given to the control gate. This voltage is altered accordingto some function which can be defined for the best results[49].

Figure 4.3 has an example of the well capacitance being adjusted once during theexposure time.

4.1.2 Multiple Capture

Multiple caputre, simply takes several photos of the same image with varying expo-sure’s, bright parts of the image are captured by short exposure times and dark areas

27

Page 28: Mobile Eye - Matthew Gaunt

Figure 4.4: Spatially varying pixel exposure, e0, e1, e2 and e3 each represent a differentexposure time for a pixel, where e0 < e1 < e2 < e3, giving a set of images varying inspace and exposure.

are captured on long exposure. These images are then merged together and someresearch has shown that averging over the images can give good results[26].

4.1.3 Spatially Varying Pixel Exposures

This technique sacrifices spatial resolution for high dynamic range images by assigningmultiple sensors to each pixel. A filter is then applied to each sensor, giving 4 valuesfor a single pixel, each with a different exposure, so a bright pixel will likely have amaximum value and a lower value and a dark pixel will likely have both a zero andnon zero value [42]. A further explanation of this is given in 4.4.

4.1.4 Time to Saturation

This method of increasing the dynamic range works by calculating the time it takesfor each individual pixel to reach the saturation point. This is achieved by giving eachpixel a local processor which is triggered by the magnitude of the light intensity [20].Once light hits a sensor it calculates the saturation time and the shortest saturationtime determines the exposure, which is applied to all the sensors. However having aprocessor per pixel can increase the pixel size to an unacceptable amount [26].

4.2 Platform

Here I am going to discuss the Android platform and some unique feature’s it has rele-vant to the MobileEye implementation. I won’t be covering why the Android platformhas been selected, that is discussed on page 33.

4.2.1 Dalvik Virtual Machine

Google originally started with exetremely tight constraints on it’s expectation of whathardware Android could run on, setting the target of having 64MB of RAM for it’slow end devices. This has lead to a number of design choices to enable application’s torun using as little memory as possible since the platform is intended to enable multi-tasking. It’s expected that after all the high level processes (libraries and backgroundservices) have been started there is only approximately 20MB of memory left from theoriginal 64MB.

The first major change the Dalvik Virtual Machine does is transform the compiledjar class files into a .dex file format. Traditionally n Java classes will generate n classfile when compiled. A dex file will merge these together and generate a share pool

28

Page 29: Mobile Eye - Matthew Gaunt

Figure 4.5: Small example of Zygote memory sharing.

of constants which is ‘shared more’ than a set of class files. A share pool is whereconstants are stores, things like strings, methods, types etc. A simple example of thisbeing done on a set of class files is shown on page 52. This gives impressive results interms of the file size of a dex file, often being created at least 2% less than a compressedjar file (dex files are later compressed into an android package .apk files).

Like other OS’s, memory is divided into different kinds and Dalvik identifies thefollowing 4, clean shared, clean private, dirty shared and dirty private.

Clean memory is simply data that the OS knows it can drop or replace withoutfear of disrupting the system (i.e. data back-ed up by a file that can be re-read in).

In the clean shared and clean private memory are the libraries and applicationspecific dex files. In the private dirty memory, you have the memory address spaceassigned to that specific process, which serves as the application heap.

For the shared dirty memory Dalvik has a process called Zygote, which loads theclasses and libraries it believes will be used by a number of applications, preventingeach one loading it’s own version of the library. Zygote is responsible for creating afork whenever a new application is to be started and the forked child process thenbecomes the main application. This is described as a standard unix fork, which wouldsuggest it is at this point the application is given it’s own address space. Copy-on-writesemantics are used to share the zygote shared dirty memory, as shown in figure 4.5.This later plays an important part of the garbage collecting in the Dalvik VM whichis why it is mentioned here.

Normally on a standard unix fork, the child process is given it’s own copy of it’sparents memory, copy-on-write however allows processes to share the memory until achild process attempts to write it. If this occurs it is given it’s own copy of the memory.

Each process has it’s own garbage collection and Zygote has been the deciding factoron how the garbage collection is performed on data. There are generally 2 methodsof storing some mark bits to indicate to the garbage collector (GC) what it should do

29

Page 30: Mobile Eye - Matthew Gaunt

to the data. One method stores the mark bits with each object, the second optionstores the mark bits seperately (in parallel). But because Zygote shares it’s memory,if one process marks this to be cleaned, then GC will attempt to clean up Zygote’sobjects which will then affect the other processes using the same memory. (Note: Thecopy-on-write memory is not the same memory as the shared memory, copy-on-writeis the Zygotes heap).

Finally the android platform does a number of things to improve the effiency ofthe code (beyond optimisations of the java compiler). So on installation, the Androidplatform performs verification to ensure the dex file is well formed and correct. Opti-mizations are done to the code, such as static linking, inlining of native methods etc.These optimizations are done at install time to save the amount of work needed lateron and I imagine this is done at install time to convert a standard dex file into anoptimized dex file suitable for that version of the Android platform.

4.2.2 Application Development

The recommended method for development on the Android platform is through theEclipse IDE with the Android Plug-in. There are a number of versions of Android, themain versions that are on devices in the public are 1.5, 1.6 and 2.1.

The provided emulator is fairly robust and offer’s a wide range of tools and hardwaresimulation, however the camera on the emulator is extremely poor. It offers a movingpattern with no way to feed in data. For this reason the development needs to be donedirectly on the device, but there is little extra effort to do this as the emulator anddevices integrate with the IDE through the same method - android debug brige (adb).

The general structure of an application is that each individual screen view is anActivity. An activity is designed to give an application the ability to handle theapplication’s reaction as it goes through the activity lifecycle.

The activity lifecycle is Android’s technique of handling your application becomesthe top of the application stack, moves down the stack by other applications andeventually gets brought back to the top or removed from the stack.

To handle this, each activity is given access to override the following methods[1]:

public class Activity extends ApplicationContext {

protected void onCreate(Bundle savedInstanceState);

protected void onStart();

protected void onRestart();

protected void onResume();

protected void onPause();

protected void onStop();

protected void onDestroy();

}

Because MobileEye is intending to maintain bluetooth and socket connections withother machines this needs to be handled correctly through these methods.

The activity of each screen can be thought of as the activity responsible for the UIthread and it’s elements. As an example, if you try and edit a TextView (Just a widgetto display a string) from a different thread, it will cause an error in the application.

30

Page 31: Mobile Eye - Matthew Gaunt

So to overcome this, a Handler method is used which acts as a message queue betweenother threads and the main UI thread. This is exetremely flexible and can be used fortransmitting a ’Message’ or ’Runnable’s. Where a Message is a data structure whichcan store certain information and Runnables are an easy way to implement a Threadwhere you only need to use the run() method of the Thread class and don’t wish tocreate a seperate sub-class of Thread.

Android has a process where the process is considered to be dead if the UI locksup for longer than 5 seconds [6]. At which point the system will offer to force close theapplication. This means the image processing needs to be done on a worker thread, butthe communications need to have their own threads to be able to read and respond, asquickly as possible, to communications.

4.2.3 Security, Intents and Receivers

The Android platform has a security feature where an application must state whathardware and information it wishes to have acess to in the AndroidManifest file. Thisfile stores the list of Activities and can be used to identify any broadcast receivers.The reason for this is that a user must give the application permission to use these oninstallation.

Intent’s are used to launch an ‘intention’to do something. So an example of thiswould be to turn on bluetooth from your application. This can’t be done for securityreasons, so a system application is called to handle this for us. Such an intent wouldbe performed as [4]:

if (!mBluetoothAdapter.isEnabled()) {

Intent enableBtIntent = new Intent(BluetoothAdapter.ACTION_REQUEST_ENABLE);

startActivityForResult(enableBtIntent, REQUEST_ENABLE_BT);

}

This intent is acknowledged by the system and any application’s registered to handlethis intent will be launched. The idea behind this is that an application can supplimentit’s features through other applications[10].

BroadcastReceivers are used to receive information from the system, so a simpleexample of this is the buttons on the handsree set of the device. The system willregister each button press and pass it to any broadcast receiver that wants to be awareof that event.

4.2.4 Bluetooth and Camera api

The bluetooth api for Android is fairly simple and the api includes some example codeto turn on the bluetooth device from the application and how to perform a scan ofnearby devices.

The documentation for the Camera api is fairly well explained, but there are aspectsof it that have little to no documentation. The main problem with the api is the previewcallback provided by the api, while this is exactly what is needed, the data is given asjust a byte[], with little explanation of how the data is structured.

The documentation claims the default format is YCbCr 420 SP, but I couldn’t findany informatoin about this format, suggesting it is specific to the Android platformand not widely used. The closest version that is widely explained is the YCbCr format.But to try and avoid having to read in and convert this to a friendlier format I tried tochange the preferences of the Camera’s encoding to RGB 888, RGB 565 or RGB 332.These formats stored the values of red green and blue using R8 G8 B8 bits, R5 G6 B5bits or R3 G3 B2 bits, where R8 is 8 bits to represent red, G8 is 8 bits to represent

31

Page 32: Mobile Eye - Matthew Gaunt

Figure 4.6: YUV 420 format and byte stream[14].

green etc. However this doesn’t effect the encoding used for the preview (i.e. thissetting appears to be ignored).

Further research pointed me to a forum post someone had discussed with regardsto the format of the image data [2]. It was claimed that the G1 had a different formataltogether and was using the YUV 420 semi-planar encoding. This encoding startswith a set of luminance values for the image Y (which has a luminance value for eachpixel), and the U and V components are applied one value to 4 Y values, with all theU values appearing after the Y values and V values after the U values. An illustrationof this is shown in figure 4.6, however the Android implementation of this is meant tobe slightly different to this. The U and V values are one after the other, not joinedtogether in the data as shown in figure 4.6.

Both the YUV and YCbCr format have the value Y representing the luminance ofthe image which is suitable to get a greyscale version of the image. I found some codefrom an open source project that extracts this and I have adapted it accordingly toobtain a scaled version of the image at read-in time, to give some minor speed up[15].

32

Page 33: Mobile Eye - Matthew Gaunt

Chapter 5

Project Execution

The implementation consisted of a number of components that which been seperatedas much as possible to give the most flexibility possibile. The key components of thesystem were:

• Mobile application handling the image processing to search for free space

• Hardware to perform projection rotation

• UI to be projected onto a surface (done in Python)

• A client-server application to handle object recognition (done in Java)

Each of these sections will be discussed in depth outlining my choices and reasoningbehind them.

5.1 Tool and Language Choices

5.1.1 Mobile Application

The main requirements of the mobile platform used to implement this work was it hadgive access to the camera and have means to communicate with bluetooth devices aswell as internet connectivity.

The decision was heavily based on hardware since I only have access to an An-droid device, but also because it was free and has one of the most active developmentcommunities.

The language used in Android is Java, although there is a native SDK (NDK)counterpart which allows development in C or C++. From the Android NDK site, theNDK provides increased performance for ‘..self-contained, CPU-intensive operationsthat don’t allocate much memory’[3]. I would expect MobileEye to be both CPU-Intensive but also allocate alot of memory use by handling a number of images, soit would seem to offer possible improvements, however the documentation at [3] alsoexplains that most applications gained increased complexity from using the NDK, be-cause of this I decided to use the Java SDK would be the most suitable for development,since Java is a language I am comfortable developing in and meant I could spend moretime focusing on the problem and learning python.

The mobile device I have used for implementation is the Nexus One [8], which hasa 5 mega pixel camera and a 1GHz processor and 512MB Ram. This does need to bekept in mind that the processor is fairly fast compared to some other smartphones.

The techniques implemented have not been developed to favour the high dynamicrange techniques used in the Nexus One’s camera (it’s unclear whether it is a CMOSor CDD camera chip, both of which can offer high dynamic range), although it doeshelp the algorithm to perform well.

33

Page 34: Mobile Eye - Matthew Gaunt

Figure 5.1: A diagram on University of Bristol’s current steerable projector.

5.1.2 Hardware for Projection Rotation

Ideally a laser based projection unit would be used for this project. Again this choicewas dictated by available hardware. I have used a Samsung DLP projector which isapproximately 5x2x3.75 inches (WxHxD). The main disadvantage of this is the sizeis larger than a mobile device and requires manual focusing. The advantage is thisprojector is it will be brighter than a smaller, handheld projection unit. This is good forimplementation but hides the capabilities of an actual implementation of this system.

I was lucky enough to be given some guidance for implementing the system forrotating the projection. Internal research in the University of Bristol was done tocreate a custom piece of hardware which used a bluetooth controlled motor to movea mirror over a handheld projector. This took several months to build and there wasno guarentee that it could be used for a projector the size of the one available to meand it wasn’t able to rotate horizontally. For these reasons, I created a custom piece ofhardware which would be controlled manually to position a projection both verticallyand horizontally.

The custom hardware was designed to strap around the projector, consisting of 2parts, a fixed base attached to the projector unit with a circle cut into it, larger thanthe projection lense diameter. The second piece had the mirror attached it on hinges,which had small hooks that fitted onto the base part of the hardware. The hinges onthe mirror gave the vertical positioning and the second part was able to rotate aroundthe base, giving horizontal positioning. See figure 5.2 for photos.

An ideal steerable projection system would be a similar system to the one usedin the Everywhere display 2.8. This would obviously need to be made to a muchsmaller scale and would most likely need to be developed by someone with a strongerbackground in hardware development. Ideally a USB connection would be the bestsolution for communication with the device to control it’s movement, but this wouldrequire further research into api’s available to use a USB connected device.

5.1.3 Projection UI

UI Design is something I have never spent much time doing in previous development, Ihave only been exposed to Java and python languages for use in UI implementations.From my experience Java was far more complex than python. In terms of developmentof the backend of the program, it had to be capable of communicating via bluetooth

34

Page 35: Mobile Eye - Matthew Gaunt

Figure 5.2: Images of the Steerable Projection Hardware. Section A shows the fixedbase on the left and the rotating platform on the right. Section B shows these to piecesfitted together on the projector. C shows how the mobile device would be attachedand section D shows the mirror fitted to the hardware.

35

Page 36: Mobile Eye - Matthew Gaunt

with the mobile device and update the UI accordingly. Both are capable of doing thisso I decided to use python since my previous experiece with python resulted in a muchquicker development process compared to Java.

I used a GUI builder called Glade to develop the layout of the UI, then used someexample code from the PyBluez documentation [11] (PyBluez is a wrapper for thelinux bluetooth protocol). The example code helped create the java connection as well(using the correct UUID’s etc).

The resulting application works extremely well, I implemented some code to rotatethe image after the projection had been rotated (when you rotate the mirror to projectleft or right, the result projection image is rotated. The bluetooth has occasional con-nection problems if the mobile device connects at about the same time as a connectiontimeout. The timeout’s can be stopped but this results in the connection never get-ting closed when the program wishes to end (although I expect there is some way toovercome this problem, this method had the extra addition that it regularly ensured alive connection is available otherwise it reset).

5.1.4 Object Recognition

I used a binary of the FabMap application to see how the results possible but also tosee if any interesting mapping data could be obtained from it’s use. Because it wasa binary it meant that it only required some conifguration changes to work with acustom set of initialising images. But to integrate this with the application a used aJava client-server socket implementation. The reason for this is that it seems infeasiblethat the image processing required for this would be done on a mobile device. Insteadit would be accomplished on a server (most likely implemented using a MapReducetechnique since it is a highly parallelisable task).

Because I don’t have access to a server that could this application online safely, Iimplemented the client-server application on the same machine which run’s the pro-jector UI. During the initialisation of the mobile application, an IP address is requiredto use this method, but may skipped if not needed.

The reason for choosing Java was simply because I had used it before and it wasable to run shell scripts from within the program which meant I could easily run theFabMap algorithm after downloading the image from the mobile device.

Obviously in a real product situation this may be replaced a different language orplatform to ensure safe and secure methods for external api use. This would also bedone using a custom implementation of the FabMap algorith rather than a binary.

5.2 Space Finding Algortihm

There is no clear research into methods of finding space in a scene, so for this projectI developed a custom space finding algorithm designed to work on a mobile device atan acceptable speed that it could be used to find and project in real time.

The original idea was to use an existing edge detection method to detect whereitem borders are and from this detect where there might be larger area’s which mightbe appropriate. However the time taken to convolve an edge detection kernel with animage took to long while still keeping a reseanable size image to work on.

To overcome this I decided to try and achieve a similar result through the use ofthresholding. Here I will outline the process performed to create the final algorithmimplemented.

36

Page 37: Mobile Eye - Matthew Gaunt

Figure 5.3: Initial histogram data from a phone image.

5.2.1 Histogram Data

The first step was to consider the output of a small image from the mobile device andthe histogram representation as shown in 5.3.

It’s clear to see that data is highly noisy and suffers from extremes (the number ofpixels with a value of approximately 254). But from a human perspective you couldmake an estimate of about 3 or 4 gaussian distributions that could be used to segmentthe image.

While it would be possible to run through the array and pick out the biggest valuesfrom this data, it would be difficult to calculate with some reliability the approximatemean and variance. Because of this I implemented a simple smoothing technique,which averaged each pixel value with the 2 pixel values either side of it. This had 2advantages, it made it clearer where the peak and width of the distributions were. Anumber of these sample images with their approximate thresholded images are shownin the appendix on page 45.

The effect of this on the data, had a similar effect to blurring the image. If youapply a gaussian blur to an image, the effect in the histogram is similar to the resultsachieved through this method.

The great advantage of using this method is that the image size doesn’t becomethe largest concern in terms of effiency, since it’s values are read into these values tocreate the histogram, which is implemented in a bucket sort algorithm, and then onlythe histogram data is used in the further algorithms, which is limited to at most 255values.

To speed up the algorithm I reduced the data further to 64 buckets, giving groups

37

Page 38: Mobile Eye - Matthew Gaunt

of 4 (values 0-4, 5-9, 10-14 etc). By putting the data into these buckets it performed agood level of averaging in itself (since values with extremes were added to the smallerneighbouring values) to then eliminate any further noise the original averaging methodpropesed above was applied (as shown in the appendix on page 51).

The next problem was to perform the hill climbing that would find these peaksand estimate the width of the peak (variance). Before implementing a hill climbingtechnique it is worth noting that in the majority of images, there tends to be approx-imately three dominant distributions that represent the data. This is an assumptiointhat is based on the results of images used during the development. This value is usedto heavily model the expected data from these images, but has proven to be exetremelysuccessful in the results and always accounted for the majority of pixels in the image.

5.2.2 Hill Climbing

The original method of hill climbing was searching for the peaks in the data, after thisselecting the top three and traversing along the peak until a trough was found on eitherside. The peaks were searched for using preset points along the graph, the motivationbehind this was to avoid any noise that still may exist in the data.

This was later improved to be a single pass, maintaining pointers of the minimum,peak top and maximum bucket for each distribution. The results of hill climbing canbe seen in figure 5.4.

As you can see the result’s are highly satisfactory if you take the highest value peak,which represents the piece of paper. This however is an idealistic image that happenedto work for the outlined so far.

The problem that can occur if you only consider the highest value peak out of thetop three is that the top 2 peaks may actually be part of the same region in the imageand by treating them as seperate distributions can give the results as shown in figure5.5. The simplest method to solve this, is to identify that these 2 peaks as meeting ata trough and this gives a suitable merging rule. There are situations where this isn’tsuitable, if the image has peaks at either end of the histogram and happen to meet inthe middle, it is highly unlikely that they represent the same region, because of thisthere is a limit put on how far apart the peaks can be to be merged into the samedistribution.

5.2.3 Area Extraction

So far I’ve explained how the segmentation is achieved, from this the largest free regionneeds to be identified.

The main point of segmenting was to extract the largest and brightest area, butthe problem is identifying where this largest area is. Given a group of pixels assignedto the same threshold, the challenge is to find the largest region of these pixels andignoring any noise that may exist in the group.

My initial thoughts were to implement a full region growing algorithm which wouldidentify all the groups in the image matched to the threshold, then select the largestregion. This would require alot of computation to be spent on the smaller regions.Instead I used the much simpler method of averaging a point a = (x, y) which isaveraged for all the pixels assigned to the group. From this the pixel is grown verticallyand horizontally in a square until one direction can no longer grow. Then the otherdirections grow until such a time that the box is consuming as much space as possible.

The reason of taking this over the alternatives (which will be discussed in futureresearch 6.2) is that the simplicity gives good time complexity and the end resultsare good. There are however, two major drawbacks of this method. The averaging

38

Page 39: Mobile Eye - Matthew Gaunt

Figure 5.4: Hill climbing results of a white piece of paper. On the left: the cumulativevalues of each pixel group (0-4, 5-9 etc). The highlighted values (orange) repesent thepeak group of that distribution and the colours along the side repesent each distribu-tion. On the right: The corresponding thresholded images for each distribution, wherewhite pixels indicate they are a part of that distribution / group / threshold.

Figure 5.5: This image has been split into three regions (illustrated by white, grey andblack pixels) and from the histogram the peaks of the white and grey region are 34and 42.

39

Page 40: Mobile Eye - Matthew Gaunt

Figure 5.6: An example of where the space finding algorithm will choose a less thanoptimal solution. If the center point (shown as the red circle in this image) is placedbetween two regions not classified as the same region by the thresholding (shown asthe solid black bars), the region growing will grow to the width of this column andgrow vertically (Region 1). Ideally region 2 or 3 would be selected as the projectionarea is far larger.

technique works well providing there is a large enough section within the scene tocontribute most to the average of a. This means that smaller area’s which may besuitable for projection in a scene, won’t be found since only a small amount of noisein other parts of the image will move the center point a far off (possibly into a sectionof the image not included in the threshold) meaning the region growing will simplyfail. The other problem with this technique is the region growing of the box, it doesn’ttry any alternative methods for growing, growing simply in all directions evenly untileach direction reachs a boundary. If the center point a happned to land in betweentwo small columns of a large space, as illustrated in 5.6, then it will fill this columnand grow directly down, despite a far more optimal solution existing else where in theimage.

5.2.4 Application Structure

The Android application is governed by a set of states. This states are used to givethe system as much stability as possible.

The states are as follows:

Initialising When the application sets up any bluetooth or image processing connec-tions on the first 2 screens, these connections aren’t actually opened until thecamera activity is opened. The reason for this is it makes it easier to handle aconnection that fails (in terms of what activity needs to finish).

When the camera activity starts up the bluetooth and object recognition serversare connected. If either of these fail then the application is reset. It is possibleto skip these connections as they aren’t required, it just gives the applicationlimited use. Once the connection is established the state is changed

Find Area This state is simply the first stage at finding a projectable area, if it findsa suitable sized area the state changes, otherwise the state remains unchangedand repeated

Test Projection Area This state waits for approximately 3?? seconds with eachimage frame being tested that the projection area’s mean pixel value stays thesame. If this changes beyond a threshold it sets the state back to finding an

40

Page 41: Mobile Eye - Matthew Gaunt

area. After 3 seconds the state is changed to projecting markers. When the statechanges the bluetooth connection is informed to project the markers.

Setting Up Markers This state has the sole purpose of acting as a time out. Theuser is required to indicate that the markers have been set-up (moved into posi-tion) by a button press. It is assumed that the area will remain constant duringthis period. This would be different for a mechanical motor as it would be farfaster than the manual configuration. If the button isn’t pressed the state is setback to finding area, otherwise it scans to obtain points the marker.

Projecting Markers This state expects to see the marker in the centre of the screenand will find the corners of the marker. If this is successful the state is movedonto projecting data, the coordinates are then sent to the UI (over the bluetoothconnection) which would be used to apply keystoning and change the markerfor the appropriate image. If this doesn’t succeed the state is changed back tofinding area.

Projecting Data This is the final state and set’s a new image average pixel value. Ifthis goes above another threshold the system resets it state again.

Throughout these states it is possible to take a photo to send for object recognition.This information is sent when changing from the projecting markers state to projectingdata.

41

Page 42: Mobile Eye - Matthew Gaunt

Chapter 6

Project Status

6.1 Current Status

The current status of the implementation is a reliable and flexible system. Howeverthere are a number of changes that could advance the system’s performance and alsoa number of changes that would bring the system closer to a final implementation.

Below is outline of each section and my thoughts on what would be most suitableto change.

6.1.1 Projector UI

The python UI has worked exetremely well and performs reliably. By using the blue-tooth connection it should be suitable for most bluetooth enabled mobile devices, theonly thing that is required is the use of the same protocol’s used in the MobileEye sys-tem. What this means is simply using the appropriate XML calls between the mobiledevice and the laptop connected to the computer.

The only thing that wasn’t implemented was the keystoning of the projected image.My implementation finds and passes four corresponding points from the camera frameto the python UI application, but nothing is done with the data.

The methods outlined in [52, 53] should be reasonably easy to calculate, althoughmay require some external image processing libraries to apply the transformation (i.e.openCV’s python wrapper).

The code for handling the rotation for the steerable projection (when the projectionis steered left or right the projected image rotates as well) was disabled as hardwarewasn’t being used for the final implementation, so may require enabling for future use.

Further work would need to be done with this if a laser based projection was usedsuch that the background colour of the application was changed from grey to black.The advantage of this, with laser projection, is that these pixels will be completelyswitched off, at the moment the UI will project a white / grey default colour, creatingan un-desired border around any projection.

• Discuss the no colour consideration

• Dicuss the algorithms for maximising the space used

6.1.2 Android Application

I am fairly happy with the Android Application as it stands, the Activity lifecyclehandles the client-server protocol and bluetooth connection’s appropriately and thememory of the application seems fairly stable. I believe the application could be fine-tuned to give a slight speed up but nothing too extensive.

42

Page 43: Mobile Eye - Matthew Gaunt

Future work may be need to make this application work on older versions of An-droid, but the only real api differences I would expect would be in the camera and textto speech api’s.

I implemented the code to handle button presses of the handsfree kit, the mainidea being that a limited amount of interaction could be given to the device throughthis rather than touch screen (To handle object recognition and also to identify whenmanual set-up of the steerable projection was done). This would be ideal for any futureimplementations with full

6.1.3 Image Recognition

I didn’t get the time to see what level of results could be achieved by using the FabMapalgorithm and an interior mapping system. I had hoped the binary of the FabMapimplementation would be more flexible than it was, but lead to extremely slow per-formance because of the nature of it’s implementation. I think this part of the imple-mentation is the one that requires the most attention. The following things are whatI would like to change:

• Implement a custom version of a standard SURF image recognition and thenimplement and compare the FabMap algorithm

• Maximise the speed of system by using the parallelisable possibilities providedby the SURF system through the use of a MapReduce system for matching andscale space creation.

While there are a number of techniques, Google are improving their image searchfeatures and they are showing signs of releasing api’s to use their service[7]. Whilethere is no time scale on the release of this service, Googles vast computing and accessto images would give them a number of advantages that would be difficult to matchwithout investing alot of time and effort.

6.1.4 Hardware

The hardware wasn’t particularly useful by the end of the project. Because of the sizeof the projector there was only one way to position the phone so you could see thedevices screen, but this meant it wasn’t possible to move the mirror vertically enoughto position the porjector everywhere in the camera’s view.

I think a smaller projection system would be able to overcome this problem, butmaking the same version of the hardware to work on such a small device may provedifficult, so I would expect a different design might be needed.

6.1.5 Aims Achieved

Discuss the aims and what was achieved what wasn’t

6.2 Future Work

6.2.1 Depth

At the moment there is no calibration of the projector to handle depth with the projec-tor. This is overcome by manually moving the mirror, but in a real life implementationthe camera can’t be aligned properly without depth. A clear example of this is givenin figure 6.1. Notice in the example that the projector and camera are next to each

43

Page 44: Mobile Eye - Matthew Gaunt

Figure 6.1: Illustration how depth of a projection area is required. The camera angles(both vertical and horizontal) θcv and θch remain the same while the angles to centerthe projection are varied (θph1 and θph2). Note: the vertical angle doesn’t change andwould be kept the same the camera angle.

other at the height. This means that the vertical angle between the camera and thecenter of the projectable area will be the same as the angle needed to centre the pro-jection. You could obviously swap this orientation so the camera and projector wherevertically aligned (i.e. one below the other) but now on different heights. This wouldswap everything around (horizontal angles the same vertical now changes with depth.

This could be overcome simply projection different coloured markers across thevertical section of the screen, which could then be recognized by the camera and couldbe used to work out the difference between the projectable surface centre and thecoloured marker (the camera should be able to calculate the distance from the centreprojected (colour) marker. Having multiple markers means if a centre marker was notprojected onto the surface the width between other markers could be used to infer somelevel of depth to estimate the amount rotation need to get the centre marker onto thesurface, which can then be refined.

6.2.2 Automatic Object Recognition

Discuss how it might be possible to achieve this on a Mobile Device.

6.2.3 Space Finding Algorithm

The algorithm defining the free space within which to project is fairly reliable andstable.

My biggest concerns with it is that area extraction suffers from the problems out-lined on page 38. What might be possible (to achieve in a fast time) is to divide theimage into a grid (size can be determined by performance), where during the segmen-tation (pixels are defined into groups) the grid is filled with a score. Each grid is givena point for each pixel in it’s region which is a member of that group. Then perform amuch simpler region growing method on the maximum size.

The key to this is that it uses a big enough grid to simplify the process and performsa greate overview of the regions. From the region define an approximate center pointfor the area extraction to continue as normal, instead of using a weight point to definethe centre.

44

Page 45: Mobile Eye - Matthew Gaunt

Appendix A

Space Finding Appendix

A.1 First Averaging and Thresholding Test’s

45

Page 46: Mobile Eye - Matthew Gaunt

46

Page 47: Mobile Eye - Matthew Gaunt

47

Page 48: Mobile Eye - Matthew Gaunt

48

Page 49: Mobile Eye - Matthew Gaunt

49

Page 50: Mobile Eye - Matthew Gaunt

50

Page 51: Mobile Eye - Matthew Gaunt

A.2 Averaging and Thresholding Improvements

51

Page 52: Mobile Eye - Matthew Gaunt

A.3 Dalvik

52

Page 53: Mobile Eye - Matthew Gaunt

Bibliography

[1] Activity.

[2] [android-developers] re: Android camera preview filter using cam-era.previewcallback.onpreviewframe.

[3] Android ndk.

[4] Bluetooth.

[5] Coolpix. 1000pj.

[6] Designing for responsiveness.

[7] Google will make goggles a platform.

[8] Nexus one.

[9] Online consumer-generated reviews have significant impact on offline purchasebehavior.

[10] Openintents.

[11] Pybkuez documentation.

[12] Samsung i8520 ’halo’ android 2.1 phone with 3.7-inch super amoled and picoprojector.

[13] Worlds first video projector mobile phone epoq egp-pp01.

[14] Yuv.

[15] Zxing (”zebra crossing”).

[16] Smartphone sales to overtake mobile phone sales by 2012, Nov 2009.

[17] Blob detection, May 2010.

[18] Hough transform, May 2010.

[19] Stanislaw Borkowski, Oliver Riff, and James L. Cowley. Projecting rectified im-ages in an augmented environment. IEEE International Workshop on Projector-Camera Systems, Oct 2003.

[20] Vladimir Brajovic and Takeo Kanade. A sorting image sensor: An example of mas-sively parallel intensity-to-time processing for low-latency computational sensors.IEEE International Conference on Robotics and Automation, Apr 1996.

[21] Edward Buckley. Invited paper: Holographic laser projection technology. SIDSymposium Digest of Technical Papers, 39(1), May 2008.

53

Page 54: Mobile Eye - Matthew Gaunt

[22] Andreas Butz and Christian Schmitz. Annotating real world objects using a steer-able projector-camera unit. IEEE International Workshop on Projector-CameraSystems, June 2005.

[23] Steven Decker, R. Danial McGrath, Kevin Brehmer, and Charles G. Sodini. A256 x 256 cmos imaging array with wide dynamic range pixels and column-paralleldigital output. IEEE Journal of Solid-State Circuits, 33(12), Dec 1998.

[24] Cornelia Fermuller. Non-maximum suppression.

[25] Mark Freeman, Mark Champion, and Sid Madhavan. Scanned laser pico projec-tors: Seeing the big picture (with a small device).

[26] Abbas El Gamal. High dynamic range image sensors. Stanford University Website,2002.

[27] Andrew Greaves and Enrico Rukzio. View & share: Supporting co-present viewingand sharing of media using personal projection. International Journal of MobileHuman Computer Interaction, 2010.

[28] Chris Harris and Mike Stephens. A combined corner and edge detector. Proceedingsof the Alvey Vision Conference, 1988.

[29] Larry J. Hornbeck. Digital Light ProcessingTM for High-Brightness, High-Resolution Applications. Texas Instruments Incorporated, P.O. Box 655012, MS41,Dallas, TX 75265, February 1997.

[30] Texas Instruments. Dlp R© projectors glossary.

[31] Texas Instruments. How dlp technology works, 2009.

[32] David Kriegman. Homography estimation. CSE 252A, 2007.

[33] Johnny C. Lee, Paul H. Dietz, Dan Maynes-Aminzade, Ramesh Raskar, andScott E. Hudson. Automatic projector calibration with embedded light sensors.ACM Symposium on User Interface Software and Technology (UIST), Oct 2004.

[34] Markus Lochtefeld, MichaelRohs, Johannes Schoning, and Antonio Kruger. Ma-rauders light: Replacing the wand with a mobile camera projector unit. Mobileand Ubiquitous Multimedia, 2009.

[35] David G. Lowe. Object recognition from local scale-invariant features. ICCV,1999.

[36] Krystian Mikolajczyk and Cordelia Schmid. Indexing based on scale invariantinterest points. ICCV, 2001.

[37] Pranav Mistry. Pranav mistry: The thrilling potential of sixthsense technology,November 2009.

[38] Pranav Mistry and Pattie Maes. Pattie maes and pranav mistry demo sixthsense,03 2009.

[39] Pranav Mistry and Pattie Maes. Sixthsense: A wearable gestural interface. SIG-GRAPH Asia 2009, Sketch., Yokohama, Japan., December 2009.

[40] Pranav Mistry, Pattie Maes, and Liyan Chang. Wuw - wear ur world - a wearablegestural interface. CHI EA ’09, Apr 2009.

54

Page 55: Mobile Eye - Matthew Gaunt

[41] Andrew Molineux, Enrico Rukzio, and Andrew Greaves. Search light interactionswith personal projector. Ubiprojection 2010 1st Workshop on Personal Projectionat Pervasive 2010, May 2010.

[42] Shree K. Nayar and Tomoo Mitsunaga. High dynamic range imaging: Spatiallyvarying pixel exposures. IEEE CVPR, 2000.

[43] Claudio Pinhanez. The everywhere displays projector: A device to create ubiqui-tous graphical interfaces. Proc. of Ubiquitous Computing 2001 (Ubicomp”01), Sep2001.

[44] Claudio Pinhanez. Using a steerable projector and a camera to transform surfacesinto interactive displays. Conference on Human Factors in Computing Systems,2001.

[45] Claudio S. Pinhanez, Frederik C. Kjeldsen, Anthony Levas, Gopal S. Pingali,Mark E. Podlaseck, and Paul B. Chou. Ibm research report: Ubiquitous interactivegraphics. IBM Research Report RC22495 (W0205-143), May 2002.

[46] Projector.com. Projector display types: Crt or dlp or lcd?

[47] Ramesh Raskar and Paul Beardsley. A self-correcting projector. IEEE ComputerVision and Pattern Recognition (CVPR), Dec 2001.

[48] Enrico Rukzio and Paul Holleis. Projector phone interactions: Design space andsurvey. Workshop on Coupled Display Visual Interfaces at AVI 2010, May 2010.

[49] Michel Sayag. Non-linear photosite response in ccd imagers (patent).

[50] Michael Schmitt and Ulrich Steegmuller. Green laser meets mobile projectionrequirements. Optics and Laser Europe, pages 17–19, 2008.

[51] Johannes Schoning, Markus Lochtefeld, Michael Rohs, Antonio Kruger, and SvenKratz. Map torchlight: A mobile augmented reality camera projector unit. Con-ference on Human Factors in Computing Systems, 2009.

[52] Rahul Sukthankar, Robert G. Stockton, and Matthew D. Mullin. Automatickeystone correction for camera-assisted presentation interfaces. Advances in Mul-timodal Interfaces—Proceedings of ICMI, 2000.

[53] Rahul Sukthankar, Robert G. Stockton, and Matthew D. Mullin. Smarter pre-sentations: Exploiting homography in camera-projector systems. Proceedings ofInternational Conference on Computer Vision, 2001.

[54] Jahja I. Trisnadi. Speckle contrast reduction in laser projection displays. Proc.SPIE Projection Displays VIII, Ming H. Wu; Ed., 4657:131–137, April 2002.

[55] Paul Viola and Michael Jones. Rapid object detection using a boosted cascade ofsimple features. CVPR, 2001.

[56] WowWee. Cinemin swivel.

55


Recommended