Computing & Information Sciences Kansas State University Wednesday, 03 Dec 2008CIS 530 / 730:...

transcript

Computing & Information SciencesKansas State University

Wednesday, 03 Dec 2008CIS 530 / 730: Artificial Intelligence

Lecture 38 of 42

Wednesday, 03 December 2008

William H. Hsu

Department of Computing and Information Sciences, KSU

KSOL course page: http://snipurl.com/v9v3

Course web site: http://www.kddresearch.org/Courses/Fall-2008/CIS730

Instructor home page: http://www.cis.ksu.edu/~bhsu

Reading for Next Class:

Sections 22.1, 22.6-7, Russell & Norvig 2nd edition

Vision, Part 2 of 2Discussion: Machine Problem 7

Vision OutlineVision Outline

Physiology of Vision (1 lecture) Overview of Human Visual Percetion (1 lecture)

Need presenter for Monday! Part I: Low-level vision (images as texture)

Texture segmentation, image retrieval, scene models, “Bag of words” representations

Part II: Mid-level vision (segmentation) Principles of grouping, Normalized Cuts, Mean-shift, DD-MCMC,

Graph-cut, super-pixels Part III: 2D Recognition

Window scanning (Schniderman+Kanade, Viola+Jones) Correspondence Matching (schanfer matching, housedorf distance,

shape contexts, invariant features, active appearance models) Recognition with Segmentation (top-down + buttom-up) Words and Pictures

•http://www.cs.cmu.edu/~efros/courses/AP06/

So what do humans care about?

slide by Fei Fei, Fergus & Torralba

Verification: is that a bus?

Detection: are there cars?

Identification: is that a picture of Mao?

Object categorization

building

wallbanner

street lamp

Scene and context categorization• outdoor

• city

• traffic

• …

Rough 3D layout, depth ordering

Challenges 1: view point variation

Michelangelo 1475-1564

Challenges 2: illumination

slide credit: S. Ullman

Challenges 3: occlusion

Magritte, 1957

Challenges 4: scale

Challenges 5: deformation

Xu, Beihong 1943

Challenges 6: background clutter

Klimt, 1913

Challenges 7: object intra-class variationChallenges 7: object intra-class variation

slide by Fei-Fei, Fergus & Torralba

Challenges 8: local ambiguityChallenges 8: local ambiguity

slide by Fei-Fei, Fergus & Torralba

Challenges 9: the world behind the image Challenges 9: the world behind the image

In this course, we will:In this course, we will:

Take a few baby steps…

Physiology of Vision: a swift overview

16-721: Learning-Based Methods in VisionA. Efros, CMU, Spring 2007

Some figures from Steve Palmer

Class IntroductionsClass Introductions

Name: Research area / project / advisor What you want to learn in this class? When I am not working, I ______________ Favorite fruit:

Image FormationImage Formation

Digital Camera

The Eye

Monocular Visual Field: 160 deg (w) X 135 deg (h)Binocular Visual Field: 200 deg (w) X 135 deg (h)Monocular Visual Field: 160 deg (w) X 135 deg (h)Binocular Visual Field: 200 deg (w) X 135 deg (h)

Point of observation

Figures © Stephen E. Palmer, 2002

What do we see?What do we see?

3D world 2D image

Point of observation

What do we see?What do we see?

3D world 2D image

Painted backdrop

The Plenoptic FunctionThe Plenoptic Function

Q: What is the set of all things that we can ever see? A: The Plenoptic Function (Adelson & Bergen)

Let’s start with a stationary person and try to parameterize everything that he can see…

Figure by Leonard McMillan

Grayscale snapshotGrayscale snapshot

is intensity of light Seen from a single view point At a single time Averaged over the wavelengths of the visible spectrum

(can also do P(x,y), but spherical coordinate are nicer)

Color snapshotColor snapshot

is intensity of light Seen from a single view point At a single time As a function of wavelength

Spherical PanoramaSpherical Panorama

All light rays through a point form a ponorama

Totally captured in a 2D array -- P() Where is the geometry???

See also: 2003 New Years Eve

http://www.panoramas.dk/fullscreen3/f1.html

A movieA movie

is intensity of light Seen from a single view point Over time As a function of wavelength

Space-time imagesSpace-time images

Holographic movieHolographic movie

is intensity of light Seen from ANY viewpoint Over time As a function of wavelength

P(,t,VX,VY,VZ)

The Plenoptic FunctionThe Plenoptic Function

Can reconstruct every possible view, at every moment, from every position, at every wavelength

Contains every photograph, every movie, everything that anyone has ever seen! it completely captures our visual reality! Not bad for a function…

P(,t,VX,VY,VZ)

The Eye is a cameraThe Eye is a camera

The human eye is a camera! Iris - colored annulus with radial muscles

Pupil - the hole (aperture) whose size is controlled by the iris What’s the “film”?

photoreceptor cells (rods and cones) in the retina

The RetinaThe Retina

Cross-section of eye

Ganglion cell layer

Bipolar cell layer

Receptor layer

Pigmentedepithelium

Ganglion axons

Cross section of retina

Retina up-closeRetina up-close

Wednesday, 03 Dec 2008CIS 530 / 730: Artificial Intelligence© Stephen E. Palmer, 2002

Cones cone-shaped less sensitive operate in high light color vision

Two types of light-sensitive receptors

Rods rod-shaped highly sensitive operate at night gray-scale vision

Rod / Cone sensitivityRod / Cone sensitivity

The famous sock-matching problem…

Distribution of Rods and Cones.

150,000

100,000

50,000

020 40 60 8020406080

Visual Angle (degrees from fovea)

Cones Cones

FoveaBlindSpot

Night Sky: why are there more stars off-center?

Electromagnetic SpectrumElectromagnetic Spectrum

http://www.yorku.ca/eye/photopik.htm

Human Luminance Sensitivity Function

Why do we see light of these wavelengths?

0 1000 2000 3000

Wavelength (nm)

400 700

2000 C

5000 C

10000 C

VisibleRegion

…because that’s where theSun radiates EM energy

Visible Light

Retinal Processing

Single Cell Recording

Microelectrode

Amplifier

Electrical response(action potentials)

Single Cell Recording

Retinal Receptive Fields

Receptive field structure in ganglion cells:On-center Off-surround

Stimulus condition Electrical response

Response

RF of On-center Off-surround cells

Receptive FieldNeural Response

Center

Surround

On Off

Response Profile

on-center

off-surround

Horizontal Position

FiringRate

RF of Off-center On-surround cells

Receptive Field

Horizontal Position

on-surround

off-center

Response Profile

FiringRate

Center

Surround

On Off

Neural Response

Surround

Center

Receptive field structure in bipolar cells

Receptors

Bipolar Cell

A. WIRING DIAGRAM

HorizontalCells

Direct excitatory component (D)

B. RECEPTIVE FIELD PROFILES

Direct Path

Indirect Path

Indirectinhibitory

component (I)

Visual CortexVisual Cortex

aka:Primary visual cortexStriate cortexBrodman’s area 17

Cortical Area V1

ThalamusLGN

Striatecortex(V1)

DorsalStream

Parietalvisualcortex

Temporalvisualcortex

Eye Opticnerve

Extrastriatecortex

VentralStream

Cortical Receptive Fields

Single-cell recording from visual cortex

TimeTime

Three classes of cells in V1

Simple cells

Complex cells

Hypercomplex cells

Simple Cells: “Line Detectors”

A. Light Line Detector

Horizontal Position

FiringRate

B. Dark Line Detector

Horizontal Position

FiringRate

Simple Cells: “Edge Detectors”

C. Dark-to-light Edge Detector

Horizontal Position

FiringRate

D. Light-to-dark Edge Detector

Horizontal Position

FiringRate

Constructing a line detector

Receptive Fields

Retina LGN

Center-Surround Cells

Simple Cell

CorticalArea V1

Complex Cells

STIMULUS NEURAL RESPONSE

Complex Cells

Constructing a Complex Cell

Simple Cells

Cortical Area V1

Complex CellReceptive Fields

Retina

Hypercomplex Cells

“End-stopped” Simple Cells

Constructing a Hypercomplex Cell

Receptive Fields

RETINA CORTICAL AREA V1

Complex Cell End-stopped Cell

Mapping from Retina to V1Mapping from Retina to V1

Why edges?Why edges?

So, why “edge-like” structures in the Plenoptic Function?

Because our world is structured!Because our world is structured!

Problem: Dynamic RangeProblem: Dynamic Range

15001500

25,00025,000

400,000400,000

2,000,000,0002,000,000,000

The real world isHigh dynamic range

pixel (312, 284) = 42pixel (312, 284) = 42

ImageImage

42 photos?42 photos?

Is Camera a photometer?Is Camera a photometer?

Long ExposureLong Exposure

10-6 106

Real world

Picture

0 to 255

High dynamic range

Short ExposureShort Exposure

10-6 106

Real world

Picture

0 to 255

High dynamic range

Varying ExposureVarying Exposure

What does the eye sees?What does the eye sees?

The eye has a huge dynamic rangeDo we see a true radiance map?

Eye is not a photometer!Eye is not a photometer!

"Every light is a shade, compared to the higher lights, till you come to the sun; and every shade is a light, compared to the deeper shades, till you come to the night."

— John Ruskin, 1879

Computing & Information Sciences Kansas State University Wednesday, 03 Dec 2008CIS 530 / 730:...

Documents