Joint development and implementation of an advanced training program in neurosciences Two countries,...

Joint development and implementation of an advanced training program in neurosciences

Two countries, one goal, joint success!


Ioan BuciuIoan Buciu

Imaging and computer science applications to neurosciences

Department of Electronics and TelecommunicationsDepartment of Electronics and TelecommunicationsFaculty of Electrical Engineering and Information Technology, Faculty of Electrical Engineering and Information Technology,

University of Oradea, 410087, RomaniaUniversity of Oradea, 410087, Romaniae-mail: e-mail: [email protected]@uoradea.ro



Principles of nuclear magnetic resonance (NMR). Relaxation processes. Spin echo◦ Nuclear magnetism. Magnetization◦ Nuclear magnetic resonance. "FID" signal◦ Relaxation processes and constants◦ Spin echo

Principles of MR imaging; magnetic field gradients◦ Gradients of magnetic field◦ Selective excitation of a section◦ Reading gradient◦ Phase coding gradient◦ Contrast in MRI images



The architecture of an MRI system◦ The magnet◦ Emission circuit◦ Receiver circuit◦ Reconstruction circuit◦ Central computer



Basics of image processing and analysis applied in medical imaging◦ Image enhancement and filtering◦ Image segmentation◦ Mathematical morphology applied to medical imaging

Biologically plausible neural models for image representation in the Human Visual System◦ Redundancy, coding and compression principles in the

neural system◦ Dense, sparse and local data representation◦ Simple and complex neural cells model◦ Gabor approach



Biologically plausible neural models for image representation in the Human Visual System◦ PCA approach◦ ICA approach◦ NMF and variants◦ Compressed sensing

Automatic medical classification and recognition techniques◦ Introduction for supervised and unsupervised classification

approaches◦ Artificial Neural Networks (ANNs)◦ Support Vector Machines (SVMs)



The Brain: Born with 1000 trillions of neural cells (neurons) ! BUT we left with ~ 14 trillions neurons … Neural cells are not isolated – connections are formed

linking cells to perform neural processes (learning processes – how to move, memorizing objects, recognizing objects, sounds, familiar faces, activities, etc., cognitive processes).

Some neurons are responsible for visual processes – image compression, image encoding, image representation, image understanding.



Human Visual System – information flow



LGN – Lateral Geniculate Nucleus – 6 levels - luminance information perception (specialized ganglion cells).

Visual cortex – V1 … AIT. V1 (primary visual cortex) – 6 levels conducts

pulses toward higher neural layers when stimulated by: Oriented edges; Various spatial frequencies;

Various temporal information; Some particular spatial locations;



Retina, LGN and V1 (primary visual cortex)



Monocular visual field: 160 degree wide. Binocular visual field: 200 degree wide. Processing is not uniform ! 25 % of cortex is devoted to the

central 5 degrees of the field of view. Processes in the retina can be modeled via difference of

Gaussians. A Gaussian:

Difference of Gaussians (ON center OFF surround):



The eye does not keep the absolute luminance value. It only keeps relative light values. Compression occurs in the periphery not in the fovea. Receptive field: the region in space (visual scene) in which

the presence of a stimulus alters (triggers) the neuron to respond (fire).

The receptive field is defined by type, size, and shape. Receptive field is modeled via convolution kernel R:

where - original image, - neuron response.



Cons and rods – convert light into signals further transmitted to the brain (visual cortex).

2 eyes 2 retinas no. of cones in each retina: 6 millions no. of rods in each retina: 125 millions Optic nerves: 1.5 millions (fibers for each eye). Rods and cones must be interconnected to nerve fibers on a

many-to-one basis. (Consequences: the images - visual scenes - are not represented internally as we see them !!! – image representation + image understanding following image encoding + information compression). Compression rate: 100:1



Cons - generate achromatic (graylevel) responses (luminance decomposition).

Cons – sensitive only in low light levels (good for night vision).

Rods – photoreceptors responsible for color vision. Color - perceptionto different wavelengths of visible light



Different ranges gives rise to different color response. 3 types – S, M, L – sensitive only in high light levels ! (can we see colors in low illuminated scenes ?) Dogs have only 2 types, bees have 4, and mantis shrimps have 10 !



Cones – S – blue response (445 nm – peak) Cones – M – green or yellow response (535 nm – peak) Cones – L – red response (575 nm – peak)



Opponent Processing: - Retina performs “matrix operations” to represent color (decomposition) in the opponent color system (Y, Y – B, R – G).



The output of the three cone color is transformed into an achromatic channel (such as luminance - Y) and two chromatic channels (opponent channels): – U, V – chroma components

High-level HVS (Human Vision System) is much more sensitive to the variations in the achromatic channel than in the chromatic channels.

Same principles are exploited in standard JPEG compression.



R BG



RGB YUV Y = 0,3 R + 0,6 G + 0,1 B; U = B – Y; V = R – Y

Y U (Cb) V (Cr)



Human perception – the organization, identification and interpretation of sensory information necessary to represent (internally) and understand the environment.

Human perception – thorough the 5 traditional (or more ?) human senses:

1. Sight (human vision) – Human Visual System - ability of the eyes to detect images of visible light.

2. Hearing (human audition) – Auditory System – the sense of sound perception - vibration.

3. Taste. 4. Smell. 5. Touch.



Can we sense everything, anything ? NO ! OUR SENSES ARE LIMITED. Human Auditory System – sounds (acoustic stimuli)

frequency – the number of vibrations that are produced per second – Hertz (Hz) – in nature mixture of sounds with diff freq components.

1 Hz – 1 vibration. Frequency range of human hearings: 20 Hz – 20.000 Hz. Bounds degrade with age ! (middle-aged adult: up to 14 KHz) Low frequency : human’s heartbeat. High frequency: a scream.



Sound out of range infrasound and ultrasound. Bats: 100.000 Hz. 1 Hz / second = 1 vibration / second.



fs – sampling frequency – is its value important ? Definitely YES !

f = 7 Hz; red - fs = 10 Hz; blue - fs = 8000 Hz; This is temporal representation



Frequency representation: Analyze mixture of sounds – temporal analysis vs. frequency analysis (1D - Fourier Transform)

Temporal representation Frequency representation



Fourier Transform

Temporal representation Frequency representation



What’s human auditory system (human sound perception) got to do with it (frequency decomposition) ?

Answer: basic compression principle ! – Psychoacoustic Models – Perceptual Codecs - Spectral Masking – mp3 !



Human Visual System (HVS) – images – formed as visual stimuli

Like sounds, images can be also decomposed into frequency components.

Image decomposition in frequency same as for sounds: 2D - Fourier Transform.

Like sounds, visual spectrum (frequency components) is bounded.

HVS is less sensitive to low and high frequencies – Consequence: We can remove high frequencies from an image without degrading (perceptually speaking) the image quality (degradation is not visually noticeable).



Image decomposition in frequency same as for sounds: 2D - Fourier Transform.

Low frequency components correspond to smooth regions in the image, while high frequency components are associated to details in the image (edges, corners).

Low frequency image High frequency image Low - High frequency image



2D – Fourier Transform

First column: low and high

frequency image, respectively

Second column:

corresponding 2D – FT

(ONLY magnitude)





DC component

AC components low frequency

AC components high frequency

a)

d)

a) Image b) FT – Magnitude c) FT – Shifted magnitude

d) FT - phase

c)b)





Image reconstruction from magnitude and phase of FT:

a) c)b)

a) Original Image b) Reconstruction of image with magnitude only

c) Reconstruction of image with phase only



Full image reconstruction magnitude + phase of FT:

a) b)



Original image

50 components

200 components

1000 components



Original BMP – 813 KB JPEG – 100 % – 284 KB

JPEG – 75 % – 72 KB JPEG – 10 – 15 KB



Crop of original BMP Crop of JPG – 100 %

Crop of JPG – 75 % Crop of JPG – 10 %



Original BMP FT spectrum JPG – 100 % FT spectrum

JPG – 75 % FT spectrum JPG – 10 % FT spectrum



Limitations of Visual Perception 1. Space Perception – 3D (length, width or depth and height) +

4th ? which is time perception 4D (3 spatial dimensions + 1 temporal dimension – direction)

3D space 4D – space !



Limitations of Visual Perception Flatland – 2D imaginary world – 1884 Edwin Abbott:

A flatworld concept



Limitations of Visual Perception Tessaract - tesseract, also called an 8-cell or regular

octachoron or cubic prism, is the four-dimensional analog of the cube; the tesseract is to the cube as the cube is to the square (Wikipedia).

A cube

A 3D projection of an 8-cell performing a simple rotation about a plane which bisects the figure from front-left to back-right and top to bottom.

3D projection of an 8-cell performing a double rotation about two orthogonal planes.



Limitations of Visual Perception 2. Human Visual Perception can be foolish: A) Unreal (impossible) objects - Oscar Reutersward’s

Triangle: The brain perceives the object locally (as local components) followed by combining them as a whole:3D mental reconstructionA paradox !!!!



Limitations of Visual Perception 2. Human Visual Perception can be foolish: B) Ambiguity (Perception ambiguity)



Limitations of Visual Perception 2. Human Visual Perception can be foolish: C) Distortion Illusions - neural cells are directional sensitive !



Limitations of Visual Perception 2. Human Visual Perception can be foolish: D) Camouflage – Stimuli have similar properties.



Limitations of Visual Perception 2. Human Visual Perception can be foolish: E) Simultaneous contrast.



Limitations of Visual Perception 2. Human Visual Perception can be foolish: F) Motion Illusion.



Limitations of Visual Perception 3. HVS is only sensitive to visible light: The wavelengths of electromagnetic radiation between roughly 370 nm and 730 nm account for light visible to the HVS.



Limitations of Visual Perception 4. Bandwidth of vision defined by the – Contrast Sensitivity

Function - spatial frequency components are visible up to 60 cpd.



Limitations of Visual Perception 4. Perceptual masking – contrast masking:



Limitations of Visual Perception 5. Validation of perception: - A specific configuration of the

environment is validated as a perception if the expected sensory information (projection from the tested configuration into senses) is similar to the observed one, up to the precision of the senses [Gro09].

The validation step should guarantee that there is enough evidence to support a configuration.

A good validation criterion should accept a configuration only when the measurements are compatible and the evidence is enough to support the hypothesis.



5. Validation of perception: - Exemple



We talked so far about limitations of HVS; what about some positive thing ? …

Can you recognize the persons ? – low vs. high frequency

Spatial resolution: 31 x 29 pixels ! Original spatial resolution: 468 x 448 pixels !~ factor = x 15



Surrounding information contains redundant parts ! Let’s start with the beginning:. Histogram – occurrence frequency of a value within a series. Example: a 30 – dimensional vector x: x = [1 6 4 3 6 4 5 6 7 10 2 3 2 6 7 4 3 3 2 6 7 1 4 1 8 1 8 1 10 3]



Modeling a histogram thorough functions - probability density function – pdf – Gaussian, Poisson, uniform distribution, etc.



Kurtosis – high order statistics – Gaussian pdf – Kurtosis = 0, Super – Gaussian: K > 0, Sub – Gaussian: K < 0.

Below: -1.3, 12.4, 0.0



Mean, Standard Deviation (std), variance, covariance. 12 – dimensional vector:

X = [1 2 4 6 12 15 25 45 68 67 65 98]

Mean:

How useful is the mean (solely) ? Not much:

Xa = [0 8 12 20] and Xb = [8 9 11 12]

n

XX

n

ii

x

1 34x

10 ba



Mean, Standard Deviation (std), variance, covariance. Standard Deviation (std) :

Xa Xb

)1

1

2

n

XXn

ii

x



Mean, Standard Deviation (std), variance, covariance. Variance :

Multivariate data (ex: first row – marks for students, second row – frequency for attending the course).

)1

1

2

2

n

XXn

ii

x

9.08.08.02.08.09.02.07.0

998571048X



Mean, Standard Deviation (std), variance, covariance. Variance :

Covariance:

)1

)(var 1

n

XXXXx

n

iii

x

1)(cov 1

n

YYXXx

n

iii

x

),(cov),(cov),(cov

),(cov),(cov),(cov

),(cov),(cov),(cov

zzyzxz

zyyyxy

zxyxxx

Cov

zzzyzx

yzyyyx

xzxyxx





Correlation coefficient:

= -1 strong negatively correlated variables. = +1 strong positively correlated variables. = 0 decorrelated variables.

yx

yx

yxxy

E

yxyx ),cov(]1,1[





Natural images contains redundant information one pixel is strong correlated with its neighbors, correlation coefficient decreases as we move away from this pixel. Example for still image (intra - frame redundancy).

Ex: correlation coefficient



What about video frames ? One second of video composed of 24 (standard) played still

images – other numbers: 23.97, 25, 30, even 60 fps. Video frames contains not only intra-frame redundancy but

also inter – frame redundancy: - a pixel (or small patch) in one frame is correlated to the corresponding pixel (or patch) in the subsequent frame.

12 = 0.97, 13 = 0.94, 14 = 0.90, 23 = 0.97, 24 = 0.90, 34 = 0.92



Can we get rid off redundant information ? Yes, there are several methods or models compliant with the

biological (HVS) system. The original data (still images, video, audio) is first

transformed to suppress redundant (not or less significant) parts of information.

A simply way is data rotation.



before = 0.99 after = -0.09



Another method: Whitening (for T video frames) Steps: 1) Each image of size i x j is lexicographically scanned and

reshaped into a k – dimensional vector (k = i x j). Each vector is stored into a k x T dimensional matrix X.



2) Form the covariance Cx matrix corresponding to X and compute the eigenvectors (Q) and eigenvalues () by solving the relation:

Cx = Q QT

Transform data according to:

Z = -1/2 QTx

The new covariance matrix is:

Cz = [-1/2 QT] Cx [-1/2 QT]T = I



Another method: Whitening (for T video frames)

Before whitening 12 = 0.97, 13 = 0.94, 14 = 0.90, 23 = 0.97, 24 = 0.90, 34 = 0.92

After whitening 12 = -0.14, 13 = -0.05, 14 = -0.44, 23 = 0.01, 24 = -0.42, 34 = -0.56



Method to measure correlation: 1) correlation coefficient

2) Mutual Information MI (large value indicates a strong correlation):

yx ypxp

yxpyxpYXI

)()(

),(log),();(



MI 12 = 0.97, MI 13 = 0.94, MI 14 = 0.90, MI 23 = 0.97, MI 24 = 0.90, MI 34 = 0.92

MI = 0.0323



Let us suppose we have a set of p images.



An image (of dimension r x q = m pixels): x = [x1, x2, …, xm]T

One image x can be decomposed as:x = Z*h

where Z (m x p) – basis functions, h are the coefficients vector of p x 1 h = [h1, h2, …, hp]T

h = W*x W = Z-1

For a set of p images, each image can be thought as a combination of the full image data set



Biologically: Z – receptive field, h – neuron’s firing rate. Neuroscience: the type of image encoding is related to the

number of neurons that are active (respond) to a certain piece of information represented by a specific sensory stimulus caused by the image.

Local image code - a single individual specific cell is activated.

Advantage: “computed” very fast and occupies little memory. Disadvantage: it cannot generalize (i.e., when trained with a

sufficient number of samples, it achieves satisfactory results when tested on samples from the training set, but performs poorly on new test samples not belonging to the training set).



Reason: the input-output unit association (as in single-layer neural networks) is very weak and a new sample cannot be linked with the old association learned during the training process.

Dense image code - a large cell population with overlapping sensory input is activated and contributes to the image representation.

Advantage: a large capacity of making new associations, robust to occlusion.

Disadvantage: it suffers from slow training, requires heavy training and is likely to produce redundant image representations.



Sparse image codes – only a fraction of a large neuronal population is active.

Dense representation a.k.a. Holistic representation Local representation a.k.a. Part-based representation STILL a debate on which representation is biologically

compliant. Hierarchical structure was then proposed – local followed by

dense (full) representation.



Exemple:



Atick and Redlich [1] support the idea of a dense image code within the HVS and argue for compact, densely decorrelated codes for image representation.

Accordingly, the receptive fields of retinal ganglion cells = local “whitening” filters that remove second-order correlations between image pixels. Bandpass, multiscale and oriented receptive fields of V1 neurons may also be considered as filters that remove second-order correlation, the way Principal Component Analysis (PCA) does.

Efficient image coding - the image is described by a small number of descriptors - minimizing the mutual information in a such a way that the higher-order correlation between images is removed.



Spatial receptive fields of simple cells (including V1 neurons) have been reasonably well described physiologically as being localized, oriented and bandpass.

Biederman came up with the theory of recognition-by-components (RBC) [7].

Empirical tests support his idea that complex objects are segmented into components called ‘geons’, which are further used by humans for image understanding.

Still unclear whether holistic/sparse image representations are unique and global or face image processing is a task-dependent ?



Spatial receptive fields of simple cells (including V1 neurons) have been reasonably well described physiologically as being localized, oriented and bandpass.

Biederman came up with the theory of recognition-by-components (RBC) [7].

Empirical tests support his idea that complex objects are segmented into components called ‘geons’, which are further used by humans for image understanding.

Still unclear whether holistic/sparse image representations are unique and global or face image processing is a task-dependent ?



Different processes are involved for different tasks: object recognition versus face recognition, v.s. facial expression recognition, for instance ?

Face identification and facial expression recognition are two independent tasks based on different representations and processing mechanisms.

IT area of the temporal lobe contains specialized neurons (face cells) that are selectively tuned to faces.

Evidence that AIT areas contain neurons with responses related to facial identity recognition exist, while other neurons (located in the superior temporal sulcus) are specialized to respond only to facial expressions.



Linear and nonlinear mechanisms to trigger responsiveness of neurons to stimuli simple and complex neural cells for HVS.

Single “standard model” for each stage of the visual pathway or several different models ?

Linear receptive field and simple & complex neural cells. Basic models of neurons involved in early visual processing.

In all models, the response of a neuron is described by passing an image through one or more linear filters (by taking the dot product or projection of an image and a filter). The outputs of the linear filters are passed through an instantaneous nonlinear function, plotted here as firing rate on the ordinate and filter output on the abscissa.



LEFT: Simple model of a retinal ganglion cell or of an LGN relay neuron. The model includes a linear filter (receptive field) with a center–surround organization and a half-wave rectifying nonlinearity. Images that resemble the filter produce large firing rate responses, whereas images that resemble the inverse of the filter or have no similarity with the filter produce no response. RIGHT: Model of a V1 simple cell as a filter elongated along one axis and a half-wave squaring nonlinearity. As in LEFT, only images that resemble the filter produce high firing rate responses.



The energy model of a V1 complex cell. The model includes two phase shifted linear filters whose outputs are squared before they are summed. In this model, both images that resemble the filters and their inverses produce high firing rates.

Polarity contrast insensitivity.



Properties of neural cells: 1) Direction selectivity: the subunit receptive fields of many

complex cells consist of ON and OFF subregions shifting smoothly over time (spatiotemporally inseparable receptive field), suggesting direction selectivity.

2) Orientation selectivity: the subunits (basis functions) clearly shows orientation preferences – particular cells respond only to particular oriented stimuli. Local spatial information to preserve the geometric topology.

3) Spatial frequency tuning: - to bandpass filtering.



Properties of neural cells: 3) Spatial frequency tuning: - bandpass filtering. Redundancy reduction is obtained by suppressing the low

spatial frequency in order to whiten the power spectrum of images, done by highpass filtering.

On the other hand, the high frequency components contain only little power from the image source and, therefore, it is not robust to noise. To avoid this, highpass frequency must be eliminated.

The combination of noise and redundancy reduction optimizes the information transfer, resulting a bandpass filtering.



Gabor functions (also known as wavelets or filters) were heavily proposed as model for biological neural cells due to their similar properties (mainly frequency tuning and orientation selectivity).

A Gabor function is a sinusoid windowed with a Gaussian function.

Its size, frequency and orientation can be manipulated to produce a wide range of different receptive field models.

By convolving the image with the Gabor functions, a new image representation can be achieved with features that are sparse, oriented and localized.



A Gabor function is a sinusoid windowed with a Gaussian function:

- filter parameters



A Gabor function - 2 out of phase filters in the real and imaginary part of a complex function:

The real part

The imaginary part

The peak filter response is at f0. The expression above hold for the temporal (1D) Gabor filter.



The spatial (2D) Gabor function :

The carrier: The envelope: wr(x,y)



Sinusoid carrier:

The envelope:



The envelope:



Gabor function parameters:



2D Gabor



2D Gabor



Fourier transform of a 2D Gabor function

or, in polar coordinates:



The thinner the lines are – the higher the density (or the frequency) is.

How many line pairs (or cycles) an image can hold, or how many cycles per pixel it can hold ? The thinnest line width in an image is one pixel, so the maximum lines frequency an image can hold is one pair of lines per two pixels, or 0.5 cycles per pixel.



Half – magnitude profile – the region of point, in frequency domain, with magnitude equal one-half the peak magnitude.



Half – magnitude frequency and orientation bandwidths for neurons.

How to find them ? Probe a neuron with sinusoid images of orientation 0 and

different spatial frequency magnitudes F. F is increased with respect to F0 until the magnitude of the neuron’s response is half the magnitude at (F0, 0) Fmax. Next we decrease F until the magnitude of the neuron’s response is half the magnitude at (F0, 0) Fmin.

Half-magnitude frequency bandwidth is defined as:

(in octave)



Octave - is a unit used for shown the ratio, as an index of 2. k octaves = 2k × 100.0%.

Half-magnitude frequency bandwidth is defined as:

When



Modeling simple neural cell with Gabor functions: 1) the orientation of the Gaussian envelop can be modeled as

being equivalent to the orientation of the carrier, i.e 2) In macaque V1, most cells have a half magnitude spatial

frequency bandwidth between 1 and 1.5 octaves. The median is about 1.4 octaves.

3) In macaque V1, the range of half-magnitude orientation bandwidths among cells is very large (mean = 65 degrees, median = 42 degrees)

4) In macaque V1 the peak frequencies range from as low as 0.5 cycles per degree of visual angle, to as large as 15 cycles per degree of visual angle.



Modeling simple neural cell with Gabor functions: 5) The spatial frequency bandwidth (in octaves) tends to be a

bit larger for cells with low peak frequency than for cells with large peak frequency. For example, the median half magnitude bandwidth of cells tuned to frequencies higher than 5 cycles/degree is 1.2 octaves, whereas the median for cells tuned to frequencies smaller than 2 cycles/degree is 1.7 octaves.

6) Orientation selective simple cells in V1 show minimum response at about 30 to 40 degrees away from the optimal orientation, not at 90 degrees away from the optimal orientation.



Modeling simple neural cell with Gabor functions: 7) The spiking rate of simple cells neurons in macaque V1 is

between close to 0 Hz, at rest, to about 120 Hz, when maximally excited.

8) In the area mapping the fovea, there are more kernels oriented vertically and horizontally than oriented diagonally (about 3 to 2).

9) Pairs of adjacent simple cells in the visual cortex of the cat are in quadrature similar to real and imaginary part of Gabor function complex Gabor receptive field.



Gabor functions for spatial frequency filtering and orientation selectivity:

1) we consider a set of Gabor functions distributed uniformly upon the foveal field.

2) each point in this field contains at least 2 neurons in quadrature.

3) Image enters the foveal field and the result is a convolution process between the image and the set (banks) of Gabor filters with several parameters acting as bandpass filters.

The peak frequency is controlled by the spatial frequency of the sinusoid carrier (u0, v0).

The half-magnitude region (orientation) is controlled by the rotation and scale parameters a, b, of the Gaussian envelope



Convolution Gabor response (G) which is orientation selective and frequency tuned:



Gabor wavelets: 5 frequencies 4 orientations







Recall: X = Z*H PCA decomposes the set of data (from X) into a set of

orthogonal (orthonormal) basis images (columns of Z) and coefficients H.

Orthogonality orthogonal columns the redundant information between columns is minimum !!!

Orthonormal = orthogonal + unity length. Ex: a matrix:



Length of a vector

v = [4; 11; 8;10]T

Inner product:

x = [1, 6, 7, 4], y = [3, 2, 8, 3]

(x, y) = 1*3 + 6*2 + 7*8 + 4*3 = 83



Two vectors are orthogonal to each other if their inner product equals zero.

[2; 1; –2; 4] and [3; –6; 4; 2]

([2; 1; –2; 4], [3; –6; 4; 2]) = 2*3 + 1*(–6) – 2*(4) + 4*2 = 0;

The two vectors above are orthogonal. What about orthonormal vectors ? v = [2; 4; 1; 2]. Its length u = [2/5; 4/5; 1/5; 2/5] is normal because:



Two vectors are orthogonormal to each other if their inner product equals zero and their length is 1.

u = [2/5; 1/5; –2/5; 4/5] and v

are orthonormal. Orthogonal matrix AAT = ATA = I. Ex:



If A is orthogonal A-1 = AT ! Recall: X = Z*H PCA decomposes the set of data (from X) into a set of

orthogonal (orthonormal) basis images (columns of Z) and coefficients H . X m x n, Z m x p, H p x n.

Z – eigenvectors (Z-1 = ZT ). Ex:

Transform B into zero mean data

9.06.11.16.17.20.32.29.27.04.2

1.15.1123.21.39.12.25.05.2Boriginal

81.110

10

1 i

i

x

x 91.1

10

10

1 i

i

y

y



Covariance matrix:

PCA decomposition:

01.131.081.031.079.009.129.099.021.149.0

71.031.081.019.049.029.109.039.031.169.0_ meanzeroB

7166.06154.0

6154.06166.0A

6779.07352.0

7352.06779.0Z

0442.0

1556.1λ

Txxx EA μxμx

miiiix ,...,1, zλzC



PCA coefficients H (remember Z-1 = ZT) projection: H1 = Z1

T * Bzero-mean (we took only the first column of Z) – approximate 2D data to 1D.

Full projection: H = ZT * Bzero-mean

16.001.004.034.017.020.013.038.014.017.0

22.143.014.109.091.067.127.099.077.182.0H



Components contribution

1 / (1 + 2) = 1.1556 / (1.1556 + 0.0442) = 1.1556 / 1.1998 = 0.96

2 / (1 + 2) = 0.0442 / (1.1556 + 0.0442) = 0.0442 / 1.1998 = 0.036



Data approximation for just a few coefficients p < min(m,n) . Data reconstruction from just a few coefficients:

Brec = Z1* Z1T * Bzero-mean + Mean Boriginal

Full data reconstruction:

Brec = Z* ZT * Bzero-mean + Mean = Boriginal

Example for image data set. 1) Set 1: 200 face images (m = 200), each image is 56 x

46 pixels n = 2576.





Eigenimages (p = 10):







(p = 200):



Reconstruction from p = 10 components

Reconstruction from p = 30 and 100 components



Reconstruction from p = 140 components

Reconstruction from p = 200 full reconstruction



How many components are enough ? Plot of the cumulative sum of the eigenvalues to have a clue.



To remember: PCA consists on a transformation from a space of high

dimension to another with more reduced dimension. If the data are highly correlated (and usually it is), there is

redundant information. PCA decreases the amount of redundant information by

decorrelating the input vectors. The input vectors, with high dimension and correlated, can be

represented in a lower dimension space and decorrelated PCA is a powerful tool to compress data.



In principle, PCA yields uncorrelated components. When the data have a Gaussian distribution, the uncorrelated

components are independent as well. However, if the data are mixtures of non-Gaussian

components, PCA fails to extract components having a non-Gaussian distribution.

On the contrary, ICA takes into account the higher-order statistics of the data in an attempt to recover non-Gaussian components.

Seeking non-Gaussian components is related to looking for statistical independence.

We need moments and cumulants of order higher than 2 to capture the non-Gaussian structure of data.



Blind source separation



Blind source separation



A measure of non-Gaussianity of a random is the kurtosis:

ICA statistical model: (A p x n)

ICA estimates a demixing matrix W of dimensions n×p that will recover the original components of s as



Typically, in ICA algorithms, W is sought such that the rows of it have maximally non-gaussian distributions and are mutually uncorrelated.

Simple way to do this is to first whiten the data and then seek orthogonal non-normal projections.

ICA in Computational Neuroscience. Source separation - EEG and MEG signals:

◦ The brains activity is measured through Electroencephalograms. Those signals are a mixture of different activities in the brain and other external noises.

◦ ICA solves correctly the problem of extracting the original activity signals.



ICA in Computational Neuroscience. Modeling the performance of the neurons in area V1 of

mammalian cortex:◦ Spikes.◦ Receptive fields.◦ Natural images

Some studies propose that the behaviour of one kind of neurons can be computationally described throught the ICA analysis of this natural inputs



ICA in Computational Neuroscience.



The fiducial features are considered as independent. What we see is the human faces X composed of those supposedly independent features (nose, mouth,eyebrows, etc) S mixed through A to form the whole face. By applying ICA, thegoal is to identify the umixing matrix W to retrievethe independent features from Z.



For m observations data :

S are (supposed to be) independent sources !

ICA applied on images – 2 architectures proposed. Architecture I: The observation matrix X (m x p) is formed by treating the

facial images as row vectors. ICA recovers m independent images. A) First, PCA is applied.



form whose columns are the eigenvectors. Projection (coefficients) similar to H:

B) ICA is applied to



The rows of contain the ICA coefficients of the linear combination of rows (basis vectors) in U.

Architecture II: - We consider XT. The pixels are assumed to be independent ! In Architecture II, ICA is performed on the projected data

The basis images obtained by performing PCA and ICA can be represented as



The ICA coefficients for reconstruction:

Results on data set 2 – 164 images of 60 x 45 pixel resolution faces with different (6) facial expressions.





The ICA basis images for architecture I:

The ICA basis images for architecture II



Other applications: Fetal Electrocardiogram (magnetocardiogram) extraction, i.e

removing/filtering maternal electrocardiogram signals and noise from fetal electrocardiogram (magnetocardiogram) signals.





NMF - Constraints in NMF approach both factors are non-negative.

No other constraints ! At least two reasons for performing a NMF◦ biological: firing rates in visual perception neurons are non-negative◦ image processing: pixels in grayscale image have non-negative values

One possible cost function to minimize the divergence

between ZH and X is to use the Kullback – Leibler (KL)

divergence:

ji kijkjik

k kjik

ijij xhz

hz

xxf

,NMF ln|| ZHX



Another cost function would be the Euclidean distance based

(least – square sense) cost function:

By employing an Expectation – Maximization (EM) strategy

the following updating rules for the factors Z and H are

obtained (at each iteration t) for the KL divergence:

ji kkjikij hzxf

,

2

NMF 2

1|| ZHX

i ik

ik

tkjik

ijki

tkj

tkj z

hz

xz

hh1

1

j kj

j jk

k kjtik

ij

tik

tik h

hhz

x

zz1

1

i

tik

tikt

ik z

zz



LNMF key issues:◦ more localized features robustness to occlusion ???◦ eliminates irrelevant information.

KL based LNMF cost function is given by:

Three additional conditions: (1) (2) (3)

i

iiji

ij vβuαf,

NMFLNMF ||f|| BHXBHX

BBUu Tij T

ij HHVv

j

min jju kj

min jku j

max jjv



New updating rules:

ik

tkjik

ijki

tkj

tkj hb

xbhh

11

j kj

j jk

k kjtik

ijtik

tik h

hhb

xb

b1

1

i

tik

tikt

ik b

bb



NMF and LNMF algorithms consider the database as a whole

and treats the images in the same way – there is no

information about classes encoded in the basis images or

coefficients. By integrating class information into the factors B or H we

might expect better class separability, therefore we could

enhance the classification performance.

Suppose - c distinctive classes {Q1, …, Qc}, ln

r rl

l hn

μ1

1

n

j jhn

μ1

1

l Qh

Tlrlrw

lrμhμhS

l

Tllb μμμμS



Two new constraints: (1) min Sw (2) max Sb

The new objective function has now the following form:

with , - constants. The new update equation for the coefficients expression is:

bw δγf SSBHXBHX ||f|| LNMFDNMF

ξ

hb

xbξhμμ

hi

k

tkjik

ijki

tlkjll

tlkj 4

82112 11

)(2

)(

tlkj

tlkj

tlkj

tkj c

hhhh )()()( |||21











Data set 1 9 - dimensional Pima Indians Diabetes (PID). The ninth variable is the class label (0 or 1). Label 1 is interpreted as “tested positive for diabetes”. The database comprises 768 instances with 500 of them

corresponding to the label 0 and another 268 instances corresponding to label 1.

Data set 2 Wisconsin Breast Cancer database. The dimension of the feature vectors is 10, where the 10th

value corresponds to the class label (2 or 4). Label 2 “benign”, Label 4 “malign”. 699 instances: 458 of label 2 + 241 instances of label 4.



Automatic classification procedure? 1) Split data into 2 parts. 2) Use the first part (usually named the training set) to extract

some features from data. 3) Employ a method to “learn” these features from the

training set. Supervised learning we make use of data labels for

learning. Unsupervised learning we don’t need data labels for

learning.



4) Use the method for the second part of the data set (named

the test set) to classify automatically (without user

intervention !) data the method will tell us which data

pertains to which label. Important: The method has no access to the test labels, the

test data is classified solely on the learning parameters derived

from the training data.



The classification problem can be restricted to consideration of the two-class problem without loss of generality.

The goal separate the two classes by a function which is induced from available examples.

The goal is to produce a classifier that will work well on unseen examples, i.e. it generalizes well.



A set of training vectors belonging to two separate classes:

Separate the set with a hyperplane The set of vectors is said to be optimally separated by the

hyperplane if it is separated without error and the distance between the closest vector to the hyperplane is maximal.



The norm of the weight vector should be equal to the inverse of the distance, of the nearest point in the data set to the hyperplane.



The norm of the weight vector should be equal to the inverse of the distance, of the nearest point in the data set to the hyperplane.

A separating hyperplane in canonical form must satisfy the following constraints:

The distance d(w, b; x) of a point x from the hyperplane (w, b) is



The optimal hyperplane maximizing the margin, subject to the constraints above. The margin is given by:

Hence the hyperplane that optimally separates the data is the one that minimises

Date post:	13-Jan-2016
Category:	Documents
Upload:	elijah-park
View:	214 times
Download:	1 times

Joint development and implementation of an advanced training program in neurosciences Two countries,...

Documents