Robust Computer Vision Homework 5: Mean Shift for ...€¦ · 1 Mean Shift Segmentation In this...

transcript

Robust Computer Vision

Homework 5:

Mean Shift for Segmentation and Tracking

Eric Wengrowski

April 5, 2016

1 Mean Shift Segmentation

In this section, we compare image segmentation using Mean Shift and Synergis-tic Segmentation, which incorporates both Mean Shift and a Weight Map. The3 Mean Shift parameters are the spatial bandwidth 2hs+1, the color bandwidth2hr, and the minimum region size in pixels. The 3 Weight Map parameters usedin Synergistic Segmentation are the gradient window size 2n + 1, the mixtureparameter, and the threshold for the mixture. The Windows program EDI-SON.EXE is used to perform the segmentation. The code and binaries can befound here: http://coewww.rutgers.edu/riul/research/code/EDISON/.

In each of the following examples, the relevant area of the image was isolatedprior to segmentation.

church.jpg

The task was to segment the area around the bell in church.jpg.

Mean Shift Segmentation

Figure 1: The Mean Shift Segmented church bell. The Spatial bandwidth 2hs +1 = 10, the Color Bandwidth 2hr = 6.5, and the minimum region size = 20pixels. Notice that the underside of the bell is segmented separately, as is theshadow on the upper right side, and the ring underneath near the opening ofthe bell.

Synergistic Segmentation

Figure 2: The Synergistic Segmented church bell. The Spatial bandwidth 2hs +1 = 9, the Color Bandwidth 2hr = 8.5, and the minimum region size = 15pixels. The gradient window size 2n + 1 = 7, the mixture parameter = 0.1,and the threshold for the mixture = 0.5. Notice now that the underside of thebell is still segmented separately. But the ring underneath near the opening ofthe bell is not jointly segmented with the rest of the underside. The shadow onthe right face of the bell is still separately segmented. But more of the bright,specular areas on the bell’s front face have been removed from the main bellbody segment.

fishcoral.jpg

The task was to segment the a few coral elements from the reef in fishcoral.jpg.

Figure 3: The Mean Shift Segmented coral elements. The Spatial bandwidth2hs + 1 = 7, the Color Bandwidth 2hr = 4, and the minimum region size = 90pixels. Notice that the tips of each coral segment are disjointly segmented fromthe rest of the coral stems. By in large, the coral stems are jointly segmentedwith neighboring stems, especially near the base of each stem. For most coralstems, there are several segments that change with distance to the tip.

Figure 4: The Synergistic Segmented coral elements. The Spatial bandwidth2hs + 1 = 9, the Color Bandwidth 2hr = 8.5, and the minimum region size= 50 pixels. The gradient window size 2n + 1 = 4, the mixture parameter= 0.7, and the threshold for the mixture = 0.3. Similar to the above Mean Shiftsegmentation, the tips of the coral are often separately segmented from the restof the coral stem. However, two of the coral tips are jointly segmented with thecoral stems in this image, but not the above Mean Shift image. The tips of thecoral are usually contained in only 1 or 2 blue segments. The two close coraltips in the mid left of the above image are better separated than the Mean Shiftimage.

halfdress.jpg

The task was to segment the area around the bell in halfdress.jpg.

Figure 5: The Mean Shift Segmented shore shack. The Spatial bandwidth2hs + 1 = 7, the Color Bandwidth 2hr = 4.5, and the minimum region size= 90 pixels. The shore shack is uniformly segmented from the sky and forest.The line segment at the top of the building is well preserved on the right, andsomewhat preserved on the left.

Figure 6: The Synergistic Segmented shore shack. The Spatial bandwidth 2hs+1 = 15, the Color Bandwidth 2hr = 1.8, and the minimum region size = 90pixels. The gradient window size 2n+ 1 = 4, the mixture parameter = 0.9, andthe threshold for the mixture = 0.9. Notice now that the underside of the bellis still segmented separately. Similar to the above Mean Shift Segmented imageabove, the shore shack is uniformly segmented from the sky and forest. Again,the line segment at the top of the building is well preserved on the right, andsomewhat preserved on the left. However, the differences lie towards the baseof the structure, where the Synergistic Segmentation includes pixels to the left.

violinist.jpg

The task was to segment the area around the bell in violinist.jpg.

Figure 7: The Mean Shift Segmented 2 small cars. The Spatial bandwidth2hs +1 = 8, the Color Bandwidth 2hr = 5.5, and the minimum region size = 40pixels. Sections of both cars is jointly segmented. Additionally, each car bodyis broken into several segments. The left car’s windows are broken into severalsegments.

Figure 8: The Synergistic Segmented 2 small cars. The Spatial bandwidth2hs +1 = 8, the Color Bandwidth 2hr = 3.5, and the minimum region size = 90pixels. The gradient window size 2n + 1 = 5, the mixture parameter = 0.5,and the threshold for the mixture = 0.5. Here, both cars are clearly segmentedseparately. And for the most part, the car bodies are segmented into only 1section. There are separate segmentations of the pixels near and underneaththe driver-size mirrors. The windows are still segmented separately from the restof the car body. The left car’s windows are still broken into several segments.

In general there is not an enormous difference between the ordinary MeanShift Segmentation and the Synergistic Segmentation. The biggest difference iswhen there is an important local edge, such as the separation of the 2 cars inthe last example. Typically, it is difficult to separate these cars with Mean ShiftSegmentation without over segmenting the car bodies.

The mixture parameter and the average threshold are closely related becausethe threshold is just the sum along the edge.

2 Mean Shift Tracking

The mean shift tracking implementation in Matlab was Mean-Shift VideoTracking by Sylvain Bernhardto. The code can be found on the Matlab FileExchange here: https://www.mathworks.com/matlabcentral/fileexchange/35520-mean-shift-video-tracking

The video sequence walking44.mpeg was used as input. We were expectedto track 2 people:

1. A man entering the scene from the right about 2.7 seconds into the video,and

2. Another man entering the scene from the left about 6.7 seconds into thevideo.

The video frames not containing either of these two men were removed.

Contents

• Prune Excess Frames• Call Mean-Shift Tracking with Kalman Filtering• Mean-Shift Video Tracking• Description• Import movie• Play the movie• Variables• Target Selection in Reference Frame• Run the Mean-Shift algorithm• Kalman Filtering• Export/Show the processed movie

Prune Excess Frames

v = VideoReader(’walking44.mpeg’);

% 1st Tracking Sequence

v.CurrentTime = 2.7;

k = 1;

clip1_file = VideoWriter(’walking44_1.avi’,’Uncompressed AVI’);

open(clip1_file)

while v.CurrentTime <= 6.0

clip1(k).cdata = readFrame(v);

writeVideo(clip1_file,clip1(k).cdata);

k = k+1;

close(clip1_file)

% 2nd Tracking Sequence

v.CurrentTime = 6.7;

k = 1;

clip2_file = VideoWriter(’walking44_2.avi’,’Uncompressed AVI’);

open(clip2_file)

while v.CurrentTime <= 14.0

clip2(k).cdata = readFrame(v);

writeVideo(clip2_file,clip2(k).cdata);

k = k+1;

close(clip2_file)

Call Mean-Shift Tracking with Kalman Filtering

MS_Tracking(’walking44_1.avi’);

MS_Tracking(’walking44_2.avi’);

Elapsed time is 0.186746 seconds.

Elapsed time is 0.390448 seconds.

Mean-Shift Video Tracking

by Sylvain Bernhardt July 2008 Modified by Eric Wengrowski in April 2016

Description

This is a simple example of how to use the Mean-Shift video tracking algorithmimplemented in ’MeanShift Algorithm.m’. It imports the video ’Ball.avi’ fromthe ’Videos’ folder and tracks a selected feature in it. The resulting videosequence is played after tracking, but is also exported as a AVI file.

Import movie

[Length,height,width,Movie]=Import_mov(fname);

Play the movie

% Put the figure in the center of the screen, % without axes and menu bar. scrsz= get(0,’ScreenSize’); figure(’Position’,[scrsz(3)/2-width/2 scrsz(4)/2-height/2width height],... ’MenuBar’,’none’); axis off % Image position inside the figureset(gca,’Units’,’pixels’,’Position’,[1 1 width height]) % Play the movie movie(Movie);

Variables

index_start = 1;

% Similarity Threshold

f_thresh = 0.16;

% Number max of iterations to converge

max_it = 5;

% Parzen window parameters

kernel_type = ’Gaussian’;

radius = 1;

Target Selection in Reference Frame

if(strcmp(fname,’walking44_1.avi’))

if(exist(’clip1patch.mat’,’file’))

load ’clip1patch.mat’;

[T,x0,y0,H,W] = Select_patch(Movie(index_start).cdata,0);

save(’clip1patch.mat’,’T’,’x0’,’y0’,’H’,’W’);

elseif(strcmp(fname,’walking44_2.avi’))

if(exist(’clip2patch.mat’,’file’))

load ’clip2patch.mat’;

save(’clip2patch.mat’,’T’,’x0’,’y0’,’H’,’W’);

%pause(0.2);

Run the Mean-Shift algorithm

Calculation of the Parzen Kernel window

[k,gx,gy] = Parzen_window(H,W,radius,kernel_type,0);

% Conversion from RGB to Indexed colours

% to compute the colour probability functions (PDFs)

[~,map] = rgb2ind(Movie(index_start).cdata,65536);

Lmap = length(map)+1;

T = rgb2ind(T,map);

% Estimation of the target PDF

q = Density_estim(T,Lmap,k,H,W,0);

% Flag for target loss

loss = 0;

% Similarity evolution along tracking

f = zeros(1,(Length-1)*max_it);

% Sum of iterations along tracking and index of f

f_indx = 1;

% Draw the selected target in the first frame

Movie(index_start).cdata = Draw_target(x0,y0,W,H,...

Movie(index_start).cdata,2);

%%%% TRACKING

WaitBar = waitbar(0,’Tracking in progress, be patient...’);

% From 1st frame to last one

for t=1:Length-1

% Next frame

I2 = rgb2ind(Movie(t+1).cdata,map);

% Apply the Mean-Shift algorithm to move (x,y)

% to the target location in the next frame.

[x,y,loss,f,f_indx] = MeanShift_Tracking(q,I2,Lmap,...

height,width,f_thresh,max_it,x0,y0,H,W,k,gx,...

gy,f,f_indx,loss);

Kalman Filtering

My implementation of a Kalman Filter is based on the functions in Matlab’sComputer Vision System Toolbox. See: https://www.mathworks.com/help/

vision/ref/configurekalmanfilter.html and https://www.mathworks.com/

help/control/ug/kalman-filtering.html.

% Kalman Filtering

detectedLocation = [x,y];

if(t==1) %first iteration

motionModel = ’ConstantVelocity’;

initialLocation = detectedLocation;

sigma = 5;

initialEstimateError = sigma.^2*ones(1,2);

%The process noise covariance, Q:

%Q = diag(repmat(MotionNoise, [1, M]))

motionNoise = sigma.^2*ones(1,2);

%The measurement noise covariance, R:

%R = diag(repmat(MeasurementNoise, [1, M])).

measurementNoise = sigma;

kalmanFilter = configureKalmanFilter(motionModel,initialLocation,...

initialEstimateError,motionNoise,measurementNoise);

trackedLocation = correct(kalmanFilter, detectedLocation);

else %not first iteration

predict(kalmanFilter);

trackedLocation = correct(kalmanFilter, detectedLocation);

% New Estimates of [x,y] after Kalman Filtering

x = round(trackedLocation(1));

y = round(trackedLocation(2));

% Check for target loss. If true, end the tracking

if loss == 1

break;

% Drawing the target location in the next frame

Movie(t+1).cdata = Draw_target(x,y,W,H,Movie(t+1).cdata,2);

% Next frame becomes current frame

y0 = y;

x0 = x;

% Updating the waitbar

waitbar(t/(Length-1));

close(WaitBar);

%%%% End of TRACKING

Export/Show the processed movie

Export the video sequence as an AVI file in the Videos folder

WaitBar = waitbar(0,’Exporting the output AVI file, be patient...’);

tracked_move_name = [fname ’.TrackedOutput.avi’];

trackedv = VideoWriter(tracked_move_name,’Uncompressed AVI’);

open(trackedv)

%movie2avi(Movie,’Videos\Movie_out’,’Quality’,50);

writeVideo(trackedv,Movie);

close(trackedv)

waitbar(1);

close(WaitBar);

% Put a figure in the center of the screen,

% without menu bar and axes.

scrsz = get(0,’ScreenSize’);

figure(1)

set(1,’Name’,’Movie Player’,’Position’,...

[scrsz(3)/2-width/2 scrsz(4)/2-height/2 width height],...

’MenuBar’,’none’);

axis off

% Image position inside the figure

set(gca,’Units’,’pixels’,’Position’,[1 1 width height])

% Play the movie

movie(Movie);

Figure 9: Person 1 at the beginning of his tracking sequence.

Figure 10: Person 1 at the end of his tracking sequence.

Figure 11: Person 2 at the beginning of his tracking sequence.

Figure 12: Person 2 at the end of his tracking sequence.

The first person was difficult to track because his background changed greatlythroughout his trajectory. The background is similar to the foreground whenhe steps in front of the lady in black. Because both people have similar colors,the tracking is most vulnerable to error here. For simple cases, backgroundsubtraction might help, especially when initially selecting a target window, be-cause it is easy to track the background rather than the intended subject. Butbackground subtraction will probably not help much in the case where pathsare crossed with a similarly colored object.

The second person was difficult to track because he moves towards the cam-era. So the amount of pixels he occupies grows larger. In this implementation,the target window does not change with the relative distance of the intendedtarget. The effects of this can be seen towards the end of the tracking sequence,where the subject’s arm and the background are the focus of the target window.

Generally speaking, the first person was more difficult to track for the ma-jority of his path, but the second person was harder to track towards the endof his path.

Robust Computer Vision Homework 5: Mean Shift for ...€¦ · 1 Mean Shift Segmentation In this...

Documents