+ All Categories
Home > Documents > Detecting Nonexistent...

Detecting Nonexistent...

Date post: 20-Jun-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
3
Detecting Nonexistent Pedestrians Jui-Ting Chien , Chia-Jung Chou , Hwann-Tzong Chen National Tsing Hua University, Taiwan {ydnaandy123,jessie33321}@gmail.com , [email protected] Abstract We explore beyond object detection and semantic seg- mentation, and propose to address the problem of estimat- ing the presence probabilities of nonexistent pedestrians in a street scene. Our method builds upon a combination of generative and discriminative procedures to achieve the perceptual capability of figuring out missing visual infor- mation. We adopt state-of-the-art inpainting techniques to generate the training data for nonexistent pedestrian detec- tion. The learned detector can predict the probability of observing a pedestrian at some location in the current im- age, even if that location exhibits only the background. We evaluate our method by inserting pedestrians into the image according to the presence probabilities and conducting user study to distinguish real and synthetic images. The empir- ical results show that our method can capture the idea of where the reasonable places are for pedestrians to walk or stand in a street scene. 1. Introduction Humans are good at inferring implicit information from images based on the context. For instance, with sufficient familiarity of urban street scenes, humans know where to look at to find pedestrians. It implies that peripheral cues in the scene other than the pedestrians themselves are useful for pedestrian detection. The ability to explore and exploit those cues would be important for algorithms to achieve human-level scene understanding and behavior analysis. Our goal is to address the problem of predicting where the pedestrians are likely to appear in a scene. Such a task is new and different from the conventional setting of pedes- trian detection in that the target locations might not contain any pedestrians. Real and synthetic datasets of urban scenes [2, 3, 8, 9] are popular and useful for training deep models to solve object detection and semantic segmentation, but can- not be directly used for our task. We generate new training data that enforce our method to learn how to infer possible presence locations of pedestrians using implicit contextual cues rather than the features on the pedestrians. Figure 1. Top: The input image and the predicted heatmap. The likelihoods of head and foot positions are depicted in red and blue. Bottom: The synthesized image with phantom pedestrians. 2. Method The proposed pipeline has three parts: i) generating training data for nonexistent-pedestrian detection, ii) learn- ing to predict possible locations of pedestrians, iii) synthe- sizing new images with phantom pedestrians for qualitative evaluation. See Fig. 1 for an illustration of the pipeline. Training data. From [2] we select a wide range of urban- scene images that contain at least one pedestrian. Each image has the ground-truth segmentation of pedestrians. We use state-of-the-art inpainting methods [6, 11] to re- move all pedestrians and create a set of ‘background’ im- ages. Based on the synthetic backgrounds, the original images, and the ground-truth segmentations, we generate various training data exhibiting different combinations of removed/non-removed pedestrians and the corresponding heatmaps of possible pedestrian presences, as in Fig. 2. Learning. We adopt the methods of [5, 10] for learning to detect nonexistent pedestrians. We also propose a new model by combining FCN [10] with a discriminator. The idea of including a discriminator is inspired by the genera- tive adversarial networks [1, 4, 7]. Based on [10], we seek 1
Transcript
Page 1: Detecting Nonexistent Pedestrianssunw.csail.mit.edu/abstract/Detecting_Nonexistent_Pedestrians.pdf · Detecting Nonexistent Pedestrians Jui-Ting Chien, Chia-Jung Chou, Hwann-Tzong

Detecting Nonexistent Pedestrians

Jui-Ting Chien , Chia-Jung Chou , Hwann-Tzong ChenNational Tsing Hua University, Taiwan

{ydnaandy123,jessie33321}@gmail.com, [email protected]

Abstract

We explore beyond object detection and semantic seg-mentation, and propose to address the problem of estimat-ing the presence probabilities of nonexistent pedestriansin a street scene. Our method builds upon a combinationof generative and discriminative procedures to achieve theperceptual capability of figuring out missing visual infor-mation. We adopt state-of-the-art inpainting techniques togenerate the training data for nonexistent pedestrian detec-tion. The learned detector can predict the probability ofobserving a pedestrian at some location in the current im-age, even if that location exhibits only the background. Weevaluate our method by inserting pedestrians into the imageaccording to the presence probabilities and conducting userstudy to distinguish real and synthetic images. The empir-ical results show that our method can capture the idea ofwhere the reasonable places are for pedestrians to walk orstand in a street scene.

1. IntroductionHumans are good at inferring implicit information from

images based on the context. For instance, with sufficientfamiliarity of urban street scenes, humans know where tolook at to find pedestrians. It implies that peripheral cues inthe scene other than the pedestrians themselves are usefulfor pedestrian detection. The ability to explore and exploitthose cues would be important for algorithms to achievehuman-level scene understanding and behavior analysis.

Our goal is to address the problem of predicting wherethe pedestrians are likely to appear in a scene. Such a taskis new and different from the conventional setting of pedes-trian detection in that the target locations might not containany pedestrians. Real and synthetic datasets of urban scenes[2, 3, 8, 9] are popular and useful for training deep models tosolve object detection and semantic segmentation, but can-not be directly used for our task. We generate new trainingdata that enforce our method to learn how to infer possiblepresence locations of pedestrians using implicit contextualcues rather than the features on the pedestrians.

Figure 1. Top: The input image and the predicted heatmap. Thelikelihoods of head and foot positions are depicted in red and blue.Bottom: The synthesized image with phantom pedestrians.

2. Method

The proposed pipeline has three parts: i) generatingtraining data for nonexistent-pedestrian detection, ii) learn-ing to predict possible locations of pedestrians, iii) synthe-sizing new images with phantom pedestrians for qualitativeevaluation. See Fig. 1 for an illustration of the pipeline.

Training data. From [2] we select a wide range of urban-scene images that contain at least one pedestrian. Eachimage has the ground-truth segmentation of pedestrians.We use state-of-the-art inpainting methods [6, 11] to re-move all pedestrians and create a set of ‘background’ im-ages. Based on the synthetic backgrounds, the originalimages, and the ground-truth segmentations, we generatevarious training data exhibiting different combinations ofremoved/non-removed pedestrians and the correspondingheatmaps of possible pedestrian presences, as in Fig. 2.

Learning. We adopt the methods of [5, 10] for learningto detect nonexistent pedestrians. We also propose a newmodel by combining FCN [10] with a discriminator. Theidea of including a discriminator is inspired by the genera-tive adversarial networks [1, 4, 7]. Based on [10], we seek

1

Page 2: Detecting Nonexistent Pedestrianssunw.csail.mit.edu/abstract/Detecting_Nonexistent_Pedestrians.pdf · Detecting Nonexistent Pedestrians Jui-Ting Chien, Chia-Jung Chou, Hwann-Tzong

Figure 2. Generating training data. Left: The original image. Middle two: Some pedestrians are removed from the original image andinpainted with backgrounds. Right: The corresponding heatmap over the inpainted image as the ground truth for training.

Figure 3. Experimental results: The predicted locations of nonexistent pedestrians, visualized as heatmaps.

Figure 4. More examples of the synthesis pipeline. We adjust theimage brightness for better visualization. From top to bottom:input images, predicted heatmaps, and synthesized images withphantom pedestrians according to the predicted heatmaps.

to generate a distribution of pedestrian locations not onlycoherent with the input context, but also realistic enough todeceive the discriminator.

Synthesis. Our method automatically synthesizes a newtest image according to the predicted nonexistent-pedestrianheatmaps. More specifically, for each test image, we ob-tain three heatmaps modeling the head and feet locations of

pedestrians. We perform non-maximum suppression on theheatmaps and propose several candidate pedestrian poses.We search for the most likely pedestrian poses in the train-ing dataset, and render the output image by copy-and-paste.More examples of the synthesis pipeline are shown in Fig. 4.

3. Experimental Results and Conclusion

Given unseen images, the trained model performs wellon mimicking human perceptual ability. In Fig. 3, we high-light some logical patterns that can be observed in the gener-ated images: i) Sidewalks, safety islands, and bus stops areoften assigned with high probabilities of pedestrian pres-ence, even if the scene is void of pedestrians. ii) The timingis right: The ‘phantom’ pedestrians is inclined to cross thestreet when there is no car. iii) People tend to form groups.iv) Depth and perspective are correct: The ‘sizes’ of high-response areas in the heatmap are in accordance with thedepth and vanishing point.

To verify the quality of the synthesized images, we con-duct user study and ask the viewers to distinguish real andsynthesized images. Our method significantly outperformsthe baseline, which randomly selects a location and a scaleto add a pedestrian into the scene.

This work shows a possibility of using deep networks toinfer a scene from the implicit cues for nonexistent-objectdetection, which is a relatively unexplored area and has nostandard mechanisms for quantitative evaluation. The pro-posed idea suggests an alternative direction toward sceneunderstanding and may be further applied to other tasks.

Page 3: Detecting Nonexistent Pedestrianssunw.csail.mit.edu/abstract/Detecting_Nonexistent_Pedestrians.pdf · Detecting Nonexistent Pedestrians Jui-Ting Chien, Chia-Jung Chou, Hwann-Tzong

References[1] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein GAN. CoRR,

abs/1701.07875, 2017.[2] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Be-

nenson, U. Franke, S. Roth, and B. Schiele. The cityscapes datasetfor semantic urban scene understanding. In CVPR, pages 3213–3223,2016.

[3] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision meets robotics:The KITTI dataset. I. J. Robotics Res., 32(11):1231–1237, 2013.

[4] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C.Courville. Improved training of wasserstein gans. CoRR,abs/1704.00028, 2017.

[5] A. Newell, K. Yang, and J. Deng. Stacked hourglass networks forhuman pose estimation. In Computer Vision - ECCV 2016 - 14thEuropean Conference, Amsterdam, The Netherlands, October 11-14,2016, Proceedings, Part VIII, pages 483–499, 2016.

[6] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros.Context encoders: Feature learning by inpainting. In CVPR, pages2536–2544, 2016.

[7] A. Radford, L. Metz, and S. Chintala. Unsupervised Representa-tion Learning with Deep Convolutional Generative Adversarial Net-works. ArXiv e-prints, Nov. 2015.

[8] S. R. Richter, V. Vineet, S. Roth, and V. Koltun. Playing for data:Ground truth from computer games. In ECCV, pages 102–118, 2016.

[9] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez.The SYNTHIA dataset: A large collection of synthetic images forsemantic segmentation of urban scenes. In CVPR, pages 3234–3243,2016.

[10] E. Shelhamer, J. Long, and T. Darrell. Fully convolutional networksfor semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell.,39(4):640–651, 2017.

[11] C. Yang, X. Lu, Z. Lin, E. Shechtman, O. Wang, and H. Li. High-Resolution Image Inpainting using Multi-Scale Neural Patch Synthe-sis. ArXiv e-prints, Nov. 2016.


Recommended