Research

Home | Background | Publications
 
 
Topics Data and Downloads
Long-Term Prediction
Partially Occluded Object Detection
Object Recognition wih 3D Models
Expression Recognition
Face Recognition
Face Detection
Face Morphing
Object Recognition in Range Data
Detection of Pedestrians
Segmentation of Range and Intensity Image Sequences
Tracking Non-rigid, Moving Objects
Object Recognition with 3D Models
Range Images from Laser Range Camera (19 MB)
MPEG Pedestrian Detection
MPEG Pedestrian Tracking


Intent-Aware Long-Term Prediction of Pedestrian Motion

Vasiliy Karasev, Alper Ayvaci, Bernd Heisele, Stefano Soatto
University of California LA, Honda Research Institute, USA.

Abstract
We present a method to predict long-term motion of pedestrians, modeling their behavior as jump-Markov processes with their goal a hidden variable. Assuming approximately rational behavior, and incorporating environmental constraints and biases, including time-varying ones imposed by traffic lights, we model intent as a policy in a Markov decision process framework. We infer pedestrian state using a Rao-Blackwellized filter, and intent by planning according to a stochastic policy, reflecting individual preferences in aiming at the same goal.

Video.


V. Karasev, A. Ayvaci, B. Heisele, S. Soatto. Intent-Aware Long-Term Prediction of Pedestrian Motion. International Conference on Robotics and Automation (ICRA) , 2016.


Sample long-term predictions of traffic participants’ motion generated by our model. Warmer colors indicate more probable paths. Notice that the predictions are multi-modal and obey constraints of the environment.





Partially Occluded Object Detection by Finding the Visible Features and Parts

Kai Chi Chan, Alper Ayvaci and Bernd Heisele
Purdue Univ., Honda Research Institute, USA.

Abstract
We address the partially occluded object detection problem by implementing a model which includes latent visibility flags that are attached to cells and parts of a Deformable Part Model. A visibility flag indicates whether an image portion is part of a pedestrian or part of an occluder. To compute the visibility flags and the score of the detector simultaneously, we maximize a concave objective function that is composed of the following four parts: (1) the detection scores of visible cells and parts, (2) a cell-to-cell consistency term which encourages neighboring cells to have the same visibility flags, (3) a cell-to-part consistency term which encourages compatible labeling among overlapping cells and parts, and (4) a penalty term for cells and parts that are labeled as occluded. The maximization of the concave objective function is done using the Alternating Direction Method of Multipliers (ADMM). By removing scores of occluded cells and parts from the final detection score we significantly improve detection performance on partially occluded pedestrians. In experiments we show that our system outperforms the standard DPM and other state-of-art methods.


K. C. Chan, A. Ayvaci and B. Heisele. Partially Occluded Object Detection by Finding the Visible Features and Parts. International Conference on Image Processing, (ICIP), 2015, Best Paper Award.










occluded pedestrians

Consistency graph: Root cells are represented by the
squares on the image, and parts are drawn above. Edges that are represented by orange lines indicate the cell-to-cell consistency while yellow lines indicate the cell-to-part consistency.


occluded pedestrian
The visibility map estimates: Input image. The initialization passed to ADMM: To acquire this map, we threshold the cell-level and part-level detector responses at 0. Red and green indicate the variables with values 0 and 1, respectively. The binarized visibility estimate at first iteration. The solution at convergence (third iteration).
Object Recognition with 3D Models

B. Heisele, G. Kim, and A. Meyer
Honda Research Institute, USA.

Abstract

We propose techniques for designing and training of pose-invariant object recognition systems using realistic 3d computer graphics models. We look at the relation between the size of the training set and the classification accuracy for a basic recognition task and provide a method for estimating the degree of difficulty of detecting an object. We show how to sample, align, and cluster images of objects on the view sphere. We address the problem of training on large, highly redundant data and propose a novel active learning method which generates compact training sets and compact classifiers.
 
Top row: 3D computer graphics models used for training and photographes of the real objects used for testing. Middle row: Synthetic images with uniform background. Bottom row: Synthetic images with natural background

B. Heisele, G. Kim, A. Meyer. Object Recognition with 3D Models.
British Machine Vision Conference, 2009.




Recognition performance on real objects. The system has been exclusively trained on synthetic images.




Expression Recognition

J. Skelley, R. Fischer, A. Sarma, and B. Heisele
Center for Biological and Computational Learning, M.I.T., Honda Research Institute, USA.



Example images from the new database. 
Abstract

We describe a new expression database which contains video sequences of both played and natural expressions and an expression classification system based on warped optical flow fields and texture features. We analyze the system's generalization performance  when confronted with subjects that were not present in the training set and its recognition performance when tested on natural expressions. We evaluate several techniques for combining the classifier outputs computed on single images to perform classification of a temporal sequence of expression images.

J. Skelley, R. Fischer, A. Sarma, and B. Heisele. Recognizing Expressions in a New Database Containing Played and Natural Expressions.
International Conference on Pattern Recognition, Hong Kong, 2006.



System architecture.


Face Recognition

B. Heisele, J. Huang, V. Blanz
Honda R&D Americas Inc.,
Center for Biological and Computational Learning, M.I.T., 
Computer Graphics Research Group, University of Freiburg


Left: Original image used for computing the 3D model. Right: Synthetic image.
Abstract

We present a novel approach to pose and illumination invariant face recognition that combines two recent advances in the computer
vision field: component-based recognition and 3D morphable models. In a first step a 3D morphable model is used to generate 3D face models from only two input images from each person in the training database. By rendering the 3D models under varying pose and illumination conditions we then create a vast number of synthetic face images which are used to train a component based face recognition system. In preliminary experiments we show the potential of our approach regarding pose and illumination invariance.

Huang, J., V. Blanz and B. Heisele. Face Recognition Using Component-Based SVM Classification and Morphable Models. In: Proceedings of Pattern Recognition with Support Vector Machines, First International Workshop, SVM 2002, Niagara Falls, Canada, Lecture Notes in Computer Science, Springer 2388, 334-341, 2002.



Components used for recognition for a frontal and half-profile view of a face.


Face Detection

B. Heisele, T. Poggio, M. Pontil
Center for Biological and Computational Learning, M.I.T.
 


Abstract

We present a trainable system for detecting frontal and near-frontal views of faces in still gray images using Support Vector Machines (SVMs). We first consider the problem of detecting the whole face pattern by a single SVM classifier. In this context we compare different types of image features, present and evaluate a new method for reducing the number features and discuss practical issues concerning the parameterization of SVMs and the selection of training data. The second part of the paper describes a component-based method for face detection consisting of a two-level hierarchy of SVM classifiers. On the first level, component classifiers independently detect components of a face, such as the eyes, the nose, and the mouth. On the second level, a single classifier checks if the geometrical configuration of the detected components in the image matches a geometrical model of a face. 

Heisele, B., T. Serre, M. Pontil, T. Vetter and T. Poggio. Categorization by Learning and Combining Object Parts. In: Advances in Neural Information Processing Systems 14, Vancouver, Canada, Vol. 2, 1239-1245, 2002.
 

Component-based face detection with four component classifiers and a single geometrical configuration classifier.


 
Face Morphing

B. Heisele, R. Su
Center for Biological and Computational Learning, M.I.T.
 

Abstract

As part of the CBCL face detection project we used face morphing algorithms to build a face database for training and testing our detection system. One way to perform face morphing is to manually select pairs corresponding features which define the mapping between the two images. The Beier-Neely (BN) algorithm requires the user to select pairs of corresponding straight line segments. Addtionally to the BN algorithm we used a morphing technique based on optical flow. This technique does not require manual user interaction. Instead, the mapping between the images is automatically determined by estimating the optical flow.


    Morphing: left and middle: original images, right: morphed image

 
 
Object Recognition in Range Data

B. Heisele
DaimlerChrysler Research, Germany

Abstract

I developed a model-based algorithm for real-time recognition of objects in dense range data. In an off-line modeling process, templates are generated from a 3D model with a virtual range sensor. Ttwo types of templates are generated: edge templates representing the silhouette of the object and range templates describing the 3D structure of the object's surface. The recognition process consists of two steps. First, object hypotheses are generated by fast, hierarchical edge-based matching. Then range-based matching verifies the object hypotheses.
The method works for arbitrarily shaped 3D objects. Both the template generation and the recognition process are performed automatically. Since no high-level feature extraction is involved, the algorithm is insensitive against measurement noise and partial occlusions of the object. The recognition system was successfully tested on range images taken by a laser range camera.
 


Object recognition: left: original range image, middle: recognition result for large cup, right: recognition result for box.
     


 
Detection of Pedestrians

B. Heisele, C. Woehler
DaimlerChrysler Research, Germany
 
Abstract

This algorithm recognizes walking pedestrians in sequences of color images taken from a moving camera. The recognition is based on the characteristic motion of the legs of a pedestrian walking parallel to the image plane. Each image is segmented into region-like image parts by clustering pixels in a combined color/position feature space. The proposed clustering technique implies matching of corresponding clusters in consecutive frames and therefore allows clusters to be tracked over a sequence of images. Based on the observation of clusters over time a two-stage classifier extracts those clusters which most likely represent the legs of pedestrians. A fast polynomial classifier performs a rough preselection of clusters by evaluating temporal changes of a shape-dependent cluster feature. The final classification is done by a time delay neural network with spatio-temporal receptive fields.

B. Heisele and C. Woehler. Motion-based recognition of pedestrians. ICPR98, Brisbane, 1998.

Detection Results

Gait Analysis


 
Segmentation of Range and Intensity Image Sequences
B. Heisele and W. Ritter
DaimlerChrysler Research, Germany

Abstract

Similar to the technique described below. We applied divisive and k-means clustering to image segmention and tracking of objects in pairs of range and intensity images. The image sequences were taken by a laser range camera

The original range and intensity image sequences used in our experiments can be downloaded. If you use the data please acknowledge DaimlerChrysler Aerospace in your publications:Download Range and Intensity Image Sequences (19MB)

B. Heisele, W. Ritter. Segmentation of range and intesity image sequences. Proc. IEEE Conf. on Information Intelligence and Systems, Washington, 1999, 223-225


Demo: intensity images, range images, tracking


 
Tracking Non-rigid, Moving Objects
B. Heisele, U. Kressel, and W. Ritter
DaimlerChrysler Research, Germany

Demo Sequence

Abstract

We developed a method for tracking non-rigid, moving objects in a sequence of colored images, which were recorded by a non-stationary camera. In an initial step, object parts are determined by a divisive clustering algorithm, which is applied to all pixels in the first image of the sequence. The feature space is defined by the color and position of a pixel. For each new image the clusters of the previous image are adapted iteratively by a parallel k-means clustering algorithm. Instead of tracking single points, edges, or areas over a sequence of images, only the centroids of the clusters are tracked. The proposed method remarkably simplifies the correspondence problem and also ensures a robust tracking behavior.

B. Heisele, U. Kressel, and W. Ritter. Tracking non-rigid, moving objects based on color cluster flowSegmentation of Range and Intesity Image Sequences. CVPR, San Juan, 1997, 253-257


 
 

Original Image

Clustered Image

Trajectories of Pedestrian