Veranstalter	Giorgio Panin, Ph.D.
Modul	IN3150
Typ	Vorlesung
Sprache	Englisch
Semester	WS 2009/2010
ECTS	3.0
SWS	2V
Hörerkreis	Wahlfach für Studenten der Informatik (Master, Diploma)
Zeit & Ort	Di 10:00 - 12:00 MI 03.07.023
Schein	Nach erfolgreicher mündlichen Prüfung

News

Exam: Thursday, 04.03.10 startgin from 14:00, in Seminarraum 03.07.023.

Registration: through TUM-Online.

Open and currently running Theses

Thesis proposals can be found at the Vision section of our student projects webpage.

For information about our research group, see also the ITrackU webpage, and the OpenTL library.

Course description

The course aims to provide a structured overview of model-based object tracking, with the purpose of estimating and following in real-time the spatial pose (rotation, translation etc.) of one or more objects, by using digital cameras and fast computer vision techniques.

The first part of the course will introduce the general tools for object tracking:

1. Pose and deformation models, and camera projection

2. Methods for pose estimation from geometric feature correspondences

3. Bayesian tracking concepts (state dynamics, measurement likelihood)

4. Bayesian filters for linear and nonlinear models, with single or multi-hypothesis state distributions

Afterwards, we will concentrate on the visual part: among the many modalities available, we will focus in particular on the following ones:

1. Color-based: Matching color statistics, from the visible object surface to the underlying image area.

2. Keypoint- and Motion-based: Detection and tracking of single point features, possibly making use of image motion information (optical flow).

3. Contour-based: Matching the object boundary line, as it deforms with the object roto-translation (also called Active Contours).

4. Template-based: Registration of a fully textured surface (Template) to the image gray-level intensities.

Finally, the last lecture will introduce advanced topics, concerning: multiple cameras, multiple simultaneous objects, and data fusion with multiple modalities (colors, edges, ...).

Pre-requisites

The course will also provide the following pre-requisites in a self-contained fashion (a basic knowledge would be in any case recommended):

Basic math and algebra (nonlinear functions and derivatives, matrix computation)

Basic geometry: 3D transformations, projective geometry, camera imaging

Probability theory and statistics

Basic image processing (representation, filtering etc.)

System theory: state-space representation, dynamics, observation

Textbook

The reference text for this course is

Giorgio Panin, Model-based visual tracking, ed. Wiley-Blackwell (to appear around end 2010).

Please check also the OpenTL webpage for more information.

Slides

Lecture slides for WS09/10 are currently in preparation:

IMPORTANT: now they are protected by password, and restricted only to the course participants.

Please contact me by email in order to obtain the password.

Part I - General tools for object tracking

Lecture1.pdf: Introduction to model-based visual tracking
Lecture2.pdf: Object and camera geometry
Lecture3.pdf: Pose estimation from feature correspondences (with Matlab exercises)
Lecture4.pdf: Bayesian tracking: dynamics and observation models (with Matlab exercises)
Lecture5.pdf: Bayesian tracking: Kalman and Particle filters (with Matlab exercises)

Part II - Visual modalities

Lecture6.pdf: Colorbased object tracking
Lecture7.pdf: Keypoint tracking and image motion
Lecture8.pdf: Invariant keypoints: detection, description and matching
Lecture9.pdf: Contour-based tracking: re-projection of contour points and lines
Lecture10.pdf: Contour-based tracking: Snakes, Condensation and the CCD algorithm
Lecture11.pdf: Template-based tracking: Active Appearance Models
Lecture12.pdf: Introduction to multi-camera/-modal/-target tracking

Bibliographical references

Lecture 1:

Survey [1] (Introduction)
Survey paper [12]

Lecture 2:

General transformations: [3], Chapter 2
Rigid body motion, exponential representation: [1], Sec. 2.2; [7]
Camera model: [1], Sec 2.1 (and references), [3], Chapter 6
Camera calibration: [3], Chapter 7

Lecture 3:

Pose estimation from corresponding features: [3], Chapter 4
P3P problem: [1], Sec. 2.3.3
Similarity estimation (in N-dimensions): [11]
Linear and Nonlinear LSE: [1], Sec. 2.4 (and references)
Robust LSE: [10], and [1], Sec.2.5

Lecture 4:

General tracking concepts (not only vision): [6], Introduction
Dynamical models: [5], Chapter 9, [6], Chapters 4 and 6 (until 6.3)
The three levels of visual measurements: taken from the data fusion literature [8] (data-, feature-, decision-level)
General Bayesian tracking equations: [1], Sec. 2.6

Lecture 5:

Kalman Filter:

Kalman Filter (and EKF): Tutorial by Greg Welch (in particular, the Introductory Paper [13]); [1], Sec.2.6.1 and references; [6], Chapter 5 (KF) and Chapter 10.3 (EKF)
Particle Filters: [1], Sec. 2.6.2; paper [16]; the Condensation web page.
Unscented Kalman Filter: Paper [14]; Tutorial [15]

Lecture 6:

Color spaces: (wikipedia links)

Color space definitions (a useful list is also available)
CIE XYZ color space (includes also the RGB space)
Imaginary colors (an interesing definition related to the tristimulus values)
LUV color space - just one attempt to linearize perceptual color differences
HSV color space
YUV color space (and chroma subsampling)

Color distributions:

histograms, Gaussian mixtures (see also the paper), kernel densities

Mean-shift: (for more recent applications and videos, look at the homepage of D. Comaniciu)

Paper about image segmentation
Paper about object tracking

Blob matching: [2], Chapter 5 (morphology) and 8 (matching contours)

Color-based particle filter: [17] (feature-level) and [18] (pixel-level)

Lecture 7:

A general list of keypoint detectors (for this and the next lecture) can be found here.

KLT algorithm: material can be found at the KLT homepage by Stan Birchfield. In particular, [19] is the reference paper for this method.

Back-projection: for more information about depth maps, see the Z-buffering webpage.

Feature detection vs. tracking: see also [1]

Optical flow: see the Wikipedia page and the Lucas-Kanade original paper [21].

Harris detector: see the paper [20]

Lecture 8:

Harris corners: (see Lecture 7)

Scale-space theory: the book of T. Lindeberg [9], plus some references (for a quicker introduction), at this webpage.

SIFT webpage (by D. Lowe).

Lecture 9:

Edge-based tracking methods: [1], Sec. 4.1
Sampling model contour points with the GPU: paper [29] (general concepts) and paper [26] (our implementation)
Canny edge detector: Wikipedia page, and the original paper [22]
Marr-Hildreth edge detector: Wikipedia page, and paper [24]
Harris (RAPiD) tracker: original paper [22] and robust improvement with RANSAC [25]
Segment detection: paper [27], segment-based object tracking: paper [28]

Lecture 10:

B-splines: [5] Chapter 3, and wikipedia page
Snakes: the original paper [30], another webpage and slides
Multi-hypothesis likelihood for particle filters: see the original CONDENSATION paper [16]
The CCD algorithm: original paper [31], a faster implementation [32] and our real-time version [33]

Lecture 11:

AAM/ASM (including learning): the homepage of Tim Cootes, his paper [34] and the paper [35]
Computational improvements (forwards-, inverse-compositional approach): paper [34], and the face tracking project webpage at CMU
From 2D to 3D face tracking: paper [36]
See also the Wikipedia webpages about PCA and SVD

Lecture 12:

Data fusion books: [37] and [38] (only the first Chapters; in particular, have a look at the JDL data fusion scheme)

References:

[1] V. Lepetit, P. Fua, Monocular Model-Based 3D Tracking of Rigid Objects: A Survey, Foundations and Trends in Computer Graphics and Vision, 2005

[2] G. Bradski, A. Kaehler, Learning OpenCV - Computer Vision with the OpenCV Library, First Edition, O' Reilly, 2008

[3] R. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, 2nd Edition, Cambridge University Press, 2004

[4] O. Faugeras, Three-Dimensional Computer Vision, MIT Press, 1993

[5] A. Blake, M. Isard, Active Contours, Springer-Verlag, 1998.

[6] Y. Bar-Shalom, X.-R. Li, T. Kirubarajan, Estimation with Applications to Tracking and Navigation, J. Wiley & Sons, 2001

[7] R. Murray, Z. Li, S. Sastry, A Mathematical Introduction to Robotic Manipulation, CRC Press, 2002

[8] D. Hall, J. Llinas, Handbook of multisensor data fusion, CRC Press, 2nd Edition, 2008

[9] T. Lindeberg, Scale-Space Theory in Computer Vision, Kluwer Academic Publishers, Dordrecht, Netherlands, 1994.

[10] M. A. Fischler, R. C. Bolles, Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography Comm. of the ACM, Vol 24, pp 381-395, 1981

[11] S. Umeyama, Least-Squares Estimation of Transformation Parameters Between Two Point Patterns IEEE Trans. Pattern Anal. Mach. Intell. 13(4): 376-380 (1991)

[12] Yilmaz, A., Javed, O., and Shah, M. 2006. Object tracking: A survey. ACM Comput. Surv. 38, 4 (Dec. 2006), 13

[13] G. Welch, G. Bishop, An Introduction to the Kalman Filter, TR 95-041, University of North Carolina at Chapel Hill, July 2006

[14] S. Julier, J. Uhlmann, H. F. Durrant-Whyte, A new method for the nonlinear transformation of means and covariances in filters and estimators, IEEE Transactions on In Automatic Control, Vol. 45, No. 3. (06 August 2002), pp. 477-482.

[15] E. A. Wan, R. van der Merwe, The Unscented Kalman Filter for Nonlinear Estimation

[16] Michael Isard and Andrew Blake CONDENSATION -- conditional density propagation for visual tracking Int. J. Computer Vision, 29, 1, 5--28, (1998)

[17] P. Perez, C. Hue, J. Vermaak, M. Gangnet, Color-based probabilistic tracking, European Conference on Computer Vision (ECCV), 2002

[18] C. Lenz, G. Panin, A. Knoll. A GPU-accelerated particle filter with pixel-level likelihood. In International Workshop on Vision, Modeling and Visualization (VMV), Konstanz, Germany, October 2008.

[19] J. Shi, C. Tomasi. Good Features to Track, IEEE Conference on Computer Vision and Pattern Recognition, pages 593-600, 1994.

[20] C. Harris, M. Stephens (1988). A combined corner and edge detector, Proceedings of the 4th Alvey Vision Conference. pp. 147--151.

[21] B. Lucas, T. Kanade An iterative image registration technique with an application to stereo vision, Proceedings of Imaging understanding workshop, pp 121--130, 1981.

[22] C. J. Harris, Tracking with rigid models. In A. Blake and A. Yuille, editors, Active Vision. MIT Press, Cambridge, MA, 1992.

[23] J. Canny, A Computational Approach to Edge Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 8, No. 6, Nov 1986.

[24] D. Marr, E. Hildreth, Theory of Edge Detection. In Proceedings of the Royal Society of London. B 207, 1980, S. 187-217.

[25] M. Armstrong, A. Zisserman, Robust object tracking. In Proc. Asian Conference on Computer Vision, volume I (1995).

[26] E. Roth, G. Panin, and A. Knoll. Sampling feature points for contour tracking with graphics hardware. In International Workshop on Vision, Modeling and Visualization (VMV), Konstanz, Germany, October 2008.

[27] D. Lowe, Three-Dimensional Object Recognition from Single Two-Dimensional Images. Artif. Intell. 31(3): 355-395 (1987)

[28] D. Koller, K. Daniilidis, H.-H. Nagel., Model-Based Object Tracking in Monocular Image Sequences of Road Traffic Scenes International Journal of Computer Vision 10:3 (1993) 257--281.

[29] B. Gooch, M. Hartner, N. Beddes, Silhouette Extraction (book Chapter of SIGGRAPH course)

[30] M. Kass, A. Witkin, D. Terzopoulos, Snakes: Active contour models, International Journal of Computer Vision, 1(4), 1987, 321331. Marr Prize Special Issue

[31] Hanek, R. and Beetz, M. 2004. The Contracting Curve Density Algorithm: Fitting Parametric Curve Models to Images Using Local Self-Adapting Separation Criteria Int. J. Comput. Vision 59, 3 (Sep. 2004), 233-258

[32] Hanek, R., Schmitt, T., Buck, S., and Beetz, M. 2003. Toward RoboCup without color labeling AI Mag. 24, 2 (Jun. 2003), 47-50.

[33] G. Panin, A. Knoll. Fully automatic real-time 3D object tracking using active contour and appearance models. Journal of Multimedia 2006, 1(7):62-70, 2006

[34] T. Cootes, G. Edwards, C.Taylor. Active Appearance Models, in Proc. European Conference on Computer Vision 1998 (H.Burkhardt & B. Neumann Ed.s). Vol. 2, pp. 484-498, Springer, 1998.

[35] I. Matthews, S. Baker Active Appearance Models Revisited, International Journal of Computer Vision, Vol. 60, No. 2, November, 2004, pp. 135 - 164.

[36] J. Xiao, S. Baker, I. Matthews, and T. Kanade, Real-Time Combined 2D+3D Active Appearance Models, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June, 2004

[37] J. Clark, A. Yuille, Data Fusion for Sensory Information Processing Systems, Springer, 1990

[38] M. Liggins, D. Hall, J. Llinas, Handbook of Multisensor Data Fusion: Theory and Practice, CRC Press (2009)

Model-based Visual Tracking