Face Tracking

Face tracking extends face detection to video sequences. Any individual who appears in a video for any length of time generates a face track – that is, a sequence of face instances across time. The goal of face tracking then is to aggregate single-frame detection results into a collection of face tracks. In face tracking we exploit spatio-temporal continuity to associate face instances across frames, and iteratively update motion models for all face tracks, respecting shot boundaries in the process.

The additional information present in videos (vs. images) allows us to improve both accuracy and computational efficiency:


Our face tracking technology makes no a priori assumptions about the background, camera motion, or the number of individuals in a scene. We can track an arbitrary number of faces simultaneously, for any length of time, and require no “lock-on” phase or other type of initialization. Furthermore, since face tracking is built on face detection, it inherits all its features, including the capability to track low-resolution faces in uncontrolled real-world video.

Real-Time Performance

The computational performance of the face tracker is more than sufficient for real-time applications, such as controlling a pan-tilt-zoom camera. The video below demonstrates face-based speaker-following in a presentation setting: