search menu icon-carat-right cmu-wordmark

Real-Time Extraction of Biometric Data from Video

Headshot of Satya Venneti

The Department of Defense is increasingly relying on biometric data, such as iris scans, gait recognition, and heart-rate monitoring to protect against both cyber and physical attacks. "Military planners, like their civilian infrastructure and homeland security counterparts, use video-linked 'behavioral recognition analytics,' leveraging base protection and counter-IED operations," according to a recent article in Defense Systems. Current state-of-the-art approaches do not make it possible to gather biometric data in real-world settings, such as border and airport security checkpoints, where people are in motion. This blog post presents exploratory research undertaken by the SEI's Emerging Technology Center to design algorithms to extract heart rate from video capture of non-stationary subjects in real-time.

Foundations of Our Work

Biometrics, the science of analyzing human physical and behavioral characteristics, is an established field of study that has seen renewed interest as researchers in government and industry have realized its potential in a wide range of scenarios including security, surveillance, counter-terrorism, and identification. Incorporating biometrics collection into these scenarios could help defenders detect fake faces (photographs or masks) used by intruders to gain access to secure areas, identify symptoms of PTSD in returning soldiers, and detect unusual physical changes at security checkpoints--and these are just a few examples.

Our work is inspired by research developed at the Massachusetts Institute of Technology's Computer Science and Artificial Intelligence Laboratory (CSAIL) that uses signal processing to magnify color and movement frequencies in video that are invisible to the human eye (this research is highlighted in a New York Times video). The method, Eulerian Video Magnification, takes a standard video sequence as input and applies spatial decomposition, followed by temporal filtering to the frames. The resulting signal is then amplified to reveal hidden information--for example, heart rate becomes visible to the naked eye in a processed video as the subject's face flushes noticeably with each heartbeat.

One of the initial applications of the CSAIL algorithm, released via open source, was for use in neonatal care units to monitor subtle movements in infants. Eulerian Video Magnification has been used on pre-recorded video under lab conditions on subjects who remain primarily still in front of a camera. The current state of the art for color and motion magnification requires laboratory-controlled conditions, which restricts the subject's movement and facial expressions, as well as illumination variations. These restrictions make this approach unsuitable for use in real-time settings such as airports.

Our research combines facial landmarking with spatial decomposition and temporal filtering to enable heart-rate extraction from video captured in real-time settings on subjects in motion. For example,

  • We are applying video biometrics using "facial landmarking", which localizes and tracks 68 facial landmarks from frame-to-frame. This approach includes cropping to regions of interest, including the cheeks and forehead, areas where blood vessels are located and the pulse is magnified.
  • Confidence values are output for each point, depending on occlusions, face angle, and illumination on the face. Using these, we can focus on "best" regions of the face in each frame when capturing biometric video, which will make our problem space much smaller and results more accurate.
  • Facial landmarking also allows us to accommodate typical movements (+/-90 degrees along pitch, yaw, and roll directions) by tracing points along a facial feature as it is moving (e.g., eyebrow, lips, nose, cheeks, forehead).
  • To accelerate the facial landmarking, we are using a two pronged approach. We are using information from previous frames as historical reference to speed up the processing. We are also using graphics processing units (GPUs), to accelerate the processing of parallel tasks (i.e., performing the same computation on thousands of video frames).

Our Collaborators

Our research approach relies on face landmarking research advanced by Dr. Marios Savvides, the director of the CyLab Biometrics Center at Carnegie Mellon University and our collaborator on this project. Dr. Savvides' research on face landmarking allows us to address the issues of facial motion and expressions, occlusions, and variation in illumination.

For the computational photography component, we are collaborating with Dr. Kris Kitani, assistant research professor in the Robotics Institute, Computer Vision Group at Carnegie Mellon University. Dr. Kitani's interests lie in the area of human activity forecasting by integrating optimal control and computer visions techniques, with an aim to overcome the limitations of the traditional camera.

Testing Our Approach

As part of our approach, we are testing a proof-of-concept that integrates the landmarking technique developed by Dr. Savvides with the extraction of heart rate. We created an application written in Python to extract the heartrate from a facial video from a webcam livestream or from a previously recorded video of a facial image. This application uses face landmarking to find the optimum location on the user's face, e.g. the forehead region. The application then measures the average optical intensity of the pixels in the region of interest by collecting data from this location over time and uses this to estimate the heart rate. We account for different races by automatically detecting skin tone and scaling the weight of the red color channel in a proportional manner.

We applied our application to a database of videos obtained through open source that highlighted a human-computer interaction study undertaken by researchers at Imperial College in London in which 30 participants were shown fragments of movies and pictures. During the study, the participants were monitored with six video cameras, a microphone, an eye gaze tracker, and physiological sensors measuring ECG, EEG (32 channels), respiration amplitude, and skin temperature. Study participants also had their heartrates measured via sensors attached to their bodies.

We ran our application on 25 videos (with both still and mobile faces) from the Imperial College study to see if we could successfully identify heartrate. We then compared our results with corresponding ground-truth readings output by the heart-rate monitors and found that our program is accurate within plus/minus five heartbeats per minute.

On a separate front, we also tested our approach on other researchers in the ETC. We applied our approach using a video feed from a standard webcam along with a heart rate app, Instant Heart Rate, that tracks a still subject's heart rate. We found that in 30 seconds of live webstream video, our approach was two beats off.

Our goal for this research is to be able to process 10 seconds of naturalistic facial video within 30 seconds. We are currently testing our algorithm on more pre-recorded videos as well as live webcam streams. We are experimenting with differing window lengths to be able to capture and track real-time changes in heart rate. We are also exploring implementing our algorithm on a mobile platform such as a GPU equipped smartphone like the iPhone 6s and HTC Desire Eye E1 Android phone.

Wrapping Up and Looking Ahead

Real-time, or close-to-real-time, tracking of heart rate is but one piece of a puzzle that will allow us to use biometrics to assess the state of an individual. A complete model for non-invasive, multi-modal biometric monitoring system would involve some combination of the following

  • tracking eye monitoring (e.g., iris scan, pupil size, dilation)
  • monitoring facial expressions and head motions (like nodding or turning away)
  • voice frequency monitoring
  • walking gait monitoring
  • odor sniffing devices

Applying GPUs to capture the heart rate of subjects in a naturalistic setting in near real-time (within 30 seconds) using video magnification and facial landmarking techniques is but the first step in this research.

We welcome your feedback on this research in the comments section below.

Additional Resources

To watch a video on the CSAIL research, please click here.

Get updates on our latest work.

Each week, our researchers write about the latest in software engineering, cybersecurity and artificial intelligence. Sign up to get the latest post sent to your inbox the day it's published.

Subscribe Get our RSS feed