Real-Time Extraction of Biometric Data from Video
The Department of Defense is increasingly relying on biometric data, such as iris scans, gait recognition, and heart-rate monitoring to protect against both cyber and physical attacks. "Military planners, like their civilian infrastructure and homeland security counterparts, use video-linked 'behavioral recognition analytics,' leveraging base protection and counter-IED operations," according to an article in Defense Systems. Current state-of-the-art approaches do not make it possible to gather biometric data in real-world settings, such as border and airport security checkpoints, where people are in motion. This blog post presents the results of exploratory research conducted by the SEI's Emerging Technology Center to design algorithms that extract heart rate from video of non-stationary subjects in real time.
Foundations of Our Work
Biometrics, the science of analyzing human physical and behavioral characteristics, is an established field of study that has seen renewed interest as researchers in government and industry have realized its potential in a wide range of scenarios, including security, surveillance, counter-terrorism, and identification. Incorporating biometrics collection into these scenarios could help defenders detect fake faces (photographs or masks) used by intruders to gain access to secure areas, identify symptoms of PTSD in returning soldiers, and detect unusual physical changes at security checkpoints--and these are just a few examples.
Our work was inspired by research developed at the Massachusetts Institute of Technology's Computer Science and Artificial Intelligence Laboratory (CSAIL) that uses signal processing to magnify color and movement frequencies in video that are invisible to the human eye (this research is highlighted in a New York Times video). The method, called Eulerian Video Magnification, takes a standard video sequence as input and applies spatial decomposition, followed by temporal filtering to the frames. The resulting signal is then amplified to reveal hidden information. For example, heart rate becomes visible to the naked eye in a processed video as the subject's face flushes noticeably with each heartbeat.
One of the initial applications of the CSAIL algorithm, released via open source, was to monitor subtle movements in infants in neonatal care units. Eulerian Video Magnification has been used on pre-recorded video under lab conditions on subjects who remain primarily still in front of a camera. The current state of the art for color and motion magnification requires laboratory-controlled conditions, which restricts the subject's movement and facial expressions, as well as illumination variations. These restrictions make this approach unsuitable for use in real-time settings, such as airports or stadiums.
Our research combines facial landmarking with spatial decomposition and temporal filtering to enable heart-rate extraction from video captured in real-time settings on subjects in motion. For example,
- We apply video biometrics using "facial landmarking," which localizes and tracks 68 facial landmarks from frame to frame. This approach includes cropping to regions of interest, including the cheeks and forehead--areas where blood vessels are located, and the pulse is magnified.
- Confidence values are output for each point, depending on occlusions, face angle, and illumination on the face. Using these values, we can focus on "best" regions of the face in each frame when capturing biometric video, which makes our problem space much smaller and results more accurate.
- Facial landmarking also allows us to accommodate typical movements (+/-90 degrees along pitch, yaw, and roll directions) by tracing points along a facial feature as it is moving (e.g., eyebrow, lips, nose, cheeks, forehead).
- To accelerate the facial landmarking, we use a two-pronged approach. We use information from previous frames as historical reference to speed up the processing. We also use graphics processing units (GPUs) to accelerate the processing of parallel tasks (i.e., performing the same computation on thousands of video frames).
Our research approach uses a hybrid of face landmarking techniques to achieve both speed and accuracy in natural settings. We use DLib - an open source face landmarking tool in conjunction with CyLab state of the art face landmarking advanced by Dr. Marios Savvides, director of the CyLab Biometrics Center at Carnegie Mellon University, and a collaborator on this project. This hybrid approach allows us to address the issues of facial motion and expressions, occlusions, and variation in illumination in near-real time.
For the computational photography component, we collaborated with Dr. Kris Kitani, assistant research professor in the Robotics Institute, Computer Vision Group, at Carnegie Mellon University. Dr. Kitani's interests lie in the area of human activity forecasting by integrating optimal control and computer visions techniques, with an aim to overcome the limitations of the traditional camera.
Testing Our Approach
To test our approach, we integrated the landmarking techniques described above with the extraction of heart rate. We created an application written in Python to extract the heartrate from a facial video from a webcam livestream or from a previously recorded video of a facial image. This application uses face landmarking to find the optimal location on the user's face (e.g., the forehead). The application then measures the average optical intensity of the pixels in the region of interest by collecting data from this location over time and uses this to estimate the heart rate. We account for different races by automatically detecting skin tone and scaling the weight of the red color channel in a proportional manner.
We tested our application on videos obtained through open source that highlighted a human-computer interaction study undertaken by researchers at Imperial College in London in which 30 participants were shown fragments of movies and pictures. During the study, the participants were monitored with six video cameras, a microphone, an eye gaze tracker, and physiological sensors measuring ECG, EEG (32 channels), respiration amplitude, and skin temperature. Study participants also had their heartrates measured via sensors attached to their bodies.
We ran our application on 25 videos (with both still and mobile faces) from the Imperial College study to see if we could successfully identify heartrate. We then compared our results with corresponding ground-truth readings output by the heart-rate monitors and found that our program is accurate within plus/minus five heartbeats per minute.
On a separate front, we also tested our approach on other researchers in the ETC. We applied our approach using a video feed from a standard webcam along with a heart rate app, Instant Heart Rate, that tracks a still subject's heart rate. We found that in 30 seconds of live webstream video, our approach was two beats off.
Our goal for this research was to process 10 seconds of naturalistic facial video within 30 seconds - we were successful in meeting this goal within +- 5bpm of ground truth. We are currently testing our algorithm on more pre-recorded videos, as well as live webcam streams. We are experimenting with differing window lengths to capture and track real-time changes in heart rate. We are also exploring implementing our algorithm on a mobile platform, such as a GPU-equipped smartphone like the iPhone 7 and HTC Desire Eye E1 Android phone. We are also porting our application to augmented reality headsets.
Extracting heart rate from video has many potential applications, with perhaps the most obvious opportunities in security scenarios and healthcare. Heart rate is an important indicator that can reveal information about a person's overall mental and physical state, and obtaining that information quickly and in a non-invasive manner could be useful to human agents conducting security screenings and polygraph examinations. Another important security application is detecting abnormalities that reveal a face has been spoofed. Tools for spoofing faces are becoming widespread and can generate believable reenactments. We tested our tool on spoofed facial videos, and we found that our tool was able to detect abnormalities that indicated spoofing--for example, in sampling different sections of a subject's face in a spoofed video, we found widely varying heart rates. On an original video, those regions were much more consistent.
Our tool also has potential applications in the area of healthcare. A great opportunity exists in developing countries, where people have greater access to smartphones than to healthcare. As telemedicine advances medical care in these areas, a doctor could obtain a patient's heart rate accurately via video call.
Wrapping Up and Looking Ahead
Real-time--or close-to-real-time--tracking of heart rate is but one piece of a puzzle that will allow us to use biometrics to assess the state of an individual. A complete model for non-invasive, multi-modal biometric monitoring system involves some combination of the following
- tracking eye monitoring (e.g., iris scan, pupil size, dilation, gaze)
- monitoring facial expressions and head motions (like nodding or turning away)
- voice frequency monitoring
- walking gait monitoring
- odor sniffing devices
Our next biometrics project involves micro-expressions: subtle, fleeting expressions that can reveal "leaked" emotions. We are conducting research to detect and classify those expressions as they happen via webcam or pre-recorded video. We are using many of the same tools for this project as the heart rate project. For example, facial landmarking helps us identify areas of the face where micro-expressions most commonly occur, and GPUs enable us to speed up parallel tasks.
We welcome your feedback on this research in the comments section below.
To watch a video on the CSAIL research, please click here.