search menu icon-carat-right cmu-wordmark

Video Summarization: Using Machine Learning to Process Video from Unmanned Aircraft Systems

Kevin Pitstick

As the use of unmanned aircraft systems (UASs) increases, the volume of potentially useful video data that UASs capture on their missions is straining the resources of the U.S. military that are needed to process and use this data. This publicly released video is an example of footage captured by a UAS in Iraq. The video shows ISIS fighters herding civilians into a building. U.S. forces did not fire on the building because of the presence of civilians. Note that this video footage was likely processed by U.S. Central Command (CENTCOM) prior to release to the public to highlight important activities within the video, such as ISIS fighters carrying weapons, civilians being herded into the building to serve as human shields, and muzzle flashes emanating from the building.

The United States Air Force currently assigns a ground station crew (including a pilot, sensor operator and mission intelligence coordinator) to manage a single UAS platform. Additionally a PED (process, exploit, and disseminate) crew, including several Intelligence Surveillance and Reconnaissance (ISR) analysts, view incoming video in real time to identify important activities and events, such as those depicted in the video. The number of people needed to manage and exploit UAS platforms represents a significant and escalating workforce challenge for the DoD. This blog post describes research by the SEI's Tactical Technologies Group (TTG) into how machine learning could be used to reduce workforce needs by automating tasks that are currently performed by humans. For a simple definition and explanation of machine learning, see

Machine learning has been used successfully in many domains to reduce the need for human monitoring and intervention. One example is the use of artificial intelligence to diagnose skin cancers. An open question that we at the SEI seek to answer is

Can we build on machine-learning strategies that detect and recognize objects and summarize video to address specific DoD problems, such as the need to process UAS-captured video?

If machines can be trained to recognize activities and entities of interest to military analysts, they will have an advantage over human operators in the volume of data they can handle. The SEI is investigating how machine learning could be applied to analyzing UAS video. This task requires the capability for machines to process video data and notify human operators of significant entities or events. For example, machines could be trained to identify people carrying weapons, guarding hostages, or digging next to a road. Even recognizing when there is nothing interesting to report in a video is a significant step forward in leveraging machine intelligence capabilities.

Many machine learning strategies rely on data that is labeled or tagged with the correct answers to train a system capable of recognizing a specific target (e.g., images containing cats that are correctly labeled as such can be used to train a cat-recognizer for new data). Unfortunately, the DoD does not currently have access to large volumes of video data that is labeled or tagged to support machine learning. Project Maven, initiated by Deputy Secretary of Defense Robert Work in April of 2017, is taking on the task of labeling DoD-relevant objects in images as an initial task. Researchers at TTG are working on a variety of supervised machine learning techniques that can take advantage of Project Maven's data-labeling task, as well as unsupervised techniques that do not require data tagging. Our goal is to improve the performance of machine learning for UAS video data using the best of supervised and unsupervised techniques. Key challenges include addressing the unique characteristics of military video and leveraging unsupervised machine learning strategies in order to address a lack of labeled data.

The SEI collaborates on this work with Prof. Eric Xing of the Machine Learning Department, School of Computer Science, Carnegie Mellon University, an expert on machine learning. This work entails finding and developing algorithms that work best with video similar to that produced by UASs and improving them to work better. The SEI's objective is to be able to provide a searchable summary of video data to PED teams so that team members need only observe (or be alerted to) critical parts instead of having to look at footage in which nothing significant is happening. Determining what should be included in a good summary depends on the specific interests of the analyst, which is a challenge to this approach. The team has already developed an initial, multi-stage video summarization pipeline that we are improving.

The SEI is also collaborating with researchers from the 711 Human Performance Wing at the Air Force Research Laboratory (AFRL), which focuses on human/machine interaction for analysts and how analysts exploit video. AFRL is providing insight into the practical needs of analysts, as well as access to video data to use in research. The SEI is also using publically available infrared (IR) and electro-optical (EO) spectrum data from aerial platforms.

In addition to the Air Force and other military organizations that field tactical PED teams, federal and DoD agencies can use the algorithms developed by this SEI research for forensic analysis of video. In addition to monitoring a live video stream for a specific activity such as planting of an improvised explosive device in a road, the video could be used to track back and determine where the persons planting the device came from.

Forensic investigators can also look at video on a certain day and compare it to historical video. For example if there were three vehicles at a compound yesterday and now there are none, a forensic investigator may want to determine where they went. Identifying what has changed over a period of time and producing a video that highlights that change has great potential for forensic analysis. Summarization can pick out changes that are evident between two videos and highlight the change for an analyst.

Critical to success is making sure that the summarizations do in fact reduce the workload of the analysts and provide them with useful information that can be acted upon. A challenge for any automated system is to ensure that its users perceive the system to be helpful and to reduce, rather than increase, the user's cognitive load. Our colleagues at AFRL are helping us to understand clearly the video analyst's needs. We hope to visit sites where analysts work and determine the usefulness of the information that the summarizations provide. A long term goal of the project is to provide augmented displays for analysts in real time, feeding them relevant information, such as what happened before at the same location.

Future Outlook

Although the SEI is in only the early stages of this research, the work has significant potential to mitigate against the ever-increasing workload of video analysis. In the longer term, if the SEI and our collaborators can make progress in having machines successfully identify significant activities and events in DoD videos, systems can be built that alert personnel to individual instances or even combinations of activities and events. Even further down the road, it may be possible to recognize and search for patterns of life across multiple videos, with the ultimate goal of predicting future activities and events.

Additional Resources

Read the SEI Blog Post Experiences Using Watson in Software Assurance.

SEI blog posts on artificial intelligence

Get updates on our latest work.

Each week, our researchers write about the latest in software engineering, cybersecurity and artificial intelligence. Sign up to get the latest post sent to your inbox the day it's published.

Subscribe Get our RSS feed