icon-carat-right menu search cmu-wordmark

Why Did the Robot Do That?

Stephanie Rosenthal

The growth and change in the field of robotics in the last 15 years is tremendous, due in large part to improvements in sensors and computational power. These sensors give robots an awareness of their environment, including various conditions such as light, touch, navigation, location, distance, proximity, sound, temperature, and humidity. The increasing ability of robots to sense their environments makes them an invaluable resource in a growing number of situations, from underwater explorations to hospital and airport assistants to space walks. One challenge, however, is that uncertainty persists among users about what the robot senses; what it predicts about its state and the states of other objects and people in the environment; and what it believes its outcomes will be from the actions it takes. In this blog post, I describe research that aims to help robots explain their behaviors in plain English and offer greater insights into their decision making.

What is a Robot?

It is important to begin with a common definition of what constitutes an autonomous robot. I think of them as having three components:

  • Sensing: Many technologies today have sensors in them including smart phones and watches. So what makes robots different?
  • Decision making: They take the sensor information and decide what to do about it. I do not count remote-controlled machines as robots because they are not making decisions themselves.
  • Action on external forces (e.g., moving through an environment, manipulating objects): Although a clothes dryer has sensors and even makes decisions about how long to dry clothes, I do not consider it a robot because it does not exert action outside of its container.

A robot performs actions based on what its sensors and decision-making code tell it to do, and while it may execute exactly as it was instructed, the result may not be what you expect. For example, a robot that navigates in office spaces may be programmed to avoid hitting chairs but do so anyway. There are many possible reasons for hitting the chair that a robotics expert might investigate. Although the robot is programmed to avoid chair legs, it may not be able to detect the specific chair legs in the environment because they are narrow and hard to sense.

Alternatively, the robot could sense that some things are in its path but not know that they are chair legs that should be avoided. Finally, there could be problems with the actions. The robot may correctly sense the chair legs as chair legs and attempt to avoid hitting them, but its wheels may slip or it may be traveling too fast and hit them anyway.

Given so many possibilities for what can happen, there are many times when robotics experts and robot users alike ask, "Why did the robot do that?" Did the robot mean for that action to happen (e.g., hitting a chair as it navigates)? Was it a failure? Could the robot sometimes sense the chair but not today? Or does the robot know that it is even a chair leg? Is there some other condition that was not handled in the code?

The Importance of Trust

If a robotic system takes actions that users do not understand, how can the users know whether the robot will succeed in its task? How can they trust the robot to perform its job without constant supervision? And, if they choose not to watch it, how do they know what happened and what the robot experienced as it moved by itself (a problem detailed in our recently published papers, Verbalization: Narration of Autonomous Robot Experience and Dynamic Generation and Refinement of Robot Verbalization)? Understanding what happened on the robot is paramount to being able to trust it enough to let it run autonomously in the future.

In extreme cases, such as search-and-rescue operations, users who often do not have extensive knowledge of robots are depending on them to execute a potentially life-saving but risky task on their behalf. If they do not understand the reasoning behind why the robot took the actions it took, they may not trust it. Moreover, if they cannot trust the robots to perform those important tasks, they are more likely to perform the task themselves and risk their own lives by reducing the robot automation. Even in the best-case scenarios in which the search-and-rescue personnel continue using the robot, they may watch it perform tasks rather than applying their effort to more urgent tasks themselves.

Robot experts build trust in their robots by looking through log files, which is incredibly tedious. If a perceived or actual error occurs, they have to debug their robots using the same logs, which is time-consuming and prone to misunderstanding. Robot users that have no access to the logs are rarely given insight into their robot's actions. As a result, users have no real way to gain trust in their robots other than by watching the behavior and inferring causality.

Our aim with this research is to have robots respond in natural language to queries about their autonomous choices, including their routes taken and possible error conditions. Specifically, we want to explore ways for robots to verbalize (an analogy to visualization) their experience via natural language to reduce the uncertainty that robot experts and users alike have about their robot's behaviors and hopefully increase trust.

Explaining Robot Actions in Natural Language

I chose a natural language approach because it would enable robots to express more detail than other interfaces, such as light arrays. Natural language, however, introduces more complexity into how we represent and talk about robot actions. For example, a robot expert may want more or different information about the robot behavior than the typical user, which may require a different vocabulary.

A car GPS system is a good example. GPS system scientists may be interested in actual GPS coordinates as well as how the system decided to route them through their city. Users with knowledge of the city care more about road names than GPS coordinates, but they may also want to understand the reasoning behind why the GPS system produces one route versus another if they do not typically take the route suggested. A user who is new to town may just want road names and will not care about the route decisions. Vocabularies of road names and GPS coordinates, along with many possible route preferences (no toll roads, few left turns, etc.) are given to the system ahead of time for it to use when giving driving directions.

Looking at the GPS system another way, users familiar with a city should know the routes to particular places that they visit often. While it is not done today, a GPS system could tell these users to head towards their house and then give more instructions from a user's home to the destination. This approach reduces the number of unnecessary instructions. Someone traveling to a new part of town or traveling within a new town, however, would need full-route instructions. Route instruction length is a user preference that could be learned in addition to preferences about toll roads and turns.

Robot route explanations can be thought of the same way as GPS. However, the locations that they travel to, the reasons for choosing their routes, and the preferences for the types of information to give are less clear than for GPS. All these factors may influence a user's understanding of the explanation and their trust of the robot. To summarize, our goal for our two-year project is to create algorithms to

  • translate robot actions, which are written in code, into English
  • create algorithms for robots to generate explanations using the translations and user preferences for the types of information they receive
  • understand how natural language explanations improve trust in robots

We take a three-phase approach to this problem. First, as described in the section on crowd sourcing natural language below, we aim to collect translations of actions to English from "crowds" of people online. These crowds are willing to do short studies for a small amount of money and allow us to generate a wide variety of ways to explain robot actions in English. We can then mine this data for language patterns and synonyms of words to form the vocabulary that the robot uses for its own explanations. We also study the ways that people may change their preferences for the types of information in the explanations. We create algorithms that use these patterns and preferences to generate the explanations and study how the explanations affect user trust and the user's ability to understand and generalize robot behavior.

I am collaborating with Siddharatha "Sidd" Srinivasa, Finmeccanica Associate Professor in Computer Science at The Robotics Institute at Carnegie Mellon University. I am also collaborating with Manuela Veloso, who leads the Machine Learning Department in Carnegie Mellon University's School of Computer Science.

Crowd Sourcing Natural Language

Just as the GPS device can talk about GPS coordinates or road names, we are interested in understanding the vocabulary that people want to use with robots. There are many different kinds of robots, so it is impossible to come up with all of the vocabulary ourselves. Instead, we create small tasks for people online--the crowd--to perform and collect each of their explanations of the task to extract the words and patterns of language that they use. For example, we collected 1,400 explanations for what block a robot arm should pick up. Our crowd told us to talk about synonyms of blocks--such as cubes and boxes; colors of blocks; groups of blocks; and patterns in blocks, such as lines and circles--to help them disambiguate which block will be picked up. Similarly, an office robot may talk about offices and hallways and/or corridor numbers.

Some natural language explanations may be easier to understand than others, so we need a way to filter the good ones. For example, in our dataset, we have sentences such as Pickup the blue block that is second farthest to the left and Of the three blue blocks that are towards your left, pick the one in the middle. It is not clear which is easier to understand.

We ask a new crowd to try to understand each sentence and tell us what the robot will do. Each sentence may be tested on many new people. The more accurate the new crowd is in understanding the sentence, the better it is. For example, when talking about blocks on a table, the explanations that include references to left and right without indicating whose left and right, the robot's or the user's, are more difficult to understand.

We create algorithms to generate new explanations using the good vocabulary and the patterns of language (e.g., using perspective taking words such as my left and your right instead of just left and right). Our block explanation algorithm searches through all combinations of blocks to find a sentence that is similar to the ones that the crowd was most accurate at understanding.

Understanding User Preferences

While the first phase of our work focused on figuring how people talk about what the robot is doing, the next question is what the users prefer to hear. We started by modeling preferences along three parameters:

  • Abstraction: the vocabulary a robot uses to talk about its path or actions--for example, in the hallway versus an x, y GPS location.
  • Locality: the region the user is interested in.
  • Specificity: how much detail to provide.

For each given preference, the robot can generate a new and different sentence similar to how a GPS device might tell a person knowledgeable about a city to head towards home rather than giving turn-by-turn directions to a place they already know how to get to. We can study how these preferences improve the usability of the explanations (e.g., because they are tailored to the user, possibly shorter and less verbose for frequent users).

Piloting Our Approach with Search-and-Rescue Teams

As part of our research, we are collaborating with Dr. Joshua Peschel, assistant professor in Agricultural and Biosystems Engineering at Iowa State University (effective January 1, 2017). Dr. Peschel has developed computer vision, field robotics, and human factors-based methods to improve sensing and sense-making in agricultural, natural, and urban systems. Through a collaboration with Dr. Peschel's company, Senformatics, we will be deploying our explanations on his robot boats to understand how the explanations impact the trust of a variety of types of users including water rescue personnel and civil and environmental engineers monitoring the waterways.

Wrapping Up and Looking Ahead

As autonomous mobile robots continue to play an increasing role in our lives through smartphones, autonomous vehicles, and search-and-rescue operations, it is important to seek out ways to improve trust between users and robots. One method for doing so that we are exploring is to increase trust by enabling robots to convert sensor data into natural language to describe its experiences and reasoning to human users. Moving forward, we're focusing on creating explanations that help people generalize the robot's experiences to what will happen in the future (if it turned left this time, it will turn left next time).

We welcome your feedback on this research in the comments section below.

Additional Resources

Verbalization: Narration of Autonomous Robot Experience, which I coauthored with Sai Selvaraj and Manuela Veloso.

Dynamic Generation and Refinement of Robot Verbalization, which I coauthored with Vittorio Perera, Sai P. Selveraj, and Manuela Veloso.

Enhancing Human Understanding of a Mobile Robot's State and Actions Using Expressive Lights, which I coauthored with Kim Baraka and Manuela Veloso.

Spatial References and Perspective in Natural Language Instructions for Collaborative Manipulation, which I coauthored with Shen Li, Rosario Scalise, Henny Admoni, and Siddhartha S. Srinivasa.

Written By
CITE

Get updates on our latest work.

Each week, our researchers write about the latest in software engineering, cybersecurity and artificial intelligence. Sign up to get the latest post sent to your inbox the day it's published.

Subscribe Get our RSS feed