Operating at the Edge

Unlike traditional computing, where processing is mainly performed on local servers and in the cloud, edge computing pushes applications, data, and computing power to the edge of the Internet—to mobile devices, sensors, and end users. One of the main drivers for edge computing is that the number of devices connected to the Internet, and the volume of data being produced by those devices and used by governments and businesses, is growing far too quickly for traditional computing approaches, networks, and data-center infrastructures to accommodate.

The edge is therefore wherever a person is trying to use a software service. In an urban environment, the edge could be a first-tier cellular provider, a public Wi-Fi network, or a private-enterprise network. In these settings, access to resources in the cloud is simply assumed because connectivity for the most part is always available. However, there are other edge environments in which connectivity to the cloud cannot be assumed, such as environments in which first responders and military personnel operate.

As this post describes, the SEI is conducting applied research for developing innovative solutions, principles, and best practices for architecting, developing, and deploying systems to support teams operating in remote locations away from central computing resources where resources are constrained. This research includes using artificial intelligence and machine learning (AI/ML) at the edge for improved capabilities and mission support.

We are starting this blog series called “Operating at the Edge” to share concepts related to edge computing and our ongoing work and experiences architecting and developing edge systems. This blog post introduces concepts and challenges for Operating at the Edge.

The Humanitarian Edge

We refer to a disadvantaged edge environment in which first responders and other emergency personnel operate as the humanitarian edge. It is characterized by limited computational resources operating in unsafe conditions caused by weather events, seismic events, or even infrastructure failures, in which things can change at a moment’s notice. A more specific characterization is shown in the callouts in Figure 1 below.

Figure 1: The Humanitarian Edge: First Responders and Humanitarian Aid

First, the local network infrastructure will often be down or have limited connectivity, making cloud resources unavailable or unreliable. The computing resources that are available, typically handheld or vehicle-mounted, will be limited by available power, which could be a critical issue if fuel or other power resources are also down. Then, there is the context of the crisis; especially in natural disasters, first responders go in without knowledge of the actual situation on the ground. In other words, there is a discovery-and-triage process that requires constant shifting of resources to prioritize issues as they occur, especially when the need is medical.

As a result, first responders must rely on the resources that they carry into the situation, with only a loose plan of prioritization as issues are identified. Moreover, these first responders are typically underequipped and forced instead to rely on resources supplied over time or as circumstances allow. Therefore, the humanitarian edge is a challenging environment of maximizing available resources (computational, networks, and other) on an unknown timeline, for unknown issues, and being able to continue operations in these environments regardless of resource exhaustion or the individual challenges of the cause of the disaster. Given the high number of unpredictable conditions, the humanitarian edge is characterized by its extreme uncertainty.

The Tactical Edge

The tactical edge is the field environment where military personnel execute missions (i.e., “boots on the ground”) and can range from air to ground to undersea, and from urban to rural settings. Figure 2 below shows a canonical example of military operations in a desert rural environment in which different types of military groups are present and performing actions.

Figure 2: The Tactical Edge: Warfighters and Military Assets

In these environments, warfighters are trained to bring whatever they need with them, to ensure that they have the resources that they need to perform their missions. As a result, while there may be some local infrastructure and resources, they are typically ignored in favor of military assets. However, the tradeoff is that military resources are often limited at the tactical edge, including power to computing resources to connectivity, and operate in disconnected, intermittent, and low-bandwidth (DIL) network environments. While resources and limitations are known, the risk is always that the nature of tactical missions will result in resource needs that are not already provisioned, resulting in a disadvantage. Compared to other edge environments, such as the humanitarian edge, the tactical edge is much more planned, but unfortunately in many cases there will be an adversary trying to thwart or sabotage the mission, increasing the need for continued operation as adversaries succeed in their efforts.

Challenges Operating at the Edge

Systems that are designed to operate at the humanitarian or tactical edge need to account for the following challenges intrinsic to these environments:

limited computing resources—Due to size, weight, and power (SWAP) limitations, devices deployed to the edge are limited in their computing resources, such as CPU/GPU power and memory. Accessing cloud resources may not be possible. To instead bring these capabilities to the edge, software engineering efforts must design for distributability, which is the ability to split system components across multiple computing nodes.
intermittent/denied network connectivity—Many computing capabilities are designed for “always-connected” networks and do not handle intermittent connections gracefully. Software written for tactical edge nodes must be robust to handle missions that operate in areas of limited network infrastructure, or where communications are actively denied because of adversary efforts or natural disasters.
limited attention—Both in humanitarian and tactical edge systems, operators must perform their missions under extremely stressful conditions with full attention and rapid decision-making, leaving little to no time to focus on additional devices and interfaces. Visual interfaces must be simple and uncluttered and should display only the most relevant information. Input devices should be simple, configurable, intuitive, and tactile. Information served to the operators must be focused on the most critical, timely, and relevant aspects of the mission.

Quality Attributes for Systems at the Edge

A quality attribute (QA) is a measurable and testable property of a system that is used to indicate how well the system satisfies the needs of its stakeholders beyond the basic function of the system. A system’s ability to meet its desired quality attributes is substantially determined by its architecture. What follows is a list of quality attributes that are common to edge systems and that must be realized to address the challenges of edge environments. These are listed in no particular order; their priority is dictated by context and what are considered acceptable tradeoffs.

reliability—Reliability is the ability of a system to continue operation under fault conditions. It is a key quality attribute to consider in edge environments for continued operations. All system components need to be resilient to failure and have recovery strategies. For example, systems should be designed to reconnect automatically without locking up when the network disconnects. Temporary recovery strategies could be used, such as spinning up local services to replace unreachable remote services. System architects should also consider redundancy and alternative options for mission-critical capabilities.
privacy—Data collected at the edge could have personally identifiable information (PII), which is an especially critical concern in humanitarian situations, where the goals of saving lives and protecting an individual’s personal information should not be mutually exclusive. Privacy should therefore be built into components that clean data before sending it to the cloud, if available. Systems may need to do all of the processing on the edge node if PII data cannot be sent to the cloud, because of either policy restrictions or lack of user consent.
security—In tactical edge environments, computing nodes are operating near adversaries, who are often looking for opportunities to thwart or sabotage the mission, which makes security a paramount concern to successful operations. If edge nodes are operating on local networks, the assumption must be that adversaries are monitoring the network, and all data must be encrypted. If edge nodes are operating with sensitive or classified information, special zeroization functionality should be included so that data can be quickly destroyed if the node is compromised.
adaptability—Operations at the edge—whether they are disaster relief missions or active combat— often come with some level of uncertainty. Edge systems must be able to adapt to meet constantly shifting priorities. For edge systems to deal with uncertainty, engineers must design them with adaptability in mind. Edge software should be able to adapt during operations to both the data it is processing and the type of computation it is executing. An operator may not have the attention to adjust the software on-the-fly, so setting up specific modes for different phases of the mission can provide flexibility with simplicity for the operator.
scalability—Devices at the edge are often small and specialized. Hundreds or thousands of nodes may need to discover each other, connect, and share data, leading to scalability challenges. These nodes are often heterogenous, including different hardware and software elements. In addition, tactical and humanitarian environments often have multiple entities—such as local populations or non-governmental organizations—with distinct devices and computing nodes. Resulting scalability challenges include how to get all the nodes to interoperate, how to process all of the data, and how to minimize load on the network.
interoperability—Given the breadth of organizations and devices performing tasks at the edge, it is also critical to build for interoperability, which is the degree to which two or more systems can usefully exchange meaningful information through interfaces. Taking advantage of messaging middleware can ease the process of systems integration. Using cross-platform network-serialization libraries, such as protocol buffers and interfaces that clearly define formats, can help prevent data-formatting errors.
survivability—Given the edge challenges, a system must be able to survive them. Survivability means that the system should continue to operate as well as possible despite damage to hardware (computing resources, power, or networking) or the user. Survivability is typically related to the reliability challenge. However, the key distinction is that reliability is about a system adapting to a challenging environment to maintain capability. In contrast, survivability is about acceptance of physical resource loss and maintaining capability as well as possible with the resources available, which is where the notion of scaling down comes into play. Given a physical loss of computing resources, power, or networking, the system should adapt and maintain as much capability as possible, ideally prioritizing capabilities based on the current needs of the user.
distributability—Given the limited capability of edge devices, a system at the edge will be most capable if it takes advantage of nodes working together. Distributability means that a system can operate across multiple edge nodes in concert. The idea is that, given the assumption that devices in the edge environment will have limited resources, increased capability might be realized by distributing computation among the devices that are available. Implementing capabilities as microservices is an ideal architectural tactic to support this quality attribute. This approach involves developing a set of standardized microservices and deploying them across multiple devices on a locally connected network, depending on the needs of the moment. In this way, the resources that are currently available can collaborate on tasks, freeing the user from being restricted by the resources that they have physically on hand, especially if their personal devices are damaged or otherwise unavailable due to the challenges of the environment.
openness—Given the challenges at the edge and the historical mix of software and devices that are provided to soldiers and first responders, we cannot expect that one vendor or organization will solve all of these challenges. Openness means that the system must use well-defined, open, and well-supported interfaces and platforms so that any organization with an effective software or hardware solution can easily integrate with the system. It is our experience that when it comes to edge environments, different parties provide pieces of the solution and not a full solution. As such, enabling easy integration of components and using well-defined interfaces becomes a critical feature of any architecture. In these efforts, the capability to integrate in days or weeks vs. requiring weeks or months is key.

Context and Other Aspects of Edge Computing

The challenges of operating in edge environments become part of the context for architecting edge systems and also provide input for prioritization of quality attributes. For example, in a tactical edge system, security and distributability might be the top quality attributes if sensitive data is being collected and must be processed at the edge. In a humanitarian edge system, the top quality attributes might be interoperability and openness if multiple organizations are coming together to provide aid. In addition to context, the system might be constrained by other factors, such as existing legacy systems, heterogeneity of new and old hardware, and other critical system attributes that we haven’t covered here, including latency, maintainability, and evolvability.

Future posts in our series will focus on other aspects of edge computing, such as using AI/ML components as part of edge systems, service management and deployment, efficient data processing, and security when Internet of Things (IoT) devices are involved.

To learn more about our work in this area, please see the additional resources listed below.

Software Engineering Institute

SEI Blog