Challenges in Network Monitoring above the Enterprise
Recently George Jones and I attended USENIX Security '11. We hosted an evening Birds of a Feather (BoF) session where we asked a question of some significance to our CERT® Network Situational Awareness (NetSA) group:
Is Large-Scale Network Security Monitoring Still Worth the Effort?
One of the foundational principles behind most organizations' network security practices is still "defense in depth," which is implemented using a variety of security controls and monitoring at different locations in an organization's networks and systems. As part of a defense-in-depth strategy, it has become commonplace for organizations to build enterprise security operations centers (SOCs) that rely in part on monitoring the extremely large volumes of network traffic at the perimeter of their networks. There has been a recent trend toward increased investment in (and reliance on) network monitoring "above the enterprise" in order to simplify sensor deployments, decrease cost, and more easily centralize operations. At the same time, the idea of a well-defined defensible perimeter is being challenged by cloud computing, the insider threat, the so-called advanced persistent threat problem, and the prevalence of socially-engineered application-level attacks over network-based attacks. For an opinion piece about how things have changed, read Rik Farrow's article in the USENIX magazine ;login:.
The purpose of the BoF was to revisit some of the assumptions behind approaches to large-scale network monitoring at this level. We also wanted to lead a discussion about the challenges we face in monitoring, especially in light of these changes. We considered the following questions.
What problems do we confront when monitoring at the supra-enterprise level?
We discussed a number of challenges, many of which are the result of networks not being architected with "monitorability" as a priority. We also discussed the following factors:
- Everything in HTTP[S]
- NAT, proxies, tunneling
- Carrier-grade NAT/IPv4 islands
- Lack of knowledge of policy and assets
- Legal restrictions
What data can we expect to remain unencrypted?
We can expect that as more and more traffic is encrypted, we'll still be able to see the following data that must remain unencrypted in order for an IP network to function properly:
- IP headers (traffic summaries) - Packets have to be routed by the public infrastructure, which means that IP headers will remain unencrypted for the foreseeable future. This will enable various traffic analysis techniques. However, it's worth noting that tunnels (including IPv6) and anonymizing networks like Tor will affect what we see.
- DNS queries and responses - While DNSSEC deployment will mean that DNS responses will be digitally signed, we can expect that the content will remain unencrypted. This will enable analysis that will support the identification of new malicious domains and the detection of the use of DNS by malware.
- BGP and related routing protocols - Just as we can expect IP headers to remain unencrypted, we can expect BGP to remain in the clear.
In addition, there is other "global metadata" that can be combined with monitoring data and used for analysis. This metadata includes registration data (i.e., "whois" data), gTLD zone files, public certificates for certificate authorities, website reputation data, and RBL lists.
What can you still analyze at the supra-enterprise level?
Using traffic analysis techniques, we can see phenomena that appear as changes in traffic patterns. We identify these variations by developing indicators for the following:
- Worms, DDoS, floods, large-scale scans
- The scale and scope of global attacks (e.g., all banks, etc.)
- Detection based on locality (e.g., identifying traffic from a particular country)
A literature search on intrusion detection using traffic analysis will identify a variety of papers. For example, there are a number of papers in RAID proceedings. Some examples can also be found in the FloCon® proceedings, available at the CERT FloCon site.
Using a combination of traffic analysis, DNS, and (selective) content capture, we can develop heuristics that can function as indicators for the following:
- Spear phishing, spammers and botnets
- Malicious domains with DNS analysis (We have published a blog entry about this topic, and the USENIX Security proceedings also include a related paper.)
In general, analysis based on a broad view of network traffic remains invaluable as part of incident analysis. It provides a way to understand the traffic associated with a particular incident and to identify activity occurring elsewhere in the network that matches a particular pattern.
A broad view of DNS and our network's traffic also enables a whole class of analysis we might call "indicator expansion"-various ways in which we can take a single indicator of malicious activity, like a single IP on a watch list, and find additional IPs also associated with the malicious activity of interest. This expansion can be based on a behavioral detection algorithm; for example, heuristics for enumerating the IPs of all the bots in a botnet. We can also often expand our watch list by leveraging DNS or other global metadata to associate an IP with a DNS name or a real-world entity, and to then map that entity back to additional IP addresses that we can add to our watch list.
How are attacks changing?
One thing we can say for sure is that attacks are moving up the application stack. In addition to targeting ports, servers, and hosts, they now target applications like web browsers and PDF viewers, as well as users themselves. The goal is to be able to monitor the users and the assets they control. It's not entirely clear what we can rely on being visible at this level in the future.
There are several big questions that need to be answered in order to formulate a strategy for supra-enterprise monitoring:
- What kind of selective content capture should we doing?
- At what point do we need a different monitoring approach (on hosts, systems, etc.)?
- How does the picture change at lower levels, (e.g. enterprise and below)?
What are some monitoring techniques that can still work?
During the BoF session, we discussed the following techniques:
- Re-routing suspicious traffic to a place it can be monitored. This could include selective full-packet capture.
- Leveraging routers and switches to generate traffic summaries (NetFlow/CFlowD, SFLow, etc.)
- Intelligent sampling
What about "the cloud?"
- We discussed how "the cloud" is a problem because we can no longer rely on being able to distinguish individual virtual host endpoints within a cloud infrastructure. This could be solved by ensuring that NAT does not happen before the monitoring point. One thought: assign IPv6 addresses to everything, no more NAT.
- Will Google, Amazon, and other vendors invest in the infrastructure required to do monitoring? Should this come standard with hosting services?
- Will cloud providers provide flow or monitoring data? Should this be standard practice? What about other monitoring options for your servers?
- Monitoring requirements could be incorporated into providers' terms of services agreements.
- What about cloud-to-cloud attacks? Could attackers provision E2C machines to attack users on that platform?
What about mobile?
We finished the session up with a brief discussion of mobile. We have the same endpoint issue as the cloud in a world of 3G devices. In the case of 4G, we can expect that it will be common to assign IPv6 addresses to the mobile device endpoints.
At the end of the session, one of the participants suggested ironically that as data moves to "the cloud" and users move to mobile devices using third-party networks, a larger percentage of the traffic that remains on corporate networks might actually be illegitimate, malicious, and otherwise unrelated to business purposes.
Continuing the discussion...
We hope to continue this discussion about exploring the ways that supra-enterprise network monitoring is changing, what techniques can be effective, and where new approaches are needed.