search menu icon-carat-right cmu-wordmark

Practical Math for Your Security Operations - Part 2 of 3

Hi, this is Vijay Sarvepalli, Security Solutions Engineer in the CERT Division again. In my earlier blog post, I offered some ideas for applying set theory in your SOC (Security Operations Center). This time I introduce you to statistics, specifically standard deviation. Mathematical terms such as standard deviation can seem mysterious for daily security operations. However, I've provided some simple examples to help you analyze network security data using this measurement.

In simple terms, standard deviation is a measure of deviation from the mean. This measure can be used to evaluate the distribution or diversity of data. One practical use of standard deviation is to detect DNS clients that are vulnerable to DNS cache poisoning vulnerability (CERT VU#800113). In this case, the objective is to look for the predictability of UDP client port numbers when querying a DNS server.

We can use network flow data or passive DNS data (even better) to calculate the standard deviation and use derived numbers to find unpatched systems that are vulnerable to a DNS cache poisoning attack.

The formula we use for calculating the standard deviation is shown below, where a standard deviation of a series (x) with a mean value of mu (μ) is σ:

Formula used for calculating the standard deviation.

We will perform this calculation on network flow data in a SQL (Structured Query Language) database to detect machines with weak randomness in selecting source ports for DNS queries. A minimum of 100 samples of DNS requests from each machine is required to calculate the standard deviation. The UDP port numbers that can be used by a machine for a DNS query range from 1023 to 65535. A standard deviation value of 15,000 or more indicates well-distributed (less predictable) port numbers used by the operating system. A standard deviation below 4000 indicates poorly distributed port number selection by the operating system.

Often, database software provides standard functions that can be used to calculate statistical measures such as standard deviation. The screenshot below is an example of a passive DNS portal using a MySQL query to search for IP addresses of internal systems that are querying external DNS servers (8.8.8.8) and show a poor distribution of port numbers.

The SQL query used in this example is

SQL>select sip AS 'DNS Client IP', count(sip) AS 'Occurrence',avg(sport) AS 'Ave' ,max(sport) AS 'Max' ,min(sport) AS 'Min' ,ceil(stddev(sport)) AS 'Std. Dev' FROM dns where dport=53 and dip='8.8.8.8' group by sip having count(sip)>100 order by stddev(sport) ASC LIMIT 10;

2618_practical-math-for-your-security-operations-part-2-of-3_1

The above example lists the top 10 IP addresses that show predictable port numbers when querying the external DNS server. Sometimes standard deviation can yield results that do not necessarily reflect the lack of randomness . In the example, the IP address 10.233.24.11 has a very predictable source port number (53) that is used many times. The reason for its appearance on the list is that if a probability distribution for a series strongly peaks at two points far apart, it can yield high standard deviation values.

As an additional math exercise, you can build a "box and whiskers" representation of this data to visualize the differences between the two DNS client systems. In the chart below, the system shown on the left represents the server with the IP address 192.168.30.1 in the table above. This system uses a small fraction of its available pool of port numbers (1023-65535), effectively using only port numbers ranging from 39636 to 40625. The system shown on the right represents a system that uses a wide port range and is not susceptible to the DNS cache poison vulnerability.

pDNS-box-whiskers.png

The topic of my next blog post will be to explore more about entropy, a measure of uncertainty that can be used to similarly detect anomalous behavior in network data. Finally, I will highlight other useful topics, such as algebra, number theory, and probability theory, that can be used to effectively begin equipping your security analysts to better use the data in your SOC.

If you have any questions or comments, contact us at netsa-contact@cert.org.

Get updates on our latest work.

Each week, our researchers write about the latest in software engineering, cybersecurity and artificial intelligence. Sign up to get the latest post sent to your inbox the day it's published.

Subscribe Get our RSS feed