Improving Security in the Latest C Programming Language Standard
Buffer overflows--an all too common problem that occurs when a program tries to store more data in a buffer, or temporary storage area, than it was intended to hold--can cause security vulnerabilities. In fact, buffer overflows led to the creation of the CERT program, starting with the infamous 1988 "Morris Worm" incident in which a buffer overflow allowed a worm entry into a large number of UNIX systems. For the past several years, the CERT Secure Coding team has contributed to a major revision of the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC) standard for the C programming language. Our efforts have focused on introducing much-needed enhancements to C and its standard library to address security issues, such as buffer overflows.
These security enhancements include (conditional) support for bounds-checking interfaces, (conditional) support for analyzability, static assertions, "no-return" functions, support for opening files for exclusive access, and the removal of the insecure gets() function. This blog posting explores two of the changes--bounds-checking interfaces and analyzability--from the December 2011 revision of the C programming language standard, which is known informally as C11(each revision of the standard cancels and replaces the previous one, so there is only one C standard at a time).
I work on the CERT Secure Coding team, where I've made technical contributions to the definition of the new C features for addressing security. I've also chaired Task Group PL22.11 (programming language C) of the International Committee for Information Technology Standards (INCITS), representing the United States. Working with SEI colleagues Robert C. Seacord and David Svoboda, I helped develop, refine, and introduce many of the security enhancements to this major ISO standard revision.
Bounds Checking Interfaces
Until the latest update of the C standard, its security features had been limited to the snprintf() function, which was introduced in 1999 and whose implementations have some quirks. Previous iterations of the C library contained functions that did not perform automatic bounds checking. Instead, C library implementations assume programmers provide output character arrays that are large enough to hold the result and return a notification of failure if they were not large enough.
The C standard now includes a library that provides extensions that can help mitigate security vulnerabilities, including bounds-checking interfaces. For example, the strcpy() copy function in previous versions of the standard C library did not check the bounds of the array into which it was copied. A buffer overflow will occur, therefore, if a programmer uses strcpy() to copy a larger string into a small array and does not explicitly check the bounds of the array prior to making the call to strcpy().
One remedy to the strcpy() problem is to use the strncpy() function, which provides bounds, but won't terminate the string with a null character (whose value is 0) if there's insufficient space. Situations like this create a vulnerability because data can be written past the end of the array, overwriting other data and program structures. This buffer overflow vulnerability can be (and has been) misused to run arbitrary code with the permissions of the defective program. If the programmer writes runtime checks to verify lengths before calling library functions, then those runtime checks frequently duplicate work done inside the library functions, which discover string lengths as a side effect of doing their job. The new bounds-checking interface provides strcpy_s(), a more secure string copy function that not only checks the bounds of the array that it is copying into, but also ensures that the string is terminated by a null character.
Another aspect of the C programming language we focused on in C11 is analyzability, which deals with so-called "undefined" behavior. Undefined behavior arises when a programmer uses a nonportable or erroneous program construct or erroneous data for which the C standard does not impose a requirement. The C standard includes several areas of the C language with undefined behavior because behavior of those areas depends on compiler implementation details. An example is signed integer overflow. Different hardware behaves differently on signed integer overflow, so trying to make the language mandate one method of dealing with it would negatively affect performance on some systems because the standard behavior would not match what the hardware does.
There are many areas in which the standard makes accommodations for various kinds of hardware, and they are all lumped together into the undefined behavior category. Since the C standard doesn't constrain how a compiler implements undefined behavior, it could conceivably do anything, such as cause the machine to halt and catch fire, though compiler writers who do this might not find many professional programming customers!
We examined this issue and realized that in practice, there are two categories of undefined behavior:
- behavior for which we really cannot say what will happen, such as storing data outside the bounds of an object, and
- behavior where the implementation really should do something reasonable, such as signed integer overflow
We created the Analyzability Annex in C11, in which we labeled the former behavior "critical undefined behavior," indicating that the consequences could be serious. The latter category we called "bounded undefined behavior," because we can say with certainty that nothing unpredictable should be allowed to happen as a result.
The category of critical undefined behavior is a small subset of undefined behavior, which means that most undefined behavior becomes bounded. We didn't have to change the spirit of the C language to do this, because all we did was specify that bounded undefined behavior is not allowed to store data outside the bounds of an object. In the example of signed integer overflow, this means the compiler runtime implementation could choose to return some reasonable result, cause a trap that terminates the program, or simply print a message and move on. As long as it does not perform an out-of-bounds store, anything is permissible.
The bounding of undefined behavior allows analysis tools to know that a C program will not have unpredictable behavior except in a very small set of circumstances, which is why we called it the Analyzability Annex.
Other Areas of Research
While our work to date has focused on the ISO C standard and helping programmers prevent critical undefined behaviors, the CERT secure coding team has also been working on the CERT C Secure Coding Standard, which contains a set of rules and guidelines to help programmers code securely. Those guidelines, which will be the subject of an upcoming blog post, leverage our work on the ISO standard to help programmers avoid undefined behavior, as well as behavior that programmers might not have expected when writing their code.
The CERT C Secure Coding Standard also serves as a foundation for the Source Code Analysis Lab (SCALe), which is our software auditing service that can be used to find vulnerabilities and weaknesses in any codebase. SCALe uses a suite of static analysis and dynamic analysis tools to find vulnerabilities in a codebase, based on the patterns and guidelines defined in the CERT C Secure Coding Standard.
For more information about the new ISO standard for the C programming language, please visit
The C standard is available for purchase in the ANSI Web Store.
For more information about the work of the CERT Secure Coding Team, please visit
For more information on the CERT Source Code Analysis Lab (SCALe), please visit