Obsidian: A New, More Secure Programming Language for Blockchain
Billions of dollars in venture capital, industry investments, and government investments are going into the technology known as blockchain. It is being investigated in domains as diverse as finance, healthcare, defense, and communications. As blockchain technology has become more popular, programming-language security issues have emerged that pose a risk to the adoption of cryptocurrencies and other blockchain applications. In this post, I describe a new programming language, Obsidian, which we at the SEI are developing in partnership with Carnegie Mellon University (CMU) writing secure smart contracts in blockchain platforms.
Quick Recap: Bitcoin and Blockchain
In an earlier blog post, What Is Bitcoin? What Is Blockchain?, I introduced blockchain, which is simply a distributed ledger that tracks transactions among parties. Its appeal lies in its fundamental properties, which apply to every transaction:
- All parties agree that the transaction occurred.
- All parties agree on the identities of the individuals participating in the transaction.
- All parties agree on the time of the transaction.
- The details of the transaction are easy to review and not subject to dispute.
- Evidence of the transaction persists, unchangeable, over time.
This combination of properties results in a system that, by design, timestamps and records all transactions in a secure and permanent manner and is easily auditable in the future. In addition, due to its distributed nature, the system is highly resilient to downtime and is unlikely to crash. All these properties combined make blockchain an appealing system for a wide variety of applications, and explain much of the interest in the technology.
The best-known application for blockchain to date is its use in a cryptocurrency called Bitcoin. Bitcoin relies on blockchain technology, but it is important to appreciate that not all blockchains are Bitcoin; blockchain is the underlying technology, not a specific end-user application. Moreover, many companies have created altcoins and are using initial coin offerings (ICOs) as a method of raising capital, but this is also not a requirement for blockchain technologies. Fundamentally, you can think of a blockchain like a computer: a piece of infrastructure on which software can run. This infrastructure can be public-facing, like computers hosting websites, or private, such as servers hosting proprietary software.
Following the Bitcoin example, the Bitcoin software is fairly limited in its capability, being able to do only two tasks: send transactions and create blocks. Other blockchains such as Ethereum are more sophisticated in that they can run virtually any program that a developer can create. Instead of tracking only account balances, Ethereum makes it possible to install fully-functional programs on the blockchain, which enables much broader applicability.
The Increasing Importance of Security
There is a lot of interest in blockchain within the defense community, in the Defense Advanced Research Projects Agency (DARPA) and other DoD organizations. The applications include financial management of the National Income and Product Accounts (NIPA) system, which transfers money among government agencies; a Small Business Innovation Research (SBIR) call went out seeking help with this system.
While the blockchain properties of transparency, resiliency, and forgery resistance are highly desirable in finance and currency applications, many other areas are attracting government attention, as well. One specific area of intense focus has been supply chain management. The auditability of blockchain lends itself to tracking elements in a supply chain so that they can easily be tracked, even multiple layers deep in the chains. Moreover, the identity aspect of blockchain makes it possible to tell, for example, that one supplier bought from another supplier, who bought from that supplier, etc. Such "chain of custody" information would usually be hard to find and track, but blockchain technology makes this tracking much more transparent.
We are also seeing applications in simple automation where people are using blockchain for standard processes. One of the well-touted adopters is the General Services Administration (GSA), which took its standard process for onboarding new government contractors, put a component of it on a blockchain, and released it. That is one of the first real government applications of a blockchain technology.
As experimentation proceeds, many government agencies are approaching blockchain with caution after having learned hard lessons from previous attempts to bring new technology into government. Blockchain failures have been public and visible, and these failures underscore the need for caution in mission-critical applications.
Limits of Current Practice
Despite all the above, blockchain is still a technology in its infancy. The tools available to develop blockchain applications are widely regarded as being hard to use and easy to use incorrectly. Probably the most notorious example was the blockchain-based venture capital fund known as the DAO, or decentralized autonomous organization. Individuals joined this group simply by adding cryptocurrency to the DAO's "pot," so to speak. Distribution of money was then accomplished through voting on a blockchain. Famously, because of a programming error, someone was able to siphon the money out, and the DAO lost tens of millions of dollars. After that incident, it became evident that the programming language the programmer had used was hard for developers and made it too easy to make such a critical mistake.
In addition to the DAO theft, there have been others. People have a hard time writing correct programs in the languages that are currently in common use. One such language is Solidity, which has been shown to be insecure by the DAO example and others. While other blockchain platforms use more standard languages, such as Go or Java, the development of blockchain-based applications requires programmers to use different design patterns than are typically used in programming. It is easy to make a critical mistake simply through lack of knowledge.
The Development of Obsidian
In our work, we looked at the set of application domains that people are interested in using for blockchain, and we observed some commonalities. We seek to design a language to provide stronger guarantees than would be available when writing these same applications in Solidity, Go, or Java. Our work was motivated by two general principles:
- The language should explicitly force the developer to consider blockchain-specific design at every stage of development.
- Bugs should be caught earlier in compile time rather than waiting until the system is deployed, by which time it might be too late to fix them.
The current standard practice for programming in Solidity is to provide an escape hatch for an error condition. A third-party that all participants trust is designated with access to all of the resources in the contract, and, in the event of an error that was unanticipated by the author of the code, all the money is transferred to that third-party. The rationale is that transferring to the third-party is better than losing the money altogether if that trusted third-party acts in a trustworthy way, and if the author of the code remembers to insert this trap door in every appropriate place. However, that is a lot of ifs for developers to remember and add to their applications.
We observed that many of these applications are typically stateful at a high level. To take one example, think of financial insurance for a bond. A company issues a bond available for sale, and then eventually somebody can buy the bond. The bond changes state from being available for sale to having been purchased. As soon as the bond has been purchased, it makes no sense to purchase it again. If you were to implement this in a standard object-oriented language, such as Java, you would have a method that allowed any caller to buy the bond, and it would do a runtime check to see whether the state was valid for the transaction that was being evoked. While this approach works, it would rely on this dynamic check.
Instead, we would like to provide that kind of guarantee at compile-time rather than at run-time. By moving the guarantee earlier, we could catch the potential bug earlier. Research shows that it is cheaper to fix bugs if they are discovered earlier rather than later. In the context of blockchain, this is especially important because blockchain programs are immutable. After a defective program is published on the blockchain, it can't be fixed directly: a new program must be coded and added to the blockchain and then all users must be convinced to migrate to the new version. This sort of permanence is one of the attributes of blockchain, i.e., after something has been added to a blockchain, it is in that spot forever.
One reason we are seeing so many bugs in blockchain applications is that programmers are coming from web-development, game-development, and finance-development backgrounds. They are not used to thinking in terms of exchanging money and the need to treat it differently. Moreover, in traditional programming, bugs are easily fixed through patches.
For example, web developers specifically never have to worry about users updating their applications, because each time users visit the website they receive the most up-to-date version of code. The correctness of web programs is thus a function of their developer's diligence and their ability to recognize atypical cases. We are trying to make a language that explicitly embeds knowledge of atypical cases so that even if programmers don't catch potential problems themselves, they need not think about all the contingencies. We are embedding this kind of domain knowledge into the constructs of the Obsidian language.
Obsidian includes two features largely absent from current languages, including those used for blockchain development:
- State-oriented programming allows the user to declare and transition among states explicitly. Our research suggests that a large fraction of blockchain programs are organized around a high-level state machine.
- Linear types ensure that important resources managed by programs are managed correctly. This prevents the possibility of accidentally having money vanish into thin air; the compiler keeps the user safe.
Obsidian specifically forces the programmer to consider matters that they wouldn't necessarily consider in existing languages because they have to type them all out. More is made explicit and less is implicit.
There is a type of computer-program analysis called typestate that allows a computer to determine what parts of the program can be run based on what else has already happened. Continuing the finance analogy from earlier, once a bond has been sold, we can't sell it again. Typestate analysis helps us determine this kind of information on a computer. In Obsidian, we use typestate to represent high-level states of software interfaces and to reason statically--that is, at compile-time--about the states that are available and what operations are available in each state. We can therefore provide compile-time guarantees about available states.
The other observation we made is that many blockchain programs are based on some sort of resource. Think of money for example, a virtual currency or some other kind of resource. These quantities are different from the kinds of objects that you might typically pass around in object-oriented programming languages. For example, if I construct a color object to represent the color of a pixel on the screen, and I pass it to some interface, we will both have a reference to that color object. If I modify it, maybe we should have established some kind of agreement--a service-level agreement or some kind of application programming interface specification--regarding what the modifications were going to be in case I am violating anybody's invariants.
This approach is different from money. If I give you money, I don't have that money anymore; it is no longer my money, it is yours. I can talk about how much money it was, but I cannot give it to you and then also give it to someone else, because I already gave it away once and cannot give it away again.
In the context of programming, you could easily imagine a bug in which I accidently give away money twice. You could also imagine a bug in which I take some money and then forget to do anything with it. I assign it to a variable, which is in some scope, then there is a bug in which that variable goes out of scope, and that reference to the money is lost and the money is gone and unable to be retrieved. There are many different quantities in blockchain applications that have this kind of property; for example, votes in a voting system work the same way.
Anytime you have a static resource pool where the change of the state of the resource changes where the resource is located, you want to be able to guarantee that that transaction can actually occur--that it is appropriate, that it can occur, and that it actually has occurred--before you get to run-time. After you get to run-time, if you took my money and the variable goes away, the money may be lost forever. A bug similar to this caused the loss of $280 million in Ethereum in Nov 2017.
The term we are using in Obsidian to think about this concept is "ownership." If I own some money, you don't own any money yet. If I give you the money, you own it and I don't. Obsidian makes it much easier to track what is going on with resources such as money. In programs written in one of the existing languages, if I as a user am trying to track resources, I am doing it manually; I have to keep counts to make sure that I have spent all the money, and it is up to me to make sure I didn't forget to spend it. It is also up to me to make sure I don't try to spend it twice. With Obsidian, if I try to spend it twice, as soon as I hit the save button, it warns me that I am doing something I can't do. If I try to spend the money before I have it, the program will also warn me. If I have money and forget to use it, I'll get a warning, "You are in an area where you have money to spend." Obsidian makes it much easier to find the kinds of bugs that have plagued users previously.
In addition to reasoning about these things in a type-oriented way, in the future, we would like to think about enabling static analysis. We could actually write static-analysis code that runs in the complier to check to see whether there is any way that money can get stuck permanently in the contract.
A language would typically be unaware of the possibility of money getting stuck in a contract inside of a block; for this, we would need a domain-specific language: not a language you would use to program a video game, but a language designed specifically for this purpose that takes advantage of the attributes of the kinds of uses that blockchains are going to have potentially and that we already know they are going to have.
Status of Obsidian
Obsidian is currently an SEI research project and is not yet ready for use in production blockchain systems. We are assessing the language as we develop it to ensure that
- It is doing what we want it to do and keeping safe the things that we want the language to keep safe; and we can prove its safety.
- Programmers understand what the language is and how it can be used, and we have implemented the language so that it is usable by programmers. A key risk in developing any new programming language is that doing so requires changes on behalf of developers, which poses a problem in terms of adoption. It will be important to ensure that we show people how to use Obsidian properly.
- All implementation bugs have been identified and fixed.
We will report ongoing progress on our project website.
Looking Ahead: Next Steps
Our next step, currently in progress, is to find users and potential partners so that we can develop case studies of the language in use in real settings. We have a number of partners currently and are always interested in new opportunities to test Obsidian out in a real-world environment.
On a broader scale, we are also reevaluating the approach to designing programming languages in general: not just producing a language and informing people about how to use it, but actually trying to justify the benefits of the language and justify the design process that we are using to design the language in a scientific way. This additional objective makes language development slower than in a traditional context because of the methodical way in which we are designing the language.
We are also interested in understanding the breadth of application for Obsidian for people who may have tried unsuccessfully to program applications of blockchain with existing languages. We are interested in learning from the community about such boundary or edge cases for Obsidian that we may not have considered. We are especially interested in particular use cases that people are working on, situations where they are writing some contract and are interested in whether Obsidian might provide a better way of writing their software. What we learn from the community will continue to inform some typing constructs and other things that we haven't dealt with yet as development continues.
To catch as many coded bugs as possible, we are reaching out to the developer community to learn what kind of bugs they are encountering in existing blockchain applications or prototype blockchain code. The more code we can see that has bugs in it--which basically means any code around because every code has some bugs--the better we can develop Obsidian to prevent those bugs.
Our hope is that a language that helps people to avoid some of the problems that we have seen with blockchain, that ameliorates blockchain's weaknesses and takes advantages of its strengths, will help blockchain to survive and thrive. We welcome the opportunity to talk to any government agency or any sponsor interested in helping us to apply Obsidian in a pilot project. Please contact us if you'd like to chat!
View the Obsidian project website.
Read my earlier blog post, What Is Bitcoin? What Is Blockchain?
Read our IEEE article on Obsidian, Obsidian: A Safer Blockchain Programming Language.
Listen to the SEI Podcast, Obsidian: A Safer Blockchain Programming Language.