Vagrant Box Wrangling

So, you're using Vagrant, and maybe you've even read my earlier post on it, but your Vagrant box doesn't have everything you need. Or maybe it has too much, and you need something simpler. For instance, do you find yourself installing or removing packages or fixing packages to specific versions to get parity with your production platform? Or maybe you need more extensive auditing over your environment, such as when you (or your customer) can't trust a third-party box vendor. Or you need a way to clone a virtual machine for parity with the production environment. What are your options? In this blog post, I will explain what a box file is and how you can have more control over your Vagrant workflow by creating your own box. I will also introduce Packer as a tool to create a Vagrant box, and I will finish with an example for managing Vagrant box versions and distributing updates in a team setting.

Why Custom?

Why would a development team want more control over its Vagrant boxes or want to create a custom box? This scenario may arise when a specific OS distribution or configuration is not available through the normal channels. There are various reasons for needing a custom box, such as if the virtual machine should be loaded with a special application "run" user, configured with specific yum mirrors, or have the firewall configured in a certain way.. Many customizations are simple to apply with a shell provisioner called from Vagrantfile or a Chef recipe, but many are not. The approach here is all about managing your environment and planning ahead. Distributing and maintaining a "company Vagrant box" that supports every team and project is cleaner than policing each project's Vagrantfile to be sure that customization scripts are updated regularly and applied consistently.. A network proxy illustrates this decision: Does your team have a Vagrant box with the company's proxy preconfigured, or does every Vagrantfile configure the proxy individually for each project?

What Exactly Is a Box? What Is Box metadata?

Before we go any further, let's understand what, exactly, a Vagrant box is and how your system uses it. A Vagrant box is literally just an archive containing a virtual machine configuration, a virtual disk, and some other metadata files. You can see this for yourself by viewing the contents of any .box file using the tar command in a Linux or Mac OSX terminal:

tar -tf any-box-file.box

When you add a box to your system with vagrant box add .. Vagrant not only copies a .box file to a special location on your hard drive, but also looks for additional metadata to extract and use. In fact, the canonical method for adding a box to inventory does not target the actual .box file directly at all, although it is capable of doing so. The natural target for the command vagrant box add is a JSON definition for the box, wherein the location of the .box file is stored and used to copy the .box file. The JSON definition also states the box name and a description field, along with listing the available versions of that box, each version potentially supporting multiple providers. Each provider section states the location of the .box file and a checksum for that version and provider. This is why, when adding a .box file directly (which does NOT contain this metadata), Vagrant requires the additional command line argument --name. (The other fields can be assumed with a default, and the box version is not supported in this case.) As a fun exercise, you can verify that adding Vagrant boxes from Atlas using the common box naming convention (e.g., hashicorp/precise64) actually downloads the metadata JSON, not a .box file. First, run vagrant box add <your favorite atlas box name> and look in the first few output lines:

==> box: Loading metadata for box 'hashicorp/precise64'
box: URL: https://atlas.hashicorp.com/hashicorp/precise64

Now, you can press Ctrl-C to exit this because we don't want Vagrant to actually add the box; we just want to see what it downloads. Take the target URL from the output of the box add command and fetch it directly using wget:

wget https://atlas.hashicorp.com/hashicorp/precise64

Look at the contents of the file that is saved; it is metadata JSON. Of course, the very first thing Vagrant does when it encounters this file is to search the JSON for the latest available version, find the URL of the .box file, and download it, but we are spared these details and only see the resulting Vagrant box being added to our environment.

Note that the metadata.json file contained in the .box file archive and the box definition metadata file, often itself named metadata.json, are entirely different, unrelated files. The file contained in the .box archive only states the provider for which the box was built and is inconsequential in our discussion of creating and handling custom boxes.

Below is an example of a box metadata file (named metadata.json) listing three different versions of the box. When adding such a metadata with vagrant box add .., Vagrant looks only at the latest available version, in this case 0.3.0 --Vagrant does not add all versions. A Vagrantfile behaves in the same way: for this example, setting the box value to "cert/centos7_x86_64" in Vagrantfile will use version 0.3.0 unless box_version is specified and set to an older version.

{
  "name": "cert/centos7_x86_64",
  "description": "CentOS 7 x86_64",
  "versions": [{
    "version": "0.1.0",
    "providers": [{
      "name": "virtualbox",
      "url": "http://example.org/CentOS-7.1.1503.el7.centos.2.8_GuestAdditions_4.3.30.box",
      "checksum_type": "sha1",      "checksum": ""
    }]
  },{
    "version": "0.2.0",
    "providers": [{
      "name": "virtualbox",
      "url": "http://example.org/CentOS-7.2.1511-3.10.0-GA_4.3.30.box",
      "checksum_type": "sha1",
      "checksum": ""
    }]
  },{
    "version": "0.3.0",
    "providers": [{
      "name": "virtualbox",
      "url": "http://example.org/CentOS-7.2.1511-3.10.0-GA_5.1.2.box",
      "checksum_type": "sha1",
      "checksum": ""
    }]
  }]
}

Creating Your Own Base Box

Now that we are a bit more familiar with what a Vagrant box is, let's talk about our options for customization. A Vagrant box can be obtained in one of at least four ways, and if we're talking about customization, the last two are the only real options:

adding it from Atlas (canonical, and likely what you're doing today)
downloading and adding the bare .box file from another provider/site
creating a .box file by hand (as described in Vagrant's documentation)
using Packer to create a .box file (as recommended in Vagrant's base box documentation)

My recommendation is to use Packer. Creating the .box file by hand is really just an exercise in installing an operating system and making rote modifications to it so that it can function as Vagrant box. While it is enlightening to read Vagrant's documentation to have an understanding of just what a Vagrant box requires, the advantages of using Packer to actually create the box are numerous. The most obvious advantage is that the box itself will have been generated by a repeatable script that can be shared with the team by way of a version control system.

Should you take the manual route and generate the .box file by hand, you will at some point be left with a virtual machine from which Vagrant packaged the .box file. It is important that you keep this virtual machine around and have it named well--should you want to modify the box, all you need to do is boot up that machine, make any adjustments, and re-package it.

Should you go with Packer, take note that assembling a Packer build is not trivial; however, there are many resources to get you started, and some even give you a finished product. Try searching on "Packer minimal vagrant box" at github.com.

A Little Bit about Packer

Many, if not all, Vagrant boxes available on Atlas are built using Packer. Think of Packer as a "Vagrant for Vagrant boxes." Vagrant and Packer perform similar tasks: both facilitate the operation of provisioning tools to customize a virtual machine. It is each tool's position in the development workflow that set them apart. While Vagrant uses a Vagrant box as a starting point for creating a virtual machine as a development platform, Packer creates that Vagrant box. While Vagrant needs a Vagrant box in a specific format to do it's job, Packer starts from nothing and builds just about anything. Enabling interoperability with provisioning tools such as Ansible, Chef, and Puppet, Packer can produce an Open Virtualization Format (OVF) file, a Docker image, an Amazon Machine Image (AMI), or a host of other artifacts. Generating a Vagrant box is just one option when running a Packer build, and actually a Vagrant box artifact is not even a primary artifact but the result of an otherwise optional post-processing step.

Like Vagrant with a Vagrantfile, Packer uses a human-readable configuration file that provides instructions (e.g., where to find the required ISO files at what size to create the virtual disk, and so on). This usage of a versionable script is in line with infrastructure-as-code principles and brings us closer to a fully-scripted infrastructure. As mentioned earlier, Packer also has interoperability with provisioners such as Ansible, Puppet, and Chef, which means a great deal of complexity can be baked into the final product, and the resulting artifacts could, in theory, be used to support each environment ("test," "production," etc.), providing near-perfect environment parity.

Box Distribution and Version Management

Let's assume that you found a solution that works for you, and that you have your customized box ready to go. You have tested it by adding it to your local Vagrant box list and spun up your project with vagrant up. Everything checks out, except that Roger still hasn't fixed that bug in the registration form. How is this box distributed to the team?

I will present two options for box distribution in a team setting that don't involve passing around a USB stick. Both options provide a clean path for future box updates. One uses Atlas as the central box repository, and one is more homespun, using a simple file share.

1. Box Distribution with Atlas

To use Atlas, you will need to sign up for a free account. The Atlas web interface provides controls to create a new box version with a provider and also to upload your .box file and release it. After a version of a box is released, everyone on the Internet will be able to use your box file with vagrant box add <your Atlas username>/<box name> or in Vagrantfiles. Updating is straightforward: Simply add a new version through the Atlas UI. There is an option to create private boxes; however, this is an enterprise feature and not part of the free plan. If privacy is a requirement, use the second method for box distribution instead.

2. Box Distribution using a Network Share

This option has all the elements of Atlas, but placed differently. Basically, box distribution and versioning requires

a .box file for each version
a team-accessible place on your organization's network (or the Internet) to store the .box files
a metadata declaration of your box and its available versions

With Atlas, all three of these requirements are covered by the Atlas platform. The only bit of "magic" is that Vagrant is configured internally to search Atlas when it is handed the name of a box, for instance, "hashicorp/ubuntu1404" being the "ubuntu1404" box maintained by the "hashicorp" Atlas account. With this homegrown solution, the .box files are located on a network share, and the metadata JSON box definition actually lives in source control for each project. That's right--directly in the root of each project lives a metadata.json file, similar to the example above (the one with three versions of the box). As the maintainer of the box, when you have a new version to push, you would do the following:

Create the new .box file, whether by packaging using vagrant or by running a Packer build.
Place the .box file on the network share.
Calculate the checksum.
Add a new version to metadata.json for each project that uses the box, including the network share path and checksum.
Commit and push the updated metadata.json file.
Notify the team.

As team members, everyone else only needs to add metadata.json once initially:

vagrant box add metadata.json

And then update when a new version is available:

vagrant box update

Essentially, the difference between this solution and Atlas is that instead of Vagrant pulling down the latest JSON box definition from Atlas and acting on it to download a .box file, Vagrant uses a local JSON box definition (the metadata.json file). In theory, you could set up a web server to serve the JSON box definition files and the workflow would be identical.

Conclusion

I hope this guide to understanding, creating, and managing Vagrant boxes was informative and demystifying. Creating and managing your own suite of boxes can be an educational and challenging endeavor, and I think for many developers it is in the spirit of respecting technology that we customize, tweak, and learn whenever possible. In the case of Vagrant boxes, we also have the opportunity to better align with the DevOps principles of infrastructure-as-code and making infrastructure and operations a first class citizen of our work.

Additional Resources

To view the webinar DevOps Panel Discussion featuring Kevin Fall, Hasan Yasar, and Joseph D. Yankel, please click here.

To view the webinar Culture Shock: Unlocking DevOps with Collaboration and Communication with Aaron Volkmann and Todd Waits please click here.

To view the webinar What DevOps is Not! with Hasan Yasar and C. Aaron Cois, please click here.

To listen to the podcast DevOps--Transform Development and Operations for Fast, Secure Deployments featuring Gene Kim and Julia Allen, please click here.

To read all of the blog posts in our DevOps series, please click here.

Software Engineering Institute

SEI Blog

Vagrant Box Wrangling

Tim Palko

October 14, 2016

PUBLISHED IN

CITE

TAGS

SHARE

Written By

Tim Palko

Author Page

Digital Library Publications

Send a Message

More By The Author

From Vagrant to Victory

March 15, 2016 • By Tim Palko

Monitoring in the DevOps Pipeline

December 16, 2015 • By Tim Palko

DevOps for Contractors

August 21, 2015 • By Tim Palko

The Missing Metrics of DevOps

May 29, 2015 • By Tim Palko

DevOps Technologies: Fabric or Ansible

March 20, 2015 • By Tim Palko

More In DevSecOps

Example Case: Using DevSecOps to Redefine Minimum Viable Product

March 11, 2024 • By Joe Yankel

Acquisition Archetypes Seen in the Wild, DevSecOps Edition: Clinging to the Old Ways

December 18, 2023 • By William E. Novak

Extending Agile and DevSecOps to Improve Efforts Tangential to Software Product Development

August 7, 2023 • By David Sweeney, Lyndsi A. Hughes

5 Challenges to Implementing DevSecOps and How to Overcome Them

June 12, 2023 • By Joe Yankel, Hasan Yasar

Actionable Data from the DevSecOps Pipeline

May 1, 2023 • By Bill Nichols, Julie B. Cohen