Advancing Algorithms for File Deduplication Across Containers
• Presentation
Publisher
Software Engineering Institute
Topic or Tag
Abstract
To address limitations, we developed an automated container image minimization technology. This technology combined and improved on two minimization approaches: pruning (removing unnecessary files from single images) and deduplication (combining shared files across images into common layers). We focused on advancing the state-of-the-art in deduplication across container images.
To create this new technology, we developed an algorithm for file deduplication across a collection of container images that can reduce container image storage usage and update bandwidth
by up to 5–15% for multi-container deployments and by up to 10–30% for pruned container deployments. In our tests with real multi-container image systems, our algorithm deduplicates
100% of shared files and processes 10 images with 225,000 files in approximately 81 minutes.
This project focused on technology that supports the Open Container Initiative (OCI) standard because the DoD aims to avoid vendor lock-in and leverage OCI-compliant containers. Additionally, this project has the potential to accelerate the SEI’s impact by open sourcing minimization algorithms to gain wider interest and adoption from industry and the DoD community.