Learning from, Understanding, and Supporting DevOps Artifacts for Docker
Jordan Henkel, Christian Bird, Shuvendu K. Lahiri, and Thomas Reps
With the growing use of DevOps tools and frameworks, there is an increased need
for tools and techniques that support more than code.
The current state-of-the-art in static developer assistance for tools like Docker is limited
to shallow syntactic validation. We identify three core challenges in the realm
of learning from, understanding, and supporting developers writing DevOps
artifacts: (i) nested languages in DevOps artifacts, (ii) rule mining, and (iii)
the lack of semantic rule-based analysis.
To address these challenges we
introduce a toolset, binnacle, that enabled us to ingest 900,000 GitHub
repositories.
Focusing on Docker, we extracted approximately 178,000 unique Dockerfiles,
and also identified a Gold Set of Dockerfiles written by Docker experts. We addressed
challenge (i) by reducing the number of effectively uninterpretable nodes in our
ASTs by over 80% via a technique we call phased parsing. To address
challenge (ii), we introduced a novel rule-mining technique capable of
recovering two-thirds of the rules in a benchmark we curated. Through this
automated mining, we were able to recover 16 new rules that were not found
during manual rule collection. To address challenge (iii), we manually
collected a set of rules for Dockerfiles from commits to the files in the Gold
Set. These rules encapsulate best practices, avoid docker build failures, and
improve image size and build latency. We created an analyzer that used these
rules, and found that, on average, Dockerfiles on GitHub violated the rules
five times more frequently than the Dockerfiles in our Gold Set. We also
found that industrial Dockerfiles fared no better than those sourced from
GitHub.
The learned rules and analyzer in binnacle can be used to aid
developers in the IDE when creating Dockerfiles, and in a post-hoc fashion to
identify issues in, and to improve, existing Dockerfiles.
(Click here to access the paper:
PDF;
University of Wisconsin