Characterizing Malcode Evolution
Archit Gupta, Pavan Kuppili, Aditya Akella and Paul Barford
The diversity, sophistication and availability of malicious software (malcode) pose enormous challenges for securing networks and end hosts from attacks. In this paper, we analyze a large corpus of malcode meta data compiled over a period of 19 years. Our aim is to understand how malcode has evolved over the years and in particular how different instances of malcode relate to one another. We develop a novel graph pruning technique to establish the underlying relationships between different instances of malcode based on temporal information and key common phrases identified in the malcode descriptions. Our algorithm enables a range of possible inheritance structures, which we investigate through extensive manual validation. The resulting ``most likely'' malcode family trees show unique structure and diverse characteristics. We present an evaluation of gross characteristics of malcode evolution and also drill down on the details of the most interesting and potentially dangerous malcode families.
Our approach is not definitive and could be improved given better meta data. Nevertheless, it is our hope that this new perspective on malcode evolution will be of great help in the development of more effective defenses in the future.
Download this report (PDF)
Return to tech report index