Metadata-Based Parallelization of Program Instrumentation
Matthew D. Allen and Gurindar S. Sohi
2007
Program instrumentation has a wide variety of useful applications, but tool writers must overcome the challenge of substantial overheads caused by introducing additional code and data into a program. This paper observes that instrumentation usually operates on many discrete, independent data structures, which we call metadata parallelism. We proposes to exploit this phenomenon to reduce the overhead of instrumented programs by executing instrumentation function invocations that manipulate different pieces of metadata simultaneously in different threads. The key challenge to spreading instrumentation function execution across many threads is ensuring that metadata updates occur in the correct order, and do not suffer from data races. Metadata-based parallelization solves this problem by using a user-specified mapping of instrumentation function invocations to serialization sets. The runtime ensures that metadata updates are handled correctly by executing all function invocations in a given serialization set in the same thread. It achieves concurrency by spreading different serialization sets across multiple threads. Metadata-based parallelization improves on previous techniques to reduce the overhead of program instrumentation of a broad class of dynamic monitoring tools, including those that measure common-case behavior, such as profilers, and those that check for anomalous behavior, such as debugging and testing tools. Our technique allows tool developers to leverage parallelism with a natural, intuitive programming interface, leaving the burden of correct synchronization of the parallelized execution to the instrumentation system. We have modified the EEL instrumentation system to support metadata-based parallelization, and we evaluate our prototype by comparing the performance of parallelized instrumentation on both multicore and SMP systems. We show that the fast communication provided by the multicore system is a key enabler for fine-grained parallelization, achieving speedups averaging 4.3X for value profiling and 2.9X for data dependence profiling using 8 additional thread contexts.
Download this report (PDF)
Return to tech report index
|