Serializing Instructions in System-Intensive Workloads: Amdahls Law Strikes Again
Philip Wells, Gurindar Sohi
To maintain a reasonable level of complexity, processor implementations contain Serializing Instructions (SIs) — instructions, such as those that write control registers, that cannot be executed out-of-order (OoO). Maintaining sequential semantics may force SIs to serialize the pipeline and execute as the only instruction in the window. We examine the frequency of SIs in three ISAs, SPARC V9, X86-64, and PowerPC, for several system-intensive workloads. Across ISAs, we observe 2–8 SIs per thousand instructions for most workloads. As explained by Amdahl’s Law, such frequent SIs, which create serial regions within the instruction-level parallel execution of a single thread, can have a significant impact on performance. For the SPARC ISA (after removing TLB and register window effects), we observe a 4–17% performance difference between a modest out-of-order processor and a hypothetical processor which idealizes serializing instructions. We examine the consumption of values produced by several SIs, and observe that most values are consumed, but that the values are Effectively Useless (EU) — i.e. they do not actually change the execution of the consuming instructions. To improve the performance of such SIs, we propose EU prediction, which can allow younger instructions to proceed, possibly reading a stale value, and yet still correctly execute. This simple technique improves the performance of five of our seven workloads by 8–12%.
Download this report (PDF)
Return to tech report index