Using Speculative Push to Reduce Communication Latencies in Critical Sections
Ravi Rajwar, Alsin Kagi, James Goodman
Communication latencies within critical sections constitute a major bottleneck in some classes of emerging parallel workloads. In this paper we propose a mechanism, Speculative Push, aimed at reducing this communication latency. Speculative push allows the cache controller; responding to a request for a cache line inferred to have a lock variable, to predict the data sets the requestor will access within the critical section. The controller then pushes these addresses from its own cache to the target cache in an exclusive state. It also writes back the data to memory. By overlapping the transfer of the protected data along with the transfer of the lock, the communication latencies within critical sections can be substantially reduced. By pushing data in exclusive state, the mechanism can collapse the read-modify-write sequences within a critical section into a local cache access. The write-back to memory gives the receiving cache the option to ignore the push. We make a case for the use of Inferentially Queued Locks (IQLs), not just for efficient synchronization but also for reducing communication latencies. With IQLs, the processor infers the existence, and limits, of a critical section from the use of synchronization instructions and joins a queue of lock requestors. The speculative push mechanism extracts information about program structure by observing IQLs. Neither of the mechanisms require any programmer or compiler support nor any instruction set changes. Our results demonstrate that for a set of benchmarks with high communication characteristics, IQLs are able to provide speedups when there is frequent synchronization. In each of the benchmarks we studied, the combination of IQLs and speculative push removed more than half of the processor's observed latency during critical sections.
Download this report (PDF)
Return to tech report index