Efficient Synchronization Primitives for Large-scale Cache-Coherent Multiprocessors
James R Goodman, Mary K Vernon and Philip J Woest
This paper proposes a set of efficient primitives for process synchronization in multiprocessors. The only assumptions made in developing the set of primitives are that hardware combining is not implemented in the interconnect, and (in one case) that the interconnect supports broadcast. The primitives make use of synchronization bits (syncbits) to provide a simple mechanism for mutual exclusion. The proposed implementation of the primitives includes efficient (i.e. local) busy-waiting for syncbits. In addition, a hardware-supported mechanism for maintaining a first-come first-serve queue of requests for a syncbit is proposed. This queuing mechanism allows for a very efficient implementation of, as well as fair access to, binary semaphores. We also propose to implement Fetch-and-Add with combining in software rather than hardware. This allows an architecture to scale to a large number of processors while avoiding the cost of hardware combining. Scenarios for common synchronization events such as work queues and barriers are presented to demonstrate the generality and ease of use of the proposed primitives. The efficient implementation of the primitives is simpler if the multiprocessor has a hardware cache-consistency protocol. To illustrate this point, we outline how the primitives would be implemented in the Multicube multiprocessor.
Download this report (PDF)
Return to tech report index