Multifacet's Transactional Memory implementation, introduced in "LogTM: Log-Based Transactional Memory" http://www.cs.wisc.edu/multifacet/papers/hpca06_logtm.pdf, was released as part of GEMS 1.2.

The current GEMS 2.0 was modeled after the LogTM-SE system described in "LogTM-SE: Decoupling Hardware Transactional Memory from Caches" http://www.cs.wisc.edu/multifacet/papers/hpca07_logtmse.pdf

Note: This implementation of TM is designed to operate on SPARC-based target machines only

1. TM Workload Setup

TM Workload Setup

2. Simulation Parameters

Note: For more in-depth information on the parameters we used for our ISCA '07 paper, "Performance Pathologies in Hardware Transactional Memory", see the function start_ruby() in $GEMS/gen-scripts/microbench.py.

In $GEMS/ruby/config/rubyconfig.defaults:

REMOVE_SINGLE_CYCLE_DCACHE_FAST_PATH: Have to set to true in order to use the MESI_CMP_filter_directory protocol.

NUMBER_OF_VIRTUAL_NETWORKS: Set to 5 in order to get the proper number of virtual channels for the on-chip interconnect used for MESI_CMP_filter_directory.

PROFILE_EXCEPTIONS: (true or false) Indicates whether debugging related to system calls and exceptions will be output

PROFILE_XACT: (true or false) Indicates whether additional LogTM debug will be output. See Profiler.C for examples.

PROFILE_NONXACT: (true or false) Indicates whether debug output for non-transactional memory requests will be turned on/off. See Profiler.C for examples.

XACT_DEBUG: (true or false) Indicates whether detailed LogTM debug output will be displayed, depending on the level set by XACT_DEBUG_LEVEL.

XACT_DEBUG_LEVEL: (1..3) Used in conjunction with XACT_DEBUG, defines amount of debugging output will be displayed. Higher levels yield more debugging output.

XACT_MEMORY: (true or false) Set to true for TM protocols. Must be set to true during protocol compilation when simulating a lazy system (e.g. any system with Lazy version management).

XACT_ENABLE_TOURMALINE: (true or false) Set to true to enable a perfect memory system, with no caching enabled. Useful for debugging a new transactional workload. Runs slightly faster than our directory protocol.

XACT_ISOLATION_CHECK: (true or false) Set to true to enable runtime checking of read/write set isolation. Also displays warnings if strong isolation has been breached. Has a performance penalty when enabled, though.

PERFECT_FILTER: (true or false) Set to true to simulate "Perfect" physical signatures, having no false positives. Also set READ_WRITE_FILTER to "Perfect_" to simulate "Perfect" signatures.

READ_WRITE_FILTER: (Perfect_ or MultiBitSel_ or H3_ ) See gen-scripts/config.py for examples of how to set this string for non-perfect signatures. Use "Perfect_" in conjunction with PERFECT_FILTER=true for "Perfect" signatures.

PERFECT_VIRTUAL_FILTER: (true or false) Set to true to simulate "Perfect" saved physical signatures, for use when virtualizing transactions. Also set VIRTUAL_READ_WRITE_FILTER to "Perfect_" to simulate "Perfect" saved physical signatures.

VIRTUAL_READ_WRITE_FILTER: (Perfect_ or MultiBitSel_ or H3_). Analogous to READ_WRITE_FILTER for valid configuration strings. See table below for valid combinations of READ_WRITE_FILTER, VIRTUAL_READ_WRITE_FILTER, and SUMMARY_READ_WRITE_FILTER.

PERFECT_SUMMARY_FILTER: (true or false) Set to true to simulate "Perfect" summary signatures, having no false positives. Also set SUMMARY_READ_WRITE_FILTER to "Perfect_" to simulate "Perfect" summary signatures.

SUMMARY_READ_WRITE_FILTER: (Perfect_ or MultiBitSel_ or H3_) Analogous to READ_WRITE_FILTER in valid configuration strings. See table below for valid combinations of READ_WRITE_FILTER, VIRTUAL_READ_WRITE_FILTER, and SUMMARY_READ_WRITE_FILTER.

Physical

Saved Physical

Summary

Perfect?

Non-Perfect?

Perfect?

Non-Perfect?

Perfect?

Non-Perfect?

Comments

X

X

X

X

X

X

X

X

X

Saved Physical & Summary must have the same configuration strings.

X

X

X

Physical & Saved Physical can use different configuration strings. However Saved Physical & Summary must use the same configuration strings.

X

X

X

X

X

X

XACT_EAGER_CD: (true or false) Setting to true enables eager conflict detection. Otherwise uses lazy conflict detection.

XACT_LAZY_VM: (true or false) Setting to true enables lazy version management via an infinite write buffer. Otherwise uses eager version management. Must also set XACT_MEMORY to true during protocol compilation, or else appropriate Simics callbacks will not get registered.

XACT_CONFLICT_RES: (BASE, TIMESTAMP, HYBRID, or CYCLE) Different policies of conflict resolution. BASE aborts the requestor whenever there's a conflict. TIMESTAMP aborts a requestor only if the requestor is younger than the conflicting processor. HYBRID aborts a younger transaction in favor of an older requesting transaction if the younger has the conflicting block in its read set only. CYCLE always indicates a conflict with other transactions, and is used to immediately trap to a software contention manager.

XACT_COMMIT_TOKEN_LATENCY: (>=0) Integer used to determine the latency of commit arbitration for the commit token for lazy systems.

XACT_VISUALIZER: (true or false) True dumps a transactional visualizer text file displaying a transactional workload's execution using character symbols for different phases of each thread's execution.

XACT_NO_BACKOFF: (true or false) True turns off exponential backoff after an abort in the software handler.

XACT_LOG_BUFFER_SIZE: (>=0) The size of a magical hardware log buffer. Used to perform 0-cycle restoration of memory values for any transaction having logs not exceeding this buffer size.

XACT_STORE_PREDICTOR_ENTRIES: (>=0) Part of the write-set predictor, and indicates how many entries should be used.

XACT_STORE_PREDICTOR_HISTORY: (>=0) Part of the write-set predictor, and indicates the history length of the predictor, for each entry.

XACT_STORE_PREDICTOR_THRESHOLD: (>=0) How many consecutive matching predictions before the write-set predictor is used, for each entry.

ENABLE_MAGIC_WAITING: (true or false) True indicates that an aborting transaction will retry its transaction at exactly the moment when the winning transaction commits its transaction. Useful when XACT_NO_BACKOFF is set to true.

XACT_ENABLE_VIRTUALIZATION_LOGTM_SE: (true or false) True indicates simulator-only LogTM-SE style virtualization will be used, by utilizing the saved physical and summary signatures for conflict detection.

3. Output Statistics

Each parameter is given as a histogram. Binsize gives the range of each bin. The first bin is 0 through binsize - 1 aborts, the second from binsize through (2 * binsize) - 1, and so forth. Max gives the largest value observed, count the total number of transactions executed (this will be the same across all paramters), average the mean of all values observed, and standard deviation the standard deviation of a random value relative to the mean.

xact_size_dist: Histogram of total transactional read+write cache lines in transaction

xact_instr_count: Histogram of number of instructions in transaction

xact_time_dist: Histogram of cycles elapsed between transaction begin and commit

xact_log_size_dist: Histogram of transaction's log size, in bytes

xact_read_set_size_dist: Histogram of transaction read set size, in number of cache lines

xact_write_set_size_dist: Histogram of transaction write set size, in number of cache lines

xact_overflow_read_lines_dist: Histogram of number of transactional read cache lines that get evicted from the L2 cache

xact_overflow_write_lines_dist: Histogram of number of transactional write cache lines that get evicted from the L2 cache

xact_overflow_read_set_size_dist: Histogram of total transaction read set size (in cache lines) in which at least one transactional read cache line is evicted from the L2

xact_overflow_write_set_size_dist: HIstogram of total transaction write set size (in cache lines) in which at least one transactional write cache line is evicted from the L2

xact_miss_load_dist: Histogram of number of transactional read cache misses

xact_miss_store_dist: Histogram of number of transactional write cache misses

xact_nacked: Total number of transactions that were nacked

xact_retries: Histogram of number of transaction aborts

xact_abort_delays: Histogram of number of cycles processing an abort

xact_aborts: Total number of transaction aborts

xact Nacks by XID: For each static XID, total number of nacks that were encountered

xact Nacks by XID Pairs: For all nacks, which XID pairs were involved in each nack instance

xact Nacks by PC: The program counters of the memory references involved in nacks

xact exceptions: The hex number of exceptions encountered inside transactions and their frequency

xact abort by XID: Breakdown of number of aborts by static XID

xact Aborts by PC: Breakdown of number of aborts by program counter

xact Aborts by Address: Breakdown of number of aborts by memory address

xact Commit Stats by XID: For each static XID, the total number of instances of that static XID, avg. cycles for each transaction, avg. instructions, avg. read set, avg. write set, avg. load misses, avg. store misses, avg. retries

xact read (write) set false positives: Total number of Read (Write) signature false positives

xact read (write) set matches: Total number of Read (Write) signature true positives

Read (Write) set false positives rate: Read (Write) signature false positive rate

Xact Signature Stats: For each static XID, number of read (write) set bits set in read (write) sigatures on commit & abort

Hash values distribution: Histogram for read & write hash values

xact cycle breakdown: For each processor, total number of cycles spent executing transactions, aborts, barriers, backoff, transactional stalls, executing non-transactions, stalling inside non-transactions, cycles spent timing events

4. LogTM Code Overview

See LogTM Code Overview

5. Protocol - MESI_CMP_filter_directory

The current release contains only this single-chip CMP directory protocol, and uses signatures for conflict detection. LogTM-like RW bits can be modeled by using "Perfect" signatures, as the prior LogTM SMP protocol has been deprecated in GEMS 2.0.

Details

This is a 2-level directory protocol, utilizing private separate L1 instruction and data caches, and a unified shared L2 cache. The L1 and L2 are inclusive, and the directory holds a full list of sharers or exclusive owner of each cached block. The protocol supports silent S-replacements (L1 replacements of clean data), but replacements of any other valid L1 block must inform the L2 through the PUTX command.

The Sticky-M (or Sticky-S) directory state is set whenever a transactionally modified (or read) block is replaced from the L1. This can occur for 2 cases: (1) We brought the cache block into the L1 in Modified (M) state or (2) We previously evicted this transactionally modified (or read) block and re-read it in the same transaction in Exclusive (E) state. Replacements of clean transactional blocks in Shared (S) state are correct because the directory will still forward invalidates to the correct transactional processor(s).

The L2 directory will selectively send out write or read+write signature checks depending on whether the state of the cache block in the system. Any non-cached block, for correctness, always has to check both read+write signatures. An interesting corner case for this protocol and signatures arises when an uncached block checks the signatures and detects more than one NACKer. This case can arise for both transactional load and store requests due to false positives in signatures. Since it is impossible to detect the true owner of a transactionally modified block for a NACKed request, the protocol can cache the NACKed block on-chip in the shared L2 cache, but subsequent requests need to check both read and write signatures to maintain correct isolation.

6. Xact Visualizer

7. Context switching & Paging

Note: The following is applicable only to Solaris running on top of a SPARC ISA. We do not have any details on how this is done for other OSes (Linux, etc) or ISAs (x86, etc). All of this functionality is implemented in ruby/simics/SimicsHypervisor.C/h

Context switching

Solaris stores a pointer to the current thread's software context structure in the %g7 register on user<->supervisor mode changes . To detect thread switches within a process, it is only necessary to detect changes to this pointer values on mode changes. In order to detect process switches, we detect changes to the Address Space Identifier (ASID) register. In Simics, this is stored as part of the MMU register state.

Paging

Paging is detected by keeping a duplicate address translation table in the simulator, and then detecting when existing mappings are invalidated or new entries are added. In Simics, this is done by catching callbacks to DTLB map/demap, overwrite, and replacement MMU events. A paging event occurs when the Physical Page Number (PPN) field differs from a previously stored PPN in our duplicate address translation table.

Transactional_Memory (last edited 2008-03-21 03:14:05 by JayaramBobba)