# **Crossing Guard**: Mediating Host-Accelerator Coherence Interactions



Lena E. Olson\*, Mark D. Hill, David A. Wood University of Wisconsin-Madison

\* Now at Google

ASPLOS 2017 April 10<sup>th</sup>, 2017



#### Accelerators are here!

- Complex, programmable accelerators increasingly prevalent
- Many applications: graphics, scientific computing, video encoding, machine learning, etc...
- Accelerators may benefit from cache coherent shared memory
- May be designed by third parties

#### However...



- Host coherence protocols may be proprietary and complex
- Bugs in accelerator implementations might crash host system!
- Crossing Guard: coherence interface to safely translate accelerator ↔ host protocol





# **Crossing Guard Goals**

When adding accelerators to host coherence protocol:

- 1. Allow accelerators customized caches
- 2. Simple, standardized accelerator coherence interface
- **3**. Guarantee **safety** for the host system

### 1. Why Customize Caches?

- CPU caches have to work with most types of workloads
- Accelerators may only run some workloads!
  - Optimize caches for likely data access patterns
  - Number of levels, writeback vs. writethrough, MSI vs VI, etc.



### 2. Why Simple, Standardized Interface?

Host systems speak different protocols...

- Expensive to redesign for each one!
  - Intel, AMD, ARM, IBM, Oracle...
  - CCIX shows industry cares!



# 2. Why Simple, Standardized Interface?

L1 controller from gem5's MOESI\_hammer

#### **Events**

|                        | lfetch                            | Load                        | Store                                         | Invalidate                 | Other           | Other                 | Merged          | Other                    |                        | Shared             | Data                                   | Shared Data                       | Exclusive                                  |                  | Writeback         | Allacks                             | All acks no                            | L2                                        | L1 to      | Trigger L2                                    | Trigger L2                             |                 |
|------------------------|-----------------------------------|-----------------------------|-----------------------------------------------|----------------------------|-----------------|-----------------------|-----------------|--------------------------|------------------------|--------------------|----------------------------------------|-----------------------------------|--------------------------------------------|------------------|-------------------|-------------------------------------|----------------------------------------|-------------------------------------------|------------|-----------------------------------------------|----------------------------------------|-----------------|
| -                      |                                   |                             |                                               | Invalidate                 | GETS only       | GETS                  | GETS            | GETX                     |                        | Ack                | Data                                   | Stared Data                       | Data                                       | Ack              | Nack              | Allacks                             | sharers                                | Replacement                               | <u>L2</u>  | to L1D                                        | to L11                                 | to L1           |
| 1                      | <u>i a Uim Um</u><br>k / IS       | <u>liaudmum</u><br>k/IS     | <u>l i b udm um k</u><br>/ IM                 | <u>fl</u>                  | <u>e</u>        | <u>f1</u>             |                 | <u>e</u>                 |                        |                    |                                        |                                   |                                            |                  |                   |                                     |                                        |                                           |            |                                               |                                        |                 |
| <u>s</u>               | h uih k                           | h udh k                     | i b udm um k /<br>SM                          | f ce gr 1/I                | <u>n</u>        | <u>n</u>              |                 | <u>f ce gr 1/ 1</u>      |                        |                    |                                        |                                   |                                            |                  |                   |                                     |                                        | <u>cc r ka / I</u>                        | igv<br>lis | i <b>r i</b> fu s <b>z</b> ll/<br>ST          | i <b>rj</b> fuszll/<br>ST              |                 |
| <u>0</u>               | <u>h uih k</u>                    | <u>h udh k</u>              | i b <b>u</b> dm <b>u</b> m k /<br>OM          | <u>e ce gr1/1</u>          | <u>e 1</u>      | <u>e l</u>            | em l            | <u>e ce gr1/I</u>        |                        |                    |                                        |                                   |                                            |                  |                   |                                     |                                        | <u>i d <b>c</b>c <b>r</b> ka / 0</u>      | igv<br>lis | irifuszll/<br><u>OT</u>                       | i <b>rj</b> fuszll/<br><u>OT</u>       |                 |
| м                      | <u>h uih k</u>                    | <u>h udh k</u>              | <u>h.udh.k</u> / <u>MM</u>                    | <u>c cc grl/1</u>          | <u>e1/0</u>     | <u>e1/0</u>           | <u>em 1/ 0</u>  | <u>e <b>c</b>e gr1/1</u> |                        |                    |                                        |                                   |                                            |                  |                   |                                     |                                        | <u>i d ce r ka /</u><br><u>Mi</u>         | igv<br>lis | i <b>r l</b> fu s <b>z</b> ll /<br><u>MT</u>  | i <b>rj</b> fuszll/<br><u>MT</u>       |                 |
| <u>MM</u>              | <u>h uih k</u>                    | <u>h udh k</u>              | <u>h udh k</u>                                | <u>e <b>c</b>e gr1/1</u>   | <u>e1/0</u>     | <u>c cc grl</u><br>/1 | <u>em 1/ 0</u>  | <u>e <b>c</b>e gr1/1</u> |                        |                    |                                        |                                   |                                            |                  |                   |                                     |                                        | i d <b>c</b> e <b>r</b> ka /<br><u>MI</u> |            | i <b>r l</b> fu s <b>z</b> ll /<br><u>MMT</u> | i <b>r j</b> fu s <b>z</b> ll /<br>MMT |                 |
| IR                     | i a Uim Um                        | lia Udm Um                  | Libudmumk                                     | z                          | z               | z                     | z               | z                        |                        |                    |                                        |                                   |                                            |                  |                   |                                     |                                        |                                           | z          |                                               |                                        |                 |
| <u> </u>               | <u>k/IS</u><br>h uim uh k ka      | <u>k/15</u>                 | / <u>IM</u><br>i b <b>u</b> dm <b>u</b> m k / |                            |                 |                       | <u> </u>        | <u> </u>                 | ╞                      |                    |                                        |                                   |                                            |                  |                   |                                     |                                        |                                           |            |                                               |                                        |                 |
| <u>SR</u>              | <u>n um un к ка</u><br>/ <u>S</u> | ka/S                        | SM                                            | <u>z</u>                   | z               | <u>z</u>              | <u>z</u>        | <u>z</u>                 |                        |                    |                                        |                                   |                                            |                  |                   |                                     |                                        |                                           | z          |                                               |                                        |                 |
| OR                     | h <b>u</b> im <b>u</b> h k ka     |                             | i b <b>u</b> dm <b>u</b> m k /                |                            |                 |                       |                 |                          |                        |                    |                                        |                                   |                                            |                  | 1                 |                                     |                                        |                                           | -          |                                               |                                        |                 |
|                        | / <u>0</u>                        | <u>ka / O</u>               | <u>OM</u>                                     | <u>^</u>                   | <u>-</u>        | <u> </u>              | <u> </u>        | <u>-</u>                 |                        |                    |                                        |                                   |                                            | ļ                | <u> </u>          |                                     |                                        | ļ                                         | <u> </u>   |                                               |                                        |                 |
| MR                     | <u>h uim uh k ka</u><br>/ M       | <u>h udm uh k</u><br>ka / M | <u>h udm uh k</u><br>ka/MM                    | z                          | z               | z                     | z               | z                        |                        |                    |                                        |                                   |                                            |                  |                   |                                     |                                        |                                           | z          |                                               |                                        |                 |
| MMR                    | h uim uh k ka                     |                             | h udm uh k                                    |                            | i               |                       |                 |                          | 1                      | <u> </u>           |                                        |                                   |                                            |                  | 1                 |                                     |                                        |                                           |            |                                               |                                        | -               |
| PIPIR                  | / <u>MM</u>                       | <u>ka / MM</u>              | ka / MM                                       | z                          | <u>×</u>        | z.                    | <u>z</u>        | <u>×</u>                 |                        |                    |                                        |                                   |                                            |                  |                   |                                     |                                        |                                           | <u>×</u>   |                                               |                                        |                 |
| ш                      | z                                 | z                           | z                                             | сı                         | <u>a</u>        | a                     |                 | a                        | <u>m.o</u><br><u>n</u> |                    | <u>u m o n / ISM</u>                   |                                   | <u>u m o sx n kd /</u><br><u>MM</u> W      |                  |                   |                                     | <u>kk gm sxt s i</u><br><u>kd / MM</u> | z                                         | z          |                                               |                                        |                 |
| <u>SM</u>              | <u>h uih k</u>                    | <u>h udh k</u>              | z                                             | <u>f ec 1/ IM</u>          | <u>n</u>        | <u>r1</u>             |                 | <u>f ce 1/ IM</u>        | <u>m o</u><br><u>n</u> |                    | <u>v m o n / ISM</u>                   |                                   | <u>vmon/ISM</u>                            |                  |                   |                                     | <u>kk svi gm s j</u><br><u>kd / MM</u> | <u>z</u>                                  | z          |                                               |                                        |                 |
| <u>om</u>              | <u>h uih k</u>                    | <u>h udh k</u>              | z                                             | <u>e. ec 1</u> / <u>IM</u> | <u>e l</u>      | <u>e l</u>            | em l            | e_cc1/IM                 | <u>mo</u><br>n         |                    | <u>kk m o n</u>                        |                                   | <u>kk m o n</u>                            |                  |                   | <u>sxt gm s j kd</u><br>/ <u>MM</u> | sxt gm s j kd /<br>MM                  | z                                         | z          |                                               |                                        |                 |
| <u>ISM</u>             | <u>h uih k</u>                    | <u>h udh k</u>              | z                                             |                            |                 |                       |                 |                          | <u>m o</u><br><u>n</u> |                    | <u>kk m o n</u>                        |                                   | <u>kk m o n</u>                            |                  |                   |                                     | <u>sxt gm s j kd /</u><br><u>MM</u>    | z                                         | z          |                                               |                                        |                 |
| <u>M</u> <sup>W</sup>  | <u>h uih k</u>                    | <u>h udh k</u>              | <u>h udh k</u> /<br><u>MM</u> W               |                            |                 |                       |                 |                          | <u>m o</u><br>n        | <u>kk m o</u><br>n | <u>kk m o n</u>                        | <u>kk m o n</u>                   | <u>kk m o n</u>                            |                  |                   | <u>kk gm sjkd</u><br>/ <u>M</u>     | <u>gm s j kd / M</u>                   | z                                         | z          |                                               |                                        |                 |
| <u>MM</u> <sup>W</sup> | <u>h uih k</u>                    | <u>h udh k</u>              | <u>h udh k</u>                                |                            |                 |                       |                 |                          | <u>m o</u><br>n        | <u>kk m o</u><br>n | <u>kk m o n</u>                        | <u>kk m o n</u>                   | <u>kk m o n</u>                            |                  |                   | <u>kk gm sjkd</u><br>/ <u>MM</u>    | <u>gm s j kd / MM</u>                  | z                                         | z          |                                               |                                        |                 |
| <u>15</u>              | z                                 | z                           | z                                             | <u>EI</u>                  | а               | a                     |                 | а                        | m o<br>n               | <u>m ron</u>       | u <u>m o hx uo n</u><br>kd / <u>SS</u> | u r m o hx uo n<br>kd / <u>SS</u> | <u>umohxnkd</u><br>/ <u>M</u> <sup>W</sup> |                  |                   | <u>kk gsshi</u><br><u>kd/O</u>      | <u>kk gsshikd</u> ∕<br>Ω               | z                                         | z          |                                               |                                        |                 |
| <u>ss</u>              | <u>h uih k</u>                    | <u>h udh k</u>              | z                                             |                            |                 |                       |                 |                          | <u>mo</u><br>n         | <u>m ron</u>       | <u>kk m o n</u>                        | <u>kk mon</u>                     | <u>kk m o n</u>                            |                  |                   | <u>gs s j kd / S</u>                | <u>gs s i kd / S</u>                   | z                                         | z          |                                               |                                        |                 |
| <u>0</u>               | z                                 | <u>z</u>                    | Z                                             | <u>a 1/ II</u>             | <u>94 l</u>     | <u>sq l</u>           | <u>qm l</u>     | <u>91/1</u>              |                        |                    |                                        |                                   |                                            | <u>qslkd/I</u>   | <u>kk slkd/</u> [ |                                     |                                        | z                                         | z          |                                               |                                        |                 |
| MI                     | <u>z</u>                          | <u>z</u>                    | Z                                             | <u>a 1/ 11</u>             | <u>sq 1/ OI</u> | sa L/ OL              | <u>qm 1/ OI</u> | <u>a1/1</u>              |                        |                    |                                        |                                   |                                            |                  | <u>kk s1kd/[</u>  |                                     |                                        | <u>z</u>                                  | z          |                                               |                                        |                 |
| Ш                      |                                   | Z                           | Z                                             | <u>11</u>                  | <u>u</u>        | <u>[]</u>             |                 | <u>EI</u>                |                        |                    |                                        |                                   |                                            | <u>g sl kd/I</u> | <u>s1kd/1</u>     |                                     |                                        | z                                         | z          |                                               |                                        |                 |
| <u>ST</u>              |                                   | Z                           | <u>z</u>                                      | <u>z</u>                   | z               | z                     | <u>z</u>        | z                        |                        |                    |                                        |                                   |                                            |                  |                   |                                     |                                        | Z                                         | <u>z</u>   |                                               |                                        | <u>i kd</u> /   |
| <u>0</u>               | -                                 | Z                           | Z                                             | z                          | z               | z                     | z               | z                        |                        |                    |                                        |                                   |                                            |                  |                   |                                     |                                        | Z                                         | Z          |                                               |                                        | <u>i kd /</u>   |
| MT                     | -                                 | Z                           | Z                                             | <u>z</u>                   | <u>z</u>        | z                     | z               | <u>z</u>                 |                        |                    |                                        |                                   |                                            |                  |                   |                                     |                                        | <u>x</u>                                  | Z          |                                               |                                        | <u>i kd / 1</u> |
| MMT                    | Z                                 | Z                           | Z                                             | Z                          | Z               | <u>z</u>              | Z               | Z                        |                        |                    |                                        |                                   |                                            |                  |                   |                                     |                                        | Z                                         | Z          |                                               |                                        | i kd/           |

(Transition table in style of Sorin et al.)













#### **Crossing Guard**

Hardware translating between host and accelerator protocols



■ Set of accelerator ↔ host coherence messages (like an API)

#### **Crossing Guard Interface**

#### **Accelerator** → Host Requests

- GetS, GetM
- PutS, PutE, PutM

#### Host → Accelerator Requests

Invalidate

Host → Accelerator Responses

- DataS, DataE, DataM
- Writeback Ack

#### **Accelerator** → **Host Responses**

InvAck, Clean Writeback,
Dirty Writeback



- **Crossing Guard**
- Hides implementation details of host protocol
  - No counting acks, sending unblocks, handling races, etc.
- Moves protocol complexity into Crossing Guard hardware
  - Only implemented once per host system
  - By experts!



# **Experimental Implementation**

- Coherence controllers / protocols implemented in slicc
- Simulations using gem5
- Code and transition tables available online
  - http://research.cs.wisc.edu/multifacet/xguard/



### 1. Customize Caches 🗸

Designed + implemented two sample systems
Private Per-Core L1 at Accelerator



# 1. Customize Caches 🗸

#### Designed + implemented two sample systems

#### **Private L1s + Shared L2 at Accelerator**





| Controller                   | States | Transitions |
|------------------------------|--------|-------------|
| AMD Hammer-like Private \$\$ | 24     | 148         |

# 2. Simple, Standardized Interface $\checkmark$



- Implemented Crossing Guard controller for two host protocols
  - AMD Hammer-like Exclusive MOESI
  - Two-Level MESI Inclusive
- Modularity: Host and Accelerator protocol choice is completely independent











# Evaluation

- **I**. Does it provide coherence to correct accelerator?
- **II**. Does it provide safety to host?
- **III**. Does it allow high performance?



#### I. Correctness Testing

- Are coherence invariants are maintained when accelerator is acting correctly?
- How? Random tester
  - Store-Load pairs to random addresses
  - Check integrity of data
- Ran for 160 billion load/store pairs
- Local coverage: 100% states, 100% events, > 99% transitions

### II. Fuzz Testing

- Is host safety maintained when accelerator misbehaves?
- How? Replace accelerator cache with evil controller
  - Generates random coherence messages to random addresses
  - Desired outcome: No deadlocks / crashes
- Ran for 7 billion load/store pairs
- Local Coverage: 100% states, 100% events, > 99% transitions



**MESI** Inclusive host protocol

gem5-gpu



#### **Crossing Guard Summary**

- Provides simple, standardized interface to ease accelerator development
- Correctness when accelerator is correct
- **Host safety** when accelerator is incorrect
- Low performance overhead



# **Backup Follows**



#### Two-Level Accelerator Protocol (1)

#### **Private L1s + Shared L2 at Accelerator**



#### Two-Level Accelerator Protocol (2)



**L1 Controller** (M state contains dirty/clean bit)

|             | Load                    | <u>Store</u>               | <u>Replacement</u>         | <u>Invalidate</u> | <b>DataM</b>                 | <b>DataS</b>                     | Writeback Ack  |
|-------------|-------------------------|----------------------------|----------------------------|-------------------|------------------------------|----------------------------------|----------------|
| M           | <u>h q</u>              | <u>hh q</u>                | <u>j c v l</u> / <u>MI</u> | <u>dlm/I</u>      |                              |                                  |                |
| <u>S</u>    | <u>h q</u>              | <u>j b q</u> / <u>IM</u>   | <u>l/I</u>                 | <u>flm/I</u>      |                              |                                  |                |
| I           | <u>ijaq</u> / <u>IS</u> | <u>i j b q</u> / <u>IM</u> |                            | <u>f m</u>        |                              |                                  |                |
| <u>IS</u>   | <u>Z</u>                | <u>Z</u>                   | <u>Z</u>                   | <u>f m / IS I</u> |                              | <u>w u k xxlh n / S</u>          |                |
| IM          | <u>Z</u>                | <u>Z</u>                   | <u>Z</u>                   | <u>f m</u>        | <u>u k xxsh n</u> / <u>M</u> |                                  |                |
| MI          | <u>Z</u>                | <u>Z</u>                   | <u>Z</u>                   | <u>f m</u>        |                              |                                  | <u>k n / I</u> |
| <u>IS I</u> |                         | <u>Z</u>                   | <u>Z</u>                   | <u>f m</u>        |                              | <u>w u xxlh k l n</u> / <u>I</u> |                |

#### Two-Level Accelerator Protocol (3) L2 Controller (Coordinates Sharing among Accelerator L1s)

|            | get <u>M</u>                                       | getS                                       | putM                       | InvAck                          | <u>Writeback</u>                       | Inv                                    | <u>DataM</u>                               | DataS                                        | <u>WBAck</u>        | All Acks                                   | L2 Replacemen                 | t L2 Replacement Clean  |
|------------|----------------------------------------------------|--------------------------------------------|----------------------------|---------------------------------|----------------------------------------|----------------------------------------|--------------------------------------------|----------------------------------------------|---------------------|--------------------------------------------|-------------------------------|-------------------------|
| Ī          | <u>a r k greq1 / IM</u>                            | <u>b r m qreq1 / IS</u>                    |                            |                                 |                                        | <u>d qreqBC</u>                        |                                            |                                              |                     |                                            |                               |                         |
| <u>S</u>   | <u>t k i p <b>r</b>rr qreq1</u> / <u>SM</u>        | <u>m fr <b>r</b>rr uu h qreq1</u>          |                            |                                 |                                        | <u>t w i s qreqBC</u> / <u>SI</u>      |                                            |                                              |                     |                                            |                               | <u>twis</u> / <u>SR</u> |
| <u>MO</u>  | <u>e r g <b>r</b>rr uu h qreq1</u> / <u>M</u>      | <u>m fr <b>r</b>rr uu h qreq1</u> /<br>MOS |                            |                                 |                                        | <u>t w c s u qreqBC</u> /I             |                                            |                                              |                     |                                            | <u>t w c wm s</u> / <u>MR</u> | Ĺ                       |
| M          | <u>h k qreq1 / MM</u>                              | <u>h m qreq1 / MMOS</u>                    | n1p yo j qput1 / MO        |                                 |                                        | <u>t w h s qreqBC / MI</u>             |                                            |                                              |                     |                                            | <u>t w h s</u> / <u>MR</u>    |                         |
| MOS        | <u>t w k i p <b>r</b>rr qreq1</u> /<br><u>MOSM</u> | <u>m fr uu h <b>r</b>rr qreq1</u>          |                            |                                 |                                        | <u>t w i s qreqBC</u> /<br><u>MOSI</u> |                                            |                                              |                     |                                            | <u>twis</u> / <u>MOSR</u>     | <u>twis</u> / <u>ER</u> |
| IM         | <u>zz 1r</u>                                       | <u>zz 1r</u>                               |                            |                                 |                                        | <u>d qreqBC</u>                        | <u>n g <b>r</b>rr uu m qrspBC / M</u>      |                                              |                     |                                            | Z                             | <u>Z</u>                |
| IS         | <u>zz 1r</u>                                       | <u>m greq1</u>                             |                            |                                 |                                        | <u>d qreqBC</u>                        | <u>n f uu m <b>r</b>rr qrspBC</u> /<br>MOS | <u>n f uu m <b>r</b>rr qrspBC</u> / <u>S</u> |                     |                                            | Z                             | Z                       |
| <u>SM</u>  | <u>zz 1r</u>                                       | <u>zz 1r</u>                               |                            | <u>o p qrsp1</u>                |                                        | <u>zz BCr</u>                          |                                            |                                              |                     | <u>y u a qt</u> / <u>IM</u>                | <u>z</u>                      | <u>Z</u>                |
| <u>SI</u>  | <u>Z</u>                                           | Z                                          |                            | <u>o p qrsp1</u>                |                                        |                                        |                                            |                                              |                     | <u>d u qt</u> / <u>I</u>                   |                               |                         |
| <u>SR</u>  | <u>Z</u>                                           | <u>Z</u>                                   |                            | <u>o p qrsp1</u>                |                                        | <u>qreqBC / SI</u>                     |                                            |                                              |                     | <u>c ws qt</u> / <u>SRI</u>                |                               |                         |
| <u>SRI</u> |                                                    | <u>Z</u>                                   |                            |                                 |                                        | <u>d qreqBC</u>                        |                                            |                                              | <u>u qrspBC / I</u> |                                            |                               |                         |
| MR         |                                                    | <u>Z</u>                                   | <u>n1 tp j qput1 / MRi</u> | <u>hr qrsp1</u>                 |                                        | <u>qreqBC / MI</u>                     |                                            |                                              |                     |                                            |                               |                         |
| MRI        |                                                    | <u>Z</u>                                   |                            |                                 |                                        | <u>d qreqBC</u>                        |                                            |                                              | <u>u qrspBC / I</u> |                                            |                               |                         |
| MI         |                                                    | <u>Z</u>                                   | <u>n1 tp j qput1 / MIi</u> | <u>hr qrsp1</u>                 | <u>n t yo c u qrsp1 / I</u>            |                                        |                                            |                                              |                     |                                            |                               |                         |
| MOSI       |                                                    | <u>Z</u>                                   |                            | <u>o p qrsp1</u>                |                                        |                                        |                                            |                                              |                     | <u>c u qt / I</u>                          |                               |                         |
| MOSR       |                                                    | Z                                          |                            | <u>o p qrsp1</u>                |                                        | <u>qreqBC / MOSI</u>                   |                                            |                                              |                     | <u>c wm qt / MRI</u>                       |                               |                         |
| ER         |                                                    | <u>Z</u>                                   |                            | <u>o p qrsp1</u>                |                                        | <u>qreqBC / MOSI</u>                   |                                            |                                              |                     | <u>c we qt</u> / <u>MRI</u>                |                               |                         |
| <u>MM</u>  | <u>zz 1r</u>                                       | <u>zz 1r</u>                               | <u>n1p j qput1 / MMi</u>   | <u>hr qrsp1</u>                 | <u>nw g uu m qrsp1 / M</u>             | zz BCr                                 |                                            |                                              |                     |                                            | <u>Z</u>                      | <u>Z</u>                |
| MMOS       | <u>zz 1r</u>                                       | <u>zz 1r</u>                               | <u>n1p j qput1 / MMOSi</u> | <u>hr qrsp1</u>                 | <u>nw f uu m yo qrsp1</u> / <u>MOS</u> | <u>zz BCr</u>                          |                                            |                                              |                     |                                            | <u>Z</u>                      | Z                       |
| MOSM       | <u>zz 1r</u>                                       | <u>zz 1r</u>                               |                            | <u>o p qrsp1</u>                |                                        | <u>zz BCr</u>                          |                                            |                                              |                     | <u>g <b>r</b>rr y u uu m qt</u> / <u>1</u> | <u>4</u> <u>z</u>             | Z                       |
| MIi        | <u>Z</u>                                           | Z                                          |                            | <u>yo c u qrsp1 / I</u>         |                                        |                                        |                                            |                                              |                     |                                            |                               |                         |
| MRi        | <u>Z</u>                                           | Z                                          |                            | <u>yo c wm qrsp1 / MRI</u>      |                                        | <u>qreqBC / MIi</u>                    |                                            |                                              |                     |                                            |                               |                         |
| MMi        | <u>zz 1r</u>                                       | <u>zz 1r</u>                               |                            | <u>g uu m qrsp1 / M</u>         |                                        | <u>zz BCr</u>                          |                                            |                                              |                     |                                            | Z                             | Z                       |
| MMOS       | i <u>zz 1r</u>                                     | <u>zz 1r</u>                               |                            | <u>f uu m yo qrsp1</u> /<br>MOS |                                        | <u>zz BCr</u>                          |                                            |                                              |                     |                                            | <u>Z</u>                      | <u>Z</u>                |

# **Crossing Guard Invariants**

#### **Crossing Guard Guarantees to Host:**

- 1. Accelerator **requests** must be correct
  - a) Consistent with block stable state at accelerator
  - b) Consistent with block transient state at accelerator
- 2. Accelerator **responses** must be correct
  - a) Consistent with block stable state at accelerator
  - b) Consistent with block transient state at accelerator
  - c) Received within a reasonable time

( + Border Control Protections!)

# **Crossing Guard Variants**

- Full State Crossing Guard
  - Inclusive directory of accelerator state
  - + Places few restrictions on host protocol
  - + Can hide all errors
  - Requires tag + metadata storage for all blocks
- Transactional Crossing Guard
  - Stores only data for in-flight transactions
  - + Small storage
  - + Provides most safety properties
  - Requires some host tolerance



# Single-Level Cache

|        | A              | Accelerator Event | ts             | XG Requests       |       | XG Re | esponses |        |
|--------|----------------|-------------------|----------------|-------------------|-------|-------|----------|--------|
| States | Load           | Store             | Replacement    | Invalidate        | DataM | DataE | DataS    | WB Ack |
| Μ      | hit            | hit               | issue PutM / B | send Dirty WB / I | -     | -     | -        | -      |
| E      | hit            | hit / M           | issue PutE / B | send Clean WB / I | -     | -     | -        | -      |
| S      | hit            | issue GetM / B    | issue PutS / B | send InvAck / I   | -     | -     | -        | -      |
| I      | issue GetS / B | issue GetM / B    | -              | send InvAck       | -     | -     | -        | -      |
| В      | stall          | stall             | stall          | send InvAck       | / M   | / E   | / S      | / I    |



#### **Simulation Parameters**

|                                                |               |                       |                               |               | GPGPU               |              |
|------------------------------------------------|---------------|-----------------------|-------------------------------|---------------|---------------------|--------------|
|                                                |               |                       |                               | Cor           | es                  | 4            |
|                                                |               |                       |                               | GPU Fre       | equency             | 700 MHz      |
| CPU                                            |               |                       | GPGPU Caches (Hammer-like)    |               |                     |              |
| CPU                                            | J Cores       | 1                     |                               | Accel-side    | Host-side / 1-level | 2-level      |
| CPU F                                          | requency      | 3 GHz                 | L1                            | 16kB I and D  | 160kB               | 32kB         |
| Host Caches                                    |               |                       | L2                            | 128kB private | -                   | 512kB shared |
|                                                | Hammer-like   | MESI Inclusive        | GPGPU Caches (MESI Inclusive) |               |                     | ive)         |
| L1I & L1D                                      | 32kB each     | 32kB each             |                               | Accel-side    | Host-side / 1-level | 2-Level      |
| L2                                             | 128kB private | 512kB shared w/ GPGPU | L1                            | 32kB I and D  | 64kB                | 16kB         |
| •                                              |               | L2                    | -                             | -             | 192kB shared        |              |
| Table 4: CPU simulation configuration details. |               |                       | Cache-to-Cache Latency        |               |                     |              |
|                                                |               |                       |                               | Accelerator   | r L1 to L2          | 10 cycles    |
|                                                |               |                       |                               | Accelerator   | L2 to XG            | 200 cycles   |

| XG to Directory/Shared L2      | 10 cycles  |
|--------------------------------|------------|
| Accelerator to Host-side Cache | 210 cycles |
|                                |            |

Table 5: GPGPU simulation configuration details.



## Time Spent Simulating (Random)

| Configuration                | Time              |
|------------------------------|-------------------|
| XG Full + Hammer + 1 Level   | 5.28 years        |
| XG Full + Hamer + 2 Level    | 2.51 years        |
| XG Full + MESI Inc + 1 Level | 133 days          |
| XG Full + MESI Inc + 2 Level | 223 days          |
| XG Trans. + Hammer + 1 Level | 3.17 years        |
| XG Trans. + Hammer + 2 Level | 1.38 years        |
| XG Trans + Inc + 1 Level     | 90 days           |
| XG Trans + Inc + 2 Level     | 103 days          |
| TOTAL                        | <b>13.9 years</b> |



## Full Coverage %s (Random)

| Full State XG    | Single-level | Two-level |
|------------------|--------------|-----------|
| Hammer-like      | 99           | 99.8      |
| MESI Inclusive   | 100          | 99.4      |
| Transactional XG | Single-level | Two-level |
| Hammer-like      | 99.3         | 99.5      |
| MESI Inclusive   | 100          | 99.7      |



# Time Spent Simulating (Fuzz)

| Configuration                     | Time       |
|-----------------------------------|------------|
| XG Full + Hammer-like             | 1.62 years |
| XG Full + MESI Inclusive          | 287 days   |
| XG Transactional + Hammer-like    | 5.3 years  |
| XG Transactional + MESI Inclusive | 41 days    |
| Total                             | 7.82 years |



### Full Coverage %s (Fuzz)

| Full State Crossing Guard           | Fuzz Tester |
|-------------------------------------|-------------|
| Hammer-like                         | 99.3        |
| MESI Inclusive                      | 99.7        |
| Transactional Crossing Cuard        |             |
| <b>Transactional Crossing Guard</b> | Fuzz Tester |
| Hammer-like                         | 99.7        |

# **PutS Accelerator Messages**

- Why?
  - Some host protocols use them
  - Simplify management of Full State Crossing Guard
  - Cannot implement Transactional Crossing Guard + host protocol with PutS without them
- Bandwidth Impact
  - Carry no data
  - Only between accelerator cache  $\rightarrow$  Crossing Guard, not host system
  - ~1-4% of that bandwidth in experiments.
  - Could be reduced by setting a flag at Crossing Guard.



# Why not Model Checking?

- Model checking is useful! Industrial implementation of Crossing Guard would use.
- Academic tools have limitations  $\ensuremath{\mathfrak{S}}$ 
  - Benefit from symmetry, but Crossing Guard system asymmetric
  - May only work with one block in system
  - Substantial implementation overhead
- This work was a proof of concept
  - Random / Fuzz testing not perfect, but results suggestive.
  - Even models can have mistakes!



#### Performance: Hammer-like





#### **Performance: MESI Inclusive**





### Performance (Hammer-like)















