Cache performance of SPEC CPU2000

Cache Performance for SPEC CPU2000 Benchmarks

January 2002

Jason F. Cantin
Department of Electrical and Computer Engineering
1415 Engineering Drive
University of Wisconsin-Madison
Madison, WI 53706-1691
jcantin@ece.wisc.edu
http://www.jfred.org

Mark D. Hill
Department of Computer Science
1210 West Dayton Street
University of Wisconsin-Madison
Madison, WI 53706-1685
markhill@cs.wisc.edu
http://www.cs.wisc.edu/~markhill

http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data

Abstract

The SPEC CPU2000 benchmark suite (http://www.spec.org/osg/cpu2000) is a collection of 26 compute-intensive, non-trivial programs used to evaluate the performance of a computer's CPU, memory system, and compilers. The benchmarks in this suite were chosen to represent real-world applications, and thus exhibit a wide range of runtime behaviors. On this webpage, we present functional cache miss ratios and related statistics for the SPEC CPU2000 suite. In particular, split L1 cache sizes ranging from 1KB to 1MB with 64B blocks and associativities of 1, 2, 4, 8 and full. Most of this data was collected at the University of Wisconsin-Madison with the aid of the Simplescalar toolset (http://www.simplescalar.org).

Contents

Top of page

Methodology

All functional data was collected with simulators from the Alpha version of the Simplescalar toolset, version 3.0. These include sim-cache, sim-cheetah, and sim-outorder. Some of these simulators were modified for the task (for example, sim-cheetah was modified to handle programs longer than 2 billion instructions). For interval cache data, the simulators were modified to print stats every 100 million executed instructions. A combination of Perl and Tcsh scripts were used to launch, manage, and process the results of these simulations.

All benchmarks were compiled statically with heavy optimization for the Alpha AXP instruction set. Optimizations were targeted at the Alpha 21264 processor implementations, and include prefetches, square-root instructions, byte/word memory operations, and no-ops for alignment. All benchmarks were run to completion with all reference inputs, with three exceptions. Two of the data sets for Perl (253.perlbmk) required new processes to be spawned, which is not supported by the Simplescalar tools at this time. One of the data sets for 175.vpr does not produce correct results due to an undocumented bug in the Simplescalar emulation of the Alpha ISA. The benchmarks simulated execute over 7 trillion instructions for the reference input sets. Generating the functional L1 miss-ratio tables resulted in over 400 trillion simulated instructions. The total simulation load for all functional and timing-based simulations to be reported here totals 30 CPU-years.

All simulations were carried out on a combination of x86/Linux machines and Alpha/Tru64 servers in the University of Wisconsin-Madison's Computer Science Department and ECE Department. The majority of this load was managed by the Condor system, which distributed these jobs to vacant machines throughout the CS building.

All cache configurations simulated had 64B blocks for both L1 and L2 caches. All data reported here is for the LRU replacement policy, though data for other replacement policies was collected and may be placed on this site soon. This data does not include operating system effects, and caches were not flushed periodically nor on system calls.

Top of page

Benchmarks Simulated

We have collected data for all 26 benchmarks with reference inputs: 12 integer, 14 floating-point.

Benchmark Language Type Category In SPEC95?

164.gzip C Integer Compression No

175.vpr C Integer FPGA Circuit Placement and Routing No

176.gcc C Integer C Compiler Yes

181.mcf C Integer Combinatorial Optimization No

186.crafty C Integer Game Playing: Chess No

197.parser C Integer Word processing No

252.eon C++ Integer Computer Visualization No

253.perlbmk C Integer PERL Programming Language Yes

254.gap C Integer Group Theory, Interpreter No

255.vortex C Integer Object-oriented Database Yes

256.bzip2 C Integer Compression No

300.twolf C Integer Place and Route Simulator (CAE) No

168.wupwise Fortran77 Floating-Point Physics, Quantum Chromodynamics No

171.swim Fortran77 Floating-Point Shallow Water Modeling Yes

172.mgrid Fortran77 Floating-Point Multi-grid Solver: 3D Potential Field Yes

173.applu Fortran77 Floating-Point Parabolic/Elliptic Partial Diff. Eqns Yes

177.mesa C Floating-Point 3-D Graphics Library No

178.galgel Fortran90 Floating-Point Computational Fluid Dynamics No

179.art C Floating-Point Image Recognition / Neural Nets No

183.equake C Floating-Point Seismic Wave Propagation No

187.facerec Fortran90 Floating-Point Image Processing: Face Recognition No

188.ammp C Floating-Point Computational Chemistry No

189.lucas Fortran90 Floating-Point Number Theory / Primality Testing No

191.fma3d Fortran90 Floating-Point Finite-Element Crash Simulation No

200.sixtrack Fortran77 Floating-Point High Energy Nuclear Physics Accelerator Design No

301.apsi Fortran77 Floating-Point Meteorology: Pollutant Distribution Yes

Benchmark	Language	Type	Category	In SPEC95?
164.gzip	C	Integer	Compression	No
175.vpr	C	Integer	FPGA Circuit Placement and Routing	No
176.gcc	C	Integer	C Compiler	Yes
181.mcf	C	Integer	Combinatorial Optimization	No
186.crafty	C	Integer	Game Playing: Chess	No
197.parser	C	Integer	Word processing	No
252.eon	C++	Integer	Computer Visualization	No
253.perlbmk	C	Integer	PERL Programming Language	Yes
254.gap	C	Integer	Group Theory, Interpreter	No
255.vortex	C	Integer	Object-oriented Database	Yes
256.bzip2	C	Integer	Compression	No
300.twolf	C	Integer	Place and Route Simulator (CAE)	No
168.wupwise	Fortran77	Floating-Point	Physics, Quantum Chromodynamics	No
171.swim	Fortran77	Floating-Point	Shallow Water Modeling	Yes
172.mgrid	Fortran77	Floating-Point	Multi-grid Solver: 3D Potential Field	Yes
173.applu	Fortran77	Floating-Point	Parabolic/Elliptic Partial Diff. Eqns	Yes
177.mesa	C	Floating-Point	3-D Graphics Library	No
178.galgel	Fortran90	Floating-Point	Computational Fluid Dynamics	No
179.art	C	Floating-Point	Image Recognition / Neural Nets	No
183.equake	C	Floating-Point	Seismic Wave Propagation	No
187.facerec	Fortran90	Floating-Point	Image Processing: Face Recognition	No
188.ammp	C	Floating-Point	Computational Chemistry	No
189.lucas	Fortran90	Floating-Point	Number Theory / Primality Testing	No
191.fma3d	Fortran90	Floating-Point	Finite-Element Crash Simulation	No
200.sixtrack	Fortran77	Floating-Point	High Energy Nuclear Physics Accelerator Design	No
301.apsi	Fortran77	Floating-Point	Meteorology: Pollutant Distribution	Yes

Top of page

Summary Data

The following summary information is independant of cache size and associativity simulated. The third and fifth columns refer to the ratio of instruction fetches and data references that are to a unique 64B block (i.e., the data was not obtained in the last cache access). For example, one instruction cache access returns a block of 16 instructions, many of which may be executed before a different block must be accessed (typically 10 for these benchmarks). This data was obtained by simulating caches with a single 64B block.

Benchmark Instructions I-Access/Inst Data Refs D-Access/Ref Refs/Inst % User Inst

164.gzip 478,636,174,329 0.1138 142,700,878,428 0.6451 0.2981 99.9

175.vpr 84,125,622,844 0.1164 37,067,564,576 0.7811 0.4406 99.8

176.gcc 243,597,914,726 0.1295 116,093,336,744 0.6229 0.4766 98.9

181.mcf 61,870,158,860 0.1579 23,056,352,854 0.6465 0.3727 99.9

186.crafty 191,882,992,412 0.0747 70,222,383,696 0.1291 0.3660 99.9

197.parser 546,769,649,600 0.1243 190,517,359,797 0.6119 0.3484 99.9

252.eon 239,768,148,508 0.1070 118,246,210,844 0.5680 0.4932 99.9

253.perlbmk 143,122,956,639 0.1163 61,829,661,201 0.5805 0.4320 99.9

254.gap 213,813,801,949 0.1198 80,924,423,445 0.6642 0.3785 99.9

255.vortex 390,700,613,872 0.1310 161,133,019,186 0.6672 0.4124 99.9

256.bzip2 377,370,326,800 0.1179 145,002,261,443 0.6787 0.3842 99.9

300.twolf 346,489,363,383 0.1025 111,857,479,345 0.7525 0.3228 99.9

168.wupwise 349,623,875,977 0.0938 107,613,170,820 0.5629 0.3078 99.9

171.swim 225,830,970,951 0.0685 74,341,437,755 0.9314 0.3292 99.9

172.mgrid 419,156,008,460 0.0638 153,909,315,484 0.9151 0.3672 99.9

173.applu 223,883,653,813 0.0641 85,459,068,028 0.7993 0.3817 99.9

177.mesa 281,775,068,600 0.1086 108,712,910,562 0.5886 0.3858 99.8

178.galgel 409,366,700,368 0.1008 178,742,879,478 0.8742 0.4366 99.9

179.art 86,834,976,688 0.1514 30,279,186,530 0.9089 0.3487 99.9

183.equake 131,518,705,120 0.0886 58,248,603,550 0.7544 0.4429 99.9

187.facerec 211,027,395,856 0.0857 66,872,909,521 0.6323 0.3169 99.9

188.ammp 326,549,217,724 0.0833 125,189,421,217 0.6624 0.3834 99.8

189.lucas 142,398,814,292 0.0707 31,507,111,538 0.7807 0.2213 99.9

191.fma3d 268,361,331,300 0.0797 118,043,791,674 0.7015 0.4399 99.9

200.sixtrack 470,950,788,817 0.0683 116,965,122,302 0.7646 0.2484 98.2

301.apsi 347,923,962,507 0.0798 129,508,475,036 0.7299 0.3722 99.8

Int Total 3,318,147,723,922 0.1189 1,258,650,931,559 0.6376 0.3793

Int Mean 276,512,310,327 0.1176 104,887,577,630 0.6123 0.3938

FP Total 3,895,201,470,473 0.0828 1,385,393,403,495 0.7560 0.3557

FP Mean 278,228,676,462 0.0862 98,956,671,679 0.7576 0.3559

Ovrl Total 7,213,349,194,415 0.1001 2,644,044,335,048 0.6968 0.3665

Ovrl Mean 277,436,507,478 0.1007 101,694,012,887 0.6905 0.3734

Benchmark	Instructions	I-Access/Inst	Data Refs	D-Access/Ref	Refs/Inst	% User Inst
164.gzip	478,636,174,329	0.1138	142,700,878,428	0.6451	0.2981	99.9
175.vpr	84,125,622,844	0.1164	37,067,564,576	0.7811	0.4406	99.8
176.gcc	243,597,914,726	0.1295	116,093,336,744	0.6229	0.4766	98.9
181.mcf	61,870,158,860	0.1579	23,056,352,854	0.6465	0.3727	99.9
186.crafty	191,882,992,412	0.0747	70,222,383,696	0.1291	0.3660	99.9
197.parser	546,769,649,600	0.1243	190,517,359,797	0.6119	0.3484	99.9
252.eon	239,768,148,508	0.1070	118,246,210,844	0.5680	0.4932	99.9
253.perlbmk	143,122,956,639	0.1163	61,829,661,201	0.5805	0.4320	99.9
254.gap	213,813,801,949	0.1198	80,924,423,445	0.6642	0.3785	99.9
255.vortex	390,700,613,872	0.1310	161,133,019,186	0.6672	0.4124	99.9
256.bzip2	377,370,326,800	0.1179	145,002,261,443	0.6787	0.3842	99.9
300.twolf	346,489,363,383	0.1025	111,857,479,345	0.7525	0.3228	99.9
168.wupwise	349,623,875,977	0.0938	107,613,170,820	0.5629	0.3078	99.9
171.swim	225,830,970,951	0.0685	74,341,437,755	0.9314	0.3292	99.9
172.mgrid	419,156,008,460	0.0638	153,909,315,484	0.9151	0.3672	99.9
173.applu	223,883,653,813	0.0641	85,459,068,028	0.7993	0.3817	99.9
177.mesa	281,775,068,600	0.1086	108,712,910,562	0.5886	0.3858	99.8
178.galgel	409,366,700,368	0.1008	178,742,879,478	0.8742	0.4366	99.9
179.art	86,834,976,688	0.1514	30,279,186,530	0.9089	0.3487	99.9
183.equake	131,518,705,120	0.0886	58,248,603,550	0.7544	0.4429	99.9
187.facerec	211,027,395,856	0.0857	66,872,909,521	0.6323	0.3169	99.9
188.ammp	326,549,217,724	0.0833	125,189,421,217	0.6624	0.3834	99.8
189.lucas	142,398,814,292	0.0707	31,507,111,538	0.7807	0.2213	99.9
191.fma3d	268,361,331,300	0.0797	118,043,791,674	0.7015	0.4399	99.9
200.sixtrack	470,950,788,817	0.0683	116,965,122,302	0.7646	0.2484	98.2
301.apsi	347,923,962,507	0.0798	129,508,475,036	0.7299	0.3722	99.8
Int Total	3,318,147,723,922	0.1189	1,258,650,931,559	0.6376	0.3793
Int Mean	276,512,310,327	0.1176	104,887,577,630	0.6123	0.3938
FP Total	3,895,201,470,473	0.0828	1,385,393,403,495	0.7560	0.3557
FP Mean	278,228,676,462	0.0862	98,956,671,679	0.7576	0.3559
Ovrl Total	7,213,349,194,415	0.1001	2,644,044,335,048	0.6968	0.3665
Ovrl Mean	277,436,507,478	0.1007	101,694,012,887	0.6905	0.3734

Note: For columns that already contain ratios, the "Total" represents the sum of all the numerators divided by the sum of all the denominators, and the "Mean" represents the arithmetic means of the computed ratios.

Top of page

Table Format

All miss-ratio tables are in ASCII text format, generated with Perl scripts. They include the name of the file, the name of the benchmark, the command line for the benchmark, the number of instructions, the number of data references, miss-ratios (misses/reference) for a set of cache sizes and associativities, and compulsory miss rates. For each benchmark and data set, miss ratios are rounded to 8 decimal places. The computed arithmetic means for each benchmark are rounded to 7 digits, and the overall means are rounded to 6 digits. Miss ratios are reported for sizes of 1KB - 1MB, with associativities of 1-way, 2-way, 4-way, 8-way, and full. In all cases the block size was 64B's and the replacement policy was LRU. Compulsory miss-rates were measured as the miss-rate of a fully-associative 256MB cache with no flushing, and rounded to 12 places. Note that there is sufficient data to calculate the 3C's for the various configurations. See the example below (overall arithmetic mean for selected benchmarks)

--------------------------------------------------------------------------
|                    Block size: 64 bytes, Repl: LRU                     |
|------------------------------------------------------------------------|
|               Arithmetic Mean for Instruction References               |
|------------------------------------------------------------------------|
|       |                          Associativity                         |
| Size  |----------------------------------------------------------------|
|       |      1     |      2     |      4     |      8     |    full    |
|-------+------------+------------+------------+------------+------------|
|    1K | 0.040115-- | 0.038059-- | 0.038609-- | 0.038631-- | 0.038770-- |
|    2K | 0.028248-- | 0.026708-- | 0.026033-- | 0.026023-- | 0.026006-- |
|    4K | 0.019655-- | 0.017775-- | 0.017586-- | 0.017514-- | 0.017421-- |
|    8K | 0.013024-- | 0.011229-- | 0.010171-- | 0.010013-- | 0.009931-- |
|   16K | 0.007394-- | 0.004766-- | 0.003666-- | 0.003405-- | 0.004296-- |
|   32K | 0.003237-- | 0.001233-- | 0.000651-- | 0.000388-- | 0.000239-- |
|   64K | 0.001060-- | 0.000360-- | 0.000127-- | 0.000049-- | 0.000016-- |
|  128K | 0.000454-- | 0.000148-- | 0.000014-- | 0.000004-- | 0.000002-- |
|  256K | 0.000090-- | 0.000011-- | 0.000002-- | 0.000001-- | 0.000001-- |
|  512K | 0.000009-- | 0.000003-- | 0.000001-- | 0.000000-- | 0.000001-- |
| 1024K | 0.000000-- | 0.000000-- | 0.000000-- | 0.000000-- | 0.000000-- |
--------------------------------------------------------------------------
Compulsory: 0.0000000416--


--------------------------------------------------------------------------
|                    Block size: 64 bytes, Repl: LRU                     |
|------------------------------------------------------------------------|
|                  Arithmetic Mean for Data References                   |
|------------------------------------------------------------------------|
|       |                          Associativity                         |
| Size  |----------------------------------------------------------------|
|       |      1     |      2     |      4     |      8     |    full    |
|-------+------------+------------+------------+------------+------------|
|    1K | 0.275311-- | 0.232072-- | 0.207868-- | 0.191097-- | 0.185660-- |
|    2K | 0.191787-- | 0.155995-- | 0.137516-- | 0.123602-- | 0.115772-- |
|    4K | 0.145548-- | 0.114026-- | 0.105337-- | 0.094777-- | 0.089413-- |
|    8K | 0.106719-- | 0.085133-- | 0.078486-- | 0.074013-- | 0.069963-- |
|   16K | 0.082798-- | 0.067679-- | 0.064007-- | 0.061553-- | 0.059314-- |
|   32K | 0.069504-- | 0.056942-- | 0.055286-- | 0.053659-- | 0.052217-- |
|   64K | 0.060102-- | 0.052060-- | 0.050989-- | 0.049836-- | 0.048541-- |
|  128K | 0.051134-- | 0.048766-- | 0.048341-- | 0.046895-- | 0.045834-- |
|  256K | 0.046695-- | 0.044774-- | 0.044566-- | 0.044497-- | 0.043546-- |
|  512K | 0.041238-- | 0.040808-- | 0.041690-- | 0.041878-- | 0.040885-- |
| 1024K | 0.033697-- | 0.032618-- | 0.033644-- | 0.034391-- | 0.034436-- |
--------------------------------------------------------------------------
Compulsory: 0.0000293378--

For example, for a 4KB direct-mapped L1 data cache with 64-Byte blocks, approximately 146 out of every 1,000 data references miss. When neglecting the operating system, 29 out of every 1,000,000 data references cause a compulsory miss.

Top of page

Miss Ratio Tables

First-level Cache Miss Ratio Tables:
- Complete archive (http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data/tables/miss-tables.tar.gz)
- Arithmetic mean of all selected benchmarks
- Arithmetic mean of integer benchmarks (selected SPECint)
- Arithmetic mean of floating-point benchmarks (selected SPECfp)
- 164.gzip
  - 164.gzip1.tab (Source code tar-file)
  - 164.gzip2.tab (Webserver log)
  - 164.gzip3.tab (Large TIFF image)
  - 164.gzip4.tab (Random data)
  - 164.gzip5.tab (Program binary)
  - 164.gzip-ave.tab
- 175.vpr
  - 175.vpr1.tab (placement of "clma" from MCNC benchmark suite --not reported, error in Simplescalar)
  - 175.vpr2.tab (routing "clma" from MCNC benchmark suite)
- 176.gcc
  - 176.gcc1.tab (Preprocessed source from a SPECint2000 candidate)
  - 176.gcc2.tab (Preprocessed source from SPECfp2000 200.sixtrack)
  - 176.gcc3.tab (Preprocessed expr.i from gcc source)
  - 176.gcc4.tab (Preprocessed integrate.i from gcc source)
  - 176.gcc5.tab (Preprocessed version of Scilab program)
  - 176.gcc-ave.tab
- 181.mcf
  - 181.mcf1.tab (Single-depot vehicle scheduling in public mass transportation)
- 186.crafty
  - 186.crafty1.tab (5 different chess board layouts, varying search depth)
- 197.parser
  - 197.parser1.tab (Syntactic analyses of a series of english sentences)
- 252.eon
  - 252.eon1.tab (150x150 pixel image of a chair in the corner of a room)
  - 252.eon2.tab (150x150 pixel image of a chair --different algorithm)
  - 252.eon3.tab (150x150 pixel image of a chair --another different algorithm)
  - 252.eon-ave.tab
- 253.perlbmk
  - 253.perlbmk1.tab (Specdiff applied to email)
  - 253.perlbmk2.tab (Finding perfect numbers --not reported, crashes Simplescalar)
  - 253.perlbmk3.tab (Testing pseudo random numbers --not reported, crashes Simplescalar)
  - 253.perlbmk4.tab (Converting Email to HTML)
  - 253.perlbmk5.tab (Converting Email to HTML)
  - 253.perlbmk6.tab (Converting Email to HTML)
  - 253.perlbmk7.tab (Converting Email to HTML)
  - 253.perlbmk-ave.tab
- 254.gap
  - 254.gap1.tab (Comb. funcs, big #'s, finite fields, lattice computations, normalizers, ag-groups)
- 255.vortex
  - 255.vortex1.tab (Building and manipulating 3 different databases)
  - 255.vortex2.tab (Building and manipulating 3 different databases)
  - 255.vortex3.tab (Building and manipulating 3 different databases)
  - 255.vortex-ave.tab
- 256.bzip2
  - 256.bzip21.tab (Source tar file)
  - 256.bzip22.tab (A large TIFF image)
  - 256.bzip23.tab (A program binary)
  - 256.bzip2-ave.tab
- 300.twolf
  - 300.twolf1.tab (Structured circuit from MCNC benchmark suite)
- 168.wupwise
  - 168.wupwise1.tab (a problem in lattice gauge theory)
- 171.swim
  - 171.swim1.tab (Large 1335x1335 array over 512 timesteps)
- 172.mgrid
  - 172.mgrid1.tab (Single, constant coefficient equation on uniform cubical grid)
- 173.applu
  - 173.applu1.tab (Large mesh over many timesteps)
- 177.mesa
  - 177.mesa1.tab (Creating a 3D object from a 2D scalar field)
- 178.galgel
  - 178.galgel1.tab (Convective flow in a rectangular box filled with liquid)
- 179.art
  - 179.art1.tab (Finding helicopter and airplane in a thermal image)
  - 179.art2.tab (Finding helicopter and airplane in a thermal image)
  - 179.art-ave.tab
- 183.equake
  - 183.equake1.tab (1994 Northridge Earthquake aftershock in California)
- 187.facerec
  - 187.facerec1.tab (Album of 42 faces, 84 images in probe gallery)
- 188.ammp
  - 188.ammp1.tab (Tracking movement of atoms)
- 189.lucas
  - 189.lucas1.tab (Lucas-Lehmer test for primality of Mersenne numbers 2^p-1)
- 191.fma3d
  - 191.fma3d1.tab (Impulsive load applied to cylindrical panel)
- 200.sixtrack
  - 200.sixtrack1.tab (60 particles in a Large Hadron Collider)
- 301.apsi
  - 301.apsi1.tab (112x112x112 area array of data over 70 timesteps)
Second-level Cache Miss ratio tables: --Coming soon.
Miss ratios for intervals of 100 million instructions (Under Construction):
- Tabular data for split 64K L1 caches (http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data/tables/splitl1cache64K_tables.tar.gz)
- Tabular data for a unified 1MB L2 cache (http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data/tables/ul2cache1MB_tables.tar.gz)
- Graphs for 64K L1 data cache (http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data/tables/dcache64K_graphs.tar.gz)
- Graphs for 64K L1 instruction cache (http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data/tables/icache64K_graphs.tar.gz)
- Graphs for 64K L2 unified cache (http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data/tables/ul2cache1MB_graphs.tar.gz)

Top of page

Experimental Error

The miss ratios were calculated from data collected by functional, user-mode simulations of optimized benchmarks. As a result, the cache miss ratios reported above may not be representative of a real platform. A few sources of error are discussed below.

First, only primary misses were counted by the simulator. Once a reference missed in the cache, the data was loaded and all subsequent accesses to the line hit. A modern processor may also experience secondary misses, or references to data that has yet to be loaded from a prior cache miss. There is a nonzero miss latency, and a real processor may execute other instructions while waiting for the data. The sequential model used in functional simulations is optimistic in this respect.

Second, a modern processor will have optimizations that affect cache performance. Hardware prefetching of instructions and data can have the positive effect of reducing the number of cache misses. However, prefetching can also cause cache pollution. Further, speculative execution can result in increased memory traffic for speculatively issued loads, and I-cache pollution from incorrect branch predictions. This also makes the results optimistic.

Third, the operating system was ignored. System calls cause additional cache misses to bring in OS code and data, and in doing so they replace cache lines from the user program. This increases the number of conflict and capacity misses for the user program in a real system. Since the additional misses from OS intervention were not modeled, our results are optimistic. One possibility is to flush the caches on system calls. However, this is the other extreme, and would have made it impossible to measure the compulsory miss rates.

Fourth, all prefetch instructions (loads to R31) were treated as normal references. All were executed, and references from prefetch instructions were included in the overall statistics. Although prefetch instructions may prevent (or reduce the impact of) cache misses from instructions in the original code, the misses still occur (just sooner). However, prefetch instructions increase the overall hit ratio because the subsequent loads and stores that hit in the cache add to the overall hit count. One possibility is to ignore prefetch instructions altogether (the Alpha ISA allows this). Another possibility is to count the misses from the prefetches, but not count them as instructions.

Fifth, the benchmarks were optimized for an Alpha 21264 processor. The binaries may have been tuned to perform well with the 21264 cache hierarchy (64K 2-way L1 caches). Ideally, the binary should not favor a particular cache configuration. Further, the binary contains no-ops for alignment and steering of dependant operations in the clustered microarchitecture of the 21264. These no-ops increase the overall instruction count for the functional simulation.

Top of page

Future Work

Comparison to 21164 DCPI data
Unified I & D Caches
Address traces
Impact of L2 cache sizes on IPC

Top of page

Related Work

SPEC CPU2000: Measuring CPU Performance in the New Millennium, John L. Henning. http://www.spec.org/osg/cpu2000/papers/COMPUTER_200007-abstract.JLH.html
Simulating SPEC CPU2000: Reduced input sets for SPEC CPU2000, research by the University of Minnesota. http://www.spec.org/osg/cpu2000/research/umn/
Measuring Cache and TLB Performance and Their Effect on Benchmark Run Times, R.H. Saavedra and A.J. Smith, IEEE Trans. on Computers, Vol. 44, No. 10, October 1995, pp. 1223-1235 (gzipped postscript)
Prefetching and memory system behavior of the SPEC95 benchmark suite, M.J. Charney and T.R. Puzak. http://www.research.ibm.com/journal/rd/413/charney.html
Cache Performance of the SPEC92 Benchmark Suite, Jeffrey D. Gee, Mark D. Hill, Dionisios N. Pnevmatikatos, and Alan Jay Smith. http://www.cs.wisc.edu/~markhill/spec92miss.html

Top of page

Acknowledgements

Cache numbers were generated with computing resources provided by Wisconsin Condor, Midship (NSF 144-GB67), and Multifacet (NSF EIA-9971256) projects.
Special thanks to David A. Patterson for advice in generating this data.
Jason F. Cantin is supported by a Peter Schneider Wisconsin Distinguished Graduate Fellowship

Top of page

Publications

Version 1.0 of this data appears in "Computer Architecture News", Vol. 29, No. 4 -September 2001 (ACM SIGARCH).
A subset of this data will appear in the third edition of John Hennessy and David Patterson's "Computer Architecture, A Quantitative Approach".

Top of page

Disclaimer

Data in this directory is correct to the best of our knowledge. However, we provide it, *AS IS* without an expressed or implied warranty, and we accept no responsibility for the consequences of the use or misuse of this data.

Top of page

Revision History

July 2001: Version 1.0 --First release
August 19, 2001: Version 1.1
- 4 benchmarks added (mesa, ammp, vortex, parser)
- Brief discussion of experimental error added
August 26, 2001: Version 1.2
- 3 benchmarks added (crafty, eon, bzip2)
September 3, 2001: Version 1.3
- galgel added
October 21, 2001: Version 1.4
- wupwise and apsi added
November 3, 2001: Version 1.5
- vpr added
- data for 1K and 2K L1 caches added
- 2 more digits of precision added
December 3, 2001: Version 1.6
- mgrid and sixtrack added
January 20, 2002: Version 2.0
- Last benchmarks added
- I & D references per access added

Top of page

Last updated January 2002, jfc. Report any dead links or errors to jfc.