Cache performance of SPEC CPU2000

Cache Performance for SPEC CPU2000 Benchmarks

May 2003

Jason F. Cantin
Department of Electrical and Computer Engineering
1415 Engineering Drive
University of Wisconsin-Madison
Madison, WI 53706-1691
jcantin@ece.wisc.edu
http://www.jfred.org

Mark D. Hill
Department of Computer Science
1210 West Dayton Street
University of Wisconsin-Madison
Madison, WI 53706-1685
markhill@cs.wisc.edu
http://www.cs.wisc.edu/~markhill

http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data

Abstract

The SPEC CPU2000 benchmark suite (http://www.spec.org/osg/cpu2000) is a collection of 26 compute-intensive, non-trivial programs used to evaluate the performance of a computer's CPU, memory system, and compilers. The benchmarks in this suite were chosen to represent real-world applications, and thus exhibit a wide range of runtime behaviors. On this webpage, we present functional cache miss ratios and related statistics for the SPEC CPU2000 suite. In particular, L1 instruction, L1 data, and L1 unified caches ranging from 1KB to 1MB with 64B blocks and associativities of 1, 2, 4, 8 and full. Prefetch operations were always executed, but results are posted both with and without them counted in the hit ratios. Most of this data was collected at the University of Wisconsin-Madison with the aid of the Simplescalar toolset (http://www.simplescalar.org).

Contents

Top of page

Methodology

All functional data was collected with modified simulators from the Alpha version of the Simplescalar toolset, version 3.0. The simulators were modified to simulate multiple caches at once, and report statistics every 1 billion instructions. Further, we modified the Simplescalar code to correctly distinguish between binding loads, prefetch instructions made from loads to R31, and the universal NOP composed from an unaligned quadword load to R31. A combination of Perl and Tcsh scripts were used to launch, manage, and process the results of these simulations.

All benchmarks were compiled statically with heavy optimization for the Alpha AXP instruction set. Optimizations were targeted at the Alpha 21264 implementations, and include prefetches, square-root instructions, byte/word memory operations, and no-ops for alignment. They do not contain profile-directed optimizations. All benchmarks were run to completion with all reference inputs, with three exceptions. Two of the data sets for Perl (253.perlbmk) required new processes to be spawned, which is not supported by the Simplescalar tools. One of the data sets for 175.vpr does not produce correct results due to an undocumented bug in the Simplescalar emulation of the Alpha ISA. The benchmarks simulated comprise over 7 trillion dynamic instructions for the reference input sets. Generating the functional L1 miss-ratio tables resulted in over 400 trillion simulated instructions. The total simulation load for all functional simulations to be reported here totals 7.6 CPU-years.

All caches simulated here used the LRU replacement policy. This data does not include operating system effects, and caches were not flushed on system calls. Thus, actual miss-rates will be higher than those reported here. See the section on experimental error below.

Top of page

Benchmarks Simulated

We have collected data for all 26 benchmarks with reference inputs: 12 integer, 14 floating-point. See the table below (Table 1).

Benchmark Language Type Category Inputs % User Time

164.gzip C Integer Compression 5 99.9

175.vpr C Integer FPGA Circuit Placement and Routing 2 99.8

176.gcc C Integer C Compiler 5 98.9

181.mcf C Integer Combinatorial Optimization 1 99.9

186.crafty C Integer Game Playing: Chess 1 99.9

197.parser C Integer Word processing 1 99.9

252.eon C++ Integer Computer Visualization 3 99.9

253.perlbmk C Integer PERL Programming Language 7 99.9

254.gap C Integer Group Theory, Interpreter 1 99.9

255.vortex C Integer Object-oriented Database 3 99.9

256.bzip2 C Integer Compression 3 99.9

300.twolf C Integer Place and Route Simulator (CAE) 1 99.9

168.wupwise Fortran77 Float Physics, Quantum Chromodynamics 1 99.9

171.swim Fortran77 Float Shallow Water Modeling 1 99.9

172.mgrid Fortran77 Float Multi-grid Solver: 3D Potential Field 1 99.9

173.applu Fortran77 Float Parabolic/Elliptic Partial Diff. Eqns 1 99.9

177.mesa C Float 3-D Graphics Library 1 99.8

178.galgel Fortran90 Float Computational Fluid Dynamics 1 99.9

179.art C Float Image Recognition / Neural Nets 2 99.9

183.equake C Float Seismic Wave Propagation 1 99.9

187.facerec Fortran90 Float Image Processing: Face Recognition 1 99.9

188.ammp C Float Computational Chemistry 1 99.8

189.lucas Fortran90 Float Number Theory / Primality Testing 1 99.9

191.fma3d Fortran90 Float Finite-Element Crash Simulation 1 99.9

200.sixtrack Fortran77 Float High Energy Nuclear Physics Accelerator Design 1 98.2

301.apsi Fortran77 Float Meteorology: Pollutant Distribution 1 99.8

Table 1

Benchmark	Language	Type	Category	Inputs	% User Time
164.gzip	C	Integer	Compression	5	99.9
175.vpr	C	Integer	FPGA Circuit Placement and Routing	2	99.8
176.gcc	C	Integer	C Compiler	5	98.9
181.mcf	C	Integer	Combinatorial Optimization	1	99.9
186.crafty	C	Integer	Game Playing: Chess	1	99.9
197.parser	C	Integer	Word processing	1	99.9
252.eon	C++	Integer	Computer Visualization	3	99.9
253.perlbmk	C	Integer	PERL Programming Language	7	99.9
254.gap	C	Integer	Group Theory, Interpreter	1	99.9
255.vortex	C	Integer	Object-oriented Database	3	99.9
256.bzip2	C	Integer	Compression	3	99.9
300.twolf	C	Integer	Place and Route Simulator (CAE)	1	99.9
168.wupwise	Fortran77	Float	Physics, Quantum Chromodynamics	1	99.9
171.swim	Fortran77	Float	Shallow Water Modeling	1	99.9
172.mgrid	Fortran77	Float	Multi-grid Solver: 3D Potential Field	1	99.9
173.applu	Fortran77	Float	Parabolic/Elliptic Partial Diff. Eqns	1	99.9
177.mesa	C	Float	3-D Graphics Library	1	99.8
178.galgel	Fortran90	Float	Computational Fluid Dynamics	1	99.9
179.art	C	Float	Image Recognition / Neural Nets	2	99.9
183.equake	C	Float	Seismic Wave Propagation	1	99.9
187.facerec	Fortran90	Float	Image Processing: Face Recognition	1	99.9
188.ammp	C	Float	Computational Chemistry	1	99.8
189.lucas	Fortran90	Float	Number Theory / Primality Testing	1	99.9
191.fma3d	Fortran90	Float	Finite-Element Crash Simulation	1	99.9
200.sixtrack	Fortran77	Float	High Energy Nuclear Physics Accelerator Design	1	98.2
301.apsi	Fortran77	Float	Meteorology: Pollutant Distribution	1	99.8

Top of page

Summary Data

The summary information below (Table 2) is independent of cache size and associativity. The prefetches are included in the counts/rates of loads and stores, since they were made from loads to R31. Means for each of the counts are arithmetic, while means for the per-instruction rates are harmonic. The integer and floating-point benchmarks are averaged separately, and then those means are averaged for the overall mean. This gives equal weight to each benchmark in a set, and equal weighting of integer and floating-point data.

Benchmark Instructions Loads/Stores Load/Store Rate Prefetches Prefetch Rate

164.gzip 478,636,153,300 106,339,774,183 0.222172 210,749,035 0.000440

175.vpr 84,068,682,462 34,290,404,520 0.407886 28,489,513 0.000339

176.gcc 242,873,996,717 76,347,070,522 0.314348 3,223,614,266 0.013273

181.mcf 61,867,398,195 21,224,890,959 0.343071 660,286,828 0.010673

186.crafty 191,882,992,463 63,216,831,368 0.329455 502,573 0.000003

197.parser 546,749,971,166 157,370,546,967 0.287829 7,148,027 0.000013

252.eon 239,766,530,848 107,716,995,277 0.449258 3,956,650,303 0.016502

253.perlbmk 407,741,332,530 178,536,860,186 0.437868 70,854,377 0.000174

254.gap 213,814,233,713 68,971,552,540 0.322577 24,459,785 0.000114

255.vortex 390,698,783,316 153,384,434,949 0.392590 1,975,965,046 0.005058

256.bzip2 377,370,320,254 133,004,113,614 0.352450 133,248,276 0.000353

300.twolf 346,484,742,706 97,313,583,371 0.280860 23,200,957 0.000067

168.wupwise 349,623,881,589 94,914,535,141 0.271476 8,264,448,086 0.023638

171.swim 225,830,975,667 74,301,504,811 0.329014 10,310,829,246 0.045657

172.mgrid 419,156,007,205 153,695,716,464 0.366679 9,254,489,893 0.022079

173.applu 223,883,656,387 85,299,162,334 0.380998 1,961,452,391 0.008761

177.mesa 281,694,536,771 100,831,728,471 0.357947 350,031,487 0.001243

178.galgel 409,366,708,755 177,448,399,787 0.433471 58,409,867,463 0.142683

179.art 86,831,419,686 28,608,040,144 0.329466 6,045,927,514 0.069628

183.equake 131,518,590,672 55,938,011,470 0.425324 126,096,577 0.000959

187.facerec 211,027,958,400 63,321,687,774 0.300063 1,832,311,610 0.008683

188.ammp 326,548,871,460 109,074,260,280 0.334021 1,455,221,518 0.004456

189.lucas 142,398,816,802 30,986,842,904 0.217606 2,134,334,895 0.014988

191.fma3d 268,370,708,343 115,831,207,504 0.431609 1,349,475,397 0.005028

200.sixtrack 470,949,157,978 108,391,446,800 0.230155 10,556,998,171 0.022416

301.apsi 347,924,060,406 124,663,297,530 0.358306 5,381,960,075 0.015469

Int Total 3,581,955,137,670 1,197,717,058,456 10,315,168,986

Int Mean 298,496,261,472 99,809,754,871 0.332290 859,597,415 0.000024

FP Total 3,895,125,350,121 1,323,305,841,414 117,433,444,323

FP Mean 278,223,239,294 94,521,845,815 0.326063 8,388,103,165 0.004987

Ovrl Total 7,477,080,487,791 2,521,022,899,870 127,748,613,309

Ovrl Mean 288,359,750,383 97,165,800,343 0.329147 4,623,850,290 0.000048

Table 2

Benchmark	Instructions	Loads/Stores	Load/Store Rate	Prefetches	Prefetch Rate
164.gzip	478,636,153,300	106,339,774,183	0.222172	210,749,035	0.000440
175.vpr	84,068,682,462	34,290,404,520	0.407886	28,489,513	0.000339
176.gcc	242,873,996,717	76,347,070,522	0.314348	3,223,614,266	0.013273
181.mcf	61,867,398,195	21,224,890,959	0.343071	660,286,828	0.010673
186.crafty	191,882,992,463	63,216,831,368	0.329455	502,573	0.000003
197.parser	546,749,971,166	157,370,546,967	0.287829	7,148,027	0.000013
252.eon	239,766,530,848	107,716,995,277	0.449258	3,956,650,303	0.016502
253.perlbmk	407,741,332,530	178,536,860,186	0.437868	70,854,377	0.000174
254.gap	213,814,233,713	68,971,552,540	0.322577	24,459,785	0.000114
255.vortex	390,698,783,316	153,384,434,949	0.392590	1,975,965,046	0.005058
256.bzip2	377,370,320,254	133,004,113,614	0.352450	133,248,276	0.000353
300.twolf	346,484,742,706	97,313,583,371	0.280860	23,200,957	0.000067
168.wupwise	349,623,881,589	94,914,535,141	0.271476	8,264,448,086	0.023638
171.swim	225,830,975,667	74,301,504,811	0.329014	10,310,829,246	0.045657
172.mgrid	419,156,007,205	153,695,716,464	0.366679	9,254,489,893	0.022079
173.applu	223,883,656,387	85,299,162,334	0.380998	1,961,452,391	0.008761
177.mesa	281,694,536,771	100,831,728,471	0.357947	350,031,487	0.001243
178.galgel	409,366,708,755	177,448,399,787	0.433471	58,409,867,463	0.142683
179.art	86,831,419,686	28,608,040,144	0.329466	6,045,927,514	0.069628
183.equake	131,518,590,672	55,938,011,470	0.425324	126,096,577	0.000959
187.facerec	211,027,958,400	63,321,687,774	0.300063	1,832,311,610	0.008683
188.ammp	326,548,871,460	109,074,260,280	0.334021	1,455,221,518	0.004456
189.lucas	142,398,816,802	30,986,842,904	0.217606	2,134,334,895	0.014988
191.fma3d	268,370,708,343	115,831,207,504	0.431609	1,349,475,397	0.005028
200.sixtrack	470,949,157,978	108,391,446,800	0.230155	10,556,998,171	0.022416
301.apsi	347,924,060,406	124,663,297,530	0.358306	5,381,960,075	0.015469
Int Total	3,581,955,137,670	1,197,717,058,456		10,315,168,986
Int Mean	298,496,261,472	99,809,754,871	0.332290	859,597,415	0.000024
FP Total	3,895,125,350,121	1,323,305,841,414		117,433,444,323
FP Mean	278,223,239,294	94,521,845,815	0.326063	8,388,103,165	0.004987
Ovrl Total	7,477,080,487,791	2,521,022,899,870		127,748,613,309
Ovrl Mean	288,359,750,383	97,165,800,343	0.329147	4,623,850,290	0.000048

The table below (Table 3) shows how often a new block must be obtained from the cache to satisfy the request. For example, one instruction cache access returns a block of 16 instructions, many of which may be executed before a different block must be obtained (typically 10 for these benchmarks).

This data was obtained by simulating caches with a single 64-Byte block, and may be more meaningful for implementations that can buffer blocks or merge requests. The means are computed in the same way as Table 2 above.

Benchmark I$-Access I$-Access Rate D$-Access D$-Access Rate U$-Access U$-Access Rate

164.gzip 54,470,768,520 0.113804 76,846,749,520 0.160554 264,464,152,198 0.552537

175.vpr 9,783,417,461 0.116374 26,450,333,116 0.314628 75,711,030,995 0.900585

176.gcc 31,450,635,056 0.129494 43,842,754,917 0.180516 181,733,952,330 0.748264

181.mcf 9,766,961,907 0.157869 13,989,421,988 0.226119 52,061,791,386 0.841506

186.crafty 21,974,406,397 0.114520 47,422,033,028 0.247140 145,207,265,223 0.756749

197.parser 67,980,418,633 0.124335 99,595,662,954 0.182159 373,494,790,264 0.683118

252.eon 25,652,922,924 0.106991 62,785,515,113 0.261861 235,089,422,313 0.980493

253.perlbmk 47,401,063,428 0.116253 105,176,984,033 0.257950 395,433,450,861 0.969814

254.gap 25,606,586,268 0.119761 49,105,988,033 0.229667 161,143,617,345 0.753662

255.vortex 51,181,968,365 0.131001 103,480,838,369 0.264861 348,800,683,706 0.892761

256.bzip2 45,086,528,330 0.119476 79,924,432,801 0.211793 302,917,293,906 0.802706

300.twolf 35,524,277,102 0.102528 73,552,792,605 0.212283 222,877,780,050 0.643254

168.wupwise 32,803,956,134 0.093826 54,126,522,525 0.154814 216,266,923,190 0.618570

171.swim 15,463,978,992 0.068476 69,220,294,011 0.306514 159,885,506,394 0.707987

172.mgrid 26,737,536,917 0.063789 140,740,025,878 0.335770 324,975,820,808 0.775310

173.applu 14,351,426,465 0.064102 68,246,688,564 0.304831 180,277,378,414 0.805228

177.mesa 30,602,545,093 0.108637 60,630,249,848 0.215234 225,901,437,732 0.801938

178.galgel 41,243,445,388 0.100749 155,587,844,182 0.380070 389,597,173,029 0.951707

179.art 13,144,486,961 0.151379 26,652,374,302 0.306944 69,084,709,620 0.795619

183.equake 11,658,423,445 0.088645 43,425,718,299 0.330187 119,186,908,018 0.906236

187.facerec 18,076,470,614 0.085659 40,091,054,768 0.189980 142,125,671,080 0.673492

188.ammp 27,201,946,160 0.083301 76,187,755,090 0.233312 241,195,708,949 0.738621

189.lucas 10,063,843,673 0.070674 24,380,575,286 0.171213 70,125,610,574 0.492459

191.fma3d 21,390,682,566 0.079706 81,415,531,600 0.303370 246,070,861,667 0.916907

200.sixtrack 32,179,807,823 0.068330 85,712,543,482 0.182000 240,948,906,996 0.511624

301.apsi 27,762,425,232 0.079794 92,471,252,932 0.265780 270,809,995,354 0.778359

Int Total 425,879,954,391 782,173,506,477 2,758,935,230,577

Int Mean 35,489,996,199 0.119680 65,181,125,539 0.221556 229,911,269,214 0.772763

FP Total 322,680,975,463 1,018,888,430,767 2,896,452,611,825

FP Mean 23,048,641,104 0.081825 72,777,745,054 0.243528 206,889,472,273 0.720977

Ovrl Total 748,560,929,854 1,801,061,937,244 5,655,387,842,402

Ovrl Mean 224,464,297,747 0.097197 427,475,625,765 0.232023 1,482,912,351,425 0.745973

Table 3

Benchmark	I$-Access	I$-Access Rate	D$-Access	D$-Access Rate	U$-Access	U$-Access Rate
164.gzip	54,470,768,520	0.113804	76,846,749,520	0.160554	264,464,152,198	0.552537
175.vpr	9,783,417,461	0.116374	26,450,333,116	0.314628	75,711,030,995	0.900585
176.gcc	31,450,635,056	0.129494	43,842,754,917	0.180516	181,733,952,330	0.748264
181.mcf	9,766,961,907	0.157869	13,989,421,988	0.226119	52,061,791,386	0.841506
186.crafty	21,974,406,397	0.114520	47,422,033,028	0.247140	145,207,265,223	0.756749
197.parser	67,980,418,633	0.124335	99,595,662,954	0.182159	373,494,790,264	0.683118
252.eon	25,652,922,924	0.106991	62,785,515,113	0.261861	235,089,422,313	0.980493
253.perlbmk	47,401,063,428	0.116253	105,176,984,033	0.257950	395,433,450,861	0.969814
254.gap	25,606,586,268	0.119761	49,105,988,033	0.229667	161,143,617,345	0.753662
255.vortex	51,181,968,365	0.131001	103,480,838,369	0.264861	348,800,683,706	0.892761
256.bzip2	45,086,528,330	0.119476	79,924,432,801	0.211793	302,917,293,906	0.802706
300.twolf	35,524,277,102	0.102528	73,552,792,605	0.212283	222,877,780,050	0.643254
168.wupwise	32,803,956,134	0.093826	54,126,522,525	0.154814	216,266,923,190	0.618570
171.swim	15,463,978,992	0.068476	69,220,294,011	0.306514	159,885,506,394	0.707987
172.mgrid	26,737,536,917	0.063789	140,740,025,878	0.335770	324,975,820,808	0.775310
173.applu	14,351,426,465	0.064102	68,246,688,564	0.304831	180,277,378,414	0.805228
177.mesa	30,602,545,093	0.108637	60,630,249,848	0.215234	225,901,437,732	0.801938
178.galgel	41,243,445,388	0.100749	155,587,844,182	0.380070	389,597,173,029	0.951707
179.art	13,144,486,961	0.151379	26,652,374,302	0.306944	69,084,709,620	0.795619
183.equake	11,658,423,445	0.088645	43,425,718,299	0.330187	119,186,908,018	0.906236
187.facerec	18,076,470,614	0.085659	40,091,054,768	0.189980	142,125,671,080	0.673492
188.ammp	27,201,946,160	0.083301	76,187,755,090	0.233312	241,195,708,949	0.738621
189.lucas	10,063,843,673	0.070674	24,380,575,286	0.171213	70,125,610,574	0.492459
191.fma3d	21,390,682,566	0.079706	81,415,531,600	0.303370	246,070,861,667	0.916907
200.sixtrack	32,179,807,823	0.068330	85,712,543,482	0.182000	240,948,906,996	0.511624
301.apsi	27,762,425,232	0.079794	92,471,252,932	0.265780	270,809,995,354	0.778359
Int Total	425,879,954,391		782,173,506,477		2,758,935,230,577
Int Mean	35,489,996,199	0.119680	65,181,125,539	0.221556	229,911,269,214	0.772763
FP Total	322,680,975,463		1,018,888,430,767		2,896,452,611,825
FP Mean	23,048,641,104	0.081825	72,777,745,054	0.243528	206,889,472,273	0.720977
Ovrl Total	748,560,929,854		1,801,061,937,244		5,655,387,842,402
Ovrl Mean	224,464,297,747	0.097197	427,475,625,765	0.232023	1,482,912,351,425	0.745973

Top of page

Table Format

All miss-ratio tables (.tab files) are ASCII text. They include the name and command line for each benchmark; the number of instructions, data references, data prefetches, and the miss-ratios (misses/instruction, rounded to 9 places) for cache sizes ranging from 1KB to 1MB, and associativities of 1, 2, 4, 8, and full. The tables also contain compulsory miss-rates and access-rates. Both arithmetic means and harmonic means were computed for the data sets of each benchmark (rounded to 8 places). The means for each benchmark were then averaged together, and rounded to 7 places. In all cases the block size was 64B and the replacement policy was LRU (least recently used). Compulsory miss-rates were measured as the miss-rate of a 2-way set-associative 256MB cache with no flushing on system calls (rounded to 12 places). Access-rates were measured as the miss-rate for a direct-mapped, 64B cache --having just one block. Note that there is sufficient data to calculate the 3C's for the various configurations. See the example below

-----------------------------------------------------------------------------
| U-cache misses/inst: 584,975,927,483 unified refs (1.231289-/inst);         |
|-----------------------------------------------------------------------------|
| 264,464,152,198 U-cache 64-Byte block accesses (0.560264-/inst)             |
|-----------------------------------------------------------------------------|
|  Size |   Direct    |  2-way LRU  |  4-way LRU  |  8-way LRU  |  Full LRU   |
|-------+-------------+-------------+-------------+-------------+-------------|
|   1KB | 0.17096965- | 0.11586859- | 0.10006949- | 0.09539379- | 0.09356626- |
|   2KB | 0.09933510- | 0.07301168- | 0.06116419- | 0.05846171- | 0.06111943- |
|   4KB | 0.06756154- | 0.04373756- | 0.03517036- | 0.02650259- | 0.02410843- |
|   8KB | 0.05398704- | 0.02824148- | 0.02123935- | 0.02024346- | 0.01982071- |
|  16KB | 0.03316842- | 0.02309782- | 0.01727542- | 0.01709368- | 0.01694758- |
|  32KB | 0.02622252- | 0.01814185- | 0.01381846- | 0.01369148- | 0.01354134- |
|  64KB | 0.01397891- | 0.01160836- | 0.00835915- | 0.00821407- | 0.00807335- |
| 128KB | 0.00583968- | 0.00267375- | 0.00189210- | 0.00172267- | 0.00151421- |
| 256KB | 0.00343062- | 0.00054402- | 0.00040227- | 0.00038742- | 0.00038681- |
| 512KB | 0.00198606- | 0.00033332- | 0.00027255- | 0.00026623- | 0.00026589- |
|   1MB | 0.00193081- | 0.00026416- | 0.00026161- | 0.00026133- | 0.00026133- |
 -----------------------------------------------------------------------------
 Compulsory: 0.00001698143-

In this example (164.gzip), a 32KB 2-way set-associative L1 unified cache with 64-Byte blocks has approximately 18 cache misses per 1,000 instructions.

The tables of miss-ratios are organized into a set of files. For a given number of simulated instructions, there is one file for each benchmark-dataset combination, two files for the arithmetic and harmonic means of all the datasets for each benchmark, and six files for arithmetic and harmonic means of all the benchmarks (results for just the integer benchmarks and just the floating point benchmarks are provided). Each of the files contains seven tables. The first is for instruction caches; the second, third, and fourth for data caches; and the fifth, sixth, and seventh for unified caches. For the data caches and unified caches, the first table contains the miss ratios for all of the associated memory references, while the second does not count references or misses caused by prefetch operations (they do, however, affect cache state), and the third has statistics for only the prefetch operations.

Top of page

Miss Ratio Tables

First-level Cache Miss Ratio Tables:
- Complete archive (http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data/new_tables/miss-tables.tar.gz)
- All benchmarks
  - Arithmetic Mean: 1B, 2B, 5B, 10B, Full
  - Harmonic Mean: 1B, 2B, 5B, 10B, Full
- Integer benchmarks
  - Arithmetic Mean: 1B, 2B, 5B, 10B, Full
  - Harmonic Mean: 1B, 2B, 5B, 10B, Full
- Floating-Point benchmarks
  - Arithmetic Mean: 1B, 2B, 5B, 10B, Full
  - Harmonic Mean: 1B, 2B, 5B, 10B, Full
- 164.gzip
  - 164.gzip1 (Source code tar-file): 1B, 2B, 5B, 10B, Full.
  - 164.gzip2 (Webserver log): 1B, 2B, 5B, 10B, Full.
  - 164.gzip3 (Large TIFF image): 1B, 2B, 5B, 10B, Full.
  - 164.gzip4 (Random data): 1B, 2B,2B,5B, 10B, Full.
  - 164.gzip5 (Program binary): 1B, 2B, 5B, 10B, Full.
  - Arithmetic Mean: 1B, 2B, 5B, 10B, Full.
  - Harmonic Mean: 1B, 2B, 5B, 10B, Full.
- 175.vpr
  - 175.vpr1 (placement of "clma" from MCNC benchmark suite --not reported, problem with Simplescalar).
  - 175.vpr2 (routing "clma" from MCNC benchmark suite): 1B, 2B, 5B, 10B, Full.
- 176.gcc
  - 176.gcc1 (Preprocessed source from a SPECint2000 candidate): 1B, 2B, 5B, 10B, Full.
  - 176.gcc2 (Preprocessed source from SPECfp2000 200.sixtrack): 1B, 2B, 5B, 10B, Full.
  - 176.gcc3 (Preprocessed expr.i from gcc source): 1B, 2B, 5B, 10B, Full.
  - 176.gcc4 (Preprocessed integrate.i from gcc source): 1B, 2B, 5B, 10B, Full.
  - 176.gcc5 (Preprocessed version of Scilab program): 1B, 2B, 5B, 10B, Full.
  - Arithmetic Mean: 1B, 2B, 5B, 10B, Full.
  - Harmonic Mean: 1B, 2B, 5B, 10B, Full.
- 181.mcf
  - 181.mcf1 (Single-depot vehicle scheduling in public mass transportation): 1B, 2B, 5B, 10B, Full.
- 186.crafty
  - 186.crafty1 (5 different chess board layouts, varying search depth): 1B, 2B, 5B, 10B, Full.
- 197.parser
  - 197.parser1 (Syntactic analyses of a series of english sentences): 1B, 2B, 5B, 10B, Full.
- 252.eon
  - 252.eon1 (150x150 pixel image of a chair in the corner of a room): 1B, 2B, 5B, 10B, Full.
  - 252.eon2 (same image, different algorithm): 1B, 2B, 5B, 10B, Full.
  - 252.eon3 (same image, another different algorithm): 1B, 2B, 5B, 10B, Full.
  - Arithmetic Mean: 1B, 2B, 5B, 10B, Full.
  - Harmonic Mean: 1B, 2B, 5B, 10B, Full.
- 253.perlbmk
  - 253.perlbmk1 (Specdiff applied to email): 1B, 2B, 5B, 10B, Full.
  - 253.perlbmk2 (Finding perfect numbers --not reported, crashes Simplescalar).
  - 253.perlbmk3 (Testing pseudo random numbers --not reported, crashes Simplescalar).
  - 253.perlbmk4 (Converting Email to HTML): 1B, 2B, 5B, 10B, Full.
  - 253.perlbmk5 (Converting Email to HTML): 1B, 2B, 5B, 10B, Full.
  - 253.perlbmk6 (Converting Email to HTML): 1B, 2B, 5B, 10B, Full.
  - 253.perlbmk7 (Converting Email to HTML): 1B, 2B, 5B, 10B, Full.
  - Arithmetic Mean: 1B, 2B, 5B, 10B, Full.
  - Harmonic Mean: 1B, 2B, 5B, 10B, Full.
- 254.gap
  - 254.gap1 (Comb. funcs, big #'s, finite fields, lattice computations, normalizers, ag-groups): 1B, 2B, 5B, 10B, Full.
- 255.vortex
  - 255.vortex1 (Building and manipulating 3 different databases): 1B, 2B, 5B, 10B, Full.
  - 255.vortex2 (Building and manipulating 3 different databases): 1B, 2B, 5B, 10B, Full.
  - 255.vortex3 (Building and manipulating 3 different databases): 1B, 2B, 5B, 10B, Full.
  - Arithmetic Mean: 1B, 2B, 5B, 10B, Full.
  - Harmonic Mean: 1B, 2B, 5B, 10B, Full.
- 256.bzip2
  - 256.bzip21 (Source tar file): 1B, 2B, 5B, 10B, Full.
  - 256.bzip22 (A large TIFF image): 1B, 2B, 5B, 10B, Full.
  - 256.bzip23 (A program binary): 1B, 2B, 5B, 10B, Full.
  - Arithmetic Mean: 1B, 2B, 5B, 10B, Full.
  - Harmonic Mean: 1B, 2B, 5B, 10B, Full.
- 300.twolf
  - 300.twolf (Structured circuit from MCNC benchmark suite): 1B, 2B, 5B, 10B, Full.
- 168.wupwise
  - 168.wupwise (A problem in lattice gauge theory): 1B, 2B, 5B, 10B, Full.
- 171.swim
  - 171.swim (Large 1335x1335 array over 512 timesteps): 1B, 2B, 5B, 10B, Full.
- 172.mgrid
  - 172.mgrid (Single, constant coefficient equation on uniform cubical grid): 1B, 2B, 5B, 10B, Full.
- 173.applu
  - 173.applu (Large mesh over many timesteps): 1B, 2B, 5B, 10B, Full.
- 177.mesa
  - 177.mesa (Creating a 3D object from a 2D scalar field): 1B, 2B, 5B, 10B, Full.
- 178.galgel
  - 178.galgel (Convective flow in a rectangular box filled with liquid): 1B, 2B, 5B, 10B, Full.
- 179.art
  - 179.art1 (Finding a helicopter & airplane in a thermal image): 1B, 2B, 5B, 10B, Full.
  - 179.art2 (Finding a helicopter & airplane in a thermal image): 1B, 2B, 5B, 10B, Full.
  - Arithmetic Mean: 1B, 2B, 5B, 10B, Full.
  - Harmonic Mean: 1B, 2B, 5B, 10B, Full.
- 183.equake
  - 183.equake (1994 Northridge Earthquake aftershock in California): 1B, 2B, 5B, 10B, Full.
- 187.facerec
  - 187.facerec (Album of 42 faces, 84 images in probe gallery): 1B, 2B, 5B, 10B, Full.
- 188.ammp
  - 188.ammp1 (Tracking movement of atoms): 1B, 2B, 5B, 10B, Full.
- 189.lucas
  - 189.lucas1 (Lucas-Lehmer test for primality of Mersenne numbers 2^p-1): 1B, 2B, 5B, 10B, Full.
- 191.fma3d
  - 191.fma3d1 (Impulsive load applied to cylindrical panel): 1B, 2B, 5B, 10B, Full.
- 200.sixtrack
  - 200.sixtrack1 (60 particles in a Large Hadron Collider): 1B, 2B, 5B, 10B, Full.
- 301.apsi
  - 301.apsi1 (112x112x112 area array of data over 70 timesteps): 1B, 2B, 5B, 10B, Full.

Top of page

Experimental Error

The miss ratios were calculated from data collected by functional, user-mode simulations of optimized benchmarks. As a result, the cache miss ratios reported above may not be representative of a real platform. A few sources of error are discussed below.

First, only primary misses were counted by the simulator. Once a reference missed in the cache, the data was loaded and all subsequent accesses to the line hit. A modern processor may also experience secondary misses, or references to data that has yet to be loaded from a prior cache miss. There is a nonzero miss latency, and a real processor may execute other instructions while waiting for the data. The sequential model used in functional simulations is optimistic in this respect.

Second, a modern processor will have optimizations that affect cache performance. Hardware prefetching of instructions and data can have the positive effect of reducing the number of cache misses. However, prefetching can also cause cache pollution. Further, speculative execution can result in increased memory traffic for speculatively issued loads, and I-cache pollution from incorrect branch predictions. This also makes the results optimistic.

Third, the operating system was ignored. System calls cause additional cache misses to bring in OS code and data, and in doing so they replace cache lines from the user program. This increases the number of conflict and capacity misses for the user program in a real system. Since the additional misses from OS intervention were not modeled, our results are optimistic (though experiment showed these benchmarks typically spend less than 0.1% in the OS). One possibility is to flush the caches on system calls. However, this is the other extreme, and would have made it impossible to measure the compulsory miss rates.

Fourth, the benchmarks were optimized for an Alpha 21264 processor. The binaries may have been tuned to perform well with the 21264 cache hierarchy (split 64K 2-way set associative L1 caches). Ideally, the binary should not favor a particular cache configuration. Further, the binary contains no-ops for alignment and steering of dependant operations in the clustered microarchitecture of the 21264. These no-ops increase the overall instruction count for the functional simulation.

Fifth, since this is a functional simulation, the timeliness of prefetch operations is not considered. Prefetch operations can only prevent a cache miss on a demand reference if they are initiated early enough. Here, all subsequent accesses to a prefetched block hit in the cache. However, experiments with several benchmarks indicate that the compiler inserted prefetches suffiently far in advance of the first use of data to cover an L1 miss, with 10 to 100 comitted instructions between a prefetch and the first use.

Top of page

Related Work

SPEC CPU2000: Measuring CPU Performance in the New Millennium, John L. Henning. http://www.spec.org/osg/cpu2000/papers/COMPUTER_200007-abstract.JLH.html
Simulating SPEC CPU2000: Reduced input sets for SPEC CPU2000, research by the University of Minnesota. http://www.spec.org/osg/cpu2000/research/umn/
Measuring Cache and TLB Performance and Their Effect on Benchmark Run Times, R.H. Saavedra and A.J. Smith, IEEE Trans. on Computers, Vol. 44, No. 10, October 1995, pp. 1223-1235 (gzipped postscript)
Prefetching and memory system behavior of the SPEC95 benchmark suite, M.J. Charney and T.R. Puzak. http://www.research.ibm.com/journal/rd/413/charney.html
Cache Performance of the SPEC92 Benchmark Suite, Jeffrey D. Gee, Mark D. Hill, Dionisios N. Pnevmatikatos, and Alan Jay Smith. http://www.cs.wisc.edu/~markhill/spec92miss.html

Top of page

Acknowledgements

Cache numbers for version 3.0 were generated with computing resources provided by the UW ECE department and Intel.
Previous versions of this website contain cache numbers generated with computing resources provided by Wisconsin Condor, Midship (NSF 144-GB67), and Multifacet (NSF EIA-9971256) projects.

Top of page

Publications

Version 1.0 of this data appears in "Computer Architecture News", Vol. 29, No. 4 -September 2001 (ACM SIGARCH).
A subset of Version 2.0 of this data appears in the third edition of John Hennessy and David Patterson's "Computer Architecture, A Quantitative Approach".

Top of page

Disclaimer

Data in this directory is correct to the best of our knowledge. However, we provide it, *AS IS* without an expressed or implied warranty, and we accept no responsibility for the consequences of the use or misuse of this data.

Top of page

Revision History

July 2001: Version 1.0 --First release
August 19, 2001: Version 1.1
- 4 benchmarks added (mesa, ammp, vortex, parser)
- Brief discussion of experimental error added
August 26, 2001: Version 1.2
- 3 benchmarks added (crafty, eon, bzip2)
September 3, 2001: Version 1.3
- galgel added
October 21, 2001: Version 1.4
- wupwise and apsi added
November 3, 2001: Version 1.5
- vpr added
- data for 1K and 2K L1 caches added
- 2 more digits of precision added
December 3, 2001: Version 1.6
- mgrid and sixtrack added
January 20, 2002: Version 2.0 (Click here for Version 2.0)
- Last benchmarks added
- I & D references per access added
May, 2003: Version 3.0
- All data corrected to remove NOPs and distinguish prefetches (old data may be found on the previous version of the website, Version 2.0)
- Data for selected intervals added
- Unified caches added
- More general statistics added
- Harmonic means added
- More digits of precision

Top of page

Last updated May 2003, jfc.