Journal Article

Accurate and Efficient Cache Warmup for Sampled Processor Simulation Through NSL–BLRL

Luk Van Ertvelde, Filip Hellebaut and Lieven Eeckhout

in The Computer Journal

Published on behalf of British Computer Society

Volume 51, issue 2, pages 192-206
Published in print March 2008 | ISSN: 0010-4620
Published online September 2007 | e-ISSN: 1460-2067 | DOI: http://dx.doi.org/10.1093/comjnl/bxm061
Accurate and Efficient Cache Warmup for Sampled Processor Simulation Through NSL–BLRL

Show Summary Details

Preview

Architectural simulation is extremely time-consuming given the huge number of instructions that need to be simulated for contemporary benchmarks. Sampled simulation that selects a number of samples from the complete benchmark execution yields substantial speedups. However, there is one major issue that needs to be dealt with in order to minimize non-sampling bias, namely the hardware state at the beginning of each sample. This is well known in the literature as the cold-start problem. The hardware structures that suffer the most from the cold–start problem are cache hierarchies. In this paper, we propose NSL–BLRL, which combines two previously proposed cache hierarchy warmup approaches, namely: no-state-loss (NSL) and boundary line reuse latency (BLRL). The idea of NSL–BLRL is to warmup the cache hierarchy using a hardware state checkpoint that stores a truncated NSL stream. The NSL stream is a least-recently used stream of (unique) memory references in the pre-sample. This NSL stream is then truncated to form the NSL–BLRL warmup checkpoint; this is done by inspecting the sample for determining how far in the pre-sample one needs to go back to accurately warmup the hardware state for the given sample. We show using SPEC CPU2000 benchmarks that NSL–BLRL is (i) nearly as accurate as BLRL and NSL for sampled processor simulation, (ii) yields simulation time speedups of several orders of magnitude compared to BLRL and (iii) is more space-efficient than NSL. As such, we conclude that NSL–BLRL is a highly efficient and accurate cache warmup strategy for sampled processor simulation.

Keywords: computer architecture; sampled simulation; cold-start problem; warmup

Journal Article.  8308 words.  Illustrated.

Subjects: Computer Science

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.