… or why our software calls itself sampler
.
Each piece of instrumentation we add to the program is associated with a specific location in the original source code. So we might have, for example, an extra check added on line 88 of file camel.c, another on line 90, and so on. Ideally, we would like to observe the behavior of the program each time it reaches one of these instrumentation sites. In practice, that would be far too slow.
Instead, we sample the instrumentation only occasionally. Imagine that we have a biased coin that comes up heads 99 times out of 100. Each time we reach an instrumentation site, we toss that coin. If it comes up heads (99% of the time), we ignore the instrumentation. If it comes up tails (1% of the time), we run the instrumentation code and monitor the site for just this single instance. The next time we reach the same line, we toss the coin again. Code in a loop, then, might be monitored on one cycle through the loop but not monitored the next. The choice is made independently and randomly each time we reach an instrumentation site, forming a Bernoulli process.
Each piece of instrumentation is quite small and fast. Even if we skip 99 out of 100, spending a lot of time making that coin-tossing decision would still slow things down. We need to speed up the coin toss so that the common case, no sampling, is as fast as possible.
Remember that tails (samples taken) are rare. Instead of generating lots of heads waiting for the next tail, we can just write down how many heads there will be before the next tail comes along. On average, there will be 99 heads between each pair of tails. But we certainly could get two tails in a row, or we could have to wait much longer for a tail to come along. These numbers record the inter-arrival time of tails in a sequence of coin tosses that usually come up heads.
There are two great things about these inter-arrival times. First, you can generate them directly by selecting random numbers from a geometric distribution with mean 100. This is fast and each random number accounts for, on average, the next 100 decisions we will have to make. Second, these inter-arrival times are predictive. If we know that the next tail isn’t coming along for another 78 tosses, then we know in advance that the next 77 instrumentation sites should be skipped. Each geometric random number is a countdown to the next sample.
This near-term predictive power lets us amortize the cost of
deciding to skip instrumentation sites. Each application actually
contains two variations of the program code: a fast
version
with (nearly) no instrumentation and a slow
version with
full instrumentation. If the next-sample countdown is 78, then we
can stay on the fast path across at least the next 77
instrumentation sites. Only when the countdown gets close to 0 do
we need to hop over into the instrumented code. And once that
sample is taken, we reset the countdown to a new random value and
jump back into the fast code again.
Of course, programs don’t always run in a simple straight line. Knowing how many instrumentation sites are coming up in the near future is non-trivial, especially in the presence of branches, loops, and function calls. This is where compile-time program analysis comes in. As the program is being instrumented, we identify acyclic regions of the program’s control flow graph (think flowchart). There are only a finite number of paths through each acyclic region, each passing through only a finite number of instrumentation sites. Thus each acyclic region has a finite maximum instrumentation weight: the largest number of instrumentation sites reached on any one pass through that region. If execution reaches the top of a region of weight 10, and the next-sample countdown reads 78, then we know in advance that none of the instrumentation in that region will be used this time through. We jump into the fast code for that whole region and only check the countdown again once we reach the bottom.
Thus, when sampling is sparse, the program spends the majority of its time in fast, instrumentation-free code. This is how we measure a randomly sampled subset of program behavior without slowing you down.
This concludes our whirlwind tour of lightweight random sampling. If you still want more, the background reading publications should be enough to keep your mind busy for a while.