… or why our software calls itself sampler

.

Each piece of instrumentation we add to the program is
associated with a specific location in the original source
code. So we might have, for example, an extra check added on line
88 of file `camel.c`, another on line 90, and so on.
Ideally, we would like to observe the behavior of the program each
time it reaches one of these instrumentation sites. In practice,
that would be far too slow.

Instead, we sample the instrumentation only occasionally.
Imagine that we have a biased coin that comes up heads 99 times
out of 100. Each time we reach an instrumentation site, we toss
that coin. If it comes up heads (99% of the time), we ignore the
instrumentation. If it comes up tails (1% of the time), we run
the instrumentation code and monitor the site for just this single
instance. The next time we reach the same line, we toss the coin
again. Code in a loop, then, might be monitored on one cycle
through the loop but not monitored the next. The choice is made
independently and randomly each time we reach an instrumentation
site, forming a *Bernoulli
process*.

Each piece of instrumentation is quite small and fast. Even if we skip 99 out of 100, spending a lot of time making that coin-tossing decision would still slow things down. We need to speed up the coin toss so that the common case, no sampling, is as fast as possible.

Remember that tails (samples taken) are rare. Instead of
generating lots of heads waiting for the next tail, we can just
write down how many heads there will be before the next tail comes
along. On average, there will be 99 heads between each pair of
tails. But we certainly could get two tails in a row, or we could
have to wait much longer for a tail to come along. These numbers
record the *inter-arrival time* of tails in a sequence of
coin tosses that usually come up heads.

There are two great things about these inter-arrival times. First, you can generate them directly by selecting random numbers from a geometric distribution with mean 100. This is fast and each random number accounts for, on average, the next 100 decisions we will have to make. Second, these inter-arrival times are predictive. If we know that the next tail isn’t coming along for another 78 tosses, then we know in advance that the next 77 instrumentation sites should be skipped. Each geometric random number is a countdown to the next sample.

This near-term predictive power lets us amortize the cost of
deciding to skip instrumentation sites. Each application actually
contains two variations of the program code: a fast

version
with (nearly) no instrumentation and a slow

version with
full instrumentation. If the next-sample countdown is 78, then we
can stay on the fast path across at least the next 77
instrumentation sites. Only when the countdown gets close to 0 do
we need to hop over into the instrumented code. And once that
sample is taken, we reset the countdown to a new random value and
jump back into the fast code again.

Of course, programs don’t always run in a simple straight
line. Knowing how many instrumentation sites are coming up in the
near future is non-trivial, especially in the presence of
branches, loops, and function calls. This is where compile-time
program analysis comes in. As the program is being instrumented,
we identify acyclic regions of the program’s control flow
graph (think flowchart). There are only a finite number of paths
through each acyclic region, each passing through only a finite
number of instrumentation sites. Thus each acyclic region has a
finite maximum *instrumentation weight*: the largest
number of instrumentation sites reached on any one pass through
that region. If execution reaches the top of a region of weight
10, and the next-sample countdown reads 78, then we know in
advance that none of the instrumentation in that region will be
used this time through. We jump into the fast code for that whole
region and only check the countdown again once we reach the
bottom.

Thus, when sampling is sparse, the program spends the majority of its time in fast, instrumentation-free code. This is how we measure a randomly sampled subset of program behavior without slowing you down.

This concludes our whirlwind tour of lightweight random sampling. If you still want more, the background reading publications should be enough to keep your mind busy for a while.