Static information files are generated by sampler-cc during program compilation. They describe instrumented code but contain no information about any individual run. Think of them as an extension of the symbolic debugging information generated during normal compilation.
“File” is a bit of a misnomer. These pieces of information are not initially placed in their own files. Rather, static information is embedded directly within each instrumented object file, shared library, or executable. Custom ELF sections hold each piece of static data.
Embedding ensures that the static information remains
tightly associated with the code it describes even across
renaming, linking, archive extraction, etc. However, it is
often useful to extract this information into a standalone
form for additional analysis. The
/usr/local/lib/sampler/tools/extract-section
tool may be used for this purpose. It is run as
follows:
extract-section
{section-name
} { executable
| shared-library
| object
...}
section-name
names the
ELF section containing the desired data,
including a leading “.”. Remaining arguments are
ELF executables, shared libraries, or object files. The named
section is read from each of these and written to standard
output in sequence.
Note | |
---|---|
Although extract-section will happily copy data from any ELF section you name, it should only be used for extracting static sampler information. Normalization applied during extraction means that you are not seeing a byte-for-byte copy of the named section. Use objcopy for more general manipulation tasks. |
When shipping precompiled binaries to large numbers of
users, you should extract, save, and then
remove the sampler's static data sections
from your binaries. This data can be large, and the typical
end user does not need to download, store, or use it. Removal
is made easier by the fact that all of these extra
ELFsections are marked as debugging
sections: the standard strip will remove
them along with all other debug information. Our
RPM building tools do exactly this, and
save the extracted static data files in auxiliary
*-samplerinfo
packages
analogous to Red Hat's
*-debuginfo
packages. The
*-samplerinfo
packages are
available for all to see, but typically only a developer would
download and install them.
The site information file lists the instrumentation
sites added to each compilation unit. This is the main key
used to decode dynamic feedback reports. When embedded within
an instrumented binary, it is always found in the
“.debug_site_info
”
ELF section. When extracted into a
standalone file, it is conventionally stored with the
extension “.sites
”.
The format of this file is a hybrid of XML and tab-delimited columnar data. This is intended as a compromise between structure, efficiency of storage, and ease of processing.
At the top level, a static site information file
consists of a sequence of sections marked by XML-style
sites
tags:
<sites
unit
=""
unit signature
scheme
=""> …
scheme name
</sites>
<sites
unit
=""
unit signature
scheme
=""> …
scheme name
</sites>
⋮ <sites
unit
=""
unit signature
scheme
=""> …
scheme name
</sites>
Note | |
---|---|
Unlike true XML, there is no prologue and no single
root tag. The first line is the |
A single <sites>
… </sites>
section
describes the instrumentation sites for one compilation unit
with one instrumentation scheme. Each sites
start tag carries two
attributes:
a 128-bit identifying signature for this compilation unit expressed as 32 lower case hexadecimal digits
the name of an instrumentation scheme as given to
sampler-cc's -fsampler-scheme
flag
When data is collected during a run, it is also marked
with these same two attributes. Thus this
(unit signature
,
scheme name
) pair serves to
connect dynamic data with the static sites that collected
it.
A complex application that links together several
object files will contain several sites
sections. If multiple
instrumentation schemes were used within a single
compilation unit, then multiple sites
sections will appear with
the same compilation unit signature but differing scheme
names. It is even possible, though rare, for a single
object file to be linked into an executable multiple times,
in which case all of its sites
sections will be duplicated
as well. In all cases where
multiple sites
sections
are present, their order is arbitrary. In particular, do
not assume that the dynamic data in feedback reports appears
in the same order as these static sites
sections.
Within a single sites
section, each line describes
one instrumentation site for the given compilation unit and
scheme. The order here is fixed and matches the order of
counters appearing in the corresponding section of the
dynamic feedback report from each run.
Details for each site are given as a sequence of tab-delimited fields. The initial fields are common to all instrumentation schemes:
source file name
source line number
name of function containing site
control flow graph number of site (unique within function)
Additional fields are specific to the instrumentation scheme that induced this site:
atoms
the lvalue which may access shared, mutable memory
one of:
read
access reads from the given location
write
access writes to the given location
bounds
the left hand side of the instrumented assignment
one of:
local
assignment is to a named local variable
global
assignment is to a named global variable
mem
assignment is to an indirectly addressed memory location
one of:
direct
assignment is to a direct base location with no offset
field
assignment is to a named field within a structure
index
assignment is to an indexed element within an array
branches
the predicate of the instrumented branch
float-kinds
the left hand side of the instrumented assignment
one of:
local
assignment is to a named local variable
global
assignment is to a named global variable
mem
assignment is to an indirectly addressed memory location
one of:
direct
assignment is to a direct base location with no offset
field
assignment is to a named field within a structure
index
assignment is to an indexed element within an array
function-entries
: no additional fields
g-object-unref
the object argument in the instrumented call
to g_object_unref
returns
the callee in the instrumented function call
scalar-pairs
the left hand side of the instrumented assignment
one of:
local
assignment is to a named local variable
global
assignment is to a named global variable
mem
assignment is to an indirectly addressed memory location
one of:
direct
assignment is to a direct base location with no offset
field
assignment is to a named field within a structure
index
assignment is to an indexed element within an array
the right hand side of the instrumented assignment
one of:
local
site is comparing the assigned value with a named local variable
global
site is comparing the assigned value with a named global variable
const
site is comparing the assigned value with a compile-time constant
local-init
site is comparing the assigned value with a named local variable whose value is definitely initialized at that site
local-uninit
site is comparing the assigned value with a named local variable whose value is possibly uninitialized at that site
A dynamic feedback report consists of instrumentation data and possibly other debugging information collected from a single run of an instrumented program. When using high-level program launchers in conjunction with a report collection server, a dynamic report arrives at the server each time an instrumented application exits. When using low-level environment variables, a dynamic report is written into the selected file descriptor or file name as the program exits.
At the top level, a dynamic feedback report consists of a
sequence of sections marked by XML-style report
tags:
<report
id
=""> …
subreport name
</report>
<report
id
=""> …
subreport name
</report>
⋮ <report
id
=""> …
subreport name
</report>
Note | |
---|---|
Unlike true XML, there is no prologue and no single root
tag. The first line is the |
The following subsections describe the subreports currently in use.
The first subreport always has id
="samples
".
The samples
subreport
contains the final recorded values for all instrumentation
sites. It is designed to be small and therefore easy to send
to a central collection server. For this reason, it cannot be
understood by itself. A
samples
subreport must be
decoded using the static site information
files generated when the application was built. Taken
together, the samples
subreport and the static site information files connect
observed dynamic behaviors with static source features such as
functions, files, and line numbers.
A samples
subreport
consists of a sequence of sections marked by XML-style
samples
tags:
<samples
unit
=""
unit signature
scheme
=""> …
scheme name
</samples>
<samples
unit
=""
unit signature
scheme
=""> …
scheme name
</samples>
⋮ <samples
unit
=""
unit signature
scheme
=""> …
scheme name
</samples>
Each samples
section
gives the final instrumentation data for one instrumentation
scheme in one compilation unit. As noted earlier, sites
sections in the static site
information file and samples
sections in a samples
subreport are not guaranteed to appear in the same order.
However, the mandatory unit
and scheme
attributes have the same meaning in both
cases. For any given (unit
signature
, scheme
name
) pair, the corresponding section of the
samples
subreport gives the
measured values for a run and the corresponding section of the
static site information file relates that information to the
application source code.
Within one samples
section, each instrumentation site reports its measurements on
one line, with multiple values delimited by tabs. See Instrumentation schemes for a description of each scheme's
recorded data values. The order of lines
within a samples
section is fixed and
corresponds, line by line, with the corresponding sites
section of some static site
information file. Usually the static site information is
drawn from the main instrumented application, but it may also
come from shared libraries or dynamically loaded plugins, each
of which has its own static lists of instrumentation
sites.
Dynamically loaded plugins are a special case, in that
they may appear multiple times in a single
samples
subreport. If a
plugin is loaded and unloaded multiple times while the
application is running, each unload reports on all of that
plugin's sites just before unloading. Each reload of the
plugin resets all of the plugin's instrumentation site data to
its initial values (e.g. 0 for counters), with no memory of
the earlier load. When examining feedback reports from
applications with instrumented plugins, it is up to you to
merge these repeated samples
sections appropriately. For
counter-based schemes, the right thing to do is simply sum
corresponding counters from multiple sections. For the
bounds
scheme, which is not counter-based,
take the minimum of all corresponding minima and the maximum
of all corresponding maxima.
Aligning and merging multiple
samples
and sites
sections can be tedious. The
/usr/local/lib/sampler/tools/resolveSamples
tool provides simple merging to support basic data analysis.
It is run as follows:
resolveSamples
{section-name
} { executable
| shared-library
| object
| standalone site information file
...}
Standard input to resolveSamples
should be a samples
subreport, starting with the first <samples>
start tag and ending after
the last </samples>
end tag.
Arguments on the command line may be any mixture of extracted
site information files or instrumented binary files with
static site information still embedded within them. Output is
a sequence of lines containing only tab-delimited columnar
data, with no XML-like tags. Each instrumentation site
appears on a single line with the following initial
fields:
file name from resolveSamples in which this site was found
signature of the compilation unit in which this site was found
name of the instrumentation scheme that induced this site
These initial fields are followed by:
all static information fields for this site
all dynamic values reported for this site
The flat, uniform structure of a fully resolved samples report can be convenient for basic data analysis on small numbers of runs. However, the size and redundancy of the static information fields make this approach undesirable when processing hundreds or thousands of feedback reports.
Add documentation about timestamps
sections, which will also
appear in the samples
subreport when site time stamping is enabled at
instrumentation time. Also document the
/usr/local/lib/sampler/tools/resolveTimestamps
tool.
In the event of a crash, the dynamic feedback report
contains an additional subreport describing the execution
stack of the running thread at the point of failure. Output
is as generated by the backtrace_symbols
function from the GNU C library. For example:
ccrypt[0x804ff1c] /lib/tls/libc.so.6[0xaa78c8] ccrypt[0x804c3df] ccrypt[0x804c80e] ccrypt[0x804a34b] /lib/tls/libc.so.6(__libc_start_main+0xd3)[0xa94e23] ccrypt[0x8049171]
Using debug information recorded when the application was built, these raw addresses can be further resolved to function names and line numbers, as one would expect to see in a debugger:
return | module | function | file | line |
---|---|---|---|---|
0x804ff1c | ccrypt | handleSignal | report.c | 87 |
0xaa78c8 | /lib/tls/libc.so.6 | ?? | ?? | 0 |
0x804c3df | ccrypt | traverse_file | traverse.c | 451 |
0x804c80e | ccrypt | traverse_files | traverse.c | 485 |
0x804a34b | ccrypt | main | main.c | 516 |
0xa94e23 | /lib/tls/libc.so.6 | ?? | ?? | 0 |
0x8049171 | ccrypt | _start | ?? | 0 |
Note that this only reveals code locations. Values of local variables or other program data are not reported.
It is often useful to extract a single subreport from a
dynamic feedback report. For example, the
resolveSamples tool expects to see
just a samples
subreport, not
an entire feedback report. The
/usr/local/lib/sampler/tools/extract-section
tool may be used for this purpose. It is run as
follows:
extract-report
{report id
}
Standard input to extract-report should be a raw dynamic feedback report. It prints the report with the requested ID on standard output and discards the rest. For example, one might use this in conjunction with extract-section as follows:
extract-report samples
<
raw-report.log
| resolveSamples
myapp
myplugin.so
libmylib.so