We will now run a transient fault test to get you using PERSim. In this test, we will be modeling particle strikes using the charge accumulation model, analyze if a strike on one of 72 gates will cause a timing fault or not. We will identify the cycles when this will cause architectural errors and catch them. Lets use Parser from the SPEC2000 benchmark suite as the application.

We will use the project files in PERSim/XilinxProject.

In this step, we will be running the benchmark on the FPGA and extracting the inputs to select paths during 5 million cycles of hot code.

Input Sequence Extraction

Inputs

  • Launch the SDK.
  • Uncomment the #defines below //PARSER. The first line includes the parser binary. The next lines specify where the start and end PCs are. Make sure only one of the benchmarks is selected (the rest are commented out).
  • The #define START_OFFSET specifies the number of cycles after the main() function after which we will record the input sequences
  • Make sure the code is configured in the input sequence extraction mode #define INP_LOG 1
    //#define FAULT_COMPARE 1
This sets up the firmware code.

The test

Make sure the ZedBoard is connected to the computer and powered on.

  • Download the bitstream file to configure the FPGA. Xilinx Tools -> Program FPGA.
  • Open sudo minicom -D /dev/ttyACM0 (assuming ACM0 is your UART port). Save the output to a file CTRL A+L.
  • In the SDK, right click on hello world in the "Project Explorer", Run -> Launch in Hardware.

Outputs

The minicom output should record input sequences for 10 paths for 2 million cycles. This should take a few hours.
In this file, search for SECTION_START. This part of the file records the input sequences. Next we will extract these values and drive the delay aware simulation.

We will use the charge accumulation model to capture the effect of a partcile strike. An alpha particle leaves a charge on the transistor it strikes. This causes a temporary glitch. How long the behavior is affected for and when the particle strikes are random variables.

Inputs

We will use random variables to model the particle strike's effects. The distribution of intensity and when a strike occurs is our input.

The test

Navigate to DelayAwareSim/path_7/. This is a path that is near critical and will help us demonstrate the effect of transisent faults.

A python script produces two arrays one marking when a partcile strikes in a given cycle and the second showing how much delay increase it causes. Lets simulate 10000 particle strikes as only a few of these will cause a timing fault. python ../common/getTransientVariables.py 200 10000 > particles.txt Here 185 is the clock period in ps and 10000 is the number of particles we will generate.

Outputs

The fault model creates two output variables. The generated file particles.txt has two arrays.
  • In a given cycle, when a particle strikes. [strikeTime]
  • How long does it affect the gate behavior. [delayChange]

Delay Aware Simulation

We will be doing the Delay aware simulation in three steps. First we run the simulation without any fault injection. This will generate - for each cycle - if the output toggles and if it does, when the toggle happens. If there is sufficient slack, we will conclude that the fault is delay masked. For faults that might not be delay masked, we will run the simulation again with the delay fault inserted to see if it is logically masked. If a fault is seen, we mark that cycle.

Inputs

  • The states that will drive our circuit will be from the input sequence extraction step
  • The delays due to particle strikes come from the fault modeling

The test

We will do the test in 3 steps
  • Delay aware simulation with no error injection. rm -rf path_inputs.txt simv* csrc DVE* cleans the folder.
    python ../common/extractPathInputs.py [output from input sequence extraction] 7 > path_inputs.txt creates the inputs in a form VCS can read.
    ../common/allvcs compiles for VCS.
    simv > parser.simv.log runs delay aware simulation
    python ../common/getTransissions.py parser.simv.log > transissions.log transissions.log has the format Cycle : 1357494 Timing : 135.
  • We will only concentrate on when the output transissions - when it does not, the signal has the full clock period to stabilize.
    Using the timing information and the particle strike model, we will then determine which cycles may have a delay fault.
    python ../common/getTransientDelayFaults.py 200 transissions.log particles.txt 100 > faultinject.txt marks the cycles when a delay fault could possibly occur.
  • Next, we check if the possible delay faults are logically masked.
    Compile for VCS with faultinject.txt ../common/transient_allvcs1
    ./simv > parser.simvfault.log runs the simulation with faults injected in gates when delay faults are possible.
    python ../common/getTransissions.py parser.simvfault.log > transissionsfault.log records transissions with fault injection. In any cycle, if there is a difference in the output between transissionfault.log and transission.log, a fault that is not logically or delay masked is found. python ../common/genTransientFaultVector.py transission.log transissionfault.log faultinject.txt 10000 > faultvector.txt

Outputs

  • The cycles where faults manifested that will feed in as the fault vector.
  • Faults that are logically masked at the microarchitecture level
  • Faults that are delay masked at the microarchitecture level

Fault Injection and Deterministic re-execution

In fault injection and deterministic re-execution, we are going to run once without faults injected, save the architectural state. Next we run with the fault injected and compare the architecture state offline.

Inputs

The exact cycles where faults would manifest in the mircoarchitecture.
Include the fault cycles in the firmware file as int faultInjectLocArray[] = { cyclenumber0, cyclenumber1... }; and int faultInjectGateArray[] = { gatenumber0, gatenumber1... }; These are found in voil.h.

The test

Change the mode to fault compare.
//#define INP_LOG 1
#define FAULT_COMPARE 1

Program the FPGA and run the firmware on the ZedBoard. Use minicom to save the output.

Outputs

At the end of each test, the hypervisor automatically compares the saved architectural state with the version with faults injected and reports if the fault "CAUSED ARCH ERROR".