Overview
We will now run a transient fault test to get you using PERSim. In this test, we will be modeling particle strikes using the charge accumulation model, analyze if a strike on one of 72 gates will cause a timing fault or not. We will identify the cycles when this will cause architectural errors and catch them. Lets use Parser from the SPEC2000 benchmark suite as the application.
We will use the project files in PERSim/XilinxProject
.
Input Sequence Extraction
In this step, we will be running the benchmark on the FPGA and extracting the inputs to select paths during 5 million cycles of hot code.
Input Sequence Extraction
Inputs
- Launch the SDK.
- Uncomment the #defines below //PARSER. The first line includes the parser binary. The next lines specify where the start and end PCs are. Make sure only one of the benchmarks is selected (the rest are commented out).
- The
#define START_OFFSET
specifies the number of cycles after the main() function after which we will record the input sequences - Make sure the code is configured in the input sequence extraction mode
#define INP_LOG 1
//#define FAULT_COMPARE 1
The test
Make sure the ZedBoard is connected to the computer and powered on.
- Download the bitstream file to configure the FPGA. Xilinx Tools -> Program FPGA.
-
Open
sudo minicom -D /dev/ttyACM0
(assuming ACM0 is your UART port). Save the output to a fileCTRL A+L
. - In the SDK, right click on hello world in the "Project Explorer", Run -> Launch in Hardware.
Outputs
The minicom output should record input sequences for 10 paths for 2 million cycles. This should take a few hours.
In this file, search for SECTION_START
. This part of the file records the input sequences. Next we will extract these values and drive the delay aware simulation.
Fault Modeling
Inputs
We will use random variables to model the particle strike's effects. The distribution of intensity and when a strike occurs is our input.The test
Navigate to DelayAwareSim/path_7/. This is a path that is near critical and will help us demonstrate the effect of transisent faults.
A python script produces two arrays one marking when a partcile strikes in a given cycle and the second showing how much delay increase it causes. Lets simulate 10000 particle strikes as only a few of these will cause a timing fault.
python ../common/getTransientVariables.py 200 10000 > particles.txt
Here 185 is the clock period in ps and 10000 is the number of particles we will generate.
Outputs
The fault model creates two output variables. The generated file particles.txt has two arrays.- In a given cycle, when a particle strikes. [strikeTime]
- How long does it affect the gate behavior. [delayChange]
Delay Aware Simulation
Delay Aware Simulation
We will be doing the Delay aware simulation in three steps. First we run the simulation without any fault injection. This will generate - for each cycle - if the output toggles and if it does, when the toggle happens. If there is sufficient slack, we will conclude that the fault is delay masked. For faults that might not be delay masked, we will run the simulation again with the delay fault inserted to see if it is logically masked. If a fault is seen, we mark that cycle.
Inputs
- The states that will drive our circuit will be from the input sequence extraction step
- The delays due to particle strikes come from the fault modeling
The test
We will do the test in 3 steps-
Delay aware simulation with no error injection.
rm -rf path_inputs.txt simv* csrc DVE*
cleans the folder.
python ../common/extractPathInputs.py [output from input sequence extraction] 7 > path_inputs.txt
creates the inputs in a form VCS can read.
../common/allvcs
compiles for VCS.
simv > parser.simv.log
runs delay aware simulation
python ../common/getTransissions.py parser.simv.log > transissions.log
transissions.log has the format Cycle : 1357494 Timing : 135. -
We will only concentrate on when the output transissions - when it does not, the signal has the full clock period to stabilize.
Using the timing information and the particle strike model, we will then determine which cycles may have a delay fault.
python ../common/getTransientDelayFaults.py 200 transissions.log particles.txt 100 > faultinject.txt
marks the cycles when a delay fault could possibly occur. -
Next, we check if the possible delay faults are logically masked.
Compile for VCS with faultinject.txt../common/transient_allvcs1
./simv > parser.simvfault.log
runs the simulation with faults injected in gates when delay faults are possible.python ../common/getTransissions.py parser.simvfault.log > transissionsfault.log
records transissions with fault injection. In any cycle, if there is a difference in the output between transissionfault.log and transission.log, a fault that is not logically or delay masked is found.python ../common/genTransientFaultVector.py transission.log transissionfault.log faultinject.txt 10000 > faultvector.txt
Outputs
- The cycles where faults manifested that will feed in as the fault vector.
- Faults that are logically masked at the microarchitecture level
- Faults that are delay masked at the microarchitecture level
Fault Injection and Re-execution
Fault Injection and Deterministic re-execution
In fault injection and deterministic re-execution, we are going to run once without faults injected, save the architectural state. Next we run with the fault injected and compare the architecture state offline.
Inputs
The exact cycles where faults would manifest in the mircoarchitecture.
Include the fault cycles in the firmware file as
int faultInjectLocArray[] = { cyclenumber0, cyclenumber1... };
and int faultInjectGateArray[] = { gatenumber0, gatenumber1... };
These are found in voil.h
.
The test
Change the mode to fault compare.
//#define INP_LOG 1
#define FAULT_COMPARE 1
Program the FPGA and run the firmware on the ZedBoard. Use minicom to save the output.
Outputs
At the end of each test, the hypervisor automatically compares the saved architectural state with the version with faults injected and reports if the fault "CAUSED ARCH ERROR".