====== Internal Tutorial for Vertical Group ======

The DySER opensource project has gone through a radical change. This includes change in file structure scripts, new compiler DyCC, and some new tools. 

===== Building the Framework =====

The opensplyser evaluation framework contains two pieces:
  * SW toolchain files
    * The current version we use for micro2013 is in /p/vertical/projects/dyser/install/dyser-tools-full 
  * HW toolchain files
    * Check out the HW framework from git. This directory occupies aound 2.0 GB. (VCS wavefront files created in simulation may exceed 100 GB in space.) <code> git clone /p/vertical/projects/dyser/svn/splyser.git <check_out_dir></code>

The following instructions describe how to build the toolchain.
  * Building SW toolchain:
    * [TODO] No updated build instruction now.
  * Building HW toolchain:
    * You will find 4 directories in your <checkout_dir>: dyser-1.0, dyser-2.0, hardDySER, opensparc. 
      * dyser-1.0: The DySER RTL developed in OpenSPLySER paper.
      * dyser-2.0: Current developing DySER RTL, does not work with opensparc now. 
      * hardDySER: toolchain that can create a not-configurable DySER from dyser schedule file.
      * opensparc: The OpenSPARC toolchain that includes the dyser integration RTL. 
    * To set up the toolchain, navigate to: <code>cd <checkout_dir>/opensparc</code> and look for <code>OpenSPARCT1.bash</code>
    * <checkout_dir>/opensparc/OpenSPARCT1.bash: this is the file you need to modify for your environment
      * Modify line 6(HW) and 7(SW) to point to your install directory <code># ***Modification required for new install***
# Top of opensparc portion
export HW_ROOT="<path to your checkout dir>"
export SW_ROOT="<path to your SW toolchain install dir>"
</code> The HW_ROOT should be your <check_out_dir>, and the SW_ROOT now is /p/vertical/projects/dyser/install/dyser-tools-full. 
      * Modify line 23 for the scratch space of VCS object files. Make sure you can access the scratch space you assigned. <code># ***Modification required for new install***
#Regression run-time scratchspace
export DRMJOBSCRATCHSPACE=/scratch/vcsjobscratch
</code>
      * This file will be sourced by vcs-config.sh when running VCS simulation

==== Structure ====

In DySER opensource project, the benchmarks are maintained in HW toolchain. The main working directories are:
  * ''opensparc/verif/diag/c/<benchmark_sets>'' - This is the directory that has all the benchmarks:
    * ''hardDYSER'' - includes ''scalar/''(scalar code) and ''splyser/''(hand DySERized code)
    * ''micro13'' - includes ''vec/'' directory, which are the annotated c codes for DyCC to compile (Compiler DySERized code)
  * ''opensparc/regr_runs/<benchmark_sets>'' - This is the directory that has all the VCS simulation scripts.
    * ''hardDYSER'' - VCS simulation scripts with hand DySERized code
    * ''micro13'' - VCS simulation scripts with compiler DySERized code

We provide GEM5 simulation script and embedded it into the makefiles under ''diag'' directory, such that we can compile and run gem5-simulation in place. On the other hand, the VCS framework searches the ''diag'' directory, compile the benchmarks and verilog RTL, and invoke VCS for RTL-level simulation.

===== Simulation =====

The DySER evaluation framework can evaluate the design on 3 different platforms:
  * [[dyser-internal-tutorial#GEM5 simulation|GEM5]]
  * [[dyser-internal-tutorial#VCS simulation|Synopsys VCS]]
  * [[dyser-internal-tutorial#FPGA|Xilinx FPGA]]

==== GEM5 Simulation ====

The GEM5+DySER simulator is part of the SW toolchain. After installed the SW toolchain, a wrapper script ''run-gem5'' is created. This wrapper script is then used in ''config.mk'' under ''opensparc/verif/diag/c/<benchmark_sets>''. Here is an example to simulate fft in ''hardDYSER'':

Navigate to: 
<code>opensparc/verif/diag/c/hardDYSER/fft/splyser</code>
Compile and run simulation:
<code>make run_perf GEM5=1 1W=1</code>
A GEM5 simulation should be executed, and the output will be in ''m5out'' directory. the ''m5out/stats.txt'' is the simulation report, where the ''system.switch_cpus.numCycles'' is the number of total cycles in the interested region. 

In hardDySER benchmark set, Makefile commandline options are ''GEM5'', ''FPGA'', ''1W'' and ''8W''. The ''GEM5'' adds ''-DFF'' flag which translates the inserted marker (which marks the interested region) in the benchmark for GEM5 simulation. The ''FPGA'' flag translates the marker for FPGA simulation. The ''1W'' and ''8W'' flags compiles for 1W and 8W (vectorized) version of the DySERized code.

In micro13 benchmark set, Makefile commandline options are ''GEM5'', ''FPGA'', ''AUTO_DYSER'' and ''AUTO_VEC''. The ''GEM5'' and ''FPGA'' flags are the same as above. The ''AUTO_DYSER'' flag tells compiler to DySERize the annotated region, and the ''AUTO_VEC'' flag tells the compiler to DySERize and vectorize the annotated region. 


==== VCS Simulation ====

The VCS simulation framework is based on OpenSPARC ''sims'' script. Read OpenSPARC manual in ''opensparc/doc/OpenSPARCT1_DVGuide.pdf'' for more information of ''sims'' script. In brief, the ''sims'' script compiles verilog files, compile and link benchmark code, create image and run VCS simulation. 

First, navigate to:
<code>
/p/vertical/dyser-huge/dyser-micro13/opensparc/regr_runs/hardDYSER/fft.splyser.8w</code>

You will see two files in this directory:
  * ''run.sh'': This script sources the ''OpenSPARCT1.bash'' to set up the environment, and executes the ''sims'' script. You can find variables that controls VCS output, such as wavefront dump, switching activity for power estimation, and DySER trace files.
  * fft.splyser.8w.s: This File contains the compilation flags for the compiler. The ''sims'' script will call another tool ''MIDAS'' in the OpenSPARC toolchain to read this file and compile the benchmark.

Now, execute:
<code>bash run.sh</code>

Several files will be created after simulation:
  * ''build/'': This directory contains every intermediate assembly and executable.
  * log files: The ''vlog.log'' is the VCS OpenSPARC execution trace, the ''d0.dycore.log'' is the DySER core trace, the ''d0.dyser.log'' is the DySER interface trace, and the ''sims.log'' is the script execution log. 

==== FPGA ====

FPGA is same as last revision.\\
[TODO: new FPGA tutorial]

===== Creating a New benchmark =====

Currently, there are two ways of creating a new benchmark for simulation:
  * New hand DySERized benchmark: Start from a new program, identify the DySER schedule and create the schedule with ''dysched'' tool, use ''genCore.py'' and ''genRom.py'' to generate DySER core and DySER broadcast rom, and use gcc inline assembly DySER instructions to DySERize the c code.
  * New compiler DySERized code: [TODO: c code annotation]

==== Hand DySERized Benchmark ====

The first step is to read the code, find out the computation in the program and then create a DySER schedule.
Execute the software tool ''dytools/dysched'' you will see:\\
{{:internal:dysched.png|250}}

[TODO: detailed tutorial of dysched]
Right click on FU to assign functions to the functional unit:\\
{{:internal:fu-menu.png|}}

After you assign the function, FU will be colored and an edge will be created.\\
{{:internal:fu-created.png?200|}}

Double click on the FU, you will find the cursor becomes a triangle which has the same color as the FU. 
Now you can click on the edges to create datapath. Next, you can create primary input (PI) and primary output (PO) on the switches.\\
{{:internal:sw-menu.png|}}

The input and output can be at any switch, and each switch can have two inputs and two outputs. (This logical view of DySER is different from the hardware implementation.) You can click on ''Show Port Numbers'' to find out what is the port number of the PIs/POs in the configuration. **The port number is important and we will use it later**

Use ''file->save config'' to save the config file to disk. We will use this file to generate a DySER core verilog RTL. 
Several limitations are:
  * The ''genCore'' script can only generate DySER core with 2-fanout switches. A 3-fanout (or more) switch will have collide signals in RTL.
  * We only have 32 input and 32 output in the OpenSPARC-DySER interface.
  * Only a subset of functional units are supported.

The saved configuration file has the dimension and a number of switch and FU declaration. 
With the port number in mind, we can modify the c code with the macros in ''opensparc/verif/diag/c/include/dyser-dlp-sparc.h'':
  * ''DySEND(var, port)'' send data from register to DySER.
  * ''DyRECV(port, var)'' receive data from DySER to register.
  * ''DyLOAD(mem, port)'' load data from memory to DySER.
  * ''DySTORE(port, mem)'' store data from DySER to memory.
  * ''DyLOADPD(mem, port)'' broadcast load, loads to a special vector port.

Before modifying the code, use genCore.py to generate the RTL verilog. Because the logical view is different than the actual RTL, we need to have a port mapping to map the logical ports to the actual physical ports in RTL.

Use ''hardDySER/tools/genCore.py'' to read the created config file.

<code>
python genCore.py -f <config_file>
</code>

A ''<config_file>.hardDYSER.conf'' will be created at the same directory. In this file, you can find the splyser port mapping. Follow this port mapping to add DySER sends and receives to the code. Now you can compile and simulate in GEM5 and VCS. You can refer to the benchmarks in ''hardDYSER/'' to see the config files (''*.hardDySER'') and to see how to use the macros in c.

[TODO: tutorial of genRom and broadcast load]