Next Generation Build System

Contents


  1. Desired Features
  2. Useful Terminology
  3. Semantics
    Submit-side Semantics
    Remote-side Semantics
  4. Description of parameters in NMI submit file
  5. Environment Variables available to the users
  6. Using Marcos in the NMI submit file
  7. Submitting your build/test
  8. Seeing the results of your build/test
  9. Local Builder
    Submitting your local build/test
    Seeing the results of your local build/test
  10. Examples
    Gsissh
    Condor
    Additional examples

Desired Features

The aim of the new world order build/test framework is to separate the functionality of the  frame work and the component related build/test scripts. This is achieved by having a clear semantics on how the component related scripts will be invoked.

It should have following features –

  1. Ease of use.
  2. Ease of adding new components without modifying the build system code.

  3. Ease of adding new platforms preferable without modifying the build system code.

  4. Building multiple components at the same time if they are independent and follow the dependencies when required..

  5. Log the build/test related information in the database.

  6. Should transfer the required externals for a particular component to the remote build machine. This can even include externals like Java. Flexibility on this feature is debatable.

  7. Framework should be smart enough to handle the operating system signals and should report failure if any of the scripts is killed.

  8. Web interface for component builders/testers.

Useful Terminology

Semantics

The goal for having the framework is to make it simple to do simple builds, without having to understand the complexity available for more complex builds/tests.

For example, one can build a product foo by fetching it's sources from the web, untarring the sources and running make (e.g., "ncftp foo.edu:/foo.tar.gz; tar zxvf foo.tar.gz; configure; make") doesn't need to do anything special for different platforms. So this can behave just as it would on a single system. You write each of those commands in a script, specify them for the corresponding build steps in your submit file, add a remote_post to tar up whatever build output you want to save, and you're done.

Thus the build system functionality is divided into two

  1. Tasks done at the submit machine
  2. Task done at the remote machine

These semantics are shown below. Lines written in "blue" are the functionalities provided by the build system. Lines written in "black" are the "Glue-Scripts" provided by the component builder (user). Lines in "green" are remarks or more information available to the user.

Submit-side Semantics

Submit side of the framework is mainly concerned with doing the preliminary tasks done before the component build/test is actually dispatched to the remote machine(s) and doing the post build tasks after the component build/test results arrive back from the remote machine(s).

User initiates the build/test by passing the NMI submit file to the framework. This NMI submit file has all the information that the framework requires to successfully build/test the component. A sample NMI submit file is shown below in examples. It contains the information about the component and where the glue scripts are and what glue script to invoke at a given time. Once this information is parsed by the framework it creates an initial database record for this build/test run and gets a runid. It also creates a working directory where the build/test related information is stored. It creates the required sub-directories based on the platform list. Once these general tasks are done framework creates DAG of condor jobs for the requested build/test run. Some of these jobs are executed in the submit machine and some on the remote machine(s). Once the stage is set framework fetches component sources as instructed in the "sources" option of the NMI submit file and places them in a known location. It then invokes the pre_all script for the component if declared. Framework also invokes the platform_pre and platform_post scripts if declared at appropriate time.

Once the condor job running on the remote machine(s) completes (successfully or unsuccessfully) framework invokes the post_all script.

Framework also updates the database about the results of each subtasks for your builds/test.

Read the NMI submit file
Create an initial database record and get a runid
Fetch sources                            ..............

pre_all                                  ..............
for each platform {
    cp common/* <arch>/
    cd <arch>
    platform_pre                         ..............
    submit vanilla job
    update database
    platform_post                        ..............
    update database
}

post_all                                 ..............
update database
display html


Fetches sources in $Workspace/$gid/common
Wakes up in $Workspace/$gid/common



Wakes up in $Workspace/$gid/$platform


Wakes up in $Workspace/$gid/$platform


Wakes up in
$Workspace/$gid/common

 

Remote-side Semantics (per platform)

Once the condor job (job to build the component on a targeted platform) comes to the remote machine, the very first script to be invoked is the framework wrapper script that actually executes the glue scripts. User can split the build/test into several subtasks of his/her choice in the remote_declare. If the user needs to take any action before declaring the remote tasklist he/she can do it in remote_pre_declare. Similarly if there are any tasks to be executed before the first build/test task but after the remote_declare, this can be done in remote_pre.

Then for every subtask declared by the user the build system runs them saves the statistics and results for the subtasks and and streams the output back to the submit machine. When the component is built/tested and all the subtasks are completed remote_post script is run. This essentially serves for the tasks like packaging the component or correlating the test results in a particular way that is not a part of the build/test but is required for the component distribution. The framework expects that the glue scripts create a single tar file called "results.tar.gz" that contains everything that is to be transferred back to the submit side. Stdout and Stderr for all the subtasks are transferred back by default. Any other user log files created should be included in results.tar.gz by the user.

Component build wrapper
remote_pre_declare                       ..............
remote_declare                           ..............
remote_pre                               ..............
if declare_list is empty {
    insert a special noop task in the tasklist
}
for each task in declare_list {
    remote_task                          ..............
    record task runtime
    save error and output
    save return status
}

remote_post                              ..............
send back all results



 












 

Description of parameters used in the NMI submit file

Refer to examples in case of doubts

Parameters Description
description Component description string
project Name of the project for which you are building or testing
project_release Version of the Project for which you are building or testing
component Name of the component you are building or testing
component_version Version of the Component you are building or testing
run_type <BUILD | TEST>
If nothing is specified it defaults to UNKNOWN in the database. It is always good to specify the run_type in order to distinguish builds from tests.
sources (deprecated) This has been deprecated. Use parameter "inputs" (see below) instead.
inputs

List of sources to be fetched. You must also fetch the glue separately if it is not available with the sources. This list is actually a list of NMI submit files each telling how to fetch the individual sources. For example you want to fetch glue from CVS repository and component sources from web, then write the instructions in two separate files say glue.cvs and source.ftp and your sources option should look like -

sources = glue.cvs, source.ftp

Note that if you use multiple inputs and they download a file with the same name, you will end up with just one of the copies of the file, and there is no guarantee which one it will be.

Parameters available for different methods for fetching sources:
Parameters marked by "*" are optional

  1. Method: ftp
     
    method = ftp
    ftp_root = <ftp:// | http://>
    ftp_target = remainder of the Url for the source file
    ftp_args* = arguments you want to pass to ftp
    untar = <true | false> (Defaults to false) If the file that is transfered ends in .tar, .tar.gz, .tgz, or .tar.Z, the file is uncompressed and untarred.

     

  2. Method: scp
     
    method = scp
    scp_file = Path to the file or directory to be copied.
    recursive* = <true | false> (Defaults to false)
    NOTE: Required in case of scp'ing a directory
    untar = <true | false> (Defaults to false) If the file that is transfered ends in .tar, .tar.gz, .tgz, or .tar.Z, the file is uncompressed and untarred.

     

  3. Method: cvs
     
    method = cvs
    cvs_root = Should look something like -
    :ext:parag@chopin.cs.wisc.edu:/p/condor/repository/nmi
    cvs_server* = In case you have an authenticated CVS access -
    /afs/cs.wisc.edu/p/condor/public/bin/auth-cvs
    cvs_ssh = ssh
    cvs_module = Module you want to checkout from the CVS
    cvs_tag* = Specific CVS tag of the module you want to
    checkout

     

  4.  Method: nmi (Not available for local builder)
     
    method = nmi
    input_runids = Comma separated list of runid (from the database) whose results should be taken as the input to the current build/test
    platforms
     
    = Override the platform list specified in the main NMI submit file
    ignore_missing_platforms* = <true | false> (Defaults to false)
    Do not error out if there are some platforms missing in the input runid
    skip_failed_builds* = <true | false> (Defaults to false)
    Do not error out if the input run exited with failed status
    untar_results* = <true | false> (Defaults to true)
    Do not untar the result but instead copy the results.tar.gz on input run


     

platforms

List of platforms to build on. To get the list of available platforms from condor run the following command from the submit machine (grandcentral.cs.wisc.edu) -

/usr/local/bin/condor_status -l | grep nmi_platform

Specify "platforms" as a comma separated list you want to build/test on .

notify Email notification list. Put each email separated by commas.
priority Priority of the users own jobs relative to each other
prereqs

prereqs_<platform>

Comma separated list of the software that you will require to be present on the system in order to build/test your component. For example you may require a specific version of java, binutils, etc to build/test your component.

Prereqs should be specified in format prereq_name-prereq_version (Example: java-1.4.2_05)

To list the available prereqs on the remote machines run following command -

condor_status -l | grep has_

This will show the locations where the prereq software is installed. Grab the prereq naming format and put it in your list.

If you are building/testing on multiple platforms and need different version of prereq for different platforms use the paramter prereqs_<platform> and replace the <platform> with actual platform name.

For example you need gcc-2.95.3 on platform sun4u_sol_5.9 and gcc-3.2.2 on all the other platforms your NMI submit file should look like this -

prereqs = gcc-3.2.2, .......

prereqs_sun4u_sol_5.9 = gcc-2.95.3

pre_all
platform_pre
platform_post
post_all
remote_pre_declare
remote_declare
remote_pre
remote_task
remote_post
remote_post_always
Refer this section
pre_all_args
platform_pre_args
platform_post_args
post_all_args
remote_pre_declare_args
remote_declare_args
remote_pre_args
remote_task_args
remote_post_args
remote_post_always_args
Arguments to the respective glue scripts
+<condor_paramter>

If you would like to pass a specific parameter to condor then add a "+" before it.
For example you want to pass periodic_remove to condor build/test jobs then following should do the trick -

+preiodic_remove = true

++<user_paramter>

If you would like to pass a user defined parameter to condor then add a "++" before it.
For example you want to pass foo = bar to condor build/test jobs then following should do the trick -

++foo = bar

Environment Variables available to the users

Framework makes some of the information available in the environment. This information can be used in the glue scripts. Available environment variables are -

ENV Variable Local/Remote Description
NMI_<parameter> Both All the parameters used in the NMI submit file are available in the environment in the format NMI_<parameter>. For example the parameter "component" is available to glues scripts as $NMI_component environment variable.
NMI_PLATFORM Remote Name of the current platform.
_NMI_TASKNAME Remote Name of the remote (user defined) task
_NMI_STEP_FAILED Remote If this is there in the environment then the last NMI task failed
NMI_BIN Local Bin directory where the nmi executables are located
_NMI_NMIDIR Local Your workspace directory
_NMI_USERDIR Local Userdir in the workspace
_NMI_DBLOGDIR Local Directory where database logs are stored

Using Macros in the NMI submit file

Framework allows basic macro substitution in the NMI submit file. If you define an environmnet variable $_NMI_FOO you can use macro $(FOO).

There is a default macro available for use by the system

Macro Description
$(USER) Framework substitutes this with your login name.

Submitting your build/test

Setting your path

[$prompt] export PATH=$PATH:/nmi/bin

Assuming that your NMI submit file is named "cmdfile" file to build the component run -

[$prompt] nmi_submit cmdfile

Seeing the results of your build/test

A very basic web interface is available for the users to see the results of their build/test run. It is available at http://grandcentral.cs.wisc.edu/build/. We are working on making it more user friendly by providing more filters for easy access.

For every run that user submits the framework creates record in the "Run" table and assigns a runid to it. You can click the link corresponding to Run table to see information related to this record. Framework also creates a record for every task that is run like fetching the sources, pre_all script, platform_pre, tasks defined in the tasklist.nmi, etc. Each task has its own unique taskid. To get the information for the tasks related to a particular Run just look at the tasks which have the runid. The web interface also provides means to access the logs, output and error over the web.

Local Builder

Local Builder is a stripped down version of the framework. It enables the user to build/test their components on the same system where it is run from. This also means that users cannot build/test the binaries for platforms other than the local system. Conceptually it works on similar principles as the framework i.e. it uses the same glue scripts however it does not rely on jobs being run by condor. All the glue and framework related scripts are invoked locally. This enables the users to leverage on the work done by the NMI group and build/test the binaries on their own system. So all the discussion mentioned above for the framework also applies to the local builder unless otherwise stated.

Submitting your local build/test

[$prompt] export PATH=$PATH:/nmi/bin
[$prompt] nmi_run_local cmdfile

Seeing the results of your local build/test

Since the local builder does not rely on the database being present, information for the local builds/tests are not logged into the database. nmi_run_local creates a local working directory (hence forth called $workdir) in the format <username>_<machine name>_<epoche secs>_$$. This $workdir is created in the current directory where the build/test is fired from. The results for your run can be found in $workdir/userdir/<platform>/results.tar.gz

Examples

Examples below show the command description files for GsiOpenssh and condor. The scripts used to build the products are also linked from the description file.

1. GsiOpenssh

NMI submit file (cmdfile) - GsiOpenssh

description = Gsissh build for nmi

project = nmi
project_release = 6.0

component = gsi_openssh
component_version = 3.5

sources = gsissh-glue.cvs, gsissh-prereq.scp, gsissh-compat.ftp, gsissh-setup.ftp, gsissh-src.ftp

platform_pre = nwo/glue/gsissh/build/platform_pre
platform_pre_args = /space/parag/nmi-6.0/bundles
remote_declare = nwo/glue/gsissh/build/remote_declare
remote_task = nwo/glue/gsissh/build/remote_task
remote_task_args = nondebug-bins
remote_post = nwo/glue/gsissh/build/remote_post
platform_post = nwo/glue/gsissh/build/platform_post
platform_post_args = /space/parag/nmi-6.0/bundles

platforms = x86_rh_9, x86_rh_7.2, sun4u_sol_5.9

prereqs = java-1.4.2_05, apache-ant-1.6.2, junit-3.8.1, perl-5.8.5, tar-1.14, patch-2.5.4, m4-1.4.1, binutils-2.15, flex-2.5.4a, make-3.80, byacc-1.9, bison-1.25, gzip-1.2.4

prereqs_sun4u_sol_5.9 = gcc-2.95.3

notify = parag@cs.wisc.edu

priority = 1

2. Condor

NMI submit file (cmdfile) - condor

description = nightly condor 6.7.x build run

project = condor
project_release = 6, 7, x

component = condor
component_version = 6, 7, x

sources = condor_srcsfile-BUILD-V6_7-branch-2004-8-25

pre_all = nmi_glue/build/pre_all
remote_declare = nmi_glue/build/remote_declare
remote_pre = nmi_glue/build/remote_pre
remote_task = nmi_glue/build/remote_task
remote_post = nmi_glue/build/remote_post
platform_post = nmi_glue/build/platform_post
post_all = nmi_glue/build/post_all

platforms = x86_rh_9, x86_rh_8.0, x86_rh_7.2, sun4u_sol_5.9

prereqs = perl-5.8.5, tar-1.14, patch-2.5.4, m4-1.4.1, binutils-2.15, flex-2.5.4a, make-3.80, byacc-1.9, bison-1.25, gzip-1.2.4, gcc-2.95.3, coreutils-5.2.1

notify = condor-staff@cs.wisc.edu

priority = 1

3. Additional examples

You can do a CVS checkout of the glue scripts for NMI components by -


[$prompt] export CVSROOT=":ext:<username>@<cvs server>:/p/condor/repository/nmi"
[$prompt] export CVS_RSH="ssh"
[$prompt] export CVS_SERVER="/p/condor/public/bin/auth-cvs"
[$prompt] cvs co nwo/glue

Please replace the username with your login name and "cvs server" with the CSL machine name where you ran the "stashticket" command. You can find more details about this here