The Alliance Grid Testbed
There's not much here yet.

The AGT call is Mondays at 2PM Central. The phone number is 888 677 9189, code 1742 (local number is 608 316 0022)

The mailing list archives are here: agt-discuss

Layer Zero - Physical Hardware

There are two types of Clusters: A 24-node version, and a 16-node version.

They live in 44U 4-post open racks. They're 77x19x30, I think.

A compute node is:

  • A Tyan S2722 Motherboard
  • 2 2.4Ghz P4 Xeon
  • 2 Gigs of RAM
  • 60 gig IDE drive
  • 460 W Power supply - An EMACS Model HP2-6460P

    Each compute node draws about 1 amp of power in a steady state, and the power-on spike that we measured was about 3 amps. We're planning on 7 compute nodes per 20-amp circuit.

    Some warnings:

  • The storage node, the key does not prevent you from removing a drive, but it does prevent you from putting the drive back in. The pins are difficult to straighten again

    Grid Layer 1: Cluster Stack

    You're free to run whatever you'd like on the nodes for software, so long as it does the following:

  • Is Linux
  • Has all compute node nodes addressable via the Internet

    Please leave a wide range of ports open

    Our recommended stack is OSCAR. The TeraGrid is building clusters that look like this:

  • SuSE SLES 8, RC4
  • x-cat 1.2.0 beta4
  • PBS, 2.3.16 with patches, xcat installation
  • Maui Scheduler, 3.2.5-p2, xcat installation
  • MPICH, 1.2.5 with builds for Intel and gcc compilers
  • GM 1.6.3 and MPICH-GM 1.2.5..10pre3
  • VMI 1.1 and MPICH-VMI 1.1.2
  • openSSH/SSL 3.1
  • GPT 2.2.9
  • GPT Wizard
  • Globus 2.2.4
  • gsi-openssh 2.1
  • gsi-ncftp 3.0.3
  • Condor-G (NMI 2.1 binary bundle) + gahhp_server 6.4.7 rebuild
  • Intel C/C++ and FORTRAN compilers, builds 7.0.086 and 7.0.87, respectively
  • gcc 3.2.1
  • softenv 1.4.2
  • Python with XML

    You'll want to install the head node and storage node first. The storage node should be NFS mounted on every machine, probably with two partitions:

  • /home, which is where all the users home directories should live
  • /apps, which should be say 20 or 30 gigs, and have space for shared application installs

    User Environment

    A user logging into the cluster should see environment variables like Teragrids. The Teragrid Shell Environment List should get us started, but this list will be in flux a lot between July and August of 2003

    Drivers

    Intel's Site for Linux Gig Ethernet Driver http://www .intel.com/support/network/adapter/1000/e1000.htm

    Direct link to driver source: ftp://aiedownload.intel.com/df-support/5599/eng/e1000-5.0.43.tar.gz
    It will even build a handy rpm for you, just run

    rpmbuild -tb e1000-5.0.43.tar.gz
    
    the rpm with be in /usr/src/redhat/RPMS/i386

    Grid Services

    This is Grid layer 2 - the Grid Interface layer.

    Joe Greenseid has put together a list of NCSA Grid Services The VDT provides most of this.

    You can take a view of the grid from here:

  • The AGT, as viewed from Kentucky
  • The AGT, as viewed from UW
  • The AGT, as viewed from NCSA
  • The AGT, as viewed from OSC
  • The AGT, as viewed from BU

    Transaction related

    This is Grid layer 3 - the Virtual Organization management softare. It manages two things:
  • Accounts and Projects creation
  • Usage reporting

    Accounts and Project creation

    Someday, we're going to have a glorious system to manage the Alliance Grid Testbed Virtual Organization.
    For today, we've got a sample gridmap file. Please send the output of
    grid-cert-info -subject
    
    if you'd like to be added.

    Usage reporting

    The config file and the script

    Documentation

    Transactions29.doc is the latest version of the documentation for the Transaction system.
    Account_Management_Data_Dictionary12.xls is list of what each option in the transaction system means.
    Account_Management_Overview.doc is the Account Management Overview
    NCSA_Usage_Data_Format.doc documents what each field in the usage reporting means.

    Public Keys

    NCSA has a public key that it will use to contact a site with a transaction. Here are the new UW keys authorized_keys and authorized_keys2

    Implementations

    ParseTransacationv0.4.tar.gz is the reference implementation of the transaction system. It handles all cases, but does not actually perform any actions.
    file_transfer.pl is a script to copy transaction files and their checksums between sites.
    AIRS is the New Mexico implementation of the Transaction system, along with accounting management system. Read their whitepaper for a complete introduction
    gibtool-release1.tar.gz is a server side of an implementation of the NCSA transaction system. No one actually needs this, unless they are curious.
    vmr-gib-release1.tar.gz is the client side of an implementation of the NCSA transaction system. Note it is extremely, extremely simple, and will only go as far as adding accounts to a machine.