The Alliance Grid Testbed
There's not much here yet.
The AGT call is Mondays at 2PM Central. The phone number is 888 677 9189, code 1742 (local number is 608 316 0022)
The mailing list archives are here:
agt-discuss
Layer Zero - Physical Hardware
There are two types of Clusters: A 24-node version, and a 16-node
version.
They live in 44U 4-post open racks. They're 77x19x30, I think.
A compute node is:
A Tyan S2722 Motherboard
2 2.4Ghz P4 Xeon
2 Gigs of RAM
60 gig IDE drive
460 W Power supply - An EMACS Model HP2-6460P
Each compute node draws about 1 amp of power in a steady state, and the
power-on spike that we measured was about 3 amps. We're planning on 7
compute nodes per 20-amp circuit.
Some warnings:
The storage node, the key does not prevent you from removing a drive, but
it does prevent you from putting the drive back in. The pins are difficult
to straighten again
Grid Layer 1: Cluster Stack
You're free to run whatever you'd like on the nodes for software, so long as
it does the following:
Is Linux
Has all compute node nodes addressable via the Internet
Please leave a wide range of ports open
Our recommended stack is OSCAR. The TeraGrid is building clusters that look like this:
SuSE SLES 8, RC4
x-cat 1.2.0 beta4
PBS, 2.3.16 with patches, xcat installation
Maui Scheduler, 3.2.5-p2, xcat installation
MPICH, 1.2.5 with builds for Intel and gcc compilers
GM 1.6.3 and MPICH-GM 1.2.5..10pre3
VMI 1.1 and MPICH-VMI 1.1.2
openSSH/SSL 3.1
GPT 2.2.9
GPT Wizard
Globus 2.2.4
gsi-openssh 2.1
gsi-ncftp 3.0.3
Condor-G (NMI 2.1 binary bundle) + gahhp_server 6.4.7 rebuild
Intel C/C++ and FORTRAN compilers, builds 7.0.086 and 7.0.87, respectively
gcc 3.2.1
softenv 1.4.2
Python with XML
You'll want to install the head node and storage node first. The storage
node should be NFS mounted on every machine, probably with two partitions:
/home, which is where all the users home directories should live
/apps, which should be say 20 or 30 gigs, and have space for shared application installs
User Environment
A user logging into the cluster should see environment variables like Teragrids.
The Teragrid Shell Environment List
should get us started, but this list will be in flux a lot between July and
August of 2003
Drivers
Intel's Site for Linux Gig Ethernet Driver
http://www
.intel.com/support/network/adapter/1000/e1000.htm
Direct link to driver source:
ftp://aiedownload.intel.com/df-support/5599/eng/e1000-5.0.43.tar.gz
It will even build a handy rpm for you, just run
rpmbuild -tb e1000-5.0.43.tar.gz
the rpm with be in /usr/src/redhat/RPMS/i386
Grid Services
This is Grid layer 2 - the Grid Interface layer.
Joe Greenseid has put together a list of NCSA Grid Services
The VDT provides most of this.
You can take a view of the grid from here:
The AGT, as viewed from Kentucky
The AGT, as viewed from UW
The AGT, as viewed from NCSA
The AGT, as viewed from OSC
The AGT, as viewed from BU
Transaction related
This is Grid layer 3 - the Virtual Organization management softare. It manages
two things:
Accounts and Projects creation
Usage reporting
Accounts and Project creation
Someday, we're going to have a glorious system to manage the Alliance Grid Testbed Virtual Organization.
For today, we've got a sample gridmap file. Please send the output of
grid-cert-info -subject
if you'd like to be added.
Usage reporting
The config file and the script
Documentation
Transactions29.doc is the latest version
of the documentation for the Transaction system.
Account_Management_Data_Dictionary12.xls is list of what each option in the transaction system means.
Account_Management_Overview.doc is the Account Management Overview
NCSA_Usage_Data_Format.doc documents what each field in the usage reporting means.
Public Keys
NCSA has a public key that it will use to contact
a site with a transaction.
Here are the new UW keys authorized_keys and
authorized_keys2
Implementations
ParseTransacationv0.4.tar.gz is the reference implementation of the transaction system. It handles all cases, but does not actually perform any actions.
file_transfer.pl is a script to copy transaction
files and their checksums between sites.
AIRS is the New Mexico implementation
of the Transaction system, along with accounting management system. Read their
whitepaper for
a complete introduction
gibtool-release1.tar.gz is a server
side of an implementation of the NCSA transaction system. No one actually needs
this, unless they are curious.
vmr-gib-release1.tar.gz is the client
side of an implementation of the NCSA transaction system. Note it is extremely,
extremely simple, and will only go as far as adding accounts to a machine.