PFS Technical Manual

Version 0.9, 8 October 2002

Front Matter

PFS is Copyright (C) 2001-2002 Douglas Thain.

This program is released under the GNU General Public License. See the file COPYING for details.

This manual may be out of date. Please check the PFS Web Page for the most recent version.

Table of Contents

Overview

The Pluggable File System (PFS) is a tool for attaching old programs to new storage systems. PFS makes a user-level storage system appear as a file system to a legacy application. PFS does not require any special privileges, any recompiling, or any change whatsoever to existing programs. It can be used by normal users doing normal tasks. For example, an HTTP service is presented to a vi like so:
% vi /http/www.cs.wisc.edu/index.html

It's that simple.

PFS is useful to developers of distributed systems, because it allows rapid deployment of new code to real applications and real users that do not have the time, skill, or permissions to build a kernel-level filesystem.

PFS is useful to users of distributed systems, because it frees them from two unpleasant options: 1 - rewriting code to work with new systems; 2 - relying on remote administrator to trust and install new software.

PFS currently supports the following distributed systems:

We are continually adding support for new protocols. If you would like to contribute a protocol module, we would be happy to accept it!

Installation

Supported Platforms

PFS ought to run on most any POSIX-compliant system. We use it daily on Red Hat Linux 7.2 machines. It has been tested to a lesser extend on Solaris 2.8 machines. It will likely compile with small changes on other platforms. Your mileage may vary.

Installation

First, download a PFS distribution from the PFS home page. You may download either a source package or a binary distribution. Please skip to the relevant section.

Source Installation

Building PFS from source is quite complicated. It is necessarily complicated because PFS is multiplexer between many different kinds of systems. You probably don't want to do this. However, if you really must proceed:
  1. Download and install dttools
  2. Download and install Bypass
  3. Install any of optional systems you want PFS to 'mate' with. Each may be included or excluded as you see fit. If you have no external packages, then only HTTP support will be included.
  4. Kangaroo
  5. FTP-Lite
  6. Globus
  7. NeST
  8. Unpack the PFS tarball in a scratch directory:
    % gunzip pfs.tar.gz
    % tar xvf pfs.tar
    
  9. Now, you must run a rather complicated 'configure' step. For each of the packages you have installed, give a --with-XXX-path option, something like this:
    % cd pfs
    % ./configure --prefix /home/fred/pfs --with-bypass-path /home/fred/bypass -with-globus-path /usr/local/globus ...
    
  10. Finally, build and install:
    % make
    % make install
    
At long last, you're done! Please skip to "Runtime Setup."

Binary Installation

Simply unpack the tarball in any directory that you like.
% gunzip pfs-xxx-yyy.tar.gz
% tar xvf pfs-xxx-yyy.tar
That's all! Go on to "Runtime Setup."

Runtime Setup

Finally, you must set some environment variables to point your shell to PFS. If you are using a C-like shell:
setenv PFS_INSTALL_DIR /home/fred/pfs
setenv PATH ${PFS_INSTALL_DIR}/bin:${PATH}

Special Considerations

Note to IRIX users:
PFS has trouble with some of the IRIX utilities. (See why below.) You'll have better luck with the GNU utilities. Put /usr/gnu/bin first in your path before trying this section.

Examples

To load PFS, you simply use the pfsrun command followed by any other UNIX program. For example, to load PFS for use with vi, run this command:
% pfsrun vi
Now, you may make use of PFS services as normal pathnames. Within vi, you can load a web page with the following commmand:
:r /http/www.yahoo.com/index.html
Of course, it can be clumsy to put pfsrun before every command you run, so try starting a shell with PFS already loaded:
% pfsrun tcsh
Now, you should be able to run any standard command using PFS filenames. Here are some examples to get you thinking:
% vi /http/www.cs.wisc.edu/condor
% cp /http/www.cnn.com /tmp/cnn
% grep Google /http/www.google.com
We have used HTTP for the examples, as it is the only service that we know everybody has. Of course, you may use all of the other supported systems in a similar manner. For example, to copy a file from an anonymous FTP server:
% cp /anonftp/ftp.cs.wisc.edu/RoadMap .
If you want to login under a particular username, use the /ftp server, and you will be prompted for a name and a password:
% vi /ftp/gatekeeper.dec.com/README.ftp
gatekeeper.dec.com login: 
password:
The FTP protocol sends names and passwords in the clear, so a more security-conscious approach is to the use GSIFTP protocol developed by the Globus project. You must have the Globus tools and an appropriate certificate in order to use this module.
% grid-proxy-init
% cp /gsiftp/some.where.com/README .
The variants of the FTP module all have good support for directories, so you may use tools like ls and the tab-completion feature of the shell:
% ls -l /gsiftp/mss.ncsa.uiuc.edu/
% cat /anonftp/ftp.cs.wisc.edu/ [TAB]
FTP support is provided by the FTP-Lite module, which is distributed separately.

The Globus GASS module also makes use of the above protocols, but stages and caches whole files. Notice that the specific protocol must be used in addition to /gass.

% grep Globus /gass/http/www.globus.org/index.html
The NeST module is more suited towards fine-grained access:
% sort /nest/nest.cs.wisc.edu/~johnbent/hosts
Finally, Kangaroo specializes in hiding latencies on outputs:
gcc test.c -o /kangaroo/ftp.cs.wisc.edu/output.exe

Limitations

The major limitation of PFS is that it can only be applied to dynamically-linked programs. This incudes the large majority of standard and commercial applications. However, a few codes remain statically linked, such as tools in /sbin on IRIX machines. The ldd utility can be used to tell the difference between the two. If dynamic, ldd will list the libraries used by an application. If static, ldd will report "not a dynamic executable."

A more subtle limitation is due to the fact that most distributed systems do not support all of the possible file operations. For example, the HTTP module provides a bogus result for list operations. There is no protocol mechanism for getting a directory:

% ls -l /http/www.yahoo.com/index.html
-rwxrwxrwx    1 thain    23330           0 Dec 31  1969 /http/www.yahoo.com/index.html

% ls /http/www.yahoo.com/
/http/www.yahoo.com

PFS implements a large portion of the POSIX functionality. We frequently use it in a batch setting to deal with scientific applications. However, it certainly does not deal with every last tricky corner of the POSIX interface. Please see the bugs and surprises web page if you are having trouble or would like to know more.

Mount Lists

Arbitrary stroage devices may be spliced into your view of the filesystem by way of a mountlist. A mountlist is similar to /etc/fstab in UNIX an descrbies how logical names may be mapped to physical devices. A mountlist is simply a file with two columns. The first column gives a logical directory or file name, while the second gives the physical path that it must be connected to.

For example, if a database is stored on a web server at the address /http/www.cs.wisc.edu/db, you may splice it into the filesystem under /database with a mount list like this:

     /database       /http/www.cs.wisc.edu/db
If the mountlist is stored in a file named mlist, invoke PFS as follows:
% pfsrun -mountlist mlist
% cd /database
% sort data

Command Line Options

pfsrun has several command line options:
  • -debug Turn on debugging messages.
  • -debugfile <file> Send all debugging messages to this file.
  • -trace Turn on tracing messages. This produces a trace of all I/O operations performed by the application, in similar manner to strace(1) or truss(1).
  • -tracefile <file> Send all tracing messages to this file.
  • -mountlist <file> Use the given file as a mount list.
  • -clean <secs> Periodically reset memory maps at this interval.
  • -blocksize <bytes>Hint that applications should use this block size.
  • -help Show the known options.
  • Environment Variables

    Several environment variables are available in addition to the command line options of pfsrun.