PFS Technical Manual
Version 0.9, 8 October 2002
Front Matter
PFS is Copyright (C) 2001-2002 Douglas Thain.
This program is released under the GNU General Public License.
See the file COPYING for details.
This manual may be out of date. Please check the PFS Web Page for the most recent version.
Table of Contents
Overview
The Pluggable File System (PFS) is a tool for attaching old programs to new storage systems. PFS makes a user-level storage system appear as a file system to a legacy application. PFS does not require any special privileges, any recompiling, or any change whatsoever to existing programs. It can be used by normal users doing normal tasks. For example, an HTTP service is presented to a vi like so:
% vi /http/www.cs.wisc.edu/index.html
It's that simple.
PFS is useful to developers of distributed systems, because it allows rapid deployment of new code to real applications and real users that do not have the time, skill, or permissions to build a kernel-level filesystem.
PFS is useful to users of distributed systems, because it frees them from two unpleasant options: 1 - rewriting code to work with new systems; 2 - relying on remote administrator to trust and install new software.
PFS currently supports the following distributed systems:
We are continually adding support for new protocols. If you would like to contribute a protocol module, we would be happy to accept it!
Installation
Supported Platforms
PFS ought to run on most any POSIX-compliant system. We use it daily on Red Hat Linux 7.2 machines. It has been tested to a lesser extend on Solaris 2.8 machines. It will likely compile with small changes on other platforms. Your mileage may vary.
Installation
First, download a PFS distribution from the PFS home page. You may download either a source package or a binary distribution. Please skip to the relevant section.
Source Installation
Building PFS from source is quite complicated.
It is necessarily complicated because PFS is multiplexer
between many different kinds of systems.
You probably don't want to do this.
However, if you really must proceed:
- Download and install dttools
- Download and install Bypass
- Install any of optional systems you want PFS to 'mate' with. Each may be included or excluded as you see fit. If you have no external packages, then only HTTP support will be included.
- Kangaroo
- FTP-Lite
- Globus
- NeST
- Unpack the PFS tarball in a scratch directory:
% gunzip pfs.tar.gz
% tar xvf pfs.tar
- Now, you must run a rather complicated 'configure' step. For each of the packages you have installed, give a --with-XXX-path option, something like this:
% cd pfs
% ./configure --prefix /home/fred/pfs --with-bypass-path /home/fred/bypass -with-globus-path /usr/local/globus ...
- Finally, build and install:
% make
% make install
At long last, you're done! Please skip to "Runtime Setup."
Binary Installation
Simply unpack the tarball in any directory that you like.
% gunzip pfs-xxx-yyy.tar.gz
% tar xvf pfs-xxx-yyy.tar
That's all! Go on to "Runtime Setup."
Runtime Setup
Finally, you must set some environment variables to point your shell to PFS.
If you are using a C-like shell:
setenv PFS_INSTALL_DIR /home/fred/pfs
setenv PATH ${PFS_INSTALL_DIR}/bin:${PATH}
Special Considerations
|
Note to IRIX users:
|
|
PFS has trouble with some of the IRIX utilities. (See why below.) You'll have better luck with the GNU utilities. Put /usr/gnu/bin first in your path before trying this section.
|
Examples
To load PFS, you simply use the pfsrun command followed by any other UNIX program. For example, to load PFS for use with vi, run this command:
% pfsrun vi
Now, you may make use of PFS services as normal pathnames. Within vi, you can load a web page with the following commmand:
:r /http/www.yahoo.com/index.html
Of course, it can be clumsy to put pfsrun before every command you run, so try starting a shell with PFS already loaded:
% pfsrun tcsh
Now, you should be able to run any standard command using PFS filenames. Here are some examples to get you thinking:
% vi /http/www.cs.wisc.edu/condor
% cp /http/www.cnn.com /tmp/cnn
% grep Google /http/www.google.com
We have used HTTP for the examples, as it is the only service that we know everybody has. Of course, you may use all of the other supported systems in a similar manner. For example, to copy a file from an anonymous FTP server:
% cp /anonftp/ftp.cs.wisc.edu/RoadMap .
If you want to login under a particular username, use the /ftp server, and you will be prompted for a name and a password:
% vi /ftp/gatekeeper.dec.com/README.ftp
gatekeeper.dec.com login:
password:
The FTP protocol sends names and passwords in the clear, so a more security-conscious approach is to the use GSIFTP protocol developed by the Globus project. You must have the Globus tools and an appropriate certificate in order to use this module.
% grid-proxy-init
% cp /gsiftp/some.where.com/README .
The variants of the FTP module all have good support for directories, so you may use tools like ls and the tab-completion feature of the shell:
% ls -l /gsiftp/mss.ncsa.uiuc.edu/
% cat /anonftp/ftp.cs.wisc.edu/ [TAB]
FTP support is provided by the FTP-Lite module, which is distributed separately.
The Globus GASS module also makes use of the above protocols, but stages and caches whole files. Notice that the specific protocol must be used in addition to /gass.
% grep Globus /gass/http/www.globus.org/index.html
The NeST module is more suited towards fine-grained access:
% sort /nest/nest.cs.wisc.edu/~johnbent/hosts
Finally, Kangaroo specializes in hiding latencies on outputs:
gcc test.c -o /kangaroo/ftp.cs.wisc.edu/output.exe
Limitations
The major limitation of PFS is that it can only be applied to dynamically-linked programs. This incudes the large majority of standard and commercial applications. However, a few codes remain statically linked, such as tools in /sbin on IRIX machines. The ldd utility can be used to tell the difference between the two. If dynamic, ldd will list the libraries used by an application. If static, ldd will report "not a dynamic executable."
A more subtle limitation is due to the fact that most distributed systems do not support all of the possible file operations. For example, the HTTP module provides a bogus result for list operations. There is no protocol mechanism for getting a directory:
% ls -l /http/www.yahoo.com/index.html
-rwxrwxrwx 1 thain 23330 0 Dec 31 1969 /http/www.yahoo.com/index.html
% ls /http/www.yahoo.com/
/http/www.yahoo.com
PFS implements a large portion of the POSIX functionality.
We frequently use it in a batch setting to deal with scientific
applications. However, it certainly does not deal with every
last tricky corner of the POSIX interface. Please see the
bugs and surprises web page if you are having trouble or
would like to know more.
Mount Lists
Arbitrary stroage devices may be spliced into your view of the
filesystem by way of a mountlist. A mountlist is similar
to /etc/fstab in UNIX an descrbies how logical names may be
mapped to physical devices. A mountlist is simply a file with two
columns. The first column gives a logical directory or file name,
while the second gives the physical path that it must be connected
to.
For example, if a database is stored on a web server at the address
/http/www.cs.wisc.edu/db, you may splice it into the
filesystem under /database with a mount list like this:
/database /http/www.cs.wisc.edu/db
If the mountlist is stored in a file named mlist, invoke
PFS as follows:
% pfsrun -mountlist mlist
% cd /database
% sort data
Command Line Options
pfsrun has several command line options:
-debug Turn on debugging messages.
-debugfile <file> Send all debugging messages to this file.
-trace Turn on tracing messages. This produces a trace of all I/O operations performed by the application, in similar manner to strace(1) or truss(1).
-tracefile <file> Send all tracing messages to this file.
-mountlist <file> Use the given file as a mount list.
-clean <secs> Periodically reset memory maps at this interval.
-blocksize <bytes>Hint that applications should use this block size.
-help Show the known options.
Environment Variables
Several environment variables are available in addition to the command line options of pfsrun.
-
PFS_FTP_DEBUG
If set, then all FTP interactions will be sent to the standard error stream
in great detail.
-
PFS_xxx_BLOCK_SIZE
Varying the transfer block size can have a dramatic effect on system performance. This variable allows the user to control what block size PFS suggests that the standard I/O library use. For example, to recommend the block size to be used with Kangaroo, set PFS_KANGAROO_BLOCK_SIZE to the number of bytes. To give the same setting to all modules, use simply PFS_BLOCK_SIZE. If this variable is not set, a block size of 32768 is assumed.
-
PFS_HTTP_PROXY
If you requie a proxy server to access the web, set this variable
to the hostname of the proxy server.
-
PFS_HTTP_PROXY_PORT
If you requie a proxy server to access the web, set this variable
to the port number of the proxy server.