Building Data-Pipelines for High Performance Bulk Data Transfers in a Heterogeneous Grid Environment
Tevfik Kosar, George Kola, Miron Livny
The drastic increase in the data requirements of scientific applications combined with an increasing trend towards collaborative research has resulted in the need to transfer large amounts of data among the participating sites. The heterogeneous nature of the storage systems employed by the different sites makes transfer of data among them a difficult problem. The general tendency has been to either use simple scripts which require human intervention to deal with failures, or dump data to tapes and mail them. We introduce a method to build and operate data-pipelines between mass-storage systems lacking a common interface. This method can be applied easily and efficiently to transfer data between various mass storage systems. It does not need any human intervention during transfers, and it can recover automatically from various kinds of storage system, network, and software failures, guaranteeing completion of the transfers.
Download this report (PDF)
Return to tech report index