Pagerank Computation and Keyword Search on Distributed Systems and P2P Networks
| Sorted by Date | Classified by Publication Type | Classified by Project |
Karthikeyan Sankaralingam, Madhulika Yalamanchi, Simha Sethumadhavan, and James C. Browne. Pagerank Computation and Keyword Search on Distributed Systems and P2P Networks. Journal of Grid Computing, 1(3):291-307, 2003.
Download
Abstract
This paper presents a fully distributed computation for Google'spagerank algorithm. The computation is based on solution of the matrixequation defining pageranks by a distributed implementation ofasynchronous iteration. Pageranks for the documents stored on a webserver or on a host in a peer-to-peer network are computed in placeand stored with the documents. The matrix is never assembled and nocrawls of the web are required. Continuously accurate pageranks areenabled by incremental computation of pageranks for documents as theyare inserted onto a network storage host and incremental recomputationof pageranks when documents are deleted. Intrahost and intradomaindominance of document link structure is naturally exploited by thedistributed asynchronous iteration algorithm.Three implementations: (i) a simulation which was previously reported,(ii) an implementation of the algorithm in a peer-to-peercomputational system and (iii) an embedding of the computation in webservers, are described. Application of the three implementations tothree different workloads, two constructed following power law networkmodels for link distributions and one derived from the Governmentdocument database are reported. Convergence for computation of acomplete set of pageranks is rapid: 1% accuracy in 10 or fewermessages per document. Incremental computation of pageranks resultingfrom addition or deletion of documents also converges rapidly, usuallyrequiring 10 or fewer messages per document.Coupling locally stored pageranks with the documents in a peer-to-peernetwork dramatically diminishes the volume of data which must betransmitted to satisfy keyword searches in peer-to-peer networks. Theweb server implementation shows that the distributed algorithm can beused to enable web servers to compute pageranks for the documents theystore and thus potentially enable effective keyword searches for thedocuments stored on the web servers of intranets by utilizing unusedprocessing power of the web servers.
Additional Information
This is a test of the extra info broadcasting system.
BibTeX
@Article{grid2003, author = "Karthikeyan Sankaralingam and Madhulika Yalamanchi and Simha Sethumadhavan and James C. Browne", title = "{Pagerank Computation and Keyword Search on Distributed Systems and P2P Networks}", journal = "Journal of Grid Computing", volume = "1", number = "3", year = {2003}, pages = "291-307" abstract = { This paper presents a fully distributed computation for Google's pagerank algorithm. The computation is based on solution of the matrix equation defining pageranks by a distributed implementation of asynchronous iteration. Pageranks for the documents stored on a web server or on a host in a peer-to-peer network are computed in place and stored with the documents. The matrix is never assembled and no crawls of the web are required. Continuously accurate pageranks are enabled by incremental computation of pageranks for documents as they are inserted onto a network storage host and incremental recomputation of pageranks when documents are deleted. Intrahost and intradomain dominance of document link structure is naturally exploited by the distributed asynchronous iteration algorithm. Three implementations: (i) a simulation which was previously reported, (ii) an implementation of the algorithm in a peer-to-peer computational system and (iii) an embedding of the computation in web servers, are described. Application of the three implementations to three different workloads, two constructed following power law network models for link distributions and one derived from the Government document database are reported. Convergence for computation of a complete set of pageranks is rapid: 1\% accuracy in 10 or fewer messages per document. Incremental computation of pageranks resulting from addition or deletion of documents also converges rapidly, usually requiring 10 or fewer messages per document. Coupling locally stored pageranks with the documents in a peer-to-peer network dramatically diminishes the volume of data which must be transmitted to satisfy keyword searches in peer-to-peer networks. The web server implementation shows that the distributed algorithm can be used to enable web servers to compute pageranks for the documents they store and thus potentially enable effective keyword searches for the documents stored on the web servers of intranets by utilizing unused processing power of the web servers. }, bib_dl_pdf = "http://www.cs.wisc.edu/~karu/docs/papers/utexas/pagerank-journal-version.pdf", bib_dl="http://springerlink.metapress.com/content/x9841h68u4161x3v/?p=0019c14ae2c34132ab2bc615fcb16ad8&pi=4", bib_pubtype = {Journal}, bib_rescat = {proj-opinion}, bib_extra_info = {This is a test of the extra info broadcasting system.} }
Generated by bib.pl (written by Patrick Riley ) on Sun Sep 26, 2021 16:14:28 time=1207019082