|   | 
            The ADvanced Systems Laboratory (ADSL)  | ||||||||||||||||
| 
 | Analysis of HDFS Under HBase: A Facebook Messages Case Study
      
Tyler Harter,
Dhruba Borthakur*,
Siying Dong*,
Amitanand Aiyer*,
Liyin Tang*,
Andrea C. Arpaci-Dusseau,
Remzi H. Arpaci-Dusseau  Abstract:
We present a multilayer study of the Facebook Messages stack,
  which is based on HBase and HDFS.  We collect and analyze HDFS
  traces to identify potential improvements, which we then evaluate
  via simulation.  Messages represents a new HDFS workload: whereas
  HDFS was built to store very large files and receive
  mostly-sequential I/O, 90% of files are smaller than 15MB and I/O
  is highly random.  We find hot data is too large to easily fit in
  RAM and cold data is too large to easily fit in flash; however, cost
  simulations show that adding a small flash tier improves performance
  more than equivalent spending on RAM or disks.  HBase's layered
  design offers simplicity, but at the cost of performance; our
  simulations show that network I/O can be halved if compaction
  bypasses the replication layer.  Finally, although Messages is
  read-dominated, several features of the stack (i.e., logging,
  compaction, replication, and caching) amplify write I/O, causing
  writes to dominate disk I/O.
 
Full Paper:
PDF,
BibTex
 | ||||||||||||||||