The Relationship Between Precision-Recall and ROC Curves
Jesse Davis, Mark Goadrich
Receiver Operator Characteristic (ROC) curves and Precision-Recall (PR) curves are commonly used to present results for binary decision problems in machine learning. When the class distribution is close to being uniform, ROC curves have many desirable properties. However, when dealing with a highly skewed dataset, PR curves give a more accurate picture of an algorithm's performance. We show that a deep connection exists between ROC space and PR space. We prove that a curve dominates in ROC space if and only if it dominates in PR space. An important corollary to this proof is the notion of an achievable PR curve, and we show an efficient algorithm for computing the achievable PR curve. While it cannot be called a convex hull, this curve has properties much like the convex hull in ROC space. Finally, we show that differences in the two types of curves are significant for algorithm design. For example, in PR space it is incorrect to linearly interpolate between point. Furthermore, an algorithm which optimizes the area under the ROC curve is not guaranteed to optimize the area under the PR curve.
Download this report (PDF)
Return to tech report index