Receiver Operating Characteristic (ROC) Curve

Purpose

Understanding ROC Curve

ROC

ROC stands for Receiver Operating Characteristic curve. This name has emerged from the domain of electrical engineering around the 2nd World War when electrical and radar engineers used such curve to detect enemy planes. Since then, this concept has found its application in many fields, machine learning being the latest one.

Sensitivity

True Positive Rate

$\frac{\text{True Positive}}{\text{True Positive + False Negative}}$

Specificity

True Negative Rate

$\frac{\text{True Negative}}{\text{True Negative + False Positive}}$

There is a trade-off between Specificity and Sensitivity

All Labels	Sensitivity	Specificity
Positive	100%	0%
Negative	0%	100%

ROC Curve

A plot between the TPR (True Positive Rate) and the FPR (False Positive Rate).
- TPR is Sensitivity
- FPR is 1 - Specificity
ROC curve shows the tradeoff between sensitivity and specificity
- any increase in sensitivity will be accompanied by a decrease in specificity
100% sensitivity means the model has 0% false negative rate.
- TP/(TP + FN) = 1. It means that FN = 0
Sensitivity and specificity are inversely proportional to each other.
- This means that as one goes up, the other goes down.
ROC Curve shows the test accuracy; the closer the graph is to the top and left-hand borders, the more accurate the test.

Question

100 people are tested for the disease. 15 people have the disease; 85 people are not diseased. 10 of them are diagnosed with the disease when they actually have the disease, and 40 of them are diagnosed with the disease when they don't have any disease.

	Actual Positive	Actual Negative	Total
Predicted Positive	10	40	50
Predicted Negative	5	45	50
Total	15	85	100

True Negatives: 45
Sensitivity = TP/(TP + FN) = 10/(10+5) = 10/15 = 2/3 = 66.67%
Positive Predicted Value = TP/(TP+FP) = 10/(10+40) = 1/5 = 20%

ROC Curve - Python

from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt 
%matplotlib inline
logit_roc_auc = roc_auc_score(y_test, logreg.predict(X_test))
fpr, tpr, thresholds = roc_curve(y_test, logreg.predict_proba(X_test)[:,1])
plt.figure()
plt.plot(fpr, tpr, label='Logistic Regression (area = %0.2f)' % logit_roc_auc)
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic')
plt.legend(loc="lower right")
plt.savefig('Log_ROC')
plt.show()

Tagged in python machine-learning predictive-analysis

Prev Next