Classification Evaluation Metrics: The Ultimate Guide to Accurate Predictions.
This blog is completely dedicated to the crucial metrics used in classification problems. You might have come across problem statements where we have to use metrics other than the well known ‘accuracy’ score. Let us try to understand confusion matrix, accuracy, recall, precision, F1 Score, ROC- AUC curve and their usage.
Confusion Matrix :
We need a Confusion Matrix for classification evaluation because it provides a clear picture of how well a model is performing by showing not just the overall accuracy, but also where the model is making specific errors. It helps us understand:
- True Successes: How many predictions were correct.
- Errors: How many times the model predicted incorrectly, broken down into types of errors (false positives and false negatives).
This detailed view helps in diagnosing issues with the model and improving its performance.
Follow Below Link for Better Understanding of Confusion Matrix .
- True Positive (TP): Number of correctly identified positive class instances
- False Positive (FP): Number of negative class instances wrongly identified as positive class instances
- True Negative (TN): Number of correctly identified negative class instances
- False Negative (FN): Number of positive class instances wrongly identified as negative class instances
CLASSIFICATION EVALUATION MATRICES
1. Accuracy
Define:
Accuracy is the ratio of correctly predicted instances (both true positives and true negatives) to the total instances. It provides a simple way to evaluate the performance of a classification model.
It is Suitable for Balanced Data.
Formula:
Where:
- TP = True Positives
- TN = True Negatives
- FP = False Positives
- FN = False Negatives
Advantage:
- Easy to understand and calculate.
- Useful when the classes are balanced (i.e., the number of instances in each class is roughly equal).
Disadvantage:
- Misleading for imbalanced datasets, where one class dominates the other.
- Accuracy doesn't capture the FP&FN(i.e. cost function)
When to Use:
- When you have a balanced dataset or when class distribution is not skewed.
Desirable Value:
- More Accuracy Better Model
Python Implementation:
from sklearn.metrics import accuracy_score
# Example
y_true = [0, 1, 1, 0, 1, 1, 0, 0, 1, 0]
y_pred = [0, 1, 1, 0, 0, 1, 0, 0, 1, 1]
accuracy = accuracy_score(y_true, y_pred)
print(f'Accuracy: {accuracy}')
Output
Accuracy: 0.8
2. Precision
Define:
Precision is the ratio of true positive predictions to the total positive predictions made by the model. It measures how many of the predicted positive instances are actually positive.
It is Suitable for False Positive Data.
Formula:
Advantage:
- Useful in scenarios where the cost of false positives is high (e.g., spam detection)
Disadvantage:
- Can be misleading if not considered alongside recall, especially in cases of imbalanced datasets.
When to Use:
- When you want to minimize the number of false positives, such as in medical diagnostics or fraud detection.
Desirable Value:
- More Precision Better Model
Python Implementation:
from sklearn.metrics import precision_score
# Example
precision = precision_score(y_true, y_pred)
print(f'Precision: {precision}')
Output
Precision: 0.8
3. Recall
Define:
Recall (also known as sensitivity or true positive rate) is the ratio of true positive predictions to the total actual positives. It measures how many of the actual positive instances the model correctly identified.
It is Suitable for False Negative.
Formula:
Advantage:
- Useful in scenarios where the cost of false negatives is high (e.g., detecting cancer).
Disadvantage:
- Can lead to high false positives if not balanced with precision.
When to Use:
- When missing a positive instance has a higher cost than incorrectly predicting a positive instance, such as in medical testing or security screening.
Desirable Value:
- More Recall Better Model
Python Implementation:
from sklearn.metrics import recall_score
# Example
recall = recall_score(y_true, y_pred)
print(f'Recall: {recall}')
Output
Recall: 0.8
4. F1 Score
Define:
The F1 Score is the harmonic mean of precision and recall. It provides a single metric that balances both precision and recall, especially useful in cases of imbalanced datasets.
It is suitable for Imbalanced Data.
Formula:
Advantage:
- Balances the trade-off between precision and recall.
- Useful when the classes are imbalanced and both precision and recall are important.
Disadvantage:
- Less interpretable compared to individual precision and recall scores.
When to Use:
- When you need a balance between precision and recall, especially in cases of imbalanced datasets.
Desirable Value:
- More F1 Score Better Model
Python Implementation:
from sklearn.metrics import f1_score
# Example
f1 = f1_score(y_true, y_pred)
print(f'F1 Score: {f1}')
Output
F1 Score: 0.8
5. AUC-ROC Curve
Define:
The Area Under the Receiver Operating Characteristic Curve (AUC-ROC) measures the model’s ability to distinguish between classes.
The ROC curve plots the true positive rate (recall) against the false positive rate.
The AUC represents the area under this curve.
Used on Threshold Selection.
Formula:
There’s no direct formula for AUC-ROC, but the ROC curve is plotted as:
- x-axis: False Positive Rate (FPR)
- y-axis: True Positive Rate (TPR)
Advantage:
- Provides a comprehensive view of model performance across all classification thresholds.
- Useful for comparing different models.
Disadvantage:
- Can be less intuitive to interpret.
- A higher AUC-ROC doesn’t always translate to better real-world performance, especially with imbalanced datasets.
When to Use:
- When you want to evaluate the model’s ability to separate classes and compare the performance of different models.
Desirable Value:
- More AUC-ROC Curve Value Better Model
Python Implementation
from sklearn.metrics import roc_auc_score, roc_curve
import matplotlib.pyplot as plt
# Example
y_true = [0, 1, 1, 0, 1, 1, 0, 0, 1, 0]
y_scores = [0.1, 0.4, 0.35, 0.8, 0.7, 0.6, 0.2, 0.3, 0.9, 0.5]
auc = roc_auc_score(y_true, y_scores)
fpr, tpr, _ = roc_curve(y_true, y_scores)
plt.plot(fpr, tpr, label=f'AUC = {auc:.2f}')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc="lower right")
plt.show()
Output
AUC = 0.72
How to Select Optimal Threshold Value :
Based upon the cost, on increasing or decreasing threshold we reduce the FP and FN
Steps:
- Using This Python Command Create Data Frame with features like : FPR, TPR, Threshold, Distance
df['distance'] = (df['fpr']**2 + (1-df['tpr'])**2)**0.5
- Now Choose Threshold which Distance is minimum compare to Other like in Our Above Example. (0.4) is Optimal Threshold Value
Summary
Enjoyed this article?
If you found this post helpful and insightful, please take a moment to like it. Your feedback helps me continue creating content that matters to you.
I’d love to hear your thoughts and questions — leave a comment below and let’s start a conversation!
For more articles on Data Science, follow my Medium page to stay updated with the latest content and updates. Your support means a lot!
Thank you for reading, and I look forward to connecting with you!