Mastering Regression Evaluation Metrics: A Comprehensive Guide to Measuring Model Performance
This blog is all about making regression evaluation metrics easy to understand. We’ll break down MSE, MAD, RMSE, R² score, and adjusted R² score, helping you learn how to accurately judge your model’s performance and make meaningful improvements.
Overview
This article is structured to provide a comprehensive understanding of Regression Metric, covering the following key aspects:
- Metric Name
- Definition
- Formula
- Example
- Advantage
- Disadvantage
- Python Implementation
Regression Metrics
1. MAD Mean Absolute Deviation (MAD)
Definition
The Mean Absolute Deviation (MAD) measures the average magnitude of errors in a set of predictions, without considering their direction. It’s the average over the test sample of the absolute differences between prediction and actual observation.
Formula
Example
Advantages
- Easy to understand and calculate.
- Provides a clear interpretation in the same units as the data.
- Units will not changed.
Disadvantages
- Does not consider the magnitude of errors; only provides an average absolute deviation.
- It is not good evaluation metrics for optimization algorithms.
Python Implementation
import numpy as np
def mean_absolute_deviation(y_true, y_pred):
return np.mean(np.abs(y_true - y_pred))
# Example usage
y_true = np.array([3.0, 4.0, 5.0])
y_pred = np.array([2.5, 4.1, 5.2])
mad = mean_absolute_deviation(y_true, y_pred)
print("MAD:", mad)
Output
MAD: 0.2666666666666666
2. Mean Squared Error (MSE)
Definition
Mean Squared Error (MSE) is the average of the squares of the errors, where the error is the difference between the actual value and the predicted value.
It penalizes larger errors more than smaller ones.
Formula
Example
Advantages
- Penalizes larger errors more significantly.
- Useful for evaluating models with larger errors.
Disadvantages
- Sensitive to outliers; large errors have a disproportionate effect.
- Unit get squared
Python Implementation
from sklearn.metrics import mean_squared_error
y_true = [3.0, 4.0, 5.0]
y_pred = [2.5, 4.1, 5.2]
mse = mean_squared_error(y_true, y_pred)
print("MSE:", mse)
Output
MSE: 0.10
3.Root Mean Squared Error (RMSE)
Definition
Root Mean Squared Error (RMSE) is the square root of the mean of the squared errors.
It provides an estimate of the standard deviation of the prediction errors.
Formula
Example
Advantages
- Provides error magnitude in the same units as the original data.
- Useful for evaluating the precision of predictions.
- Differentiable for any values.
Disadvantages
- Sensitive to outliers and large errors.
Python Implementation
from sklearn.metrics import mean_squared_error
import numpy as np
mse = mean_squared_error(y_true, y_pred)
rmse = np.sqrt(mse)
print("RMSE:", rmse)
Output
RMSE: 0.316
4.R² Score
Definition
The R² score, or coefficient of determination, indicates how well the model’s predictions approximate the actual values.
It measures the proportion of the variance in the dependent variable that is predictable from the independent variables.
Formula
Example
Advantages
- Provides an overall measure of how well the model explains the variability of the data.
- Ranges from 0 to 1, where 1 indicates a perfect fit.
Disadvantages
- May be misleading if used with models that have many predictors or if the data is not linear.
Python Implementation
from sklearn.metrics import r2_score
y_true = [3.0, 4.0, 5.0]
y_pred = [2.5, 4.1, 5.2]
r2 = r2_score(y_true, y_pred)
print("R2 Score:", r2)
Output
R2 Score: 0.60
5. Adjusted R² Score
Definition
The Adjusted R² Score adjusts the R² score for the number of predictors in the model.
It provides a more accurate measure of model performance when multiple predictors are used.
Formula
Example
Assuming:
Advantages
- Provides a more accurate measure of goodness-of-fit for models with multiple predictors.
- Adjusts for the complexity of the model.
Disadvantages
- Can be complex to calculate manually for large datasets.
Python Implementation
from sklearn.metrics import r2_score
import numpy as np
def adjusted_r2_score(y_true, y_pred, n, p):
"""
Calculate the Adjusted R² Score.
Parameters:
y_true (list or array): Actual values
y_pred (list or array): Predicted values
n (int): Number of observations
p (int): Number of predictors
Returns:
float: Adjusted R² Score
"""
r2 = r2_score(y_true, y_pred)
adj_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)
return adj_r2
# Example data
y_true = [3.0, 4.0, 5.0]
y_pred = [2.5, 4.1, 5.2]
# Number of observations and predictors
n = len(y_true)
p = 1 # Assuming 1 predictor
# Calculate Adjusted R² Score
adj_r2 = adjusted_r2_score(y_true, y_pred, n, p)
print("Adjusted R² Score:", adj_r2)
Output
Adjusted R² Score: 0.6000000000000001
Desirable Value of all Regression Evaluation Metric for Better Model
— — — — — — — — — -Thank You! — — — — — — — — —
Thank you for taking the time to read my article. I hope you found it useful and informative. Your support means a lot, and I appreciate you joining me on this journey of exploration and learning. If you have any questions or feedback, feel free to reach out!
— — — — — — — — — Contact — — — — — — — — — — —
Linkdein -https://www.linkedin.com/in/md-tahseen-equbal-/
Github -https://github.com/Md-Tahseen-Equbal
Kaggle- https://www.kaggle.com/mdtahseenequbal