# Understanding Key Metrics for Evaluating Regression Models

Written on

## Chapter 1: Introduction to Regression Metrics

In the realm of machine learning, regression models are pivotal for forecasting continuous values. These models learn from a set of input features (X) to predict a target variable (y). A common example of a continuous target is the price of a house. The most straightforward regression technique is linear regression, particularly simple linear regression, which involves a single feature and a target variable, expressed through a specific equation.

In a basic scenario, consider predicting the miles per gallon (MPG) of a car based on its horsepower. In this case:

- Yi denotes the MPG (target variable).
- Xi indicates the horsepower (feature).
- ?1 is the coefficient applied to the horsepower.
- ?0 is the y-intercept of the regression line.
- ?i represents the error in the model, reflecting variations in the estimated relationship.

Using the Seaborn library in Python, we can visualize a simple linear regression for such a problem.

The regression line in the plot illustrates predicted values. Notably, some data points deviate from this line, indicating model error. The primary objective of linear regression is to minimize this error by optimizing the values of ?0 and ?1.

As with any machine learning task, various metrics exist to evaluate a model's performance, particularly how effectively it reduces overall error. Below, I will discuss three vital metrics employed to assess regression models.

### Section 1.1: R-squared (R²)

R-squared, or the coefficient of determination, gauges the closeness of observed values to the fitted regression line. It reflects the extent of variation in the dependent variable explained by the model. Observing the previous graph, we identify the regression line, yet many actual data points do not align perfectly with it; the discrepancies are termed residuals.

To compute R-squared, we derive the residual for each data point, square those values, and sum them to measure unexplained variance. Its formula is typically represented as follows:

R-squared values range from 0 to 1, where a value of 1 indicates a model that can account for all variance in the dependent variable. The following images compare regression lines for different independent variables. For instance, the R-squared value for weight vs. MPG is 0.69, meaning the model can explain 69% of the variance, while the value for acceleration is considerably lower at 0.18, explaining only 18%.

R-squared should not be the sole measure for model evaluation, as it does not indicate bias, leading to high values in biased models. Thus, it is essential to consider additional performance metrics.

### Section 1.2: Root Mean Squared Error (RMSE)

Residuals serve as a means to quantify error in a regression model. RMSE evaluates the dispersion of these residuals, providing a standardized method to gauge overall error in the model.

To calculate RMSE, one first determines the mean squared error (MSE) by squaring the residuals, summing them, and dividing by the number of observations minus 2. RMSE is then the square root of this value.

A lower RMSE indicates a better fit to the data, making it a preferred metric since it shares the same units as the dependent variable. However, RMSE can disproportionately penalize larger errors, making it particularly useful when significant errors are undesirable.

### Section 1.3: Mean Absolute Error (MAE)

MAE measures the average deviation of predicted values from observed ones. Like RMSE, its units match those of the dependent variable, but it treats errors linearly, meaning the MAE increases consistently with greater errors.

Calculating MAE involves summing the absolute values of the residuals and dividing by the total number of observations.

As with other metrics, there isn't a universally accepted threshold for MAE. Therefore, comparisons to baseline models are crucial.

In summary, selecting the appropriate metric for evaluating regression models depends on various factors, including the training data and the model's intended application. It is common practice to employ multiple metrics for a comprehensive performance assessment against benchmarks.

## Chapter 2: Additional Resources

In the video "Episode 6: Simple and Basic Evaluation Metrics For Regression," viewers can explore fundamental evaluation metrics used in regression, presented in an easy-to-understand format.

The video titled "Machine Learning Regression Models Metrics" further elaborates on various metrics applicable to regression models, enriching the viewer's knowledge on this essential topic.

Thank you for reading!