# Summary: Busted Assumptions

*The dataset we’ll be using for this tutorial is from Kaggle’s “House Prices: Advanced Regression Techniques” competition (LINK) as I am currently working on submitting my results. As this is a…*

The dataset we’ll be using for this tutorial is from Kaggle’s “House Prices: Advanced Regression Techniques” competition (LINK) as I am currently working on submitting my results. As this is a regression problem, where we are tasked with predicting house prices, we need to check if we have met all the major assumptions behind regression

In general, if you have violated any of these assumptions, then the results obtained from your model can be very misleading. Violations of some assumptions are much more serious than others but we should take great care to correctly process our data.

Prior to checking many of the assumptions behind regression we first have to fit a regression model (OLS in our case). This is because many assumption tests rely on calculated residuals or the error in our model. Residuals are nothing but the absolute distance between the actual and predicted values. It is also important to remember, the data I have fit to our regression model has been thoroughly processed including imputing missing values, removing outliers, deal with high and low cardinality, transformed for skew (both features and target), target encoded categorical features, and standardized using StandardScaler().

First, we add a constant to specify the initial intercept value. When fitting the model, our y_train value needs to be a 2D array instead of 1D. Finally, we fit our data and print the results. I wouldn’t get into explaining the results table.

Linearity: Linear regression assumes there is a linear relationship between the target and each independent variable or feature. This is one of the most important assumptions as violating this assumption means your model is trying to find a linear relationship in non-linear data. This will result in your model severely under-fitting your data. It is also important to check for outliers as they will have a significant effect on this assumption.

**Read the complete article at:** towardsdatascience.com