Photo by Stephan Henning on Unsplash

Linear Regressions

Introduction

Key concepts

Dependent Variable

Independent Variable

Covariance

  • $\sigma_{XY}$ = Covariance between $X$ and $Y$
  • $x_i$ = $i^{th}$ element of $X$
  • $y_i$ = $i^{th}$ element of $Y$
  • $n$ = number of data points ($X$ and $Y$ must have the same number of data points)
  • $\mu_x$ = mean of the independent variable $X$
  • $\mu_y$ = mean of the dependent variable $Y$
  • is positive, the variables are positively related.
  • is negative, they are negatively related.
  • is null, there is no linear relationship between those variables.

Correlation

  • $r$ = Pearson Correlation Coefficient
  • $x_i$ = $i^{th}$ element of $X$
  • $y_i$ = $i^{th}$ element of $Y$
  • $n$ = number of data points ($X$ and $Y$ must have the same number of data points)
  • $\mu_x$ = mean of the independent variable $X$
  • $\mu_y$ = mean of the dependent variable $Y$

Coefficient of Determination

  • $\sum_i(y_i — \hat y_i)²$ is the residual sum of squared errors. It is the squared difference between $y$ and $\hat y$. The model does not explain this part of the error.
  • $\sum_i(y_i — \overline y_i)²$ is the total sum of squared error. It is the squared difference between $y$ and $\overline y$.

Assumptions

  • The variables have a linear relationship. A scatter plot of the data will quickly tell you if this is the case.
  • Residuals are normally distributed. A histogram or a Q-Q plot of the residuals will be able to tell you more about this.
  • Homoscedasticity of data. The residuals have a constant variance. A scatter plot of the residuals will reveal if this assumption holds.

Final notes

--

--

Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store