Linear Regression

Computing the least-squares regression line from raw paired data and from summary statistics, and using the resulting line to make predictions.

Step 1 of 119%

Tutorial

The Least-Squares Regression Line

Given paired data (x1,y1),(x2,y2),,(xn,yn)(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n), the least-squares regression line is the line y^=b0+b1x\hat{y} = b_0 + b_1 x that minimizes the sum of the squared vertical distances from the data points to the line.

The slope b1b_1 and intercept b0b_0 are given by

b1=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2,b0=yˉb1xˉ,b_1 = \dfrac{\sum\limits_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum\limits_{i=1}^n (x_i - \bar{x})^2}, \qquad b_0 = \bar{y} - b_1 \bar{x},

where xˉ\bar{x} and yˉ\bar{y} are the sample means of the xx- and yy-values.

For instance, consider the three points (1,2),(2,2),(3,5)(1, 2), (2, 2), (3, 5). We have xˉ=2\bar{x} = 2 and yˉ=3\bar{y} = 3, so

b1=(1)(1)+(0)(1)+(1)(2)(1)2+02+12=32=1.5,b0=31.52=0.\begin{align*} b_1 &= \dfrac{(-1)(-1) + (0)(-1) + (1)(2)}{(-1)^2 + 0^2 + 1^2} = \dfrac{3}{2} = 1.5, \\[3pt] b_0 &= 3 - 1.5 \cdot 2 = 0. \end{align*}

The least-squares regression line is y^=1.5x.\hat{y} = 1.5\,x.

navigate · Enter open · Esc close · ⌘K/Ctrl K toggle