The Sample Covariance Matrix

Defines the sample covariance between two variables and assembles the sample covariance matrix S for two or more variables, including its symmetry and the identification of its diagonal as sample variances.

Step 1 of 119%

Tutorial

Sample Covariance Between Two Variables

The sample variance measures how a single variable spreads around its own mean. When we record two variables on the same subjects, the analogous quantity measuring how they vary together is the sample covariance.

Given paired observations (x1,y1),(x2,y2),,(xn,yn)(x_1,y_1),(x_2,y_2),\dots,(x_n,y_n) with sample means xˉ\bar{x} and yˉ\bar{y}, the sample covariance between xx and yy is

sxy=1n1i=1n(xixˉ)(yiyˉ).s_{xy} = \dfrac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y}).

This formula mirrors the sample variance — the sample variance is just the covariance of a variable with itself:

sx2=1n1i=1n(xixˉ)2=sxx.s_x^2 = \dfrac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})^2 = s_{xx}.

The sign of sxys_{xy} indicates the direction of linear association:

  • sxy>0s_{xy}>0: when xix_i is above xˉ\bar{x}, yiy_i tends to be above yˉ\bar{y} as well.
  • sxy<0s_{xy}<0: when one variable is above its mean, the other tends to be below.
  • sxy=0s_{xy}=0: no linear association is detected in the sample.

To illustrate, consider the three paired observations (1,2),(3,6),(5,4)(1,2),(3,6),(5,4). We have xˉ=3\bar{x}=3 and yˉ=4\bar{y}=4, so

sxy=12[(2)(2)+(0)(2)+(2)(0)]=42=2.s_{xy}=\dfrac{1}{2}\Big[(-2)(-2)+(0)(2)+(2)(0)\Big]=\dfrac{4}{2}=2.
navigate · Enter open · Esc close · ⌘K/Ctrl K toggle