Author Topic: Need help understanding the proof of Ordinary Least Squares  (Read 891 times)

0 Members and 1 Guest are viewing this topic.

Offline engineheatTopic starter

  • Frequent Contributor
  • **
  • Posts: 267
  • Country: us
Need help understanding the proof of Ordinary Least Squares
« on: January 04, 2020, 10:06:57 pm »
Hi, In this page:
https://en.wikipedia.org/wiki/Proofs_involving_ordinary_least_squares

section " Derivation directly in terms of matrices",

they differentiated S(beta) with respect to the beta vector to get the normal equation.

I'm really having a hard time following that step since matrix calculus is not my strong suit.

Can someone explain in more detail? thanks
 

Offline GlennSprigg

  • Super Contributor
  • ***
  • Posts: 1259
  • Country: au
  • Medically retired Tech. Old School / re-learning !
Re: Need help understanding the proof of Ordinary Least Squares
« Reply #1 on: January 05, 2020, 02:00:24 pm »
Sorry mate... I had a look at your link like a lot of the 65 other readers did...
And had a brain-fart ! sorry...  Given time, we may understand!
Diagonal of 1x1 square = Root-2. Ok.
Diagonal of 1x1x1 cube = Root-3 !!!  Beautiful !!
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
Re: Need help understanding the proof of Ordinary Least Squares
« Reply #2 on: January 05, 2020, 05:57:37 pm »
Okay, I'll have a go.  I'll use bold for matrices (all except the function to be minimized!), and \$\mathbf{b}\$ instead of \$\mathbf{\beta}\$ for clarity (as \$\mathbf{\beta}\$ does not look boldfaced).

The objective is to minimize
$$S(\mathbf{b}) = \mathbf{y}^T \mathbf{y} - 2 \mathbf{b}^T \mathbf{X}^T \mathbf{y} + \mathbf{b}^T \mathbf{X}^T \mathbf{X} \mathbf{b}$$
where \$\mathbf{b}\$ and \$\mathbf{y}\$ are one-column \$N\$-row matrices (i.e., column vectors) and \$\mathbf{X}\$ is an \$N \times N\$ square matrix. The \${}^T\$ indicates matrix transpose.

Expanding \$S(\mathbf{b})\$ we get
$$S(\mathbf{b}) = \left( \sum_i y_i^2 \right) - 2 \left( \sum_j \sum_i b_i x_{j i} y_j \right) + \left( \sum_j \sum_i \sum_k b_i x_{k i} x_{j k} b_j \right )$$
You can verify this easily by expanding the expressions for \$N=3\$ and \$N=4\$.

Although the argument \$\mathbf{b}\$ is a vector, the value of the function \$S(\mathbf{b})\$ is a scalar.  As usual, it reaches an extremum (minimum or maximum) whenever its derivative is zero.

The derivative of scalar \$S\$ by vector \$\mathbf{b}\$ is a vector.  Each component of the derivative is the partial derivative of the original scalar with respect to the corresponding vector component.  Partial derivative is calculated by treating that particular component as a variable, and all other components constants.

The partial differentials are
$$\frac{\partial S}{\partial b_i} = \left( 0 \right) - 2 \left( \sum_j x_{j i} y_j  \right) + \left( 2 \sum_k x_{i k} x_{k i} b_i \right)$$
(Again, \$b_i\$ is the variable, and all other \$b_j, j \ne i\$ are considered constants.)

At the extremum, all \$N\$ partial differentials are zero:
$$- 2 \left( \sum_j x_{i j} y_j \right) + \left( 2 \sum_k x_{i k} x_{k i} b_i \right) = 0$$
We can obviously halve each side, getting
$$- \left( \sum_j x_{i j} y_j \right) + \left( \sum_k x_{i k} x_{k i} b_i  \right) = 0$$
Do realize that this is \$N\$ equations.

We can trivially write this in matrix form,
$$- \mathbf{X}^T \mathbf{y} + (\mathbf{X}^T \mathbf{X}) \mathbf{b} = \mathbf{0}$$
where one must realize that the right side is a zero matrix, not just a scalar zero.
« Last Edit: January 06, 2020, 03:49:41 am by Nominal Animal »
 
The following users thanked this post: RoGeorge

Offline German_EE

  • Super Contributor
  • ***
  • Posts: 2399
  • Country: de
Re: Need help understanding the proof of Ordinary Least Squares
« Reply #3 on: January 05, 2020, 07:47:07 pm »
:scared: :scared: :scared: :scared: :scared: :scared: :scared: :scared:

And in other news, we have more pictures of cats:

https://www.eevblog.com/forum/chat/post-a-picture-of-a-cat!/575/
Should you find yourself in a chronically leaking boat, energy devoted to changing vessels is likely to be more productive than energy devoted to patching leaks.

Warren Buffett
 

Offline engineheatTopic starter

  • Frequent Contributor
  • **
  • Posts: 267
  • Country: us
Re: Need help understanding the proof of Ordinary Least Squares
« Reply #4 on: January 05, 2020, 09:43:47 pm »
Okay, I'll have a go.  I'll use bold for matrices (all except the function to be minimized!), and \$\mathbf{b}\$ instead of \$\mathbf{\beta}\$ for clarity (as \$\mathbf{\beta}\$ does not look boldfaced).

The objective is to minimize
$$S(\mathbf{b}) = \mathbf{y}^T \mathbf{y} - 2 \mathbf{b}^T \mathbf{X}^T \mathbf{y} + \mathbf{b}^T \mathbf{X}^T \mathbf{X} \mathbf{b}$$
where \$\mathbf{b}\$ and \$\mathbf{y}\$ are one-column \$N\$-row matrices (i.e., column vectors) and \$\mathbf{X}\$ is an \$N \times N\$ square matrix. The \${}^T\$ indicates matrix transpose.

Expanding \$S(\mathbf{b})\$ we get
$$S(\mathbf{b}) = \left( \sum_i y_i^2 \right) - 2 \left( \sum_j \sum_i b_i x_{j i} y_j \right) + \left( \sum_j \sum_i \sum_k b_i x_{k i} x_{j k} b_j \right )$$
You can verify this easily by expanding the expressions for \$N=3\$ and \$N=4\$.

Although the argument \$\mathbf{b}\$ is a vector, the value of the function \$S(\mathbf{b})\$ is a scalar.  As usual, it reaches an extremum (minimum or maximum) whenever its derivative is zero.

The derivative of scalar \$S\$ by vector \$\mathbf{b}\$ is a vector.  Each component of the derivative is the partial derivative of the original scalar with respect to the corresponding vector component.  Partial derivative is calculated by treating that particular component as a variable, and all other components constants.

The partial differentials are
$$\frac{\partial S}{\partial b_i} = \left( 0 \right) - 2 \left( \sum_j x_{j i} y_j  \right) + \left( 2 \sum_k x_{i k} x_{k i} b_i \right)$$
(Again, \$b_i\$ is the variable, and all other \$b_j, j \ne i\$ are considered constants.)

At the extremum, all \$N\$ partial differentials are zero:
$$- 2 \left( \sum_j x_{i j} y_j \right) + \left( 2 \sum_k x_{i k} x_{k i} b_i \right) = 0$$
We can obviously halve each side, getting
$$- \left( \sum_j x_{i j} y_j \right) + \left( \sum_k x_{i k} x_{k i} b_i  \right) = 0$$
Do realize that this is \$N\$ equations.

We can trivially write this in matrix form,
$$- \mathbf{X}^t \mathbf{y} + (\mathbf{X}^T \mathbf{X}) \mathbf{b} = \mathbf{0}$$
where one must realize that the right side is a zero matrix, not just a scalar zero.

Wow, thanks.
I printed this out and will digest it slowly...
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
Re: Need help understanding the proof of Ordinary Least Squares
« Reply #5 on: January 06, 2020, 04:01:35 am »
I printed this out and will digest it slowly...
Let me know if you have any questions.  (My edit after your response only fixed an insignificant typo, changing the \${}^t\$ to \${}^T\$.)

I used \$X_{j i}\$ for the component of \$X\$ on row \$j\$, column \$i\$, as usual; for the vectors, the only subscript indicates the component; both numbered from \$1\$ to \$N\$, inclusive.

I'm not a mathematician, so the way I usually do matrix calculus is by expanding it to scalar expressions with subscript indexes to denote the components; essentially, I treat the vector/matrix expressions as a set of equations on scalars, because those are so much easier to work with (for me!).  After the operations, the set of equations can almost always be expressed in concise vector/matrix form.  Those who do this sort of thing often enough know (or memorize, or have references listing the most common ones), so they can skip the entire split to scalar equations, operate, then combine back to vector/matrix form -steps.

It is extremely common for mathematicians to skip these steps, probably because they see it as "mechanical work", but to me, for the "math is a tool for solving problems" aspect, the way to do such steps is much more important than any single particular result (which the mathematicians seem so very excited about, usually).
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf