# MATH 323 Lecture 21

« previous | Thursday, November 8, 2012 | next »

## Least Squares Problem

Given subspace ${\displaystyle S\subset \mathbb {R} }$ and a vector ${\displaystyle {\vec {v}}\not \in S}$, find the closest approximation ${\displaystyle {\vec {p}}\in S}$. ${\displaystyle {\vec {p}}}$ is a vector projection, and ${\displaystyle \|{\vec {p}}\|}$ is the α-scalar projection.

When represented by ${\displaystyle A{\vec {x}}={\vec {b}}}$, ${\displaystyle S=R(A)}$ is the column space of ${\displaystyle A}$, and ${\displaystyle {\hat {x}}}$ is the vector such that ${\displaystyle {\vec {p}}=A{\hat {x}}}$. We get ${\displaystyle r({\hat {x}})={\vec {b}}-{\vec {p}}={\vec {b}}-A{\hat {x}}}$ is the residual vector.

### Normal Equation

${\displaystyle A^{T}A{\hat {x}}=A^{T}{\vec {b}}}$

### Theorem 5.3.2

${\displaystyle A}$ is a ${\displaystyle m\times n}$ matrix of rank ${\displaystyle n}$ (same rank as number of columns). Then the normal equation ${\displaystyle A^{T}A{\vec {x}}={\vec {b}}}$ has a unique solution given by

${\displaystyle {\hat {x}}=(A^{T}A)^{-1}A^{T}{\vec {b}}}$

And ${\displaystyle {\hat {x}}}$ is the unique least squares solution to the sysetm ${\displaystyle A{\vec {x}}={\vec {b}}}$.

#### Proof

Based on premise that ${\displaystyle A^{T}A}$ is nonsingular.

Assume we have some vector ${\displaystyle A^{T}A{\vec {z}}={\vec {0}}}$. For ${\displaystyle A^{T}A}$ to be nonsingular, ${\displaystyle z={\vec {0}}}$ must be the only solution.

We know that

• ${\displaystyle A{\vec {z}}\in N(A^{T})}$
• ${\displaystyle A{\vec {z}}\in R(A)}$
• and ${\displaystyle R(A)\perp N(A^{T})}$

Therefore ${\displaystyle A{\vec {z}}\in (R(A)\cap N(A^{T}))=\{{\vec {0}}\}}$.

Q.E.D.

#### Corollary

We already know that ${\displaystyle {\vec {p}}=A{\hat {x}}}$, and ${\displaystyle {\hat {x}}=(A^{T}A)^{-1}A^{T}{\vec {b}}}$, so

${\displaystyle {\vec {p}}=\underbrace {A(A^{T}A)^{-1}A^{T}} _{P}{\vec {b}}}$

The projection matrix ${\displaystyle P}$ is interesting because ${\displaystyle P^{2}=P}$:

${\displaystyle P^{2}=A{\cancel {(A^{T}A)^{-1}}}{\cancel {A^{T}A}}(A^{T}A)^{-1}A^{T}=A(A^{T}A)^{-1}A^{T}}$

#### Example

Overdetermined system

{\displaystyle {\begin{aligned}x_{1}+x_{2}&=3\\-2x_{1}+3x_{2}&=1\\2x_{1}-x_{2}&=2\end{aligned}}}

{\displaystyle {\begin{aligned}A&={\begin{pmatrix}1&1\\-2&3\\2&-1\end{pmatrix}}\\A^{T}&={\begin{pmatrix}1&-2&2\\1&3&-1\end{pmatrix}}\\{\vec {b}}&={\begin{pmatrix}3\\1\\2\end{pmatrix}}\end{aligned}}}

${\displaystyle A^{T}A{\vec {x}}=A^{T}{\vec {b}}}$

${\displaystyle {\hat {x}}={\begin{pmatrix}{\frac {81}{50}}\\{\frac {71}{50}}\end{pmatrix}}}$

### Regression

Given a set of measurements ${\displaystyle y_{1},\ldots ,y_{n}}$ at points ${\displaystyle x_{1},\ldots ,x_{n}}$, each set of values defines a point at ${\displaystyle (x_{i},y_{i})}$.

#### Linear Regression

Find equation such that ${\displaystyle y=c_{0}+c_{1}x}$ that approximates the system of equations

${\displaystyle {\begin{bmatrix}1&x_{1}\\1&x_{2}\\\vdots &\vdots \\1&x_{n}\end{bmatrix}}{\begin{bmatrix}c_{0}\\c_{1}\end{bmatrix}}={\begin{bmatrix}y_{1}\\y_{2}\\\vdots \\y_{n}\end{bmatrix}}}$

Solution for ${\displaystyle A{\vec {c}}={\vec {y}}}$ is given by ${\displaystyle A^{T}A{\vec {c}}=A^{T}{\vec {y}}}$

## Inner Product Spaces

Vector space ${\displaystyle V}$. Suppose we have a function such that for all ${\displaystyle x,y\in V}$, the inner product of ${\displaystyle x,y}$ (notation ${\displaystyle \left\langle x,y\right\rangle }$) is a real number.

We want to have the following properties:

1. ${\displaystyle \left\langle x,x\right\rangle \geq 0}$ and is equal to 0 iff ${\displaystyle x=0}$
2. ${\displaystyle \left\langle x,y\right\rangle =\left\langle y,x\right\rangle }$ (commutativity)
3. ${\displaystyle \left\langle \alpha \,x+\beta \,y,z\right\rangle =\alpha \left\langle x,z\right\rangle +\beta \left\langle y,z\right\rangle }$ and the same applies for the second component.

### Example 1

${\displaystyle \mathbb {R} ^{n}}$, ${\displaystyle \left\langle {\vec {x}},{\vec {y}}\right\rangle ={\vec {x}}\cdot {\vec {y}}}$ is the scalar product.

Given ${\displaystyle {\vec {w}}}$ weights such that ${\displaystyle w_{i}\geq 0\forall w_{i}\in {\vec {w}}}$,

${\displaystyle \left\langle {\vec {x}},{\vec {y}}\right\rangle _{\vec {w}}=\sum _{i=1}^{n}w_{i}\,x_{i}\,y_{i}}$

### Example 2

Given two matrices ${\displaystyle A,B\in M_{m,n}(\mathbb {R} )}$, let ${\displaystyle \left\langle A,B\right\rangle =\sum _{i=1}^{m}\sum _{j=1}^{n}a_{ij}\,b_{ij}}$.

### Example 3

Given two functions ${\displaystyle f,g\in C[a,b]}$, let ${\displaystyle \left\langle f,g\right\rangle =\int _{a}^{b}f(x)\,g(x)\,\mathrm {d} x}$

1. ${\displaystyle \left\langle f,f\right\rangle =\int _{a}^{b}f^{2}(x)\,\mathrm {d} x\geq 0}$ and ${\displaystyle \left\langle f,f\right\rangle =\int _{a}^{b}f^{2}(x)\,\mathrm {d} x=0\iff f\equiv 0}$

### Properties

Given a vector space ${\displaystyle V}$ and an inner product function ${\displaystyle \left\langle \cdot ,\cdot \right\rangle }$

We can redefine:

• length / norm: ${\displaystyle \left\|V\right\|={\sqrt {\left\langle v,v\right\rangle }}}$
• Orthogonality: ${\displaystyle u\perp v\iff \left\langle u,v\right\rangle =0}$
• Scalar projection: ${\displaystyle \alpha ={\frac {\left\langle u,v\right\rangle }{\left\|v\right\|}}}$
• Vector projection: ${\displaystyle p=\alpha \cdot \left({\frac {1}{\left\|v\right\|}}\,v\right)={\frac {\left\langle u,v\right\rangle }{\left\langle v,v\right\rangle }}\,v}$

### Theorem 5.4.1

The Pythagorean Law

If ${\displaystyle u\perp v}$, then ${\displaystyle \|u+v\|^{2}=\|u\|^{2}+\|v\|^{2}}$

#### Proof

{\displaystyle {\begin{aligned}\|u+v\|^{2}&=\langle u+v,u+v\rangle \\&=\langle u,u\rangle +\langle u,v\rangle +\langle v,u\rangle +\langle v,v\rangle \\&=\|u\|^{2}+0+0+\|v\|^{2}\end{aligned}}}

### Orthogonality of Functions

${\displaystyle 1\perp x}$, where the inner product is defined as ${\displaystyle \langle f,g\rangle =\int _{-1}^{1}f(x)\,g(x)\,\mathrm {d} x}$

${\displaystyle \langle 1,x\rangle =\int _{-1}^{1}1\cdot x\,\mathrm {d} x=0}$

### Theorem 5.4.2

The Cauchy-Schwarz Inequality

${\displaystyle \left|\left\langle u,v\right\rangle \right|\leq \left\|u\right\|\left\|v\right\|}$

Holds iff ${\displaystyle u}$ and ${\displaystyle v}$ are linearly dependent.