# MATH 323 Lecture 21

« previous | Thursday, November 8, 2012 | next »

## Least Squares Problem

Given subspace $S\subset \mathbb {R}$ and a vector ${\vec {v}}\not \in S$ , find the closest approximation ${\vec {p}}\in S$ . ${\vec {p}}$ is a vector projection, and $\|{\vec {p}}\|$ is the α-scalar projection.

When represented by $A{\vec {x}}={\vec {b}}$ , $S=R(A)$ is the column space of $A$ , and ${\hat {x}}$ is the vector such that ${\vec {p}}=A{\hat {x}}$ . We get $r({\hat {x}})={\vec {b}}-{\vec {p}}={\vec {b}}-A{\hat {x}}$ is the residual vector.

### Normal Equation

$A^{T}A{\hat {x}}=A^{T}{\vec {b}}$ ### Theorem 5.3.2

$A$ is a $m\times n$ matrix of rank $n$ (same rank as number of columns). Then the normal equation $A^{T}A{\vec {x}}={\vec {b}}$ has a unique solution given by

${\hat {x}}=(A^{T}A)^{-1}A^{T}{\vec {b}}$ And ${\hat {x}}$ is the unique least squares solution to the sysetm $A{\vec {x}}={\vec {b}}$ .

#### Proof

Based on premise that $A^{T}A$ is nonsingular.

Assume we have some vector $A^{T}A{\vec {z}}={\vec {0}}$ . For $A^{T}A$ to be nonsingular, $z={\vec {0}}$ must be the only solution.

We know that

• $A{\vec {z}}\in N(A^{T})$ • $A{\vec {z}}\in R(A)$ • and $R(A)\perp N(A^{T})$ Therefore $A{\vec {z}}\in (R(A)\cap N(A^{T}))=\{{\vec {0}}\}$ .

Q.E.D.

#### Corollary

We already know that ${\vec {p}}=A{\hat {x}}$ , and ${\hat {x}}=(A^{T}A)^{-1}A^{T}{\vec {b}}$ , so

${\vec {p}}=\underbrace {A(A^{T}A)^{-1}A^{T}} _{P}{\vec {b}}$ The projection matrix $P$ is interesting because $P^{2}=P$ :

$P^{2}=A{\cancel {(A^{T}A)^{-1}}}{\cancel {A^{T}A}}(A^{T}A)^{-1}A^{T}=A(A^{T}A)^{-1}A^{T}$ #### Example

Overdetermined system

{\begin{aligned}x_{1}+x_{2}&=3\\-2x_{1}+3x_{2}&=1\\2x_{1}-x_{2}&=2\end{aligned}} {\begin{aligned}A&={\begin{pmatrix}1&1\\-2&3\\2&-1\end{pmatrix}}\\A^{T}&={\begin{pmatrix}1&-2&2\\1&3&-1\end{pmatrix}}\\{\vec {b}}&={\begin{pmatrix}3\\1\\2\end{pmatrix}}\end{aligned}} $A^{T}A{\vec {x}}=A^{T}{\vec {b}}$ ${\hat {x}}={\begin{pmatrix}{\frac {81}{50}}\\{\frac {71}{50}}\end{pmatrix}}$ ### Regression

Given a set of measurements $y_{1},\ldots ,y_{n}$ at points $x_{1},\ldots ,x_{n}$ , each set of values defines a point at $(x_{i},y_{i})$ .

#### Linear Regression

Find equation such that $y=c_{0}+c_{1}x$ that approximates the system of equations

${\begin{bmatrix}1&x_{1}\\1&x_{2}\\\vdots &\vdots \\1&x_{n}\end{bmatrix}}{\begin{bmatrix}c_{0}\\c_{1}\end{bmatrix}}={\begin{bmatrix}y_{1}\\y_{2}\\\vdots \\y_{n}\end{bmatrix}}$ Solution for $A{\vec {c}}={\vec {y}}$ is given by $A^{T}A{\vec {c}}=A^{T}{\vec {y}}$ ## Inner Product Spaces

Vector space $V$ . Suppose we have a function such that for all $x,y\in V$ , the inner product of $x,y$ (notation $\left\langle x,y\right\rangle$ ) is a real number.

We want to have the following properties:

1. $\left\langle x,x\right\rangle \geq 0$ and is equal to 0 iff $x=0$ 2. $\left\langle x,y\right\rangle =\left\langle y,x\right\rangle$ (commutativity)
3. $\left\langle \alpha \,x+\beta \,y,z\right\rangle =\alpha \left\langle x,z\right\rangle +\beta \left\langle y,z\right\rangle$ and the same applies for the second component.

### Example 1

$\mathbb {R} ^{n}$ , $\left\langle {\vec {x}},{\vec {y}}\right\rangle ={\vec {x}}\cdot {\vec {y}}$ is the scalar product.

Given ${\vec {w}}$ weights such that $w_{i}\geq 0\forall w_{i}\in {\vec {w}}$ ,

$\left\langle {\vec {x}},{\vec {y}}\right\rangle _{\vec {w}}=\sum _{i=1}^{n}w_{i}\,x_{i}\,y_{i}$ ### Example 2

Given two matrices $A,B\in M_{m,n}(\mathbb {R} )$ , let $\left\langle A,B\right\rangle =\sum _{i=1}^{m}\sum _{j=1}^{n}a_{ij}\,b_{ij}$ .

### Example 3

Given two functions $f,g\in C[a,b]$ , let $\left\langle f,g\right\rangle =\int _{a}^{b}f(x)\,g(x)\,\mathrm {d} x$ 1. $\left\langle f,f\right\rangle =\int _{a}^{b}f^{2}(x)\,\mathrm {d} x\geq 0$ and $\left\langle f,f\right\rangle =\int _{a}^{b}f^{2}(x)\,\mathrm {d} x=0\iff f\equiv 0$ ### Properties

Given a vector space $V$ and an inner product function $\left\langle \cdot ,\cdot \right\rangle$ We can redefine:

• length / norm: $\left\|V\right\|={\sqrt {\left\langle v,v\right\rangle }}$ • Orthogonality: $u\perp v\iff \left\langle u,v\right\rangle =0$ • Scalar projection: $\alpha ={\frac {\left\langle u,v\right\rangle }{\left\|v\right\|}}$ • Vector projection: $p=\alpha \cdot \left({\frac {1}{\left\|v\right\|}}\,v\right)={\frac {\left\langle u,v\right\rangle }{\left\langle v,v\right\rangle }}\,v$ ### Theorem 5.4.1

The Pythagorean Law

If $u\perp v$ , then $\|u+v\|^{2}=\|u\|^{2}+\|v\|^{2}$ #### Proof

{\begin{aligned}\|u+v\|^{2}&=\langle u+v,u+v\rangle \\&=\langle u,u\rangle +\langle u,v\rangle +\langle v,u\rangle +\langle v,v\rangle \\&=\|u\|^{2}+0+0+\|v\|^{2}\end{aligned}} ### Orthogonality of Functions

$1\perp x$ , where the inner product is defined as $\langle f,g\rangle =\int _{-1}^{1}f(x)\,g(x)\,\mathrm {d} x$ $\langle 1,x\rangle =\int _{-1}^{1}1\cdot x\,\mathrm {d} x=0$ ### Theorem 5.4.2

The Cauchy-Schwarz Inequality

$\left|\left\langle u,v\right\rangle \right|\leq \left\|u\right\|\left\|v\right\|$ Holds iff $u$ and $v$ are linearly dependent.