Difference between revisions of "Solving system"

From Medusa: Coordinate Free Mehless Method implementation
Jump to: navigation, search
Line 33: Line 33:
  
 
If the matrix is symmetric and positive definite, Cholesky (called $VV^\T$ or $LL^\T$) decomposition can be used, with $\frac13 n^3$ cost.
 
If the matrix is symmetric and positive definite, Cholesky (called $VV^\T$ or $LL^\T$) decomposition can be used, with $\frac13 n^3$ cost.
 +
 +
= Basics of solving overdetermined linear systems =
 +
== ordinary least squares ==
 +
 +
Let $A$ be a matrix of size $n \times m$ and $b$ be a vector of size $n$, with $n \geq m$ and let $A$ have full rank. If $n=m$ the system $Ax=b$ is square, and can be treated as such, however the
 +
derivation below works as well.
 +
 +
The system $Ax=b$ has $n$ equations for $m$ unknowns $x_1, \ldots, x_m$ and is overdetermined. The solution is most often sought in the least-squares sense, defining the solution $x$ to be the solution of the following minimisation problem
 +
$$ \min_x \|Ax-b\|_2. $$
 +
The solution can be obtained using calculus, by looking for stationary points of function $\|Ax-b\|_2^2 = (Ax-b)^\T(Ax-b)$ (which has the same minimum), and getting the system of
 +
normal equations $$A^\T A x = A^\T b.$$ The solution can be expressed theoretically as
 +
$x = (A^\T A)^{-1} A^\T b$.
 +
 +
== Weighted least squares ==
 +
 +
The weighted least squares problem has the same setup as the ordinary least squares, but we instead minimise the norm $\|Ax-b\|_{2,w}$, where $\|z\|_{2,w} = \sum_{i=1}^n (w_iz_i)^2$. We assume all weights $w_i$ are positive.
 +
 +
By observing that $\|z\|_{2,w} = \|Wz\|_2$, where $W = \operatorname{diag}(w_1, \ldots, w_n)$, we can reduce the problem to ordinary least squares.
 +
Writing $\|Ax-b\|_{2,w} = \|W(Ax-b)\|_2 = \|WA x - Wb\|_2$, we obtain an ordinary least squares problem with matrix $WA$ and vector $Wb$, which has the
 +
normal equations $$A^\T W^2Ax = A^\T W^2b$$ and the solution $x = (A^\T W^2A)^{-1} A^\T W^2b$.
 +
 +
= Numerical solving =
 +
 +
Techniques for numerical solving of (weighted) least squares systems are described below. As list of decompositions available in Eigen is available here:
 +
https://eigen.tuxfamily.org/dox/group__LeastSquares.html
 +
 +
As when [[Solving linear systems | solving linear systems]] if a lot of systems $Ax = b_i$ need to be solved, which only differ in the right hand side, the matrix $A$ can only be decomposed once, with
 +
subsequent solutions taking $O(n^2)$ time.
 +
 +
== Normal equations ==
 +
The normal equations above can be directly used to solve the system. The matrix $A^\T A$ is symmetric and positive definite, and
 +
normal equations can be solved using Cholesky decomposition. However, this can be
 +
poorly behaved with respect to round-off errors since the condition number $\kappa(A^\T A)$ is the square
 +
of $\kappa(A)$.
 +
 +
Complexity of Cholesky decomposition is $\frac{m^3}{3}$ and complexity of matrix multiplication $2m^2n$. To preform the Cholesky decomposition the rank of $A$ must be full.
 +
 +
'''Pros:'''
 +
* simple to implement
 +
* low computational complexity
 +
 +
'''Cons:'''
 +
* numerically unstable
 +
* full rank requirement
 +
 +
== [https://en.wikipedia.org/wiki/QR_decomposition $QR$ Decomposition] ==
 +
The square $n\times m$ matrix $A$ is decomposed as
 +
$$
 +
A = QR =
 +
\begin{bmatrix} Q_1 & Q_2 \end{bmatrix}
 +
\begin{bmatrix} R_1 \\ 0 \end{bmatrix} = Q_1R_1,$$
 +
where
 +
$Q$ is a unitary matrix ($Q^{-1}=Q^T$) of size $n\times n$, $R$ trapezoidal and is the same size as $A$ ($n \times m$),
 +
$Q_1$ is also the same as $A$  ($n \times m$) and $R_1$ is a triangular $m \times m$ matrix.
 +
A useful property of a unitary matrices is that multiplying with them does not alter the (Euclidean) norm of a vector, i.e.,
 +
$$ \| Qx \| = \| x \|. $$
 +
Therefore we can say
 +
$$
 +
\| Ax- b \| = \| Q^\T (Ax-b) \| = \| Q^\T Ax  - Q^\T b \|
 +
= \| Q^\T QRx  - Q^\T b \| = \| [R_1; 0] x  - [Q_1, Q_2]^\T b \|
 +
= \| R_1x  - Q_1^\T b \| + \| Q_2^\T b \|
 +
$$
 +
Of the two terms, we have no control over the second, and we can render the first one
 +
zero by solving
 +
$$ R_1x  = Q_1^\T b, $$
 +
which results in a minimum $x = R_1^{-1} Q_1^\T b$.
 +
 +
When computing the $QR$ decomposition, only the ''thin'' decomposition $Q_1$, $R_1$ is needed to solve the least squares problem.
 +
 +
The complexity of $QR$ decomposition is $2nm^2−2m^3/3$.
 +
 +
<strong>Pros:</strong> better stability in comparison to normal equations, <strong> cons: </strong>higher complexity
 +
 +
== [https://en.wikipedia.org/wiki/Singular_value_decomposition SVD decomposition] ==
 +
In linear algebra, the [https://en.wikipedia.org/wiki/Singular_value_decomposition singular value decomposition (SVD)]
 +
is a factorization of a real or complex matrix.
 +
 +
The singular value decomposition of an $n \times m$ real or complex
 +
matrix $A$ is a factorization of the form $A= U\Sigma V^\T$, where
 +
$U$ is an $n \times n$ unitary matrix, $\Sigma$ is an $n \times m$
 +
rectangular diagonal matrix with non-negative real numbers on the diagonal, and
 +
$V$  is an $m \times m$ unitary matrix. The diagonal entries
 +
$\Sigma_{ii}$ are known as the singular values of $A$. The $n$ columns of
 +
$U$ and the $n$ columns of $V$ are called the left-singular vectors and
 +
right-singular vectors of $A$, respectively.
 +
 +
The singular value decomposition and the eigendecomposition are closely
 +
related. Namely:
 +
 +
* The left-singular vectors of $A$ are eigen vectors of $A A^\T$.
 +
* The right-singular vectors of $A$ are eigen vectors of $A^\T A$.
 +
* The non-zero singular values of $A$ (found on the diagonal entries of $\Sigma$) are the square roots of the non-zero eigenvalues of both $A^\T A$ and $A^\T A$.
 +
 +
with SVD we can write $A$ as \[A= U\Sigma V^\T \] where are $U$ and $V$ again unitary matrices and $\Sigma$
 +
stands for diagonal matrix of singular values.
 +
 +
SVD can be used to compute the pseudoinverse
 +
\[ A^{+} = \left( U\Sigma V^\T\right)^{-1} = V \Sigma^{-1}U^\T \]
 +
where $\Sigma^{-1}$ is trivial, just replace every non-zero diagonal entry by
 +
its reciprocal and transpose the resulting matrix.
 +
 +
This can be used to solve the least squares problem, ordinary systems or underdetermined systems.
 +
Note that usually only the ''thin'' variant is needed.
 +
 +
One can also choose a threshold $t$
 +
below which the singular value is considered as $0$ and truncate all
 +
singular values, whose magnitude is below $t$. Because we do not invert very small values,
 +
stability of the solution increases, at the cost of a slightly inexact solution.
 +
This approach is called "truncated SVD decomposition", which is a form of regularization of badly conditioned problems.
 +
Another approach would be [https://en.wikipedia.org/wiki/Tikhonov_regularization Tikhonov regularization].
 +
 +
SVD decomposition complexity \[ 2nm^2+2m^3 = O(n^3) \]
 +
 +
<strong>Pros:</strong> stable, <strong> cons: </strong>high complexity

Revision as of 18:06, 22 October 2022

Basics of solving square linear systems

Solving a system $Ax = b$, where $A$ is a $n\times n$ matrix of scalars (complex, real, or otherwise) is possible if $A$ is invertible, and the solution is given theoretically by $x = A^{-1}b$.


Let $\hat{x}$ be the solution of $\hat{A} \hat{x} = \hat{b}$, where the matrix $A$ is disturbed as $\hat{A} = A + \Delta{A}$ and the right hand side as $\hat{b} = b + \Delta b$.

The absolute error of $x$ is $\Delta x = \|\hat{x} - x\|$ and the relative errors are defined as $\delta A = \|\Delta A\|/\|A\|$ and $\delta b = \|\Delta b\|/\|b\|$ and $\delta x = \|\Delta x\|/\|x\|$. We assume that relative errors in inputs are reasonably small.


If only $b$ is disturbed, the error is given by $$\delta x \leq \kappa(A) \delta b,$$ where $\kappa(A) = \|A\|\|A^{-1}\|$ is the condition number of matrix $A$.

In general, the error is given by

$$\delta x \leq \frac{\kappa(A)}{1 - \kappa(A) \delta A} (\delta A + \delta b).$$

Therefore, if the condition number $\kappa(A)$ is large, the accuracy of the solution of the system can deteriorate.

If $A$ is not invertible, Moore-Penrose pseudoinverse can be used to compute a "solution": $x = A^+b$.

Decompositions

Solving linear systems is not done by computing inverses of $A$ but by suitably decomposing $A$. Many decompositions are available, with $LU$ decomposition with partial pivoting being the go-to option. with $\frac23 n^3$ cost. Additionally, we have $QR$ and $SVD$ decompositions if more robust decompositions are desired, at a greater cost.

See full the list of Eigen's available decompositions: https://eigen.tuxfamily.org/dox/group__TopicLinearAlgebraDecompositions.html

If many systems $Ax = b_i$ need to be saved with only different right hand sides, the decomposition of $A$ can be computed in $O(n^3)$ and stored with subsequent solutions only taking $O(n^2)$ time.

If the matrix is symmetric and positive definite, Cholesky (called $VV^\T$ or $LL^\T$) decomposition can be used, with $\frac13 n^3$ cost.

Basics of solving overdetermined linear systems

ordinary least squares

Let $A$ be a matrix of size $n \times m$ and $b$ be a vector of size $n$, with $n \geq m$ and let $A$ have full rank. If $n=m$ the system $Ax=b$ is square, and can be treated as such, however the derivation below works as well.

The system $Ax=b$ has $n$ equations for $m$ unknowns $x_1, \ldots, x_m$ and is overdetermined. The solution is most often sought in the least-squares sense, defining the solution $x$ to be the solution of the following minimisation problem $$ \min_x \|Ax-b\|_2. $$ The solution can be obtained using calculus, by looking for stationary points of function $\|Ax-b\|_2^2 = (Ax-b)^\T(Ax-b)$ (which has the same minimum), and getting the system of normal equations $$A^\T A x = A^\T b.$$ The solution can be expressed theoretically as $x = (A^\T A)^{-1} A^\T b$.

Weighted least squares

The weighted least squares problem has the same setup as the ordinary least squares, but we instead minimise the norm $\|Ax-b\|_{2,w}$, where $\|z\|_{2,w} = \sum_{i=1}^n (w_iz_i)^2$. We assume all weights $w_i$ are positive.

By observing that $\|z\|_{2,w} = \|Wz\|_2$, where $W = \operatorname{diag}(w_1, \ldots, w_n)$, we can reduce the problem to ordinary least squares. Writing $\|Ax-b\|_{2,w} = \|W(Ax-b)\|_2 = \|WA x - Wb\|_2$, we obtain an ordinary least squares problem with matrix $WA$ and vector $Wb$, which has the normal equations $$A^\T W^2Ax = A^\T W^2b$$ and the solution $x = (A^\T W^2A)^{-1} A^\T W^2b$.

Numerical solving

Techniques for numerical solving of (weighted) least squares systems are described below. As list of decompositions available in Eigen is available here: https://eigen.tuxfamily.org/dox/group__LeastSquares.html

As when solving linear systems if a lot of systems $Ax = b_i$ need to be solved, which only differ in the right hand side, the matrix $A$ can only be decomposed once, with subsequent solutions taking $O(n^2)$ time.

Normal equations

The normal equations above can be directly used to solve the system. The matrix $A^\T A$ is symmetric and positive definite, and normal equations can be solved using Cholesky decomposition. However, this can be poorly behaved with respect to round-off errors since the condition number $\kappa(A^\T A)$ is the square of $\kappa(A)$.

Complexity of Cholesky decomposition is $\frac{m^3}{3}$ and complexity of matrix multiplication $2m^2n$. To preform the Cholesky decomposition the rank of $A$ must be full.

Pros:

  • simple to implement
  • low computational complexity

Cons:

  • numerically unstable
  • full rank requirement

$QR$ Decomposition

The square $n\times m$ matrix $A$ is decomposed as $$ A = QR = \begin{bmatrix} Q_1 & Q_2 \end{bmatrix} \begin{bmatrix} R_1 \\ 0 \end{bmatrix} = Q_1R_1,$$ where $Q$ is a unitary matrix ($Q^{-1}=Q^T$) of size $n\times n$, $R$ trapezoidal and is the same size as $A$ ($n \times m$), $Q_1$ is also the same as $A$ ($n \times m$) and $R_1$ is a triangular $m \times m$ matrix. A useful property of a unitary matrices is that multiplying with them does not alter the (Euclidean) norm of a vector, i.e., $$ \| Qx \| = \| x \|. $$ Therefore we can say $$ \| Ax- b \| = \| Q^\T (Ax-b) \| = \| Q^\T Ax - Q^\T b \| = \| Q^\T QRx - Q^\T b \| = \| [R_1; 0] x - [Q_1, Q_2]^\T b \| = \| R_1x - Q_1^\T b \| + \| Q_2^\T b \| $$ Of the two terms, we have no control over the second, and we can render the first one zero by solving $$ R_1x = Q_1^\T b, $$ which results in a minimum $x = R_1^{-1} Q_1^\T b$.

When computing the $QR$ decomposition, only the thin decomposition $Q_1$, $R_1$ is needed to solve the least squares problem.

The complexity of $QR$ decomposition is $2nm^2−2m^3/3$.

Pros: better stability in comparison to normal equations, cons: higher complexity

SVD decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix.

The singular value decomposition of an $n \times m$ real or complex matrix $A$ is a factorization of the form $A= U\Sigma V^\T$, where $U$ is an $n \times n$ unitary matrix, $\Sigma$ is an $n \times m$ rectangular diagonal matrix with non-negative real numbers on the diagonal, and $V$ is an $m \times m$ unitary matrix. The diagonal entries $\Sigma_{ii}$ are known as the singular values of $A$. The $n$ columns of $U$ and the $n$ columns of $V$ are called the left-singular vectors and right-singular vectors of $A$, respectively.

The singular value decomposition and the eigendecomposition are closely related. Namely:

  • The left-singular vectors of $A$ are eigen vectors of $A A^\T$.
  • The right-singular vectors of $A$ are eigen vectors of $A^\T A$.
  • The non-zero singular values of $A$ (found on the diagonal entries of $\Sigma$) are the square roots of the non-zero eigenvalues of both $A^\T A$ and $A^\T A$.

with SVD we can write $A$ as \[A= U\Sigma V^\T \] where are $U$ and $V$ again unitary matrices and $\Sigma$ stands for diagonal matrix of singular values.

SVD can be used to compute the pseudoinverse \[ A^{+} = \left( U\Sigma V^\T\right)^{-1} = V \Sigma^{-1}U^\T \] where $\Sigma^{-1}$ is trivial, just replace every non-zero diagonal entry by its reciprocal and transpose the resulting matrix.

This can be used to solve the least squares problem, ordinary systems or underdetermined systems. Note that usually only the thin variant is needed.

One can also choose a threshold $t$ below which the singular value is considered as $0$ and truncate all singular values, whose magnitude is below $t$. Because we do not invert very small values, stability of the solution increases, at the cost of a slightly inexact solution. This approach is called "truncated SVD decomposition", which is a form of regularization of badly conditioned problems. Another approach would be Tikhonov regularization.

SVD decomposition complexity \[ 2nm^2+2m^3 = O(n^3) \]

Pros: stable, cons: high complexity