Difference between revisions of "Solving system"

From Medusa: Coordinate Free Mehless Method implementation
Jump to: navigation, search
(Basics of solving square linear systems)
m (ordinary least squares)
 
(One intermediate revision by the same user not shown)
Line 57: Line 57:
 
https://eigen.tuxfamily.org/dox/group__LeastSquares.html
 
https://eigen.tuxfamily.org/dox/group__LeastSquares.html
  
As when [[Solving overdetermined systems | solving overdetermined systems]] if a lot of systems $Ax = b_i$ need to be solved, which only differ in the right hand side, the matrix $A$ can only be decomposed once, with
+
As when solving overdetermined systems if a lot of systems $Ax = b_i$ need to be solved, which only differ in the right hand side, the matrix $A$ can only be decomposed once, with
 
subsequent solutions taking $O(n^2)$ time.
 
subsequent solutions taking $O(n^2)$ time.
  
Line 68: Line 68:
 
== QR decomposition ==
 
== QR decomposition ==
  
Writing $A^\T = Q_1R_1$ as when [[Solving overdetermined systems | solving overdetermined systems]], we can get the solution
+
Writing $A^\T = Q_1R_1$ as when solving overdetermined systems, we can get the solution
 
$$x = Q_1R_1 (R_1^\T Q_1^\T Q_1 R_1)^{-1}b = Q_1 R_1 R_1^{-1} R_1^{-\T} b = Q_1 R_1^{-\T} b.$$
 
$$x = Q_1R_1 (R_1^\T Q_1^\T Q_1 R_1)^{-1}b = Q_1 R_1 R_1^{-1} R_1^{-\T} b = Q_1 R_1^{-\T} b.$$
  
Line 93: Line 93:
  
 
= Basics of solving overdetermined linear systems =
 
= Basics of solving overdetermined linear systems =
== ordinary least squares ==
+
== Ordinary least squares ==
  
 
Let $A$ be a matrix of size $n \times m$ and $b$ be a vector of size $n$, with $n \geq m$ and let $A$ have full rank. If $n=m$ the system $Ax=b$ is square, and can be treated as such, however the
 
Let $A$ be a matrix of size $n \times m$ and $b$ be a vector of size $n$, with $n \geq m$ and let $A$ have full rank. If $n=m$ the system $Ax=b$ is square, and can be treated as such, however the
Line 117: Line 117:
 
https://eigen.tuxfamily.org/dox/group__LeastSquares.html
 
https://eigen.tuxfamily.org/dox/group__LeastSquares.html
  
As when [[Solving linear systems | solving linear systems]] if a lot of systems $Ax = b_i$ need to be solved, which only differ in the right hand side, the matrix $A$ can only be decomposed once, with
+
As when solving linear systems if a lot of systems $Ax = b_i$ need to be solved, which only differ in the right hand side, the matrix $A$ can only be decomposed once, with
 
subsequent solutions taking $O(n^2)$ time.
 
subsequent solutions taking $O(n^2)$ time.
  

Latest revision as of 13:07, 27 December 2023

Basics of solving square linear systems

Solving a system $Ax = b$, where $A$ is a $n\times n$ matrix of scalars (complex, real, or otherwise) is possible if $A$ is invertible, and the solution is given theoretically by $x = A^{-1}b$.


Let $\hat{x}$ be the solution of $\hat{A} \hat{x} = \hat{b}$, where the matrix $A$ is disturbed as $\hat{A} = A + \Delta{A}$ and the right hand side as $\hat{b} = b + \Delta b$.

The absolute error of $x$ is $\Delta x = \|\hat{x} - x\|$ and the relative errors are defined as $\delta A = \|\Delta A\|/\|A\|$ and $\delta b = \|\Delta b\|/\|b\|$ and $\delta x = \|\Delta x\|/\|x\|$. We assume that relative errors in inputs are reasonably small.


If only $b$ is disturbed, the error is given by $$\delta x \leq \kappa(A) \delta b,$$ where $\kappa(A) = \|A\|\|A^{-1}\|$ is the condition number of matrix $A$.

In general, the error is given by

$$\delta x \leq \frac{\kappa(A)}{1 - \kappa(A) \delta A} (\delta A + \delta b).$$

Therefore, if the condition number $\kappa(A)$ is large, the accuracy of the solution of the system can deteriorate.

If $A$ is not invertible, Moore-Penrose pseudoinverse can be used to compute a "solution": $x = A^+b$.

Basics of solving underdetermined linear systems

Least norm solution

Consider the system $Ax = b$, where $A$ is a $n \times m$ matrix of full rank and $n \leq m$. The number of variables $m$ is larger than the number of equations. This means that all equations can be satisfied and some degrees of freedom are left to obtain various properties of the solution. The common problem is to find the least-norm solution: find $x$, such that $$\min_x \|x\|_2, \text{ such that } Ax = b.$$

The solution is given by $x = A^\T (AA^\T)^{-1} b$, and it can be obtained by solving the $n \times n$ system $AA^\T y = b$ and computing $x = A^\T y$. The solution can also be written as $x = A^+ b$.

To validate the correctness, consider any other $z$, such that $Az=b$. Then, $$ (z-x)^\T x = (z-x)^\T A^\T (AA^\T)^{-1} b = (A(x-z))^\T (AA^\T)^{-1} b = (b-b)^\T (AA^\T)^{-1} b = 0, $$ and consequently $$ \|z\|^2 = \|z-x+x\|^2 = \|z-x\|^2 + 2(z-x)^\T x + \|x\|^2 = \|z-x\|^2 + \|x\|^2 \geq \|x\|^2, $$ which shows that $x$ has smaller norm than any other solution $z$.

Alternative derivation is to minimize $x^\T x$ with respect to $Ax=b$ using Lagrangian multipliers.

Weighted case

Weighted case is similar, except that the minimization problem becomes $$\min_x \|x\|_{2,w}, \text{ such that } Ax = b.$$ The minimization can be written $\min_x \|Wx\|_{2}, \text{ such that } Ax = b$, or substituting $Wx = y$, we obtain the following ordinary least norm problem: $$\min_y \|y\|_{2}, \text{ such that } AW^{-1}y = b,$$ with matrix $AW^{-1}$ and right hand side $b$, and with solution $x = W^{-1} y$.

Thus, the final solution is given as $x = W^{-2} A^\T (A W^{-2} A^\T)^{-1} b$ and can be computed by first solving $n \times n$ system $AW^{-2}A^\T y = b$ and computing $x = W^{-2} A^\T y$.

Numerical solving

Techniques for numerical solving of (weighted) underdetermined systems are described below. As list of decompositions available in Eigen is available here: https://eigen.tuxfamily.org/dox/group__LeastSquares.html

As when solving overdetermined systems if a lot of systems $Ax = b_i$ need to be solved, which only differ in the right hand side, the matrix $A$ can only be decomposed once, with subsequent solutions taking $O(n^2)$ time.

Normal equations

We can solve the system directly, by solving the $n \times n$ system $AA^\T y = b$ using Cholesky, and computing $x = A^\T y$.

This decomposition takes $\frac13 n^3 + 2n^2m$ time.

QR decomposition

Writing $A^\T = Q_1R_1$ as when solving overdetermined systems, we can get the solution $$x = Q_1R_1 (R_1^\T Q_1^\T Q_1 R_1)^{-1}b = Q_1 R_1 R_1^{-1} R_1^{-\T} b = Q_1 R_1^{-\T} b.$$

The computational costs are the same as in the overdetermined case.

SVD decomposition

The pseudoinverse can be computed using SVD and the solution obtained as $x = A^{+} b$.

The computational costs are the same as in the overdetermined case.

Decompositions

Solving linear systems is not done by computing inverses of $A$ but by suitably decomposing $A$. Many decompositions are available, with $LU$ decomposition with partial pivoting being the go-to option. with $\frac23 n^3$ cost. Additionally, we have $QR$ and $SVD$ decompositions if more robust decompositions are desired, at a greater cost.

See full the list of Eigen's available decompositions: https://eigen.tuxfamily.org/dox/group__TopicLinearAlgebraDecompositions.html

If many systems $Ax = b_i$ need to be saved with only different right hand sides, the decomposition of $A$ can be computed in $O(n^3)$ and stored with subsequent solutions only taking $O(n^2)$ time.

If the matrix is symmetric and positive definite, Cholesky (called $VV^\T$ or $LL^\T$) decomposition can be used, with $\frac13 n^3$ cost.

Basics of solving overdetermined linear systems

Ordinary least squares

Let $A$ be a matrix of size $n \times m$ and $b$ be a vector of size $n$, with $n \geq m$ and let $A$ have full rank. If $n=m$ the system $Ax=b$ is square, and can be treated as such, however the derivation below works as well.

The system $Ax=b$ has $n$ equations for $m$ unknowns $x_1, \ldots, x_m$ and is overdetermined. The solution is most often sought in the least-squares sense, defining the solution $x$ to be the solution of the following minimisation problem $$ \min_x \|Ax-b\|_2. $$ The solution can be obtained using calculus, by looking for stationary points of function $\|Ax-b\|_2^2 = (Ax-b)^\T(Ax-b)$ (which has the same minimum), and getting the system of normal equations $$A^\T A x = A^\T b.$$ The solution can be expressed theoretically as $x = (A^\T A)^{-1} A^\T b$.

Weighted least squares

The weighted least squares problem has the same setup as the ordinary least squares, but we instead minimise the norm $\|Ax-b\|_{2,w}$, where $\|z\|_{2,w} = \sum_{i=1}^n (w_iz_i)^2$. We assume all weights $w_i$ are positive.

By observing that $\|z\|_{2,w} = \|Wz\|_2$, where $W = \operatorname{diag}(w_1, \ldots, w_n)$, we can reduce the problem to ordinary least squares. Writing $\|Ax-b\|_{2,w} = \|W(Ax-b)\|_2 = \|WA x - Wb\|_2$, we obtain an ordinary least squares problem with matrix $WA$ and vector $Wb$, which has the normal equations $$A^\T W^2Ax = A^\T W^2b$$ and the solution $x = (A^\T W^2A)^{-1} A^\T W^2b$.

Numerical solving

Techniques for numerical solving of (weighted) least squares systems are described below. As list of decompositions available in Eigen is available here: https://eigen.tuxfamily.org/dox/group__LeastSquares.html

As when solving linear systems if a lot of systems $Ax = b_i$ need to be solved, which only differ in the right hand side, the matrix $A$ can only be decomposed once, with subsequent solutions taking $O(n^2)$ time.

Normal equations

The normal equations above can be directly used to solve the system. The matrix $A^\T A$ is symmetric and positive definite, and normal equations can be solved using Cholesky decomposition. However, this can be poorly behaved with respect to round-off errors since the condition number $\kappa(A^\T A)$ is the square of $\kappa(A)$.

Complexity of Cholesky decomposition is $\frac{m^3}{3}$ and complexity of matrix multiplication $2m^2n$. To preform the Cholesky decomposition the rank of $A$ must be full.

Pros:

  • simple to implement
  • low computational complexity

Cons:

  • numerically unstable
  • full rank requirement

$QR$ Decomposition

The square $n\times m$ matrix $A$ is decomposed as $$ A = QR = \begin{bmatrix} Q_1 & Q_2 \end{bmatrix} \begin{bmatrix} R_1 \\ 0 \end{bmatrix} = Q_1R_1,$$ where $Q$ is a unitary matrix ($Q^{-1}=Q^T$) of size $n\times n$, $R$ trapezoidal and is the same size as $A$ ($n \times m$), $Q_1$ is also the same as $A$ ($n \times m$) and $R_1$ is a triangular $m \times m$ matrix. A useful property of a unitary matrices is that multiplying with them does not alter the (Euclidean) norm of a vector, i.e., $$ \| Qx \| = \| x \|. $$ Therefore we can say $$ \| Ax- b \| = \| Q^\T (Ax-b) \| = \| Q^\T Ax - Q^\T b \| = \| Q^\T QRx - Q^\T b \| = \| [R_1; 0] x - [Q_1, Q_2]^\T b \| = \| R_1x - Q_1^\T b \| + \| Q_2^\T b \| $$ Of the two terms, we have no control over the second, and we can render the first one zero by solving $$ R_1x = Q_1^\T b, $$ which results in a minimum $x = R_1^{-1} Q_1^\T b$.

When computing the $QR$ decomposition, only the thin decomposition $Q_1$, $R_1$ is needed to solve the least squares problem.

The complexity of $QR$ decomposition is $2nm^2−2m^3/3$.

Pros: better stability in comparison to normal equations, cons: higher complexity

SVD decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix.

The singular value decomposition of an $n \times m$ real or complex matrix $A$ is a factorization of the form $A= U\Sigma V^\T$, where $U$ is an $n \times n$ unitary matrix, $\Sigma$ is an $n \times m$ rectangular diagonal matrix with non-negative real numbers on the diagonal, and $V$ is an $m \times m$ unitary matrix. The diagonal entries $\Sigma_{ii}$ are known as the singular values of $A$. The $n$ columns of $U$ and the $n$ columns of $V$ are called the left-singular vectors and right-singular vectors of $A$, respectively.

The singular value decomposition and the eigendecomposition are closely related. Namely:

  • The left-singular vectors of $A$ are eigen vectors of $A A^\T$.
  • The right-singular vectors of $A$ are eigen vectors of $A^\T A$.
  • The non-zero singular values of $A$ (found on the diagonal entries of $\Sigma$) are the square roots of the non-zero eigenvalues of both $A^\T A$ and $A^\T A$.

with SVD we can write $A$ as \[A= U\Sigma V^\T \] where are $U$ and $V$ again unitary matrices and $\Sigma$ stands for diagonal matrix of singular values.

SVD can be used to compute the pseudoinverse \[ A^{+} = \left( U\Sigma V^\T\right)^{-1} = V \Sigma^{-1}U^\T \] where $\Sigma^{-1}$ is trivial, just replace every non-zero diagonal entry by its reciprocal and transpose the resulting matrix.

This can be used to solve the least squares problem, ordinary systems or underdetermined systems. Note that usually only the thin variant is needed.

One can also choose a threshold $t$ below which the singular value is considered as $0$ and truncate all singular values, whose magnitude is below $t$. Because we do not invert very small values, stability of the solution increases, at the cost of a slightly inexact solution. This approach is called "truncated SVD decomposition", which is a form of regularization of badly conditioned problems. Another approach would be Tikhonov regularization.

SVD decomposition complexity \[ 2nm^2+2m^3 = O(n^3) \]

Pros: stable, cons: high complexity