Medusa: Coordinate Free Mehless Method implementation - User contributions [en]

Pardiso

2017-07-19T13:19:31Z

Mdepolli: /* Pardiso library */

=Pardiso library=

Pardiso library can be downloaded from [http://www.pardiso-project.org/]. There are also examples of use, the manual, etc. Note that the source of the library is not available. There are only pre-compiled objects available on download after one registers on the site.
* The package PARDISO is a thread-safe, high-performance, robust, memory efficient and easy to use software for solving large sparse symmetric and unsymmetric linear systems of equations on shared-memory and distributed-memory multiprocessors. The solver uses a combination of left- and right-looking Level-3 BLAS supernode techniques".
* Pardiso is proprietary software but is available for academic use; by registering a yearly license can be obtained.
* Parallelization is implemented using both OpenMP and MPI.
* MPI-based numerical factorization and parallel forward/backward substitution on distributed-memory architectures is implemented for '''symmetric indefinite matrices'''.

==Example of use==

This minimal example makes use of the OpenMP parallelization to run Pardiso solver on a sample matrix.

test.cpp:
<syntaxhighlight lang="c++" line>
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <sstream>
#include <vector>
#include <cassert>
#include <math.h>

// PARDISO prototype.
// There is no pardiso.h, therefore function prototypes have to be defined within the user's code
extern "C" void pardisoinit (void *, int *, int *, int *, double *, int *);
extern "C" void pardiso (void *, int *, int *, int *, int *, int *,
double *, int *, int *, int *, int *, int *,
int *, double *, double *, int *, double *);
extern "C" void pardiso_chkmatrix (int *, int *, double *, int *, int *, int *);
extern "C" void pardiso_chkvec (int *, int *, double *, int *);
extern "C" void pardiso_printstats (int *, int *, double *, int *, int *, int *, double *, int *);

struct TestCase {
// the dimension of the problem, Matrix A will have n*n elements, while vectors b and x will have n elements
int n;
// indices of individual rows in compact matrix definition: row[0] contains non zero elements at column indices ia[0] .. (ia[1]-1)
std::vector<int> ia;
// column indices of non-zero elements
std::vector<int> ja;
// values of non-zero elements
std::vector<double> A;
// right hand side
std::vector<double> b;
// solution vector
std::vector<double> x;

// Pardiso workspace variable (using void* as the type ensures enough space is allocated on 32 and 64-bit systems)
void* pt[64];
// Pardiso solver parameter variables
int iparm[64];
double dparm[64];

void createTestMatrix(int n) {
this->n = n;

// -2 on the diagonal (except topmost and bottommost, which are 1), 1 on the +1 and -1 diagonals (except topmost and bottommost, which are 0)
ia.reserve(n*2);
ja.reserve(n*3);
A.reserve(n*3); // 3 per row

// 1st row (only 1 element)
ia.push_back(0);
A.push_back(1);
ja.push_back(0);

for (int i = 1; i < n-1; ++i) {
// new row is stating
ia.push_back(A.size());

// the 3 diagonal elements
A.push_back(1);
ja.push_back(i-1);

A.push_back(-2);
ja.push_back(i);

A.push_back(1);
ja.push_back(i+1);
}

// last row (only 1 element)
ia.push_back(A.size());
A.push_back(1);
ja.push_back(n-1);

// the last element points at the end of the matrix (n+1 row)
ia.push_back(A.size());

// Convert matrix from 0-based to 1-based notation (Fortran...)
for (auto i = 0; i < ia.size(); i++)
ia[i] += 1;
for (auto i = 0; i < ja.size(); i++)
ja[i] += 1;

// set b to 1 (topmost and bottommost elements) and 1/n² (the other elements)
b.resize(n, 1);
for (auto i = 0; i < b.size(); i++)
b[i] /= (n*n);
}

void solve(int numThreads) {
// the solution placeholder
x.resize(n, 0);

int error = 0;
int solver = 0; // use sparse direct solver
int maxfct, mnum, phase, msglvl;
int mtype = 11; // Real unsymmetric matrix
double ddum; // double dummy variable
int idum; // integer dummy variable
int nrhs = 1; // number of right hand sides
maxfct = 1; // Maximum number of numerical factorizations.
mnum = 1; // Which factorization to use.
msglvl = 1; // Print statistical information
error = 0; // Initialize error flag to no-error
iparm[2] = numThreads; // initialize the number of threads

pardisoinit (pt, &mtype, &solver, iparm, dparm, &error);
std::cerr << "error = " << error << "\n";
pardiso_chkmatrix (&mtype, &n, A.data(), ia.data(), ja.data(), &error);
std::cerr << "error = " << error << "\n";
pardiso_chkvec (&n, &nrhs, b.data(), &error);
std::cerr << "error = " << error << "\n";
pardiso_printstats (&mtype, &n, A.data(), ia.data(), ja.data(), &nrhs, b.data(), &error);
std::cerr << "error = " << error << "\n";

phase = 11; // reordering & symbolic factorization
pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, A.data(), ia.data(), ja.data(), &idum, &nrhs, iparm, &msglvl, &ddum, &ddum, &error, dparm);
std::cerr << "error = " << error << "\n";
phase = 22; // numerical factorization
pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, A.data(), ia.data(), ja.data(), &idum, &nrhs, iparm, &msglvl, &ddum, &ddum, &error, dparm);
std::cerr << "error = " << error << "\n";
phase = 33; // back substitution and refinement
iparm[7] = 10; // number of refinement steps
pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, A.data(), ia.data(), ja.data(), &idum, &nrhs, iparm, &msglvl, b.data(), x.data(), &error, dparm);
std::cerr << "error = " << error << "\n";
phase = -1; // finalization and memory release
pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, A.data(), ia.data(), ja.data(), &idum, &nrhs, iparm, &msglvl, b.data(), x.data(), &error, dparm);
std::cerr << "error = " << error << "\n";
}
};

int main(int argc, char** argv) {
assert(argc == 2 && "Second argument is size of the system.");
int n, nThreads;
std::stringstream ss(argv[1]);
ss >> n;
auto omp_num_threads = getenv("OMP_NUM_THREADS");
if (omp_num_threads != nullptr) {
std::cerr << "OMP_NUM_THREADS = " << omp_num_threads << "\n";
nThreads = atoi(omp_num_threads); // number of threads
}
std::cout << "n = " << n << ", num threads = " << nThreads << std::endl;

TestCase testcase;
testcase.createTestMatrix(n);
testcase.solve(nThreads);
return 0;
}
</syntaxhighlight>

The code is compiled with GCC using the following command line:
<code>g++ test.cpp -o test -O3 -L./ -lpardiso500-GNU481-X86-64 -lgfortran -fopenmp -lpthread -lm -llapack -lblas</code>

Then it is executed:
<code>OMP_NUM_THREADS=8 ./test 1000000</code>

This takes quite some time on the experimental machine (Intel i7 870 @ 2.93 GHz), which is dilligently reported by Pardiso itself:

<syntaxhighlight>
Summary PARDISO 5.0.0: ( reorder to reorder )
Time fulladj: 0.054887 s
Time reorder: 2.192858 s
Time symbfct: 0.189367 s
Time parlist: 0.059304 s
Time malloc : 0.122711 s
Time total : 2.871472 s total - sum: 0.252344 s
Summary PARDISO 5.0.0: ( factorize to factorize )
Time A to LU: 0.000000 s
Time numfct : 0.196008 s
Time malloc : -0.000004 s
Time total : 0.196070 s total - sum: 0.000066 s
Summary PARDISO 5.0.0: ( solve to solve )
Time solve : 0.033254 s
Time total : 0.200364 s total - sum: 0.167110 s
</syntaxhighlight>

Pardiso

2017-07-19T11:21:00Z

Mdepolli: Created page with "=Pardiso library= Pardiso library can be downloaded from [http://www.pardiso-project.org/]. There are also examples of use, the manual, etc. Note that the source of the libra..."

=Pardiso library=

Pardiso library can be downloaded from [http://www.pardiso-project.org/]. There are also examples of use, the manual, etc. Note that the source of the library is not available. There are only pre-compiled objects available on download after one registers on the site.
* The package PARDISO is a thread-safe, high-performance, robust, memory efficient and easy to use software for solving large sparse symmetric and unsymmetric linear systems of equations on shared-memory and distributed-memory multiprocessors. The solver uses a combination of left- and right-looking Level-3 BLAS supernode techniques".
* Pardiso is proprietary software but is available for academic use; by registering a yearly license can be obtained.
* Parallelization is implemented using both OpenMP and MPI.

==Example of use==

This minimal example makes use of the OpenMP parallelization to run Pardiso solver on a sample matrix.

test.cpp:
<syntaxhighlight lang="c++" line>
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <sstream>
#include <vector>
#include <cassert>
#include <math.h>

// PARDISO prototype.
// There is no pardiso.h, therefore function prototypes have to be defined within the user's code
extern "C" void pardisoinit (void *, int *, int *, int *, double *, int *);
extern "C" void pardiso (void *, int *, int *, int *, int *, int *,
double *, int *, int *, int *, int *, int *,
int *, double *, double *, int *, double *);
extern "C" void pardiso_chkmatrix (int *, int *, double *, int *, int *, int *);
extern "C" void pardiso_chkvec (int *, int *, double *, int *);
extern "C" void pardiso_printstats (int *, int *, double *, int *, int *, int *, double *, int *);

struct TestCase {
// the dimension of the problem, Matrix A will have n*n elements, while vectors b and x will have n elements
int n;
// indices of individual rows in compact matrix definition: row[0] contains non zero elements at column indices ia[0] .. (ia[1]-1)
std::vector<int> ia;
// column indices of non-zero elements
std::vector<int> ja;
// values of non-zero elements
std::vector<double> A;
// right hand side
std::vector<double> b;
// solution vector
std::vector<double> x;

// Pardiso workspace variable (using void* as the type ensures enough space is allocated on 32 and 64-bit systems)
void* pt[64];
// Pardiso solver parameter variables
int iparm[64];
double dparm[64];

void createTestMatrix(int n) {
this->n = n;

// -2 on the diagonal (except topmost and bottommost, which are 1), 1 on the +1 and -1 diagonals (except topmost and bottommost, which are 0)
ia.reserve(n*2);
ja.reserve(n*3);
A.reserve(n*3); // 3 per row

// 1st row (only 1 element)
ia.push_back(0);
A.push_back(1);
ja.push_back(0);

for (int i = 1; i < n-1; ++i) {
// new row is stating
ia.push_back(A.size());

// the 3 diagonal elements
A.push_back(1);
ja.push_back(i-1);

A.push_back(-2);
ja.push_back(i);

A.push_back(1);
ja.push_back(i+1);
}

// last row (only 1 element)
ia.push_back(A.size());
A.push_back(1);
ja.push_back(n-1);

// the last element points at the end of the matrix (n+1 row)
ia.push_back(A.size());

// Convert matrix from 0-based to 1-based notation (Fortran...)
for (auto i = 0; i < ia.size(); i++)
ia[i] += 1;
for (auto i = 0; i < ja.size(); i++)
ja[i] += 1;

// set b to 1 (topmost and bottommost elements) and 1/n² (the other elements)
b.resize(n, 1);
for (auto i = 0; i < b.size(); i++)
b[i] /= (n*n);
}

void solve(int numThreads) {
// the solution placeholder
x.resize(n, 0);

int error = 0;
int solver = 0; // use sparse direct solver
int maxfct, mnum, phase, msglvl;
int mtype = 11; // Real unsymmetric matrix
double ddum; // double dummy variable
int idum; // integer dummy variable
int nrhs = 1; // number of right hand sides
maxfct = 1; // Maximum number of numerical factorizations.
mnum = 1; // Which factorization to use.
msglvl = 1; // Print statistical information
error = 0; // Initialize error flag to no-error
iparm[2] = numThreads; // initialize the number of threads

pardisoinit (pt, &mtype, &solver, iparm, dparm, &error);
std::cerr << "error = " << error << "\n";
pardiso_chkmatrix (&mtype, &n, A.data(), ia.data(), ja.data(), &error);
std::cerr << "error = " << error << "\n";
pardiso_chkvec (&n, &nrhs, b.data(), &error);
std::cerr << "error = " << error << "\n";
pardiso_printstats (&mtype, &n, A.data(), ia.data(), ja.data(), &nrhs, b.data(), &error);
std::cerr << "error = " << error << "\n";

phase = 11; // reordering & symbolic factorization
pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, A.data(), ia.data(), ja.data(), &idum, &nrhs, iparm, &msglvl, &ddum, &ddum, &error, dparm);
std::cerr << "error = " << error << "\n";
phase = 22; // numerical factorization
pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, A.data(), ia.data(), ja.data(), &idum, &nrhs, iparm, &msglvl, &ddum, &ddum, &error, dparm);
std::cerr << "error = " << error << "\n";
phase = 33; // back substitution and refinement
iparm[7] = 10; // number of refinement steps
pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, A.data(), ia.data(), ja.data(), &idum, &nrhs, iparm, &msglvl, b.data(), x.data(), &error, dparm);
std::cerr << "error = " << error << "\n";
phase = -1; // finalization and memory release
pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, A.data(), ia.data(), ja.data(), &idum, &nrhs, iparm, &msglvl, b.data(), x.data(), &error, dparm);
std::cerr << "error = " << error << "\n";
}
};

int main(int argc, char** argv) {
assert(argc == 2 && "Second argument is size of the system.");
int n, nThreads;
std::stringstream ss(argv[1]);
ss >> n;
auto omp_num_threads = getenv("OMP_NUM_THREADS");
if (omp_num_threads != nullptr) {
std::cerr << "OMP_NUM_THREADS = " << omp_num_threads << "\n";
nThreads = atoi(omp_num_threads); // number of threads
}
std::cout << "n = " << n << ", num threads = " << nThreads << std::endl;

TestCase testcase;
testcase.createTestMatrix(n);
testcase.solve(nThreads);
return 0;
}
</syntaxhighlight>

The code is compiled with GCC using the following command line:
<code>g++ test.cpp -o test -O3 -L./ -lpardiso500-GNU481-X86-64 -lgfortran -fopenmp -lpthread -lm -llapack -lblas</code>

Then it is executed:
<code>OMP_NUM_THREADS=8 ./test 1000000</code>

This takes quite some time on the experimental machine (Intel i7 870 @ 2.93 GHz), which is dilligently reported by Pardiso itself:

<syntaxhighlight>
Summary PARDISO 5.0.0: ( reorder to reorder )
Time fulladj: 0.054887 s
Time reorder: 2.192858 s
Time symbfct: 0.189367 s
Time parlist: 0.059304 s
Time malloc : 0.122711 s
Time total : 2.871472 s total - sum: 0.252344 s
Summary PARDISO 5.0.0: ( factorize to factorize )
Time A to LU: 0.000000 s
Time numfct : 0.196008 s
Time malloc : -0.000004 s
Time total : 0.196070 s total - sum: 0.000066 s
Summary PARDISO 5.0.0: ( solve to solve )
Time solve : 0.033254 s
Time total : 0.200364 s total - sum: 0.167110 s
</syntaxhighlight>

Solving sparse systems

2017-07-19T10:57:19Z

Mdepolli:

There are many methods available for solving sparse systems. We compare some of them here.
<figure id="fig:matrix1">
[[File:matrix.png|300px|thumb|upright=2|alt=Matrix of the discretized PDE.|<caption>Matrix of the discretized PDE. </caption>]]
</figure>
Mathematica has the following methods available (https://reference.wolfram.com/language/ref/LinearSolve.html#DetailsAndOptions)
* direct: banded, cholesky, multifrontal (direct sparse LU)
* iterative: Krylov

Matlab has the following methods:
* direct: https://www.mathworks.com/help/matlab/ref/mldivide.html#bt42omx_head
* iterative: https://www.mathworks.com/help/matlab/math/systems-of-linear-equations.html#brzoiix, including bicgstab, gmres

Eigen has the following methods: (https://eigen.tuxfamily.org/dox-devel/group__TopicSparseSystems.html)
* direct: sparse LU
* iterative: bicgstab, cg

Solving a simple sparse system $A x = b$ for steady space of heat equation in 1d with $n$ nodes, results in a matrix shown in <xr id="fig:matrix1"/>.

The following timings of solvers are given in seconds:
{| class="wikitable"
|-
! $n = 10^6$
! Matlab
! Mathematica
! Eigen
|-
! Banded
| 0.16
| 0.28
| 0.04
|-
! SparseLU
| /
| 1.73
| 0.82
|-
|}

Incomplete LU preconditioner was used for BICGStab.
Without the preconditioner BICGStab does not converge.

==Parallel execution ==
BICGStab can be run in parallel, as explain in the general parallelism: https://eigen.tuxfamily.org/dox/TopicMultiThreading.html, and specifically

'''"When using sparse matrices, best performance is achied for a row-major sparse matrix format. Moreover, in this case multi-threading can be exploited if the user code is compiled with OpenMP enabled".'''

Eigen uses number of threads specified my OopenMP, unless <code>Eigen::setNbThreads(n);</code> was called.
Minimal working example:

<figure id="fig:eigen_par">
[[File:eigen_parallel_2.png|1200px|thumb|upright=2|alt=Matrix of the discretized PDE.|<caption>Memory and CPU usage. C stands for construction of the system, L stands for the calculation of ILUT preconditioner and S for BICGStab iteration.</caption>]]
</figure>

<syntaxhighlight lang="c++" line>
#include <iostream>
#include <vector>
#include "Eigen/Sparse"
#include "Eigen/IterativeLinearSolvers"

using namespace std;
using namespace Eigen;

int main(int argc, char* argv[]) {

assert(argc == 2 && "Second argument is size of the system.");
stringstream ss(argv[1]);
int n;
ss >> n;
cout << "n = " << n << endl;

// Fill the matrix
VectorXd b = VectorXd::Ones(n) / n / n;
b(0) = b(n-1) = 1;
SparseMatrix<double, RowMajor> A(n, n);
A.reserve(vector<int>(n, 3)); // 3 per row
for (int i = 0; i < n-1; ++i) {
A.insert(i, i) = -2;
A.insert(i, i+1) = 1;
A.insert(i+1, i) = 1;
}
A.coeffRef(0, 0) = 1;
A.coeffRef(0, 1) = 0;
A.coeffRef(n-1, n-2) = 0;
A.coeffRef(n-1, n-1) = 1;

// Solve the system
BiCGSTAB<SparseMatrix<double, RowMajor>, IncompleteLUT<double>> solver;
solver.setTolerance(1e-10);
solver.setMaxIterations(1000);
solver.compute(A);
VectorXd x = solver.solve(b);
cout << "#iterations: " << solver.iterations() << endl;
cout << "estimated error: " << solver.error() << endl;
cout << "sol: " << x.head(6).transpose() << endl;

return 0;
}
</syntaxhighlight>

was compiled with <code>g++ -o parallel_solve -O3 -fopenmp solver_test_parallel.cpp</code>.
<xr id="fig:eigen_par"/> was produced when the program above was run as <code>./parallel_solve 10000000</code>.

==ILUT factorization==
This is a method to compute a general algebraic preconditioner which has the ability to control (to some degree) the memory and time complexity of the solution.
It is described in a paper by

Saad, Yousef. "ILUT: A dual threshold incomplete LU factorization." Numerical linear algebra with applications 1.4 (1994): 387-402.

It has two parameters, tolerance $\tau$ and fill factor $f$. Elements in $L$ and $U$ factors that would be below relative tolerance $\tau$ (say, to 2-norm of this row) are dropped.
Then, only $f$ elements per row in $L$ and $U$ are allowed and the maximum $f$ (if they exist) are taken. The diagonal elements are always kept. This means that the number of non-zero
elements in the rows of the preconditioner can not exceed $2f+1$.

Greater parameter $f$ means slover but better $ILUT$ factorization. If $\tau$ is small, having a very large $f$ means getting a complete $LU$ factorization. Typical values are $5, 10, 20, 50$.
Lower $\tau$ values mean that move small elements are kept, resulting in a more precise, but longer computation. With a large $f$, the factorization approaches the direct $LU$ factorization as $\tau \to 0$. Typical values are from 1e-2
to 1e-6.

Whether the method converges or diverges for given parameters is very sudden and should be determined for each apllication separately. An example of
such a convergence graph for the diffusion equation is presented in figures below:

[[File:bicgstab_err.png|400px]][[File:bicgstab_conv.png|400px]][[File:bicgstab_iter.png|400px]]

We can see that low BiCGSTAB error estimate coincides with the range of actual small error compared to the analytical solution as well as with the range of lesser number of iterations.

==Pardiso library==
[[Pardiso]]

==Herzian contact ==
The matrix of the [[Hertzian contact]] problem is shown below:
[[File:matrix_hertzian.png|500px]]

The condition number iz arther large and grows superlinearly with size in the example below.
[[File:cond.png|600px]]

Weighted Least Squares (WLS)

2017-05-19T13:44:53Z

Mdepolli:

One of the most important building blocks of the meshless methods is the Moving Least Squares approximation, which is implemented in the [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classEngineMLS.html EngineMLS class]. Check [https://gitlab.com/e62Lab/e62numcodes/blob/master/test/mls_test.cpp EngineMLS unit tests] for examples.

= Notation Cheat sheet =
\begin{align*}
m \in \N & \dots \text{number of basis functions} \\
n \geq m \in \N & \dots \text{number of points in support domain} \\
k \in \mathbb{N} & \dots \text{dimensionality of vector space} \\
\vec s_j \in \R^k & \dots \text{point in support domain } \quad j=1,\dots,n \\
u_j \in \R & \dots \text{value of function to approximate in }\vec{s}_j \quad j=1,\dots,n \\
\vec p \in \R^k & \dots \text{center point of approximation} \\
b_i\colon \R^k \to \R & \dots \text{basis functions } \quad i=1,\dots,m \\
B_{j, i} \in \R & \dots \text{value of basis functions in support points } b_i(s_j-p) \quad j=1,\dots,n, \quad i=1,\dots,m\\
\omega \colon \R^k \to \R & \dots \text{weight function} \\
w_j \in \R & \dots \text{weights } \omega(\vec{s}_j-\vec{p}) \quad j=1,\dots,n \\
\alpha_i \in \R & \dots \text{expansion coefficients around point } \vec{p} \quad i=1,\dots,m \\
\hat u\colon \R^k \to \R & \dots \text{approximation function (best fit)} \\
\chi_j \in \R & \dots \text{shape coefficient for point }\vec{p} \quad j=1,\dots,n \\
\end{align*}

We will also use $\b{s}, \b{u}, \b{b}, \b{\alpha}, \b{\chi} $ to annotate a column of corresponding values,
$W$ as a $n\times n$ diagonal matrix filled with $w_j$ on the diagonal and $B$ as a $n\times m$ matrix filled with $B_{j, i}$.

= Definition of local approximation =
<figure id="fig:1DWLS">
[[File:image_1avhdsfej1b9cao01029m1e13o69.png|600px|thumb|upright=2|alt=1D MLS example|<caption>Example of 1D WLS approximation </caption>]]
</figure>
Our wish is to approximate an unknown function $u\colon \R^k \to \R$ while knowing $n$ values $u(\vec{s}_j) := u_j$.
The vector of known values will be denoted by $\b{u}$ and the vector of coordinates where those values were achieved by $\b{s}$.
Note that $\b{s}$ is not a vector in the usual sense since its components $\vec{s}_j$ are elements of $\R^k$, but we will call it vector anyway.
The values of $\b{s}$ are called ''nodes'' or ''support nodes'' or ''support''. The known values $\b{u}$ are also called ''support values''.

In general, an approximation function around point $\vec{p}\in\R^k$ can be
written as \[\hat{u} (\vec{x}) = \sum_{i=1}^m \alpha_i b_i(\vec{x}) = \b{b}(\vec{x})^\T \b{\alpha} \]
where $\b{b} = (b_i)_{i=1}^m$ is a set of ''basis functions'', $b_i\colon \R^k \to\R$, and $\b{\alpha} = (\alpha_i)_{i=1}^m$ are the unknown coefficients.

In MLS the goal is to minimize the error of approximation in given values, $\b{e} = \hat u(\b{s}) - \b{u}$
between the approximation function and target function in the known points $\b{x}$. The error can also be written as $B\b{\alpha} - \b{u}$,
where $B$ is rectangular matrix of dimensions $n \times m$ with rows containing basis function evaluated in points $\vec{s}_j$.
\[ B =
\begin{bmatrix}
b_1(\vec{s}_1) & \ldots & b_m(\vec{s}_1) \\
\vdots & \ddots & \vdots \\
b_1(\vec{s}_n) & \ldots & b_m(\vec{s}_n)
\end{bmatrix} =
[b_i(\vec{s}_j)]_{j=1,i=1}^{n,m} = [\b{b}(\vec{s}_j)^\T]_{j=1}^n. \]

We can choose to minimize any norm of the error vector $e$
and usually choose to minimize the $2$-norm or square norm \[ \|\b{e}\| = \|\b{e}\|_2 = \sqrt{\sum_{j=1}^n e_j^2}. \]
Commonly, we also choose to minimize a weighted norm
<ref>Note that our definition is a bit unusual, usually weights are not
squared with the values. However, we do this to avoid computing square
roots when doing MLS. If you are used to the usual definition,
consider the weight to be $\omega^2$.</ref>
instead \[ \|\b{e}\|_{2,w} = \|\b{e}\|_w = \sqrt{\sum_{j=1}^n (w_j e_j)^2}. \]
The ''weights'' $w_i$ are assumed to be non negative and are assembled in a vector $\b{w}$ or a matrix $W = \operatorname{diag}(\b{w})$ and usually obtained from a weight function.
A ''weight function'' is a function $\omega\colon \R^k \to[0,\infty)$. We calculate $w_j$ as $w_i := \omega(\vec{p}-\vec{s}_j)$, so
good choices for $\omega$ are function which have higher values close to $0$ (making closer nodes more important), like the normal distribution.
If we choose $\omega \equiv 1$, we get the unweighted version.

A choice of minimizing the square norm gave this method its name - Least Squares approximation. If we use the weighted version, we get the Weighted Least Squares or WLS.
In the most general case we wish to minimize
\[ \|\b{e}\|_{2,w}^2 = \b{e}^\T W^2 \b{e} = (B\b{\alpha} - \b{u})^\T W^2(B\b{\alpha} - \b{u}) = \sum_j^n w_j^2 (\hat{u}(\vec{s}_j) - u_j)^2 \]

The problem of finding the coefficients $\b{\alpha}$ that minimize the error $\b{e}$ can be solved with at least three approaches:
* Normal equations (fastest, less accurate) - using Cholesky decomposition of $B$ (requires full rank and $m \leq n$)
* QR decomposition of $B$ (requires full rank and $m \leq n$, more precise)
* SVD decomposition of $B$ (more expensive, even more reliable, no rank demand)

In MM we use SVD with regularization described below.

= Computing approximation coefficients =

== Normal equations ==
We seek the minimum of
\[ \|\b{e}\|_2^2 = (B\b{\alpha} - \b{u})^\T(B\b{\alpha} - \b{u}) \]
By seeking the zero gradient in terms of coefficients $\alpha_i$
\[\frac{\partial}{\partial \alpha_i} (B\b{\alpha} - \b{u})^\T(B\b{\alpha} - \b{u}) = 0\]
resulting in
\[ B^\T B\b{\alpha} = B^\T \b{u}. \]
The coefficient matrix $B^\T B$ is symmetric and positive definite. However, solving above problem directly is
poorly behaved with respect to round-off errors since the condition number $\kappa(B^\T B)$ is the square
of $\kappa(B)$.

In case of WLS the equations become
\[ (WB)^\T WB \b{\alpha} = WB^\T \b{u}. \]

Complexity of Cholesky decomposition is $\frac{n^3}{3}$ and complexity of matrix multiplication $nm^2$. To preform the Cholesky decomposition the rank of $WB$ must be full.

'''Pros:'''
* simple to implement
* low computational complexity

'''Cons:'''
* numerically unstable
* full rank requirement

== [https://en.wikipedia.org/wiki/QR_decomposition $QR$ Decomposition] ==
\[{\bf{B}} = {\bf{QR}} = \left[ {{{\bf{Q}}_1},{{\bf{Q}}_2}} \right]\left[ {\begin{array}{*{20}{c}}
{{{\bf{R}}_1}}\\
0
\end{array}} \right]\]
\[{\bf{B}} = {{\bf{Q}}_1}{{\bf{R}}_1}\]
$\bf{Q}$ is unitary matrix ($\bf{Q}^{-1}=\bf{Q}^T$). Useful property of a unitary matrices is that multiplying with them does not alter the (Euclidean) norm of a vector, i.e.,
\[\left\| {{\bf{Qx}}} \right\|{\bf{ = }}\left\| {\bf{x}} \right\|\]
And $\bf{R}$ is upper diagonal matrix
\[{\bf{R = (}}{{\bf{R}}_{\bf{1}}}{\bf{,}}0{\bf{)}}\]
therefore we can say
\[\begin{array}{l}
\left\| {{\bf{B\alpha }} - {\bf{u}}} \right\| = \left\| {{{\bf{Q}}^{\rm{T}}}\left( {{\bf{B\alpha }} - {\bf{u}}} \right)} \right\| = \left\| {{{\bf{Q}}^{\rm{T}}}{\bf{B\alpha }} - {{\bf{Q}}^{\rm{T}}}{\bf{u}}} \right\|\\
= \left\| {{{\bf{Q}}^{\rm{T}}}\left( {{\bf{QR}}} \right){\bf{\alpha }} - {{\bf{Q}}^{\rm{T}}}{\bf{u}}} \right\| = \left\| {\left( {{{\bf{R}}_1},0} \right){\bf{\alpha }} - {{\left( {{{\bf{Q}}_1},{{\bf{Q}}_{\bf{2}}}} \right)}^{\rm{T}}}{\bf{u}}} \right\|\\
= \left\| {{{\bf{R}}_{\bf{1}}}{\bf{\alpha }} - {{\bf{Q}}_{\bf{1}}}{\bf{u}}} \right\| + \left\| {{\bf{Q}}_2^{\rm{T}}{\bf{u}}} \right\|
\end{array}\]
Of the two terms on the right we have no control over the second, and we can render the first one
zero by solving
\[{{\bf{R}}_{\bf{1}}}{\bf{\alpha }} = {\bf{Q}}_{_{\bf{1}}}^{\rm{T}}{\bf{u}}\]
Which results in a minimum. We could also compute it with pseudo inverse
\[\mathbf{\alpha }={{\mathbf{B}}^{-1}}\mathbf{u}\]
Where pseudo inverse is simply \[{{\mathbf{B}}^{-1}}=\mathbf{R}_{\text{1}}^{\text{-1}}{{\mathbf{Q}}^{\text{T}}}\] (once again, $R$ is upper diagonal matrix, and $Q$ is unitary matrix).
And for weighted case
\[\mathbf{\alpha }={{\left( {{\mathbf{W}}^{0.5}}\mathbf{B} \right)}^{-1}}\left( {{\mathbf{W}}^{0.5}}\mathbf{u} \right)\]

Complexity of $QR$ decomposition \[\frac{2}{3}m{{n}^{2}}+{{n}^{2}}+\frac{1}{3}n-2=O({{n}^{3}})\]

Pros: better stability in comparison with normal equations cons: higher complexity

== [https://en.wikipedia.org/wiki/Singular_value_decomposition SVD decomposition] ==
In linear algebra, the [https://en.wikipedia.org/wiki/Singular_value_decomposition singular value decomposition (SVD)]
is a factorization of a real or complex matrix. It has many useful
applications in signal processing and statistics.

Formally, the singular value decomposition of an $m \times n$ real or complex
matrix $\bf{B}$ is a factorization of the form $\bf{B}= \bf{U\Sigma V^\T}$, where
$\bf{U}$ is an $m \times m$ real or complex unitary matrix, $\bf{\Sigma}$ is an $m \times n$
rectangular diagonal matrix with non-negative real numbers on the diagonal, and
$\bf{V}^\T$ is an $n \times n$ real or complex unitary matrix. The diagonal entries
$\Sigma_{ii}$ are known as the singular values of $\bf{B}$. The $m$ columns of
$\bf{U}$ and the $n$ columns of $\bf{V}$ are called the left-singular vectors and
right-singular vectors of $\bf{B}$, respectively.

The singular value decomposition and the eigen decomposition are closely
related. Namely:

* The left-singular vectors of $\bf{B}$ are eigen vectors of $\bf{BB}^\T$.
* The right-singular vectors of $\bf{B}$ are eigen vectors of $\bf{B}^\T{B}$.
* The non-zero singular values of $\bf{B}$ (found on the diagonal entries of $\bf{\Sigma}$) are the square roots of the non-zero eigenvalues of both $\bf{B}^\T\bf{B}$ and $\bf{B}^\T \bf{B}$.

with SVD we can write $\bf{B}$ as \[\bf{B}=\bf{U\Sigma{{V}^{\T}}}\] where are $\bf{U}$ and $\bf{V}$ again unitary matrices and $\bf{\Sigma}$
stands for diagonal matrix of singular values.

Again we can solve either the system or compute pseudo inverse as

\[ \bf{B}^{-1} = \left( \bf{U\Sigma V}^\T\right)^{-1} = \bf{V}\bf{\Sigma^{-1}U}^\T \]
where $\bf{\Sigma}^{-1}$ is trivial, just replace every non-zero diagonal entry by
its reciprocal and transpose the resulting matrix. The stability gain is
exactly here, one can now set threshold below which the singular value is
considered as $0$, basically truncate all singular values below some value and
thus stabilize the inverse.

SVD decomposition complexity \[ 2mn^2+2n^3 = O(n^3) \]

Pros: stable cons: high complexity

Method used in MLMS is SVD with regularization.

= Weighted Least Squares =
Weighted least squares approximation is the simplest version of the procedure described above. Given support $\b{s}$, values $\b{u}$
and an anchor point $\vec{p}$, we calculate the coefficients $\b{\alpha}$ using one of the above methods.
Then, to approximate a function in the neighbourhood of $\vec p$ we use the formula
\[
\hat{u}(\vec x) = \b{b}(\vec x)^\T \b{\alpha} = \sum_{i=1}^m \alpha_i b_i(\vec x).
\]

To approximate the derivative $\frac{\partial u}{\partial x_i}$, or any linear partial differential operator $\mathcal L$ on $u$, we
simply take the same linear combination of transformed basis functions $\mathcal L b_i$. We have considered coefficients $\alpha_i$ to be
constant and applied the linearity.
\[
\widehat{\mathcal L u}(\vec x) = \sum_{i=1}^m \alpha_i (\mathcal L b_i)(\vec x).
\]

= WLS at fixed point with fixed support and unknown function values =
Suppose now we are given support $\b{s}$ and a point $\b{p}$ and want to construct the function approximation from values $\b{u}$.
We proceed as usual, solving the overdetermined system $WB \b{\alpha} = W\b{u}$ for coefficients $\b{\alpha}$ using the pseudoinverse
\[ \b{\alpha} = (WB)^+W\b{u}, \]
where $A^+$ denotes the Moore-Penrose pseudoinverse that can be calculated using SVD.

Writing down the approximation function $\hat{u}$ we get
\[
\hat u (\vec{p}) = \b{b}(\vec{p})^\T \b{\alpha} = \b{b}(\vec{p})^\T (WB)^+W\b{u} = \b{\chi}(\vec{p}) \b{u}.
\]

We have defined $\b{\chi}$ to be
\[ \b{\chi}(\vec{p}) = \b{b}(\vec{p})^\T (WB)^+W. \]
Vector $\b{\chi}$ is a row vector, also called a ''shape function''. The name comes from being able to take all the information
about shape of the domain and choice of approximation and store it in a single row vector, being able to approximate
a function value from given support values $\b{u}$ with a single dot product. For any values $\b{u}$, value $\b{\chi}(p) \b{u}$
gives us the approximation $\hat{u}(\vec{p})$ of $u$ in point $\vec{p}$.
Mathematically speaking, $\b{\chi}(\vec{p})$ is a functional, $\b{\chi}(\vec{p})\colon \R^n \to \R$, mapping $n$-tuples of known function values to
their approximations in point $\vec{p}$.

The same approach works for any linear operator $\mathcal L$ applied to $u$, just replace every $b_i$ in definition of $\b{\chi}$ with $\mathcal Lb_i$.
For example, take a $1$-dimensional case for approximation of derivatives with weight equal to $1$ and $n=m=3$, with equally spaced support values at distances $h$.
We wish to approximate $u''$ in the middle support point, just by making a weighted sum of the values, something like the finite difference
\[ u'' \approx \frac{u_1 - 2u_2 + u_3}{h^2}. \]
This is exactly the same formula as we would have come to by computing $\b{\chi}$, except that our approach is a lot more general. But one should think about
$\b{\chi}$ as one would about the finite difference scheme, it is a rule, telling us how to compute the derivative.
\[ u''(s_2) \approx \underbrace{\begin{bmatrix} \frac{1}{h^2} & \frac{-2}{h^2} & \frac{1}{h^2} \end{bmatrix}}_{\b{\chi}} \begin{bmatrix}u_1 \\ u_2 \\ u_3 \end{bmatrix} \]

The fact that $\b{\chi}$ is independent of the function values $\b{u}$ but depend only on domain geometry means that
'''we can just compute the shape functions $\b{\chi}$ for points of interest and then approximate any linear operator
of any function, given its values, very fast, using only a single dot product.'''

= MLS =

<figure id="fig:comparisonMLSandWLS">
[[File:mlswls.svg|thumb|upright=2|<caption>Comparison of WLS and MLS approximation</caption>]]
</figure>

When using WLS the approximation gets worse as we move away from the central point $\vec{p}$.
This is partially due to not being in the center of the support any more and partially due to weight
being distributed in such a way to assign more importance to nodes closer to $\vec{p}$.

We can battle this problem in two ways: when we wish to approximate in a new point that is sufficiently far
away from $\vec{p}$ we can compute new support, recompute the new coefficients $\b{\alpha}$ and approximate again.
This is very costly and we would like to avoid that. A partial fix is to keep support the same, but only
recompute the weight vector $\b{w}$, that will now assign weight values to nodes close to the new point.
We still need to recompute the coefficients $\b{\alpha}$, however we avoid the cost of setting up new support
and function values and recomputing $B$. This approach is called Moving Least Squares due to recomputing
the weighted least squares problem whenever we move the point of approximation.

Note that if out weight is constant or if $n = m$, when approximation reduces to interpolation, the weights do not play
any role and this method is redundant. In fact, its benefits arise when supports are rather large.

See <xr id="fig:comparisonMLSandWLS"/> for comparison between MLS and WLS approximations. MLS approximation remains close to
actual function while still inside the support domain, while WLS approximation becomes bad when
we come out of the reach of the weight function.

{{reflist}}

Weighted Least Squares (WLS)

2017-05-19T12:49:47Z

Mdepolli:

One of the most important building blocks of the meshless methods is the Moving Least Squares approximation, which is implemented in the [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classEngineMLS.html EngineMLS class]. Check [https://gitlab.com/e62Lab/e62numcodes/blob/master/test/mls_test.cpp EngineMLS unit tests] for examples.

= Notation Cheat sheet =
\begin{align*}
m \in \N & \dots \text{number of basis functions} \\
n \geq m \in \N & \dots \text{number of points in support domain} \\
k \in \mathbb{N} & \dots \text{dimensionality of vector space} \\
\vec s_j \in \R^k & \dots \text{point in support domain } \quad j=1,\dots,n \\
u_j \in \R & \dots \text{value of function to approximate in }\vec{s}_j \quad j=1,\dots,n \\
\vec p \in \R^k & \dots \text{center point of approximation} \\
b_i\colon \R^k \to \R & \dots \text{basis functions } \quad i=1,\dots,m \\
B_{j, i} \in \R & \dots \text{value of basis functions in support points } b_i(s_j-p) \quad j=1,\dots,n, \quad i=1,\dots,m\\
\omega \colon \R^k \to \R & \dots \text{weight function} \\
w_j \in \R & \dots \text{weights } \omega(\vec{s}_j-\vec{p}) \quad j=1,\dots,n \\
\alpha_i \in \R & \dots \text{expansion coefficients around point } \vec{p} \quad i=1,\dots,m \\
\hat u\colon \R^k \to \R & \dots \text{approximation function (best fit)} \\
\chi_j \in \R & \dots \text{shape coefficient for point }\vec{p} \quad j=1,\dots,n \\
\end{align*}

We will also use $\b{s}, \b{u}, \b{b}, \b{\alpha}, \b{\chi} $ to annotate a column of corresponding values,
$W$ as a $n\times n$ diagonal matrix filled with $w_j$ on the diagonal and $B$ as a $n\times m$ matrix filled with $B_{j, i}$.

= Definition of local aproximation =
<figure id="fig:1DWLS">
[[File:image_1avhdsfej1b9cao01029m1e13o69.png|600px|thumb|upright=2|alt=1D MLS example|<caption>Example of 1D WLS approximation </caption>]]
</figure>
Our wish is to approximate an unknown function $u\colon \R^k \to \R$ while knowing $n$ values $u(\vec{s}_j) := u_j$.
The vector of known values will be denoted by $\b{u}$ and the vector of coordinates where those values were achieved by $\b{s}$.
Note that $\b{s}$ is not a vector in the usual sense since its components $\vec{s}_j$ are elements of $\R^k$, but we will call it vector anyway.
The values of $\b{s}$ are called ''nodes'' or ''support nodes'' or ''support''. The known values $\b{u}$ are also called ''support values''.

In general, an approximation function around point $\vec{p}\in\R^k$ can be
written as \[\hat{u} (\vec{x}) = \sum_{i=1}^m \alpha_i b_i(\vec{x}) = \b{b}(\vec{x})^\T \b{\alpha} \]
where $\b{b} = (b_i)_{i=1}^m$ is a set of ''basis functions'', $b_i\colon \R^k \to\R$, and $\b{\alpha} = (\alpha_i)_{i=1}^m$ are the unknown coefficients.

In MLS the goal is to minimize the error of approximation in given values, $\b{e} = \hat u(\b{s}) - \b{u}$
between the approximation function and target function in the known points $\b{x}$. The error can also be written as $B\b{\alpha} - \b{u}$,
where $B$ is rectangular matrix of dimensions $n \times m$ with rows containing basis function evaluated in points $\vec{s}_j$.
\[ B =
\begin{bmatrix}
b_1(\vec{s}_1) & \ldots & b_m(\vec{s}_1) \\
\vdots & \ddots & \vdots \\
b_1(\vec{s}_n) & \ldots & b_m(\vec{s}_n)
\end{bmatrix} =
[b_i(\vec{s}_j)]_{j=1,i=1}^{n,m} = [\b{b}(\vec{s}_j)^\T]_{j=1}^n. \]

We can choose to minimize any norm of the error vector $e$
and usually choose to minimize the $2$-norm or square norm \[ \|\b{e}\| = \|\b{e}\|_2 = \sqrt{\sum_{j=1}^n e_j^2}. \]
Commonly, we also choose to minimize a weighted norm
<ref>Note that our definition is a bit unusual, usually weights are not
squared with the values. However, we do this to avoid computing square
roots when doing MLS. If you are used to the usual definition,
consider the weight to be $\omega^2$.</ref>
instead \[ \|\b{e}\|_{2,w} = \|\b{e}\|_w = \sqrt{\sum_{j=1}^n (w_j e_j)^2}. \]
The ''weights'' $w_i$ are assumed to be non negative and are assembled in a vector $\b{w}$ or a matrix $W = \operatorname{diag}(\b{w})$ and usually obtained from a weight function.
A ''weight function'' is a function $\omega\colon \R^k \to[0,\infty)$. We calculate $w_j$ as $w_i := \omega(\vec{p}-\vec{s}_j)$, so
good choices for $\omega$ are function which have higher values close to $0$ (making closer nodes more important), like the normal distribution.
If we choose $\omega \equiv 1$, we get the unweighted version.

A choice of minimizing the square norm gave this method its name - Least Squares appoximation. If we use the weighted version, we get the Weighted Least Squares or WLS.
In the most general case we wish to minimize
\[ \|\b{e}\|_{2,w}^2 = \b{e}^\T W^2 \b{e} = (B\b{\alpha} - \b{u})^\T W^2(B\b{\alpha} - \b{u}) = \sum_j^n w_j^2 (\hat{u}(\vec{s}_j) - u_j)^2 \]

The problem of finding the coefficients $\b{\alpha}$ that minimize the error $\b{e}$ can be solved with at least three approaches:
* Normal equations (fastest, less accurate) - using Cholesky decomposition of $B$ (requires full rank and $m \leq n$)
* QR decomposition of $B$ (requires full rank and $m \leq n$, more precise)
* SVD decomposition of $B$ (more expensive, even more reliable, no rank demand)

In MM we use SVD with regularization described below.

= Computing approximation coefficients =

== Normal equations ==
We seek the minimum of
\[ \|\b{e}\|_2^2 = (B\b{\alpha} - \b{u})^\T(B\b{\alpha} - \b{u}) \]
By seeking the zero gradient in terms of coefficients $\alpha_i$
\[\frac{\partial}{\partial \alpha_i} (B\b{\alpha} - \b{u})^\T(B\b{\alpha} - \b{u}) = 0\]
resulting in
\[ B^\T B\b{\alpha} = B^\T \b{u}. \]
The coefficient matrix $B^\T B$ is symmetric and positive definite. However, solving above problem directly is
poorly behaved with respect to round-off errors since the condition number $\kappa(B^\T B)$ is the square
of $\kappa(B)$.

In case of WLS the equations become
\[ (WB)^\T WB \b{\alpha} = WB^\T \b{u}. \]

Complexity of Cholesky decomposition is $\frac{n^3}{3}$ and complexity of matrix multiplication $nm^2$. To preform the Cholesky decomposition the rank of $WB$ must be full.

'''Pros:'''
* simple to implement
* low computational complexity

'''Cons:'''
* numerically unstable
* full rank requirement

== [https://en.wikipedia.org/wiki/QR_decomposition $QR$ Decomposition] ==
\[{\bf{B}} = {\bf{QR}} = \left[ {{{\bf{Q}}_1},{{\bf{Q}}_2}} \right]\left[ {\begin{array}{*{20}{c}}
{{{\bf{R}}_1}}\\
0
\end{array}} \right]\]
\[{\bf{B}} = {{\bf{Q}}_1}{{\bf{R}}_1}\]
$\bf{Q}$ is unitary matrix ($\bf{Q}^{-1}=\bf{Q}^T$). Useful property of a unitary matrices is that multiplying with them does not alter the (Euclidean) norm of a vector, i.e.,
\[\left\| {{\bf{Qx}}} \right\|{\bf{ = }}\left\| {\bf{x}} \right\|\]
And $\bf{R}$ is upper diagonal matrix
\[{\bf{R = (}}{{\bf{R}}_{\bf{1}}}{\bf{,}}0{\bf{)}}\]
therefore we can say
\[\begin{array}{l}
\left\| {{\bf{B\alpha }} - {\bf{u}}} \right\| = \left\| {{{\bf{Q}}^{\rm{T}}}\left( {{\bf{B\alpha }} - {\bf{u}}} \right)} \right\| = \left\| {{{\bf{Q}}^{\rm{T}}}{\bf{B\alpha }} - {{\bf{Q}}^{\rm{T}}}{\bf{u}}} \right\|\\
= \left\| {{{\bf{Q}}^{\rm{T}}}\left( {{\bf{QR}}} \right){\bf{\alpha }} - {{\bf{Q}}^{\rm{T}}}{\bf{u}}} \right\| = \left\| {\left( {{{\bf{R}}_1},0} \right){\bf{\alpha }} - {{\left( {{{\bf{Q}}_1},{{\bf{Q}}_{\bf{2}}}} \right)}^{\rm{T}}}{\bf{u}}} \right\|\\
= \left\| {{{\bf{R}}_{\bf{1}}}{\bf{\alpha }} - {{\bf{Q}}_{\bf{1}}}{\bf{u}}} \right\| + \left\| {{\bf{Q}}_2^{\rm{T}}{\bf{u}}} \right\|
\end{array}\]
Of the two terms on the right we have no control over the second, and we can render the first one
zero by solving
\[{{\bf{R}}_{\bf{1}}}{\bf{\alpha }} = {\bf{Q}}_{_{\bf{1}}}^{\rm{T}}{\bf{u}}\]
Which results in a minimum. We could also compute it with pseudo inverse
\[\mathbf{\alpha }={{\mathbf{B}}^{-1}}\mathbf{u}\]
Where pseudo inverse is simply \[{{\mathbf{B}}^{-1}}=\mathbf{R}_{\text{1}}^{\text{-1}}{{\mathbf{Q}}^{\text{T}}}\] (once again, $R$ is upper diagonal matrix, and $Q$ is unitary matrix).
And for weighted case
\[\mathbf{\alpha }={{\left( {{\mathbf{W}}^{0.5}}\mathbf{B} \right)}^{-1}}\left( {{\mathbf{W}}^{0.5}}\mathbf{u} \right)\]

Complexity of $QR$ decomposition \[\frac{2}{3}m{{n}^{2}}+{{n}^{2}}+\frac{1}{3}n-2=O({{n}^{3}})\]

Pros: better stability in comparison with normal equations cons: higher complexity

== [https://en.wikipedia.org/wiki/Singular_value_decomposition SVD decomposition] ==
In linear algebra, the [https://en.wikipedia.org/wiki/Singular_value_decomposition singular value decomposition (SVD)]
is a factorization of a real or complex matrix. It has many useful
applications in signal processing and statistics.

Formally, the singular value decomposition of an $m \times n$ real or complex
matrix $\bf{B}$ is a factorization of the form $\bf{B}= \bf{U\Sigma V^\T}$, where
$\bf{U}$ is an $m \times m$ real or complex unitary matrix, $\bf{\Sigma}$ is an $m \times n$
rectangular diagonal matrix with non-negative real numbers on the diagonal, and
$\bf{V}^\T$ is an $n \times n$ real or complex unitary matrix. The diagonal entries
$\Sigma_{ii}$ are known as the singular values of $\bf{B}$. The $m$ columns of
$\bf{U}$ and the $n$ columns of $\bf{V}$ are called the left-singular vectors and
right-singular vectors of $\bf{B}$, respectively.

The singular value decomposition and the eigen decomposition are closely
related. Namely:

* The left-singular vectors of $\bf{B}$ are eigen vectors of $\bf{BB}^\T$.
* The right-singular vectors of $\bf{B}$ are eigen vectors of $\bf{B}^\T{B}$.
* The non-zero singular values of $\bf{B}$ (found on the diagonal entries of $\bf{\Sigma}$) are the square roots of the non-zero eigenvalues of both $\bf{B}^\T\bf{B}$ and $\bf{B}^\T \bf{B}$.

with SVD we can write $\bf{B}$ as \[\bf{B}=\bf{U\Sigma{{V}^{\T}}}\] where are $\bf{U}$ and $\bf{V}$ again unitary matrices and $\bf{\Sigma}$
stands for diagonal matrix of singular values.

Again we can solve either the system or compute pseudo inverse as

\[ \bf{B}^{-1} = \left( \bf{U\Sigma V}^\T\right)^{-1} = \bf{V}\bf{\Sigma^{-1}U}^\T \]
where $\bf{\Sigma}^{-1}$ is trivial, just replace every non-zero diagonal entry by
its reciprocal and transpose the resulting matrix. The stability gain is
exactly here, one can now set threshold below which the singular value is
considered as $0$, basically truncate all singular values below some value and
thus stabilize the inverse.

SVD decomposition complexity \[ 2mn^2+2n^3 = O(n^3) \]

Pros: stable cons: high complexity

Method used in MLMS is SVD with regularization.

= Weighted Least Squares =
Weighted least squares approximation is the simplest version of the procedure described above. Given supoprt $\b{s}$, values $\b{u}$
and an anchor point $\vec{p}$, we calculate the coefficients $\b{\alpha}$ using one of the above methods.
Then, to approximate a function in the neighbourhood of $\vec p$ we use the formula
\[
\hat{u}(\vec x) = \b{b}(\vec x)^\T \b{\alpha} = \sum_{i=1}^m \alpha_i b_i(\vec x).
\]

To approximate the derivative $\frac{\partial u}{\partial x_i}$, or any linear partial differential operator $\mathcal L$ on $u$, we
simply take the same linear combination of transformed basis functions $\mathcal L b_i$. We have considered coefficients $\alpha_i$ to be
constant and applied the linearity.
\[
\widehat{\mathcal L u}(\vec x) = \sum_{i=1}^m \alpha_i (\mathcal L b_i)(\vec x).
\]

= WLS at fixed point with fixed support and unknown function values =
Suppose now we are given support $\b{s}$ and a point $\b{p}$ and want to construct the function approximation from values $\b{u}$.
We proceed as usual, solving the overdetemined system $WB \b{\alpha} = W\b{u}$ for coefficients $\b{\alpha}$ using the pseudoinverse
\[ \b{\alpha} = (WB)^+W\b{u}, \]
where $A^+$ denotes the Moore-Penrose pseudoinverse that can be calculated using SVD.

Writing down the appoximation function $\hat{u}$ we get
\[
\hat u (\vec{p}) = \b{b}(\vec{p})^\T \b{\alpha} = \b{b}(\vec{p})^\T (WB)^+W\b{u} = \b{\chi}(\vec{p}) \b{u}.
\]

We have defined $\b{\chi}$ to be
\[ \b{\chi}(\vec{p}) = \b{b}(\vec{p})^\T (WB)^+W. \]
Vector $\b{\chi}$ is a row vector, also called a ''shape function''. The name comes from being able to take all the information
about shape of the domain and choice of approximation and store it in a single row vector, being able to approximate
a function value from givven supoprt values $\b{u}$ with a single dot product. For any values $\b{u}$, value $\b{\chi}(p) \b{u}$
gives us the appoximation $\hat{u}(\vec{p})$ of $u$ in point $\vec{p}$.
Mathematically speaking, $\b{\chi}(\vec{p})$ is a functional, $\b{\chi}(\vec{p})\colon \R^n \to \R$, mapping $n$-tuples of known function values to
their approximations in point $\vec{p}$.

The same approach works for any linear operator $\mathcal L$ applied to $u$, just replace every $b_i$ in definition of $\b{\chi}$ with $\mathcal Lb_i$.
For example, take a $1$-dimensional case for approximation of derivatives with weight equal to $1$ and $n=m=3$, with equally spaced support values at distances $h$.
We wish to approximate $u''$ in the middle support point, just by making a weighted sum of the values, something like the finite difference
\[ u'' \approx \frac{u_1 - 2u_2 + u_3}{h^2}. \]
This is exactly the same formula as we would have come to by computing $\b{\chi}$, except that our approach is a lot more general. But one should think about
$\b{\chi}$ as one would about the finite difference scheme, it is a rule, telling us how to compute the derivative.
\[ u''(s_2) \approx \underbrace{\begin{bmatrix} \frac{1}{h^2} & \frac{-2}{h^2} & \frac{1}{h^2} \end{bmatrix}}_{\b{\chi}} \begin{bmatrix}u_1 \\ u_2 \\ u_3 \end{bmatrix} \]

The fact that $\b{\chi}$ is independant of the function values $\b{u}$ but depend only on domain geometry means that
'''we can just compute the shape functions $\b{\chi}$ for points of interest and then approximate any linear operator
of any function, given its values, very fast, using only a single dot product.'''

= MLS =

<figure id="fig:comparisonMLSandWLS">
[[File:mlswls.svg|thumb|upright=2|<caption>Comparison of WLS and MLS approximation</caption>]]
</figure>

When using WLS the approximation gets worse as we move away from the central point $\vec{p}$.
This is partially due to not being in the center of the support any more and partially due to weight
being distribute in such a way to assign more importance to nodes closer to $\vec{p}$.

We can battle this problem in two ways: when we wish to approximate in a new point that is sufficiently far
away from $\vec{p}$ we can compute new support, recompute the new coefficients $\b{\alpha}$ and approximate again.
This is very costly and we would like to avoid that. A partial fix is to keep support the same, but only
recompute the weight vector $\b{w}$, that will now assign hight values to nodes close to the new point.
We still need to recompute the coefficients $\b{\alpha}$, hoever we avoid the cost of setting up new support
and function values and recomputing $B$. This approach is called Moving Least Squares due to recomputing
the weighted least squares problem whenever we move the point of approximation.

Note that is out weight is constant or if $n = m$, when approximation reduces to interpolation, the weights do not play
any role and this method is redundant. In fact, its benefits arise when supports are rather large.

See <xr id="fig:comparisonMLSandWLS"/> for comparison between MLS and WLS approximations. MLS approximation remains close to
actual function while still inside the support domain, while WLS approximation becomes bad when
we come out of the reach of the weight function.

{{reflist}}