Medusa: Coordinate Free Mehless Method implementation - User contributions [en]

Quantum Mechanics

2018-01-11T13:55:40Z

Mkolman: /* Particle in a box */

= Introduction =
The quantum world is governed by the [https://en.wikipedia.org/wiki/Schr%C3%B6dinger_equation Schrödinger equation]

\[{\displaystyle {\hat {H}}|\psi (t)\rangle =i\hbar {\frac {\partial }{\partial t}}|\psi (t)\rangle } \]

where $\hat H$ is the [https://en.wikipedia.org/wiki/Hamiltonian_(quantum_mechanics) Hamiltonian], $|\psi (t)\rangle$ is the [https://en.wikipedia.org/wiki/Wave_function quantum state function] and $\hbar$ is the reduced [https://en.wikipedia.org/wiki/Planck_constant Planck constant].

The Hamiltonian consists of kinetic energy $\hat T$ and potential energy $\hat V$. As in classical mechanics, potential energy is a function of time and space, whereas the kinetic energy differs from the classical world and is calculated as

\[\hat T = - \frac{\hbar^2}{2m} \nabla^2 .\]

The final version of the single particle Schrödinger equation can be written as

\[\left(- \frac{\hbar^2}{2m} \nabla^2 + V(t, \mathbf r)\right) \psi(t, \mathbf r) = i\hbar {\frac {\partial }{\partial t}}\psi(t, \mathbf r) \]

Quantum state function is a complex function, so it is usually split into the real part and imaginary part

\[ u, v \in C(\mathbb R)\colon \psi = u + i v , \]

which for a real $V$ yields a system of two real equations

\[\left(- \frac{\hbar^2}{2m} \nabla^2 + V(t, \mathbf r)\right) u(t, \mathbf r) = -\hbar {\frac {\partial }{\partial t}} v(t, \mathbf r) , \]
\[\left(- \frac{\hbar^2}{2m} \nabla^2 + V(t, \mathbf r)\right) v(t, \mathbf r) = \hbar {\frac {\partial }{\partial t}} u(t, \mathbf r) , \]

which may be easier to handle.

= Harmonic oscilator =

By selecting the potential $V(t, \mathbf r)$ and the initial state $\psi(0, \mathbf r)$ we get a unique solution for time propagation of the quantum state function. Probably the most used and well known example is the quantum harmonic oscilator, where we select a quadratic potential

\[V(t, \mathbf r) = V(\mathbf r) = \frac{1}{2} m \omega^2 r^2 , \]

where $m$ is the mass of the particle and $\omega$ is the angular frequency of the oscilator.

The 1D harmonic oscilator has known eigenstate solutions

\[\psi _{n}(x)={\frac {1}{\sqrt {2^{n}\,n!}}}\cdot \left({\frac {m\omega }{\pi \hbar }}\right)^{1/4}\cdot e^{-{\frac {m\omega x^{2}}{2\hbar }}}\cdot H_{n}\left({\sqrt {\frac {m\omega }{\hbar }}}x\right),\qquad n=0,1,2,\ldots .\]

where the functions $H_n$ are the physicists' [https://en.wikipedia.org/wiki/Hermite_polynomials Hermite polynomials]. Time propagation of eigenstates is described with

\[\psi_n(t, x) = \mathrm e ^ {-i (n+0.5) \omega t} \psi_n(x)\]

= Particle in a box =

A theoretical one dimensional potential

\[\displaystyle V(x)={\begin{cases}0,&0<x<L,\\\infty ,&{\text{otherwise,}}\end{cases}}\]

is known as an infinite potential well. Its time independent eigenfunctions are

\[\sqrt{\frac{2}{L}}\psi_n(x) = \sin\left(k_n x \right), \qquad n = 1,2,3,...\]

where $k_n = \frac{\pi n}{L}$. With a time dependency similar to Harmonic oscilator

\[\psi_n(t, x) = \mathrm e ^ {-i \omega_n t} \psi_n(x),\]

where $\omega_n$ and $k_n$ are connected through dispersion relation through energy $E_n$

\[{\displaystyle E_{n}=\hbar \omega _{n}={\frac {n^{2}\pi ^{2}\hbar ^{2}}{2mL^{2}}}={\frac {\hbar ^{2} k_n^2}{2m}}}.\]

Quantum Mechanics

2018-01-11T13:39:27Z

Mkolman:

Quantum Mechanics

2018-01-11T12:38:27Z

Mkolman:

Quantum Mechanics

2017-12-21T15:45:30Z

Mkolman: /* Harmonic oscilator */

= Introduction =
The quantum world is governed by the [https://en.wikipedia.org/wiki/Schr%C3%B6dinger_equation Schrödinger equation]

\[{\displaystyle {\hat {H}}|\psi (t)\rangle =i\hbar {\frac {\partial }{\partial t}}|\psi (t)\rangle } \]

where $\hat H$ is the [https://en.wikipedia.org/wiki/Hamiltonian_(quantum_mechanics) Hamiltonian], $|\psi (t)\rangle$ is the [https://en.wikipedia.org/wiki/Wave_function quantum state function] and $\hbar$ is the reduced [https://en.wikipedia.org/wiki/Planck_constant Planck constant].

The Hamiltonian consists of kinetic energy $\hat T$ and potential energy $\hat V$. As in classical mechanics, potential energy is a function of time and space, whereas the kinetic energy differs from the classical world and is calculated as

\[\hat T = - \frac{\hbar^2}{2m} \nabla^2 .\]

The final version of the single particle Schrödinger equation can be written as

\[\left(- \frac{\hbar^2}{2m} \nabla^2 + V(t, \mathbf r)\right) \psi(t, \mathbf r) = i\hbar {\frac {\partial }{\partial t}}\psi(t, \mathbf r) \]

Quantum state function is a complex function, so it is usually split into the real part and imaginary part

\[ u, v \in C(\mathbb R)\colon \psi = u + i v , \]

which for a real $V$ yields a system of two real equations

\[\left(- \frac{\hbar^2}{2m} \nabla^2 + V(t, \mathbf r)\right) u(t, \mathbf r) = -\hbar {\frac {\partial }{\partial t}} v(t, \mathbf r) , \]
\[\left(- \frac{\hbar^2}{2m} \nabla^2 + V(t, \mathbf r)\right) v(t, \mathbf r) = \hbar {\frac {\partial }{\partial t}} u(t, \mathbf r) , \]

which may be easier to handle.

= Harmonic oscilator =

By selecting the potential $V(t, \mathbf r)$ and the initial state $\psi(0, \mathbf r)$ we get a unique solution for time propagation of the quantum state function. Probably the most used and well known example is the quantum harmonic oscilator, where we select a quadratic potential

\[V(t, \mathbf r) = V(\mathbf r) = \frac{1}{2} m \omega^2 r^2 , \]

where $m$ is the mass of the particle and $\omega$ is the angular frequency of the oscilator.

The 1D harmonic oscilator has known eigenstate solutions

\[\psi _{n}(x)={\frac {1}{\sqrt {2^{n}\,n!}}}\cdot \left({\frac {m\omega }{\pi \hbar }}\right)^{1/4}\cdot e^{-{\frac {m\omega x^{2}}{2\hbar }}}\cdot H_{n}\left({\sqrt {\frac {m\omega }{\hbar }}}x\right),\qquad n=0,1,2,\ldots .\]

where the functions Hn are the physicists' [https://en.wikipedia.org/wiki/Hermite_polynomials Hermite polynomials].Time propagation of eigenstates is described with

\[\psi(t, x) = \mathrm e ^ {-i\omega t} \psi(x)\]

Quantum Mechanics

2017-12-21T15:45:14Z

Mkolman: /* Introduction */

= Introduction =
The quantum world is governed by the [https://en.wikipedia.org/wiki/Schr%C3%B6dinger_equation Schrödinger equation]

\[{\displaystyle {\hat {H}}|\psi (t)\rangle =i\hbar {\frac {\partial }{\partial t}}|\psi (t)\rangle } \]

where $\hat H$ is the [https://en.wikipedia.org/wiki/Hamiltonian_(quantum_mechanics) Hamiltonian], $|\psi (t)\rangle$ is the [https://en.wikipedia.org/wiki/Wave_function quantum state function] and $\hbar$ is the reduced [https://en.wikipedia.org/wiki/Planck_constant Planck constant].

The Hamiltonian consists of kinetic energy $\hat T$ and potential energy $\hat V$. As in classical mechanics, potential energy is a function of time and space, whereas the kinetic energy differs from the classical world and is calculated as

\[\hat T = - \frac{\hbar^2}{2m} \nabla^2 .\]

The final version of the single particle Schrödinger equation can be written as

\[\left(- \frac{\hbar^2}{2m} \nabla^2 + V(t, \mathbf r)\right) \psi(t, \mathbf r) = i\hbar {\frac {\partial }{\partial t}}\psi(t, \mathbf r) \]

Quantum state function is a complex function, so it is usually split into the real part and imaginary part

\[ u, v \in C(\mathbb R)\colon \psi = u + i v , \]

which for a real $V$ yields a system of two real equations

\[\left(- \frac{\hbar^2}{2m} \nabla^2 + V(t, \mathbf r)\right) u(t, \mathbf r) = -\hbar {\frac {\partial }{\partial t}} v(t, \mathbf r) , \]
\[\left(- \frac{\hbar^2}{2m} \nabla^2 + V(t, \mathbf r)\right) v(t, \mathbf r) = \hbar {\frac {\partial }{\partial t}} u(t, \mathbf r) , \]

which may be easier to handle.

= Harmonic oscilator =

By selecting the potential $V(t, \mathbf r)$ and the initial state $\psi(0, \mathbf r)$ we get a unique solution for time propagation of the quantum state function. Probably the most used and well known example is the quantum harmonic oscilator, where we select a quadratic potential

\[V(t, \mathbf r) = V(\mathbf r) = \frac{1}{2} m \omega^2 r^2 \],

where $m$ is the mass of the particle and $\omega$ is the angular frequency of the oscilator.

The 1D harmonic oscilator has known eigenstate solutions

\[\psi _{n}(x)={\frac {1}{\sqrt {2^{n}\,n!}}}\cdot \left({\frac {m\omega }{\pi \hbar }}\right)^{1/4}\cdot e^{-{\frac {m\omega x^{2}}{2\hbar }}}\cdot H_{n}\left({\sqrt {\frac {m\omega }{\hbar }}}x\right),\qquad n=0,1,2,\ldots .\],

where the functions Hn are the physicists' [https://en.wikipedia.org/wiki/Hermite_polynomials Hermite polynomials].Time propagation of eigenstates is described with

\[\psi(t, x) = \mathrm e ^ {-i\omega t} \psi(x)\]

Quantum Mechanics

2017-12-21T15:44:42Z

Mkolman: /* Introduction */

= Introduction =
The quantum world is governed by the [https://en.wikipedia.org/wiki/Schr%C3%B6dinger_equation Schrödinger equation]

\[{\displaystyle {\hat {H}}|\psi (t)\rangle =i\hbar {\frac {\partial }{\partial t}}|\psi (t)\rangle } \]

where $\hat H$ is the [https://en.wikipedia.org/wiki/Hamiltonian_(quantum_mechanics) Hamiltonian], $|\psi (t)\rangle$ is the [https://en.wikipedia.org/wiki/Wave_function quantum state function] and $\hbar$ is the reduced [https://en.wikipedia.org/wiki/Planck_constant Planck constant].

The Hamiltonian consists of kinetic energy $\hat T$ and potential energy $\hat V$. As in classical mechanics, potential energy is a function of time and space, whereas the kinetic energy differs from the classical world and is calculated as

\[\hat T = - \frac{\hbar^2}{2m} \nabla^2 .\]

The final version of the single particle Schrödinger equation can be written as

\[\left(- \frac{\hbar^2}{2m} \nabla^2 + V(t, \mathbf r)\right) \psi(t, \mathbf r) = i\hbar {\frac {\partial }{\partial t}}\psi(t, \mathbf r) \]

Quantum state function is a complex function, so it is usually split into the real part and imaginary part

\[ u, v \in C(\mathbb R)\colon \psi = u + i v \],

which for a real $V$ yields a system of two real equations

\[\left(- \frac{\hbar^2}{2m} \nabla^2 + V(t, \mathbf r)\right) u(t, \mathbf r) = -\hbar {\frac {\partial }{\partial t}} v(t, \mathbf r) \],
\[\left(- \frac{\hbar^2}{2m} \nabla^2 + V(t, \mathbf r)\right) v(t, \mathbf r) = \hbar {\frac {\partial }{\partial t}} u(t, \mathbf r) \],

which may be easier to handle.

= Harmonic oscilator =

By selecting the potential $V(t, \mathbf r)$ and the initial state $\psi(0, \mathbf r)$ we get a unique solution for time propagation of the quantum state function. Probably the most used and well known example is the quantum harmonic oscilator, where we select a quadratic potential

\[V(t, \mathbf r) = V(\mathbf r) = \frac{1}{2} m \omega^2 r^2 \],

where $m$ is the mass of the particle and $\omega$ is the angular frequency of the oscilator.

The 1D harmonic oscilator has known eigenstate solutions

\[\psi _{n}(x)={\frac {1}{\sqrt {2^{n}\,n!}}}\cdot \left({\frac {m\omega }{\pi \hbar }}\right)^{1/4}\cdot e^{-{\frac {m\omega x^{2}}{2\hbar }}}\cdot H_{n}\left({\sqrt {\frac {m\omega }{\hbar }}}x\right),\qquad n=0,1,2,\ldots .\],

where the functions Hn are the physicists' [https://en.wikipedia.org/wiki/Hermite_polynomials Hermite polynomials].Time propagation of eigenstates is described with

\[\psi(t, x) = \mathrm e ^ {-i\omega t} \psi(x)\]

Quantum Mechanics

2017-12-15T10:47:48Z

Mkolman: Created page with "= Introduction = The quantum world is governed by the [https://en.wikipedia.org/wiki/Schr%C3%B6dinger_equation Schrödinger equation] \[{\displaystyle {\hat {H}}|\psi (t)\ran..."

Fluid Mechanics

2017-11-23T11:05:48Z

Mkolman: /* Artificial compressibility method */

= Introduction =
Computational fluid dynamics (CFD) is a field of a great interest among researchers in many fields of science, e.g. studying mathematical fundaments of numerical methods, developing novel physical models, improving computer implementations, and many others. Pushing the limits of all involved fields of science helps community to deepen the understanding of several natural and technological phenomena. Weather forecast, ocean dynamics, water transport, casting, various energetic studies, etc., are just few examples where fluid dynamics plays a crucial role. The core problem of the CFD is solving the Navier-Stokes Equation or its variants, e.g. Darcy or Brinkman equation for flow in porous media. Here, we discuss basic algorithms for solving CFD problems. Check reference list on the [[Main Page]] for more details about related work.

Long story short, we want to solve
\begin{equation}
\frac{\partial \b{v}}{\partial t}+\nabla \cdot \left( \rho \b{vv}\right)=-\frac{1}{\rho }\nabla p+\nu {{\nabla }^{2}}\b{v}+\b{f}
\label{NavierStokes}
\end{equation}
also known as a Navier-Stokes equation. In many cases we are interested in the incompressible fluids (Ma<0.3), reducing the continuity equation to
\begin{equation}
\nabla \cdot \b{v}=0
\label{contuinity}
\end{equation}
which implies a simplification

\[\frac{\partial \left( \rho \b{v} \right)}{\partial t}+\nabla \cdot \left( \rho \b{vv} \right)=\frac{\partial \left( \rho \b{v} \right)}{\partial t}+(\rho \b{v}\cdot \nabla )\cdot \b{v}. \]

Note that the $\b{v}\b{v}$ stands for the tensor or dyadic product \[ \b{v}\b{v} = \b{v}\otimes\b{v} = \b{v}\b{v}^\T = \left[ \begin{matrix}
{{v}_{1}}{{v}_{1}} & \cdots & {{v}_{1}}{{v}_{n}} \\
\vdots & \ddots & \vdots \\
{{v}_{n}}{{v}_{1}} & \cdots & {{v}_{n}}{{v}_{n}} \\
\end{matrix} \right]\]
An example of incompressible variant of advection term in 2D would therefore be
\[\left( \b{v}\cdot \nabla \right)\b{v}=\left( \left( \begin{matrix}
u \\
v \\
\end{matrix} \right) \cdot \left( \begin{matrix}
\frac{\partial }{\partial x} \\
\frac{\partial }{\partial y} \\
\end{matrix} \right) \right)\left( \begin{matrix}
u \\
v \\
\end{matrix} \right)=\left( \begin{matrix}
u\frac{\partial u}{\partial x}+v\frac{\partial u}{\partial y} \\
u\frac{\partial v}{\partial x}+v\frac{\partial v}{\partial y} \\
\end{matrix} \right)\]

The goal of CFD is to solve system \ref{NavierStokes} and \ref{contuinity}. It is obvious that a special treatment will be needed to couple both equations. In following discussion we cover some basic approaches, how this can be accomplished.

= Solutions algorithms =
== Artificial compressibility method ==
The simplest, completely explicit approach, is an artificial compressibility method (ACM), where a compressibility term is included in the mass continuity
\[\frac{\partial \b{v}}{\partial t}+(\b{v}\cdot\nabla )\b{v}=-\frac{1}{\rho }\nabla p+\nu {{\nabla }^{2}}\b{v}+\b{f}\]
\[\frac{ 1 }{ \rho } \frac{\partial \rho }{\partial t}+\nabla \cdot \b{v}=0\]
\[\frac{ 1 }{ \rho } \frac{\partial \rho }{\partial p}\frac{\partial p}{\partial t}+\nabla \cdot \b{v}=0\]
Now, the above system can be solved directly.

The addition of the time derivative of the pressure term physically means that waves of finite speed (the propagation of which depends on the magnitude of the ACM)
are introduced into the flow field as a mean to distribute the pressure within the domain. In a true
incompressible flow, the pressure field is affected instantaneously throughout the whole domain. In ACM there is a time delay between the flow disturbance and its effect on the
pressure field. Upon rearranging the equation yields
\[\frac{\partial p}{\partial t}+\rho {{C}^{2}}\nabla \cdot \b{v}=0\]
where the continuity equation is perturbed by the quantity $\frac{\partial p}{\partial t}$ denominated herein
as the AC parameter/artificial sound speed recognized by
$C$ [m/s] - speed of sound
\[\frac{1}{C^2}=\frac{\partial \rho }{\partial p}\]
Or in another words
\[C^2=\left( \frac{\partial p}{\partial \rho}\right)_S\]
where $\rho$ is the density of the material. It follows, by replacing partial derivatives, that the isentropic compressibility can be expressed as:
\[\beta =\frac{1}{\rho {{C}^{2}}}\]
The evaluation of the local ACM parameter in incompressible flows is inspired by the
speed of sound computations in compressible flows (for instance, from the perfect gas law).
However, in the incompressible flow situation, employing such a relation is difficult, but an artificial
relation can be developed from the convective and diffusive velocities.
Reverting to the justification of continuity modification, it can be immediately seen that the
artificial sound speed must be sufficiently large to have a significant regularizing effect and at
the same time must be as small as possible to minimizing perturbations on the incompressibility
equation. Therefore, $C$ influences the convergence rate and stability of the solution method. In other words,
assists in reducing large disparity in the eigenvalues, leading to a well-conditioned system.
The $C$ can be '''estimated''' with

\[ C = \beta \max \left( \left|\b{v}\right|_2, \left|\b{v}_{ref}\right|_2 \right),\]
where $\b{v}_{ref}$ stands for a reference velocity.
Values for $\beta$ in the range of 1–10 are recommended for better convergence to the steady state at which the
mass conservation is enforced. In addition, Equation ensures that $C$ does not reach zero at stagnation points
that cause instabilities in pseudo-time, effecting convergence

== Explicit/Implicit pressure calculation ==

Applying divergence on \ref{NavierStokes} yields
\[\nabla \cdot \frac{\partial \b{v}}{\partial t}+\nabla \cdot (\b{v}\cdot \nabla )\b{v}=-\frac{1}{\rho }{{\nabla }^{2}}p+\nabla \cdot \nu {{\nabla }^{2}}\b{v}+\nabla \cdot \b{f}\]

And since $\nabla \cdot \b{v}=0$ and we can change order in $\nabla \cdot \nabla^2$ and $ \nabla^2 \cdot \nabla$ equation simplifies to
\[\frac{1}{\rho }{{\nabla }^{2}}p=\nabla \cdot \b{f}-\nabla \cdot (\b{v}\cdot \nabla )\b{v}\]
Now, we need boundary conditions that can be obtained by multiplying the equation with a boundary normal vector
\[\b{\hat{n}}\cdot \left( \frac{\partial \b{v}}{\partial t}+(\b{v}\cdot \nabla )\b{v} \right)=\b{\hat{n}}\cdot \left( -\frac{1}{\rho }\nabla p+\nu {{\nabla }^{2}}\b{v}+\b{f} \right)\cdot \b{\hat{n}}\]
\[\frac{\partial p}{\partial \b{\hat{n}}}=\left( \nu {{\nabla }^{2}}\b{v}+\b{f}-\frac{\partial \b{v}}{\partial t}-(\b{v}\cdot\nabla ) \b{v} \right)\cdot \b{\hat{n}}\]

Note that using tangential boundary vector gives equivalent BCs
\[\frac{\partial p}{\partial \b{\hat{t}}}=\left( \nu {{\nabla }^{2}}\b{v}+\b{f}-\frac{\partial \b{v}}{\partial t}-(\b{v}\cdot\nabla ) \b{v} \right)\cdot \b{\hat{t}}\]
For no-slip boundaries BCs simplify to
\[\frac{\partial p}{\partial \b{\hat{n}}}=\left( \nu {{\nabla }^{2}}\b{v}+\b{f} \right)\cdot \b{\hat{n}}\]
Otherwise an appropriate expression regarding the velocity can be written, i.e. write full and taken in account velocity BCs. For example, Neumann velocity $\frac{\partial u}{\partial x}=0$ in 2D
\[\frac{\partial p}{\partial x}=\left( \nu {{\nabla }^{2}}u + {{f}_{x}}-\frac{\partial u}{\partial t}+v\frac{\partial u}{\partial y} \right)\]
Note that you allready know everything about the velocity and thus you can compute all the terms explicitely.

So the procedure is:

* Compute Navier Stokes either explicitly or implicitly
* Solve pressure equations with computed velocities
* March in time

Basic boundary conditions
Wall: $\b{v}=0$, \[\frac{\partial p}{\partial \hat{n}}=\left( \nabla \cdot \left( \nu \nabla \b{v} \right)+\b{f} \right)\cdot \hat{n}\]
Inlet: $\b{v}=\b{a}$, \[\frac{\partial p}{\partial \hat{n}}=\left( \nabla \cdot \left( \nu \nabla \b{v} \right)+\b{f}-\nabla \cdot (\rho \b{v}\b{v})-\rho \frac{\partial \b{v}}{\partial t} \right)\cdot \hat{n}\]

Above system can be linearized (advection term) and solved either explicitly or implicitly.

Further reading:

W. D. Henshaw, A fourth-order accurate method for the incompressible Navier–Stokes equations on overlapping grids, J. Comput. Phys. 113, 13 (1994)

J. C. Strikwerda, Finite difference methods for the Stokes and Navier–Stokes equations, SIAM J. Sci. Stat.
Comput. 5(1), 56 (1984)

== Explicit Pressure correction ==
Another possibility is to solve pressure correction equation. Again Consider the momentum equation and mass continuity and discretize it explicitly
\[\frac{{{\b{v}}_{2}}-{{\b{v}}_{1}}}{\Delta t}=-\frac{1}{\rho }\nabla {{p}_{1}}-({{\b{v}}_{1}}\nabla )\cdot {{\b{v}}_{1}}+\nu {{\nabla }^{2}}{{\b{v}}_{1}}+\b{f}\]
Computed velocity obviously does not satisfy the mass contunity and therefore let’s call it intermediate velocity. Intermediate velocity is calculated from guessed pressure and old velocity values.
\[{{\b{v}}^{inter}}=\b{v}_1 + \Delta t\left( -\frac{1}{\rho }\nabla {{p}_{1}}-({{\b{v}}_{1}}\nabla )\cdot {{\b{v}}_{1}}+\nu {{\nabla }^{2}}{{\b{v}}_{1}}+\b{f} \right)\]
A correction term is added that drives velocity to divergence free field
\[\nabla \cdot ({{\b{v}}^{inter}}+{{\b{v}}^{corr}})=0 \qquad \to \qquad \nabla \cdot {{\b{v}}^{inter}}=-\nabla \cdot {{\b{v}}^{corr}}\]

Velocity correction is affected only by effect of pressure correction. This fact is obvious due to all terms except gradient of pressure on the right side of equation are constant.
\[{{\b{v}}^{corr}}=-\frac{\Delta t}{\rho }\nabla {{p}^{corr}} \]

Note that corrected velocity also satisfies boundary conditions
\[\b{v}^{iter}+\b{v}^{corr}=\b{v}^{BC}\]
Applying divergence and we get '''pressure correction poisson equation'''.
\[\,{{\nabla }^{2}}{{p}^{corr}}\,=\frac{\rho }{\Delta t}\nabla \cdot {{\mathbf{v}}^{iter}}\,\]

Boundary conditions can be obtained by mulitplying the equation with a unit normal vector $\b{\hat{n}}$
\[\frac{\Delta t}{\rho }\frac{\partial {p}^{corr}}{\partial \b{\hat{n}}} = \b{\hat{n}} \cdot \left(\b{v}^{iter} - \b{v}^{BC} \right) \]
The most straightforward approach, for dirichlet BCs, is to take into account velocity boundary condition in computation of intermediate velocity, and clearly in such cases, pressure boundary condition simplifies to
\[\frac{\partial p^{corr}}{\partial \b{\hat{n}}} = 0 \]
As ${{\b{v}}^{\operatorname{int}er}}={{\b{v}}^{BC}}$ . Another option is to explicitely compute intermediate velocity also on boundaries and then correct it through pressure correction.

The pressure poisson equation is, at given boundary conditions, defined only up to a constant. One solution is to select a node and set it to a constant, e.g. p(0, 0) = 0, however much more stable approach is to enforce solution with additional condition, also referred to as a regularization
\[\int_{\Omega }^{{}}{pd}\Omega =0\]
\[\,{{\nabla }^{2}}{{p}^{corr}}\,-\alpha =\frac{\rho }{\Delta t}\nabla \cdot {{\b{v}}^{iter}}\,\]
Where $\alpha $ stands for Lagrange multiplier. Or in discrete form
\[\sum\limits_{i}{p\left( {{x}_{i}} \right)=0}\]
\[\b{Mp}-\alpha \b{1}=\frac{\rho }{\Delta t}\nabla \cdot {{\b{v}}^{iter}}\]

where $\b{M}$ holds Laplace shape functions, i.e. the discrete version of Laplace differential operator.

Solution of a system

\[\left[ \begin{matrix}
{{M}_{11}} & .. & {{M}_{1n}} & 1 \\
.. & .. & .. & 1 \\
{{M}_{n1}} & ... & {{M}_{nn}} & 1 \\
1 & 1 & 1 & 0 \\
\end{matrix} \right]\left[ \begin{matrix}
{{p}_{1}} \\
... \\
{{p}_{n}} \\
\alpha \\
\end{matrix} \right]=\frac{\rho }{\Delta t}\left[ \begin{matrix}
\nabla \cdot \b{v}_{_{1}}^{\text{iter}} \\
... \\
\nabla \cdot \b{v}_{n}^{\text{iter}} \\
0 \\
\end{matrix} \right]\]
Gives us a solution of pressure correction.

== CBS Algorithm ==
With explicit temporal discretization problem is formulated as
\[\b{\hat{v}}={{\b{v}}_{0}}+\Delta t\left( -\nabla {{p}_{0}}+\frac{1}{Re}{{\nabla }^{2}}{{\b{v}}_{0}}-\nabla \cdot ({{\b{v}}_{0}}{{\b{v}}_{0}}) \right)\]
\[p={{p}_{0}}-\xi \Delta {{t}_{F}}\nabla \b{\hat{v}}+\xi \Delta {{t}_{F}}\Delta t{{\nabla }^{2}}{{\overset{\scriptscriptstyle\frown}{P}}_{0}},\]
where $\b{\hat{v}}$, $\Delta t$, $\xi$ and $\Delta t_F$ stand for intermediate velocity, time step, relaxation parameter, and artificial time step, respectively, and index 0 stands for previous time / iteration step. First, the intermediate velocity is computed from previous time step. Second, the velocity is driven towards solenoidal field by correcting the pressure. Note that no special boundary conditions for pressure are used, i.e., the pressure on boundaries is computed with the same approach as in the interior of the domain. In general, the internal iteration with an artificial time step is required until the divergence of the velocity field is not below required criteria. However, if one is interested only in a steady-state solution, the internal iteration can be skipped and $\Delta t$ equals $\Delta {{t}_{F}}$. Without internal stepping the transient of the solution is distorted by artificial compressibility effect. This approach is also known as ACM with Characteristics-based discretization of continuity equation, where the relaxation parameter relates to the artificial speed of sound [35].

The relaxation parameter should be set between 1-10, lower number more stable solution.

And also dimensional form

\[p={{p}_{0}}-{{C}^{2}}\Delta {{t}_{F}}\rho \nabla \b{\hat{v}}+{{C}^{2}}\Delta {{t}_{F}}\Delta t{{\nabla }^{2}}{{p}_{0}},\]

Where C is speed of sound [m/s]

== Numerical examples==
* [[Lid driven cavity]]
* [[de Vahl Davis natural convection test]]

Fluid Mechanics

2017-11-17T11:32:21Z

Mkolman: Fix typo

= Introduction =
Computational fluid dynamics (CFD) is a field of a great interest among researchers in many fields of science, e.g. studying mathematical fundaments of numerical methods, developing novel physical models, improving computer implementations, and many others. Pushing the limits of all involved fields of science helps community to deepen the understanding of several natural and technological phenomena. Weather forecast, ocean dynamics, water transport, casting, various energetic studies, etc., are just few examples where fluid dynamics plays a crucial role. The core problem of the CFD is solving the Navier-Stokes Equation or its variants, e.g. Darcy or Brinkman equation for flow in porous media. Here, we discuss basic algorithms for solving CFD problems. Check reference list on the [[Main Page]] for more details about related work.

Long story short, we want to solve
\begin{equation}
\frac{\partial \b{v}}{\partial t}+(\b{v}\cdot\nabla )\cdot \b{v}=-\frac{1}{\rho }\nabla p+\nu {{\nabla }^{2}}\b{v}+\b{f}
\label{NavierStokes}
\end{equation}
also known as a Navier-Stokes equation. In many cases we are interested in the incompressible fluids (Ma<0.3), reducing the continuity equation to
\begin{equation}
\nabla \cdot \b{v}=0
\label{contuinity}
\end{equation}
which implies a simplification

\[\frac{\partial \left( \rho \b{v} \right)}{\partial t}+\nabla \cdot \left( \rho \b{vv} \right)=\frac{\partial \left( \rho \b{v} \right)}{\partial t}+(\rho \b{v}\cdot \nabla )\cdot \b{v}. \]

Note that the $\b{v}\b{v}$ stands for the tensor or dyadic product \[ \b{v}\b{v} = \b{v}\otimes\b{v} = \b{v}\b{v}^\T = \left[ \begin{matrix}
{{v}_{1}}{{v}_{1}} & \cdots & {{v}_{1}}{{v}_{n}} \\
\vdots & \ddots & \vdots \\
{{v}_{n}}{{v}_{1}} & \cdots & {{v}_{n}}{{v}_{n}} \\
\end{matrix} \right]\]
An example of incompressible variant of advection term in 2D would therefore be
\[\left( \b{v}\cdot \nabla \right)\b{v}=\left( \left( \begin{matrix}
u \\
v \\
\end{matrix} \right) \cdot \left( \begin{matrix}
\frac{\partial }{\partial x} \\
\frac{\partial }{\partial y} \\
\end{matrix} \right) \right)\left( \begin{matrix}
u \\
v \\
\end{matrix} \right)=\left( \begin{matrix}
u\frac{\partial u}{\partial x}+v\frac{\partial u}{\partial y} \\
u\frac{\partial v}{\partial x}+v\frac{\partial v}{\partial y} \\
\end{matrix} \right)\]

The goal of CFD is to solve system \ref{NavierStokes} and \ref{contuinity}. It is obvious that a special treatment will be needed to couple both equations. In following discussion we cover some basic approaches, how this can be accomplished.

= Solutions algorithms =
== Artificial compressibility method ==
The simplest, completely explicit approach, is an artificial compressibility method (ACM), where a compressibility term is included in the mass continuity
\[\frac{\partial \b{v}}{\partial t}+(\b{v}\cdot\nabla )\b{v}=-\frac{1}{\rho }\nabla p+\nu {{\nabla }^{2}}\b{v}+\b{f}\]
\[\frac{\partial \rho }{\partial t}+\nabla \cdot \b{v}=0\]
\[\frac{\partial \rho }{\partial p}\frac{\partial p}{\partial t}+\nabla \cdot \b{v}=0\]
Now, the above system can be solved directly.

The addition of the time derivative of the pressure term physically means that waves of finite speed (the propagation of which depends on the magnitude of the ACM)
are introduced into the flow field as a mean to distribute the pressure within the domain. In a true
incompressible flow, the pressure field is affected instantaneously throughout the whole domain. In ACM there is a time delay between the flow disturbance and its effect on the
pressure field. Upon rearranging the equation yields
\[\frac{\partial p}{\partial t}+\rho {{C}^{2}}\nabla \cdot \b{v}=0\]
where the continuity equation is perturbed by the quantity $\frac{\partial p}{\partial t}$ denominated herein
as the AC parameter/artificial sound speed recognized by
$C$ [m/s] - speed of sound
\[\frac{1}{C^2}=\frac{\partial \rho }{\partial p}\]
Or in another words
\[C^2=\left( \frac{\partial p}{\partial \rho}\right)_S\]
where $\rho$ is the density of the material. It follows, by replacing partial derivatives, that the isentropic compressibility can be expressed as:
\[\beta =\frac{1}{\rho {{C}^{2}}}\]
The evaluation of the local ACM parameter in incompressible flows is inspired by the
speed of sound computations in compressible flows (for instance, from the perfect gas law).
However, in the incompressible flow situation, employing such a relation is difficult, but an artificial
relation can be developed from the convective and diffusive velocities.
Reverting to the justification of continuity modification, it can be immediately seen that the
artificial sound speed must be sufficiently large to have a significant regularizing effect and at
the same time must be as small as possible to minimizing perturbations on the incompressibility
equation. Therefore, $C$ influences the convergence rate and stability of the solution method. In other words,
assists in reducing large disparity in the eigenvalues, leading to a well-conditioned system. Values
of in the range of 1–10 are recommended for better convergence to the steady state at which the
mass conservation is enforced. In addition, Equation ensures that $C$ does not reach zero at stagnation points
that cause instabilities in pseudo-time, effecting convergence

== Explicit/Implicit pressure calculation ==

Applying divergence on \ref{NavierStokes} yields
\[\nabla \cdot \frac{\partial \b{v}}{\partial t}+\nabla \cdot (\b{v}\cdot \nabla )\b{v}=-\frac{1}{\rho }{{\nabla }^{2}}p+\nabla \cdot \nu {{\nabla }^{2}}\b{v}+\nabla \cdot \b{f}\]

And since $\nabla \cdot \b{v}=0$ and we can change order in $\nabla \cdot \nabla^2$ and $ \nabla^2 \cdot \nabla$ equation simplifies to
\[\frac{1}{\rho }{{\nabla }^{2}}p=\nabla \cdot \b{f}-\nabla \cdot (\b{v}\cdot \nabla )\b{v}\]
Now, we need boundary conditions that can be obtained by multiplying the equation with a boundary normal vector
\[\b{\hat{n}}\cdot \left( \frac{\partial \b{v}}{\partial t}+(\b{v}\cdot \nabla )\b{v} \right)=\b{\hat{n}}\cdot \left( -\frac{1}{\rho }\nabla p+\nu {{\nabla }^{2}}\b{v}+\b{f} \right)\cdot \b{\hat{n}}\]
\[\frac{\partial p}{\partial \b{\hat{n}}}=\left( \nu {{\nabla }^{2}}\b{v}+\b{f}-\frac{\partial \b{v}}{\partial t}-(\b{v}\cdot\nabla ) \b{v} \right)\cdot \b{\hat{n}}\]

Note that using tangential boundary vector gives equivalent BCs
\[\frac{\partial p}{\partial \b{\hat{t}}}=\left( \nu {{\nabla }^{2}}\b{v}+\b{f}-\frac{\partial \b{v}}{\partial t}-(\b{v}\cdot\nabla ) \b{v} \right)\cdot \b{\hat{t}}\]
For no-slip boundaries BCs simplify to
\[\frac{\partial p}{\partial \b{\hat{n}}}=\left( \nu {{\nabla }^{2}}\b{v}+\b{f} \right)\cdot \b{\hat{n}}\]
Otherwise an appropriate expression regarding the velocity can be written, i.e. write full and taken in account velocity BCs. For example, Neumann velocity $\frac{\partial u}{\partial x}=0$ in 2D
\[\frac{\partial p}{\partial x}=\left( \nu {{\nabla }^{2}}u + {{f}_{x}}-\frac{\partial u}{\partial t}+v\frac{\partial u}{\partial y} \right)\]
Note that you allready know everything about the velocity and thus you can compute all the terms explicitely.

So the procedure is:

* Compute Navier Stokes either explicitly or implicitly
* Solve pressure equations with computed velocities
* March in time

Basic boundary conditions
Wall: $\b{v}=0$, \[\frac{\partial p}{\partial \hat{n}}=\left( \nabla \cdot \left( \nu \nabla \b{v} \right)+\b{f} \right)\cdot \hat{n}\]
Inlet: $\b{v}=\b{a}$, \[\frac{\partial p}{\partial \hat{n}}=\left( \nabla \cdot \left( \nu \nabla \b{v} \right)+\b{f}-\nabla \cdot (\rho \b{v}\b{v})-\rho \frac{\partial \b{v}}{\partial t} \right)\cdot \hat{n}\]

Above system can be linearized (advection term) and solved either explicitly or implicitly.

Further reading:

W. D. Henshaw, A fourth-order accurate method for the incompressible Navier–Stokes equations on overlapping grids, J. Comput. Phys. 113, 13 (1994)

J. C. Strikwerda, Finite difference methods for the Stokes and Navier–Stokes equations, SIAM J. Sci. Stat.
Comput. 5(1), 56 (1984)

== Explicit Pressure correction ==
Another possibility is to solve pressure correction equation. Again Consider the momentum equation and mass continuity and discretize it explicitly
\[\frac{{{\b{v}}_{2}}-{{\b{v}}_{1}}}{\Delta t}=-\frac{1}{\rho }\nabla {{p}_{1}}-({{\b{v}}_{1}}\nabla )\cdot {{\b{v}}_{1}}+\nu {{\nabla }^{2}}{{\b{v}}_{1}}+\b{f}\]
Computed velocity obviously does not satisfy the mass contunity and therefore let’s call it intermediate velocity. Intermediate velocity is calculated from guessed pressure and old velocity values.
\[{{\b{v}}^{inter}}=\b{v}_1 + \Delta t\left( -\frac{1}{\rho }\nabla {{p}_{1}}-({{\b{v}}_{1}}\nabla )\cdot {{\b{v}}_{1}}+\nu {{\nabla }^{2}}{{\b{v}}_{1}}+\b{f} \right)\]
A correction term is added that drives velocity to divergence free field
\[\nabla \cdot ({{\b{v}}^{inter}}+{{\b{v}}^{corr}})=0 \qquad \to \qquad \nabla \cdot {{\b{v}}^{inter}}=-\nabla \cdot {{\b{v}}^{corr}}\]

Velocity correction is affected only by effect of pressure correction. This fact is obvious due to all terms except gradient of pressure on the right side of equation are constant.
\[{{\b{v}}^{corr}}=-\frac{\Delta t}{\rho }\nabla {{p}^{corr}} \]

Note that corrected velocity also satisfies boundary conditions
\[\b{v}^{iter}+\b{v}^{corr}=\b{v}^{BC}\]
Applying divergence and we get '''pressure correction poisson equation'''.
\[\,{{\nabla }^{2}}{{p}^{corr}}\,=\frac{\rho }{\Delta t}\nabla \cdot {{\mathbf{v}}^{iter}}\,\]

Boundary conditions can be obtained by mulitplying the equation with a unit normal vector $\b{\hat{n}}$
\[\frac{\Delta t}{\rho }\frac{\partial {p}^{corr}}{\partial \b{\hat{n}}} = \b{\hat{n}} \cdot \left(\b{v}^{iter} - \b{v}^{BC} \right) \]
The most straightforward approach, for dirichlet BCs, is to take into account velocity boundary condition in computation of intermediate velocity, and clearly in such cases, pressure boundary condition simplifies to
\[\frac{\partial p^{corr}}{\partial \b{\hat{n}}} = 0 \]
As ${{\b{v}}^{\operatorname{int}er}}={{\b{v}}^{BC}}$ . Another option is to explicitely compute intermediate velocity also on boundaries and then correct it through pressure correction.

The pressure poisson equation is, at given boundary conditions, defined only up to a constant. One solution is to select a node and set it to a constant, e.g. p(0, 0) = 0, however much more stable approach is to enforce solution with additional condition, also referred to as a regularization
\[\int_{\Omega }^{{}}{pd}\Omega =0\]
\[\,{{\nabla }^{2}}{{p}^{corr}}\,-\alpha =\frac{\rho }{\Delta t}\nabla \cdot {{\b{v}}^{iter}}\,\]
Where $\alpha $ stands for Lagrange multiplier. Or in discrete form
\[\sum\limits_{i}{p\left( {{x}_{i}} \right)=0}\]
\[\b{Mp}-\alpha \b{1}=\frac{\rho }{\Delta t}\nabla \cdot {{\b{v}}^{iter}}\]

where $\b{M}$ holds Laplace shape functions, i.e. the discrete version of Laplace differential operator.

Solution of a system

\[\left[ \begin{matrix}
{{M}_{11}} & .. & {{M}_{1n}} & 1 \\
.. & .. & .. & 1 \\
{{M}_{n1}} & ... & {{M}_{nn}} & 1 \\
1 & 1 & 1 & 0 \\
\end{matrix} \right]\left[ \begin{matrix}
{{p}_{1}} \\
... \\
{{p}_{n}} \\
\alpha \\
\end{matrix} \right]=\frac{\rho }{\Delta t}\left[ \begin{matrix}
\nabla \cdot \b{v}_{_{1}}^{\text{iter}} \\
... \\
\nabla \cdot \b{v}_{n}^{\text{iter}} \\
0 \\
\end{matrix} \right]\]
Gives us a solution of pressure correction.

== CBS Algorithm ==
With explicit temporal discretization problem is formulated as
\[\b{\hat{v}}={{\b{v}}_{0}}+\Delta t\left( -\nabla {{p}_{0}}+\frac{1}{Re}{{\nabla }^{2}}{{\b{v}}_{0}}-\nabla \cdot ({{\b{v}}_{0}}{{\b{v}}_{0}}) \right)\]
\[p={{p}_{0}}-\xi \Delta {{t}_{F}}\nabla \b{\hat{v}}+\xi \Delta {{t}_{F}}\Delta t{{\nabla }^{2}}{{\overset{\scriptscriptstyle\frown}{P}}_{0}},\]
where $\b{\hat{v}}$, $\Delta t$, $\xi$ and $\Delta t_F$ stand for intermediate velocity, time step, relaxation parameter, and artificial time step, respectively, and index 0 stands for previous time / iteration step. First, the intermediate velocity is computed from previous time step. Second, the velocity is driven towards solenoidal field by correcting the pressure. Note that no special boundary conditions for pressure are used, i.e., the pressure on boundaries is computed with the same approach as in the interior of the domain. In general, the internal iteration with an artificial time step is required until the divergence of the velocity field is not below required criteria. However, if one is interested only in a steady-state solution, the internal iteration can be skipped and $\Delta t$ equals $\Delta {{t}_{F}}$. Without internal stepping the transient of the solution is distorted by artificial compressibility effect. This approach is also known as ACM with Characteristics-based discretization of continuity equation, where the relaxation parameter relates to the artificial speed of sound [35].

The relaxation parameter should be set between 1-10, lower number more stable solution.

And also dimensional form

\[p={{p}_{0}}-{{C}^{2}}\Delta {{t}_{F}}\rho \nabla \b{\hat{v}}+{{C}^{2}}\Delta {{t}_{F}}\Delta t{{\nabla }^{2}}{{p}_{0}},\]

Where C is speed of sound [m/s]

== Numerical examples==
* [[Lid driven cavity]]
* [[de Vahl Davis natural convection test]]

Fluid Mechanics

2017-11-17T10:40:02Z

Mkolman: Removed extra the

= Introduction =
Computational fluid dynamics (CFD) is a field of a great interest among researchers in many fields of science, e.g. studying mathematical fundaments of numerical methods, developing novel physical models, improving computer implementations, and many others. Pushing the limits of all involved fields of science helps community to deepen the understanding of several natural and technological phenomena. Weather forecast, ocean dynamics, water transport, casting, various energetic studies, etc., are just few examples where fluid dynamics plays a crucial role. The core problem of the CFD is solving the Navier-Stokes Equation or its variants, e.g. Darcy or Brinkman equation for flow in porous media. Here, we discuss basic algorithms for solving CFD problems. Check reference list on the [[Main Page]] for more details about related work.

Long story short, we want to solve
\begin{equation}
\frac{\partial \b{v}}{\partial t}+(\b{v}\cdot\nabla )\cdot \b{v}=-\frac{1}{\rho }\nabla p+\nu {{\nabla }^{2}}\b{v}+\b{f}
\label{NavierStokes}
\end{equation}
also known as a Navier-Stokes equation. In many cases we are interested in the incompressible fluids (Ma<0.3), reducing the continuity equation to
\begin{equation}
\nabla \cdot \b{v}=0
\label{contuinity}
\end{equation}
which implies a simplification

\[\frac{\partial \left( \rho \b{v} \right)}{\partial t}+\nabla \cdot \left( \rho \b{vv} \right)=\frac{\partial \left( \rho \b{v} \right)}{\partial t}+(\rho \b{v}\cdot \nabla )\cdot \b{v}. \]

Note that the $\b{v}\b{v}$ stands for the tensor or dyadic product \[ \b{v}\b{v} = \b{v}\otimes\b{v} = \b{v}\b{v}^\T = \left[ \begin{matrix}
{{v}_{1}}{{v}_{1}} & \cdots & {{v}_{1}}{{v}_{n}} \\
\vdots & \ddots & \vdots \\
{{v}_{n}}{{v}_{1}} & \cdots & {{v}_{n}}{{v}_{n}} \\
\end{matrix} \right]\]
An example of incompressible variant of advection term in 2D would therefore be
\[\left( \b{v}\cdot \nabla \right)\b{v}=\left( \left( \begin{matrix}
u \\
v \\
\end{matrix} \right) \cdot \left( \begin{matrix}
\frac{\partial }{\partial x} \\
\frac{\partial }{\partial y} \\
\end{matrix} \right) \right)\left( \begin{matrix}
u \\
v \\
\end{matrix} \right)=\left( \begin{matrix}
u\frac{\partial u}{\partial x}+v\frac{\partial u}{\partial y} \\
u\frac{\partial v}{\partial x}+v\frac{\partial v}{\partial y} \\
\end{matrix} \right)\]

The goal of CFD is to solve system \ref{NavierStokes} and \ref{contuinity}. It is obvious that a special treatment will be needed to couple both equations. In following discussion we cover some basic approaches, how this can be accomplished.

= Solutions algorithms =
== Artificial compressibility method ==
The simplest, completely explicit approach, is an artificial compressibility method (ACM), where a compressibility term is included in the mass continuity
\[\frac{\partial \b{v}}{\partial t}+(\b{v}\cdot\nabla )\b{v}=-\frac{1}{\rho }\nabla p+\nu {{\nabla }^{2}}\b{v}+\b{f}\]
\[\frac{\partial \rho }{\partial t}+\nabla \cdot \b{v}=0\]
\[\frac{\partial \rho }{\partial p}\frac{\partial p}{\partial t}+\nabla \cdot \b{v}=0\]
Now, the above system can be solved directly.

The addition of the time derivative of the pressure term physically means that waves of finite speed (the propagation of which depends on the magnitude of the ACM)
are introduced into the flow field as a mean to distribute the pressure within the domain. In a true
incompressible flow, the pressure field is affected instantaneously throughout the whole domain. In ACM there is a time delay between the flow disturbance and its effect on the
pressure field. Upon rearranging the equation yields
\[\frac{\partial p}{\partial t}+\rho {{C}^{2}}\nabla \cdot \b{v}=0\]
where the continuity equation is perturbed by the quantity $\frac{\partial p}{\partial t}$ denominated herein
as the AC parameter/artificial sound speed recognized by
$C$ [m/s] - speed of sound
\[\frac{1}{C^2}=\frac{\partial \rho }{\partial p}\]
Or in another words
\[C^2=\left( \frac{\partial p}{\partial \rho}\right)_S\]
where $\rho$ is the density of the material. It follows, by replacing partial derivatives, that the isentropic compressibility can be expressed as:
\[\beta =\frac{1}{\rho {{C}^{2}}}\]
The evaluation of the local ACM parameter in incompressible flows is inspired by the
speed of sound computations in compressible flows (for instance, from the perfect gas law).
However, in the incompressible flow situation, employing such a relation is difficult, but an artificial
relation can be developed from the convective and diffusive velocities.
Reverting to the justification of continuity modification, it can be immediately seen that the
artificial sound speed must be sufficiently large to have a significant regularizing effect and at
the same time must be as small as possible to minimizing perturbations on the incompressibility
equation. Therefore, $C$ influences the convergence rate and stability of the solution method. In other words,
assists in reducing large disparity in the eigenvalues, leading to a well-conditioned system. Values
of in the range of 1–10 are recommended for better convergence to the steady state at which the
mass conservation is enforced. In addition, Equation ensures that $C$ does not reach zero at stagnation points
that cause instabilities in pseudo-time, effecting convergence

== Explicit/Implicit pressure calculation ==

Applying divergence on \ref{NavierStokes} yields
\[\nabla \cdot \frac{\partial \b{v}}{\partial t}+\nabla \cdot (\b{v}\cdot \nabla )\b{v}=-\frac{1}{\rho }{{\nabla }^{2}}p+\nabla \cdot \nu {{\nabla }^{2}}\b{v}+\nabla \cdot \b{f}\]

And since $\nabla \cdot \b{v}=0$ and we can change order in $\nabla \cdot \nabla^2$ and $ \nabla^2 \cdot \nabla$ quation simplifies to
\[\frac{1}{\rho }{{\nabla }^{2}}p=\nabla \cdot \b{f}-\nabla \cdot (\b{v}\cdot \nabla )\b{v}\]
Now, we need boundary conditions that can be obtained by multiplying the equation with a boundary normal vector
\[\b{\hat{n}}\cdot \left( \frac{\partial \b{v}}{\partial t}+(\b{v}\cdot \nabla )\b{v} \right)=\b{\hat{n}}\cdot \left( -\frac{1}{\rho }\nabla p+\nu {{\nabla }^{2}}\b{v}+\b{f} \right)\cdot \b{\hat{n}}\]
\[\frac{\partial p}{\partial \b{\hat{n}}}=\left( \nu {{\nabla }^{2}}\b{v}+\b{f}-\frac{\partial \b{v}}{\partial t}-(\b{v}\cdot\nabla ) \b{v} \right)\cdot \b{\hat{n}}\]

Note that using tangential boundary vector gives equivalent BCs
\[\frac{\partial p}{\partial \b{\hat{t}}}=\left( \nu {{\nabla }^{2}}\b{v}+\b{f}-\frac{\partial \b{v}}{\partial t}-(\b{v}\cdot\nabla ) \b{v} \right)\cdot \b{\hat{t}}\]
For no-slip boundaries BCs simplify to
\[\frac{\partial p}{\partial \b{\hat{n}}}=\left( \nu {{\nabla }^{2}}\b{v}+\b{f} \right)\cdot \b{\hat{n}}\]
Otherwise an appropriate expression regarding the velocity can be written, i.e. write full and taken in account velocity BCs. For example, Neumann velocity $\frac{\partial u}{\partial x}=0$ in 2D
\[\frac{\partial p}{\partial x}=\left( \nu {{\nabla }^{2}}u + {{f}_{x}}-\frac{\partial u}{\partial t}+v\frac{\partial u}{\partial y} \right)\]
Note that you allready know everything about the velocity and thus you can compute all the terms explicitely.

So the procedure is:

* Compute Navier Stokes either explicitly or implicitly
* Solve pressure equations with computed velocities
* March in time

Basic boundary conditions
Wall: $\b{v}=0$, \[\frac{\partial p}{\partial \hat{n}}=\left( \nabla \cdot \left( \nu \nabla \b{v} \right)+\b{f} \right)\cdot \hat{n}\]
Inlet: $\b{v}=\b{a}$, \[\frac{\partial p}{\partial \hat{n}}=\left( \nabla \cdot \left( \nu \nabla \b{v} \right)+\b{f}-\nabla \cdot (\rho \b{v}\b{v})-\rho \frac{\partial \b{v}}{\partial t} \right)\cdot \hat{n}\]

Above system can be linearized (advection term) and solved either explicitly or implicitly.

Further reading:

W. D. Henshaw, A fourth-order accurate method for the incompressible Navier–Stokes equations on overlapping grids, J. Comput. Phys. 113, 13 (1994)

J. C. Strikwerda, Finite difference methods for the Stokes and Navier–Stokes equations, SIAM J. Sci. Stat.
Comput. 5(1), 56 (1984)

== Explicit Pressure correction ==
Another possibility is to solve pressure correction equation. Again Consider the momentum equation and mass continuity and discretize it explicitly
\[\frac{{{\b{v}}_{2}}-{{\b{v}}_{1}}}{\Delta t}=-\frac{1}{\rho }\nabla {{p}_{1}}-({{\b{v}}_{1}}\nabla )\cdot {{\b{v}}_{1}}+\nu {{\nabla }^{2}}{{\b{v}}_{1}}+\b{f}\]
Computed velocity obviously does not satisfy the mass contunity and therefore let’s call it intermediate velocity. Intermediate velocity is calculated from guessed pressure and old velocity values.
\[{{\b{v}}^{inter}}=\b{v}_1 + \Delta t\left( -\frac{1}{\rho }\nabla {{p}_{1}}-({{\b{v}}_{1}}\nabla )\cdot {{\b{v}}_{1}}+\nu {{\nabla }^{2}}{{\b{v}}_{1}}+\b{f} \right)\]
A correction term is added that drives velocity to divergence free field
\[\nabla \cdot ({{\b{v}}^{inter}}+{{\b{v}}^{corr}})=0 \qquad \to \qquad \nabla \cdot {{\b{v}}^{inter}}=-\nabla \cdot {{\b{v}}^{corr}}\]

Velocity correction is affected only by effect of pressure correction. This fact is obvious due to all terms except gradient of pressure on the right side of equation are constant.
\[{{\b{v}}^{corr}}=-\frac{\Delta t}{\rho }\nabla {{p}^{corr}} \]

Note that corrected velocity also satisfies boundary conditions
\[\b{v}^{iter}+\b{v}^{corr}=\b{v}^{BC}\]
Applying divergence and we get '''pressure correction poisson equation'''.
\[\,{{\nabla }^{2}}{{p}^{corr}}\,=\frac{\rho }{\Delta t}\nabla \cdot {{\mathbf{v}}^{iter}}\,\]

Boundary conditions can be obtained by mulitplying the equation with a unit normal vector $\b{\hat{n}}$
\[\frac{\Delta t}{\rho }\frac{\partial {p}^{corr}}{\partial \b{\hat{n}}} = \b{\hat{n}} \cdot \left(\b{v}^{iter} - \b{v}^{BC} \right) \]
The most straightforward approach, for dirichlet BCs, is to take into account velocity boundary condition in computation of intermediate velocity, and clearly in such cases, pressure boundary condition simplifies to
\[\frac{\partial p^{corr}}{\partial \b{\hat{n}}} = 0 \]
As ${{\b{v}}^{\operatorname{int}er}}={{\b{v}}^{BC}}$ . Another option is to explicitely compute intermediate velocity also on boundaries and then correct it through pressure correction.

The pressure poisson equation is, at given boundary conditions, defined only up to a constant. One solution is to select a node and set it to a constant, e.g. p(0, 0) = 0, however much more stable approach is to enforce solution with additional condition, also referred to as a regularization
\[\int_{\Omega }^{{}}{pd}\Omega =0\]
\[\,{{\nabla }^{2}}{{p}^{corr}}\,-\alpha =\frac{\rho }{\Delta t}\nabla \cdot {{\b{v}}^{iter}}\,\]
Where $\alpha $ stands for Lagrange multiplier. Or in discrete form
\[\sum\limits_{i}{p\left( {{x}_{i}} \right)=0}\]
\[\b{Mp}-\alpha \b{1}=\frac{\rho }{\Delta t}\nabla \cdot {{\b{v}}^{iter}}\]

where $\b{M}$ holds Laplace shape functions, i.e. the discrete version of Laplace differential operator.

Solution of a system

\[\left[ \begin{matrix}
{{M}_{11}} & .. & {{M}_{1n}} & 1 \\
.. & .. & .. & 1 \\
{{M}_{n1}} & ... & {{M}_{nn}} & 1 \\
1 & 1 & 1 & 0 \\
\end{matrix} \right]\left[ \begin{matrix}
{{p}_{1}} \\
... \\
{{p}_{n}} \\
\alpha \\
\end{matrix} \right]=\frac{\rho }{\Delta t}\left[ \begin{matrix}
\nabla \cdot \b{v}_{_{1}}^{\text{iter}} \\
... \\
\nabla \cdot \b{v}_{n}^{\text{iter}} \\
0 \\
\end{matrix} \right]\]
Gives us a solution of pressure correction.

== CBS Algorithm ==
With explicit temporal discretization problem is formulated as
\[\b{\hat{v}}={{\b{v}}_{0}}+\Delta t\left( -\nabla {{p}_{0}}+\frac{1}{Re}{{\nabla }^{2}}{{\b{v}}_{0}}-\nabla \cdot ({{\b{v}}_{0}}{{\b{v}}_{0}}) \right)\]
\[p={{p}_{0}}-\xi \Delta {{t}_{F}}\nabla \b{\hat{v}}+\xi \Delta {{t}_{F}}\Delta t{{\nabla }^{2}}{{\overset{\scriptscriptstyle\frown}{P}}_{0}},\]
where $\b{\hat{v}}$, $\Delta t$, $\xi$ and $\Delta t_F$ stand for intermediate velocity, time step, relaxation parameter, and artificial time step, respectively, and index 0 stands for previous time / iteration step. First, the intermediate velocity is computed from previous time step. Second, the velocity is driven towards solenoidal field by correcting the pressure. Note that no special boundary conditions for pressure are used, i.e., the pressure on boundaries is computed with the same approach as in the interior of the domain. In general, the internal iteration with an artificial time step is required until the divergence of the velocity field is not below required criteria. However, if one is interested only in a steady-state solution, the internal iteration can be skipped and $\Delta t$ equals $\Delta {{t}_{F}}$. Without internal stepping the transient of the solution is distorted by artificial compressibility effect. This approach is also known as ACM with Characteristics-based discretization of continuity equation, where the relaxation parameter relates to the artificial speed of sound [35].

The relaxation parameter should be set between 1-10, lower number more stable solution.

And also dimensional form

\[p={{p}_{0}}-{{C}^{2}}\Delta {{t}_{F}}\rho \nabla \b{\hat{v}}+{{C}^{2}}\Delta {{t}_{F}}\Delta t{{\nabla }^{2}}{{p}_{0}},\]

Where C is speed of sound [m/s]

== Numerical examples==
* [[Lid driven cavity]]
* [[de Vahl Davis natural convection test]]

Weighted Least Squares (WLS)

2017-07-06T12:49:29Z

Mkolman: /* Numerical calculation of the shape functions */

One of the most important building blocks of the meshless methods is the Moving Least Squares (MLS) approximation , which is implemented in the [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classEngineMLS.html EngineMLS class]. Check [https://gitlab.com/e62Lab/e62numcodes/blob/master/test/mls_test.cpp EngineMLS unit tests] for examples.

= Notation Cheat sheet =
\begin{align*}
m \in \N & \dots \text{number of basis functions} \\
n \geq m \in \N & \dots \text{number of points in support domain} \\
k \in \mathbb{N} & \dots \text{dimensionality of vector space} \\
\vec s_j \in \R^k & \dots \text{point in support domain } \quad j=1,\dots,n \\
u_j \in \R & \dots \text{value of function to approximate in }\vec{s}_j \quad j=1,\dots,n \\
\vec p \in \R^k & \dots \text{center point of approximation} \\
b_i\colon \R^k \to \R & \dots \text{basis functions } \quad i=1,\dots,m \\
B_{j, i} \in \R & \dots \text{value of basis functions in support points } b_i(s_j-p) \quad j=1,\dots,n, \quad i=1,\dots,m\\
\omega \colon \R^k \to \R & \dots \text{weight function} \\
w_j \in \R & \dots \text{weights } \omega(\vec{s}_j-\vec{p}) \quad j=1,\dots,n \\
\alpha_i \in \R & \dots \text{expansion coefficients around point } \vec{p} \quad i=1,\dots,m \\
\hat u\colon \R^k \to \R & \dots \text{approximation function (best fit)} \\
\chi_j \in \R & \dots \text{shape coefficient for point }\vec{p} \quad j=1,\dots,n \\
\end{align*}

We will also use $\b{s}, \b{u}, \b{b}, \b{\alpha}, \b{\chi} $ to annotate a column of corresponding values,
$W$ as a $n\times n$ diagonal matrix filled with $w_j$ on the diagonal and $B$ as a $n\times m$ matrix filled with $B_{j, i}$.

= Definition of local approximation =
<figure id="fig:1DWLS">
[[File:image_1avhdsfej1b9cao01029m1e13o69.png|600px|thumb|upright=2|alt=1D MLS example|<caption>Example of 1D WLS approximation </caption>]]
</figure>
Our wish is to approximate an unknown function $u\colon \R^k \to \R$ while knowing $n$ values $u(\vec{s}_j) := u_j$.
The vector of known values will be denoted by $\b{u}$ and the vector of coordinates where those values were achieved by $\b{s}$.
Note that $\b{s}$ is not a vector in the usual sense since its components $\vec{s}_j$ are elements of $\R^k$, but we will call it vector anyway.
The values of $\b{s}$ are called ''nodes'' or ''support nodes'' or ''support''. The known values $\b{u}$ are also called ''support values''.

In general, an approximation function around point $\vec{p}\in\R^k$ can be
written as \[\hat{u} (\vec{x}) = \sum_{i=1}^m \alpha_i b_i(\vec{x}) = \b{b}(\vec{x})^\T \b{\alpha} \]
where $\b{b} = (b_i)_{i=1}^m$ is a set of ''basis functions'', $b_i\colon \R^k \to\R$, and $\b{\alpha} = (\alpha_i)_{i=1}^m$ are the unknown coefficients.

In MLS the goal is to minimize the error of approximation in given values, $\b{e} = \hat u(\b{s}) - \b{u}$
between the approximation function and target function in the known points $\b{x}$. The error can also be written as $B\b{\alpha} - \b{u}$,
where $B$ is rectangular matrix of dimensions $n \times m$ with rows containing basis function evaluated in points $\vec{s}_j$.
\[ B =
\begin{bmatrix}
b_1(\vec{s}_1) & \ldots & b_m(\vec{s}_1) \\
\vdots & \ddots & \vdots \\
b_1(\vec{s}_n) & \ldots & b_m(\vec{s}_n)
\end{bmatrix} =
[b_i(\vec{s}_j)]_{j=1,i=1}^{n,m} = [\b{b}(\vec{s}_j)^\T]_{j=1}^n. \]

We can choose to minimize any norm of the error vector $e$
and usually choose to minimize the $2$-norm or square norm \[ \|\b{e}\| = \|\b{e}\|_2 = \sqrt{\sum_{j=1}^n e_j^2}. \]
Commonly, we also choose to minimize a weighted norm
<ref>Note that our definition is a bit unusual, usually weights are not
squared with the values. However, we do this to avoid computing square
roots when doing MLS. If you are used to the usual definition,
consider the weight to be $\omega^2$.</ref>
instead \[ \|\b{e}\|_{2,w} = \|\b{e}\|_w = \sqrt{\sum_{j=1}^n (w_j e_j)^2}. \]
The ''weights'' $w_i$ are assumed to be non negative and are assembled in a vector $\b{w}$ or a matrix $W = \operatorname{diag}(\b{w})$ and usually obtained from a weight function.
A ''weight function'' is a function $\omega\colon \R^k \to[0,\infty)$. We calculate $w_j$ as $w_i := \omega(\vec{p}-\vec{s}_j)$, so
good choices for $\omega$ are function which have higher values close to $0$ (making closer nodes more important), like the normal distribution.
If we choose $\omega \equiv 1$, we get the unweighted version.

A choice of minimizing the square norm gave this method its name - Least Squares approximation. If we use the weighted version, we get the Weighted Least Squares or WLS.
In the most general case we wish to minimize
\[ \|\b{e}\|_{2,w}^2 = \b{e}^\T W^2 \b{e} = (B\b{\alpha} - \b{u})^\T W^2(B\b{\alpha} - \b{u}) = \sum_j^n w_j^2 (\hat{u}(\vec{s}_j) - u_j)^2 \]

The problem of finding the coefficients $\b{\alpha}$ that minimize the error $\b{e}$ can be solved with at least three approaches:
* Normal equations (fastest, less accurate) - using Cholesky decomposition of $B$ (requires full rank and $m \leq n$)
* QR decomposition of $B$ (requires full rank and $m \leq n$, more precise)
* SVD decomposition of $B$ (more expensive, even more reliable, no rank demand)

In our Meshless Machine MLS engine we use SVD with regularization described below.

= Computing approximation coefficients =

== [http://mathworld.wolfram.com/NormalEquation.html Normal equations] ==
We seek the minimum of
\[ \|\b{e}\|_2^2 = (B\b{\alpha} - \b{u})^\T(B\b{\alpha} - \b{u}) \]
By seeking the zero gradient in terms of coefficients $\alpha_i$
\[\frac{\partial}{\partial \alpha_i} (B\b{\alpha} - \b{u})^\T(B\b{\alpha} - \b{u}) = 0\]
resulting in
\[ B^\T B\b{\alpha} = B^\T \b{u}. \]
The coefficient matrix $B^\T B$ is symmetric and positive definite. However, solving above problem directly is
poorly behaved with respect to round-off errors since the condition number $\kappa(B^\T B)$ is the square
of $\kappa(B)$.

In case of WLS the equations become
\[ (WB)^\T WB \b{\alpha} = WB^\T \b{u}. \]

Complexity of Cholesky decomposition is $\frac{n^3}{3}$ and complexity of matrix multiplication $nm^2$. To preform the Cholesky decomposition the rank of $WB$ must be full.

'''Pros:'''
* simple to implement
* low computational complexity

'''Cons:'''
* numerically unstable
* full rank requirement

== [https://en.wikipedia.org/wiki/QR_decomposition $QR$ Decomposition] ==
\[{\bf{B}} = {\bf{QR}} = \left[ {{{\bf{Q}}_1},{{\bf{Q}}_2}} \right]\left[ {\begin{array}{*{20}{c}}
{{{\bf{R}}_1}}\\
0
\end{array}} \right]\]
\[{\bf{B}} = {{\bf{Q}}_1}{{\bf{R}}_1}\]
$\bf{Q}$ is unitary matrix ($\bf{Q}^{-1}=\bf{Q}^T$). Useful property of a unitary matrices is that multiplying with them does not alter the (Euclidean) norm of a vector, i.e.,
\[\left\| {{\bf{Qx}}} \right\|{\bf{ = }}\left\| {\bf{x}} \right\|\]
And $\bf{R}$ is upper diagonal matrix
\[{\bf{R = (}}{{\bf{R}}_{\bf{1}}}{\bf{,}}0{\bf{)}}\]
therefore we can say
\[\begin{array}{l}
\left\| {{\bf{B\alpha }} - {\bf{u}}} \right\| = \left\| {{{\bf{Q}}^{\rm{T}}}\left( {{\bf{B\alpha }} - {\bf{u}}} \right)} \right\| = \left\| {{{\bf{Q}}^{\rm{T}}}{\bf{B\alpha }} - {{\bf{Q}}^{\rm{T}}}{\bf{u}}} \right\|\\
= \left\| {{{\bf{Q}}^{\rm{T}}}\left( {{\bf{QR}}} \right){\bf{\alpha }} - {{\bf{Q}}^{\rm{T}}}{\bf{u}}} \right\| = \left\| {\left( {{{\bf{R}}_1},0} \right){\bf{\alpha }} - {{\left( {{{\bf{Q}}_1},{{\bf{Q}}_{\bf{2}}}} \right)}^{\rm{T}}}{\bf{u}}} \right\|\\
= \left\| {{{\bf{R}}_{\bf{1}}}{\bf{\alpha }} - {{\bf{Q}}_{\bf{1}}}{\bf{u}}} \right\| + \left\| {{\bf{Q}}_2^{\rm{T}}{\bf{u}}} \right\|
\end{array}\]
Of the two terms on the right we have no control over the second, and we can render the first one
zero by solving
\[{{\bf{R}}_{\bf{1}}}{\bf{\alpha }} = {\bf{Q}}_{_{\bf{1}}}^{\rm{T}}{\bf{u}}\]
Which results in a minimum. We could also compute it with pseudo inverse
\[\mathbf{\alpha }={{\mathbf{B}}^{-1}}\mathbf{u}\]
Where pseudo inverse is simply \[{{\mathbf{B}}^{-1}}=\mathbf{R}_{\text{1}}^{\text{-1}}{{\mathbf{Q}}^{\text{T}}}\] (once again, $R$ is upper diagonal matrix, and $Q$ is unitary matrix).
And for weighted case
\[\mathbf{\alpha }={{\left( {{\mathbf{W}}^{0.5}}\mathbf{B} \right)}^{-1}}\left( {{\mathbf{W}}^{0.5}}\mathbf{u} \right)\]

Complexity of $QR$ decomposition \[\frac{2}{3}m{{n}^{2}}+{{n}^{2}}+\frac{1}{3}n-2=O({{n}^{3}})\]

Pros: better stability in comparison with normal equations cons: higher complexity

== [https://en.wikipedia.org/wiki/Singular_value_decomposition SVD decomposition] ==
In linear algebra, the [https://en.wikipedia.org/wiki/Singular_value_decomposition singular value decomposition (SVD)]
is a factorization of a real or complex matrix. It has many useful
applications in signal processing and statistics.

Formally, the singular value decomposition of an $m \times n$ real or complex
matrix $\bf{B}$ is a factorization of the form $\bf{B}= \bf{U\Sigma V^\T}$, where
$\bf{U}$ is an $m \times m$ real or complex unitary matrix, $\bf{\Sigma}$ is an $m \times n$
rectangular diagonal matrix with non-negative real numbers on the diagonal, and
$\bf{V}^\T$ is an $n \times n$ real or complex unitary matrix. The diagonal entries
$\Sigma_{ii}$ are known as the singular values of $\bf{B}$. The $m$ columns of
$\bf{U}$ and the $n$ columns of $\bf{V}$ are called the left-singular vectors and
right-singular vectors of $\bf{B}$, respectively.

The singular value decomposition and the eigen decomposition are closely
related. Namely:

* The left-singular vectors of $\bf{B}$ are eigen vectors of $\bf{BB}^\T$.
* The right-singular vectors of $\bf{B}$ are eigen vectors of $\bf{B}^\T{B}$.
* The non-zero singular values of $\bf{B}$ (found on the diagonal entries of $\bf{\Sigma}$) are the square roots of the non-zero eigenvalues of both $\bf{B}^\T\bf{B}$ and $\bf{B}^\T \bf{B}$.

with SVD we can write $\bf{B}$ as \[\bf{B}=\bf{U\Sigma{{V}^{\T}}}\] where are $\bf{U}$ and $\bf{V}$ again unitary matrices and $\bf{\Sigma}$
stands for diagonal matrix of singular values.

Again we can solve either the system or compute pseudo inverse as

\[ \bf{B}^{-1} = \left( \bf{U\Sigma V}^\T\right)^{-1} = \bf{V}\bf{\Sigma^{-1}U}^\T \]
where $\bf{\Sigma}^{-1}$ is trivial, just replace every non-zero diagonal entry by
its reciprocal and transpose the resulting matrix. The stability gain is
exactly here, one can now set threshold below which the singular value is
considered as $0$, basically truncate all singular values below some value and
thus stabilize the inverse.

SVD decomposition complexity \[ 2mn^2+2n^3 = O(n^3) \]

Pros: stable cons: high complexity

Method used in MLSM is SVD with regularization.

= Weighted Least Squares =
Weighted least squares approximation is the simplest version of the procedure described above. Given support $\b{s}$, values $\b{u}$
and an anchor point $\vec{p}$, we calculate the coefficients $\b{\alpha}$ using one of the above methods.
Then, to approximate a function in the neighbourhood of $\vec p$ we use the formula
\[
\hat{u}(\vec x) = \b{b}(\vec x)^\T \b{\alpha} = \sum_{i=1}^m \alpha_i b_i(\vec x).
\]

To approximate the derivative $\frac{\partial u}{\partial x_i}$, or any linear partial differential operator $\mathcal L$ on $u$, we
simply take the same linear combination of transformed basis functions $\mathcal L b_i$. We have considered coefficients $\alpha_i$ to be
constant and applied the linearity.
\[
\widehat{\mathcal L u}(\vec x) = \sum_{i=1}^m \alpha_i (\mathcal L b_i)(\vec x).
\]

= WLS at fixed point with fixed support and unknown function values :: Shape functions =
Suppose now we are given support $\b{s}$ and a point $\b{p}$ and want to construct the function approximation from values $\b{u}$.
We proceed as usual, solving the overdetermined system $WB \b{\alpha} = W\b{u}$ for coefficients $\b{\alpha}$ using the pseudoinverse
\[ \b{\alpha} = (WB)^+W\b{u}, \]
where $A^+$ denotes the Moore-Penrose pseudoinverse that can be calculated using SVD.

Writing down the approximation function $\hat{u}$ we get
\[
\hat u (\vec{p}) = \b{b}(\vec{p})^\T \b{\alpha} = \b{b}(\vec{p})^\T (WB)^+W\b{u} = \b{\chi}(\vec{p}) \b{u}.
\]

We have defined $\b{\chi}$ to be
\[ \b{\chi}(\vec{p}) = \b{b}(\vec{p})^\T (WB)^+W. \]
Vector $\b{\chi}$ is a row vector, also called a ''shape function''. The name comes from being able to take all the information
about shape of the domain and choice of approximation and store it in a single row vector, being able to approximate
a function value from given support values $\b{u}$ with a single dot product. For any values $\b{u}$, value $\b{\chi}(p) \b{u}$
gives us the approximation $\hat{u}(\vec{p})$ of $u$ in point $\vec{p}$.
Mathematically speaking, $\b{\chi}(\vec{p})$ is a functional, $\b{\chi}(\vec{p})\colon \R^n \to \R$, mapping $n$-tuples of known function values to
their approximations in point $\vec{p}$.

The same approach works for any linear operator $\mathcal L$ applied to $u$, just replace every $b_i$ in definition of $\b{\chi}$ with $\mathcal Lb_i$.
For example, take a $1$-dimensional case for approximation of derivatives with weight equal to $1$ and $n=m=3$, with equally spaced support values at distances $h$.
We wish to approximate $u''$ in the middle support point, just by making a weighted sum of the values, something like the finite difference
\[ u'' \approx \frac{u_1 - 2u_2 + u_3}{h^2}. \]
This is exactly the same formula as we would have come to by computing $\b{\chi}$, except that our approach is a lot more general. But one should think about
$\b{\chi}$ as one would about the finite difference scheme, it is a rule, telling us how to compute the derivative.
\[ u''(s_2) \approx \underbrace{\begin{bmatrix} \frac{1}{h^2} & \frac{-2}{h^2} & \frac{1}{h^2} \end{bmatrix}}_{\b{\chi}} \begin{bmatrix}u_1 \\ u_2 \\ u_3 \end{bmatrix} \]

The fact that $\b{\chi}$ is independent of the function values $\b{u}$ but depend only on domain geometry means that
'''we can just compute the shape functions $\b{\chi}$ for points of interest and then approximate any linear operator
of any function, given its values, very fast, using only a single dot product.'''

== Numerical calculation of the shape functions ==
The expression
\[ \b{\chi}(\vec{p}) = \b{b}(\vec{p})^\T (WB)^+W \]
can be evaluated directly, but this is not the most optimal approach. A numerically cheaper and more stable way is to translate the problem of inverting the matrix to solving a linear system of equations.

'''Invertible $B$ case:'''
If $B$ is invertible, then $\b{\chi}(\vec{p}) = \b{b}(\vec{p})^\T B^{-1}$, transposing the equation then multiplying it from the left by $B$,
$\b{\chi}$ can be thought as a solution of a system $B^\T\chi(\vec{p})^\T = \b{b}(\vec{p})$, which can be solved using LU or Cholesky decomposition for example.

'''General case:'''
For a system written as $Ax = b$, where $A$ is a $n\times m$ matrix, $x$ is a vector of length $m$ and $b$ a vector of length $n$, a generalized solution
is defined as $x$ that minimizes $\|A x - b\|_2^2$. If more $x$ attain the minimal value, $x$ with the minimal $\|x\|$ is chosen. Note that this generalizes the solution
a general system ($A$ is invertible) and over-determined system ($n > m$ and $A$ has full rank). Such an $x$ can be computed using the pseudoinverse $x = A^{+} b$.

In our case, let us denote a part of the solution containing the pseudoinverse by $\tilde{\b{\chi}}$.
\[ \b{\chi}(\vec{p}) = \underbrace{\b{b}(\vec{p})^\T (WB)^+}_{\tilde{\b{\chi}}} W \]
We have an expression $\tilde{\b{\chi}} = \b{b}(\vec{p})^\T (WB)^+$ which after transposition takes the form $\tilde{\b{\chi}}^\T = ((WB)^\T)^+\b{b}(\vec{p})$, the same as $x = A^+b$ above.
Therefore, $\tilde{\b{\chi}}^\T$ is the solution of an (indeterminate) system $(WB)^\T \tilde{\b{\chi}}^\T = \b{b}(\vec{p})$.
After solving that, we can get the shape function $\b\chi(\vec{p}) = \tilde{\b{\chi}} W$ by multiplying by matrix $W$.
The system before can be solved using any decomposition of matrix $(WB)^\T = B^\T W$ necessary, mosr generally the SVD decompostion, but depending on our knowledge of the
problem, we can use Cholesky ($B^\T W$ is positive definite), $LDL^\T$ if it is symmetric, $LU$ for a general square matrix, $QR$ for full rank overdetermined system and SVD for a general system.
Is more shapes need to be calculated using the same matrix $B^\T W$ and only different right hand sides, it can be done efficiently by storing the decomposition of $B^\T W$.

= MLS =

<figure id="fig:comparisonMLSandWLS">
[[File:mlswls.svg|thumb|upright=2|<caption>Comparison of WLS and MLS approximation</caption>]]
</figure>

When using WLS the approximation gets worse as we move away from the central point $\vec{p}$.
This is partially due to not being in the center of the support any more and partially due to weight
being distributed in such a way to assign more importance to nodes closer to $\vec{p}$.

We can battle this problem in two ways: when we wish to approximate in a new point that is sufficiently far
away from $\vec{p}$ we can compute new support, recompute the new coefficients $\b{\alpha}$ and approximate again.
This is very costly and we would like to avoid that. A partial fix is to keep support the same, but only
recompute the weight vector $\b{w}$, that will now assign weight values to nodes close to the new point.
We still need to recompute the coefficients $\b{\alpha}$, however we avoid the cost of setting up new support
and function values and recomputing $B$. This approach is called Moving Least Squares due to recomputing
the weighted least squares problem whenever we move the point of approximation.

Note that if out weight is constant or if $n = m$, when approximation reduces to interpolation, the weights do not play
any role and this method is redundant. In fact, its benefits arise when supports are rather large.

See <xr id="fig:comparisonMLSandWLS"/> for comparison between MLS and WLS approximations. MLS approximation remains close to
actual function while still inside the support domain, while WLS approximation becomes bad when
we come out of the reach of the weight function.

{{reflist}}

Weighted Least Squares (WLS)

2017-07-06T12:24:04Z

Mkolman: /* Numerical calculation of the shape functions */

One of the most important building blocks of the meshless methods is the Moving Least Squares (MLS) approximation , which is implemented in the [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classEngineMLS.html EngineMLS class]. Check [https://gitlab.com/e62Lab/e62numcodes/blob/master/test/mls_test.cpp EngineMLS unit tests] for examples.

= Notation Cheat sheet =
\begin{align*}
m \in \N & \dots \text{number of basis functions} \\
n \geq m \in \N & \dots \text{number of points in support domain} \\
k \in \mathbb{N} & \dots \text{dimensionality of vector space} \\
\vec s_j \in \R^k & \dots \text{point in support domain } \quad j=1,\dots,n \\
u_j \in \R & \dots \text{value of function to approximate in }\vec{s}_j \quad j=1,\dots,n \\
\vec p \in \R^k & \dots \text{center point of approximation} \\
b_i\colon \R^k \to \R & \dots \text{basis functions } \quad i=1,\dots,m \\
B_{j, i} \in \R & \dots \text{value of basis functions in support points } b_i(s_j-p) \quad j=1,\dots,n, \quad i=1,\dots,m\\
\omega \colon \R^k \to \R & \dots \text{weight function} \\
w_j \in \R & \dots \text{weights } \omega(\vec{s}_j-\vec{p}) \quad j=1,\dots,n \\
\alpha_i \in \R & \dots \text{expansion coefficients around point } \vec{p} \quad i=1,\dots,m \\
\hat u\colon \R^k \to \R & \dots \text{approximation function (best fit)} \\
\chi_j \in \R & \dots \text{shape coefficient for point }\vec{p} \quad j=1,\dots,n \\
\end{align*}

We will also use $\b{s}, \b{u}, \b{b}, \b{\alpha}, \b{\chi} $ to annotate a column of corresponding values,
$W$ as a $n\times n$ diagonal matrix filled with $w_j$ on the diagonal and $B$ as a $n\times m$ matrix filled with $B_{j, i}$.

= Definition of local approximation =
<figure id="fig:1DWLS">
[[File:image_1avhdsfej1b9cao01029m1e13o69.png|600px|thumb|upright=2|alt=1D MLS example|<caption>Example of 1D WLS approximation </caption>]]
</figure>
Our wish is to approximate an unknown function $u\colon \R^k \to \R$ while knowing $n$ values $u(\vec{s}_j) := u_j$.
The vector of known values will be denoted by $\b{u}$ and the vector of coordinates where those values were achieved by $\b{s}$.
Note that $\b{s}$ is not a vector in the usual sense since its components $\vec{s}_j$ are elements of $\R^k$, but we will call it vector anyway.
The values of $\b{s}$ are called ''nodes'' or ''support nodes'' or ''support''. The known values $\b{u}$ are also called ''support values''.

In general, an approximation function around point $\vec{p}\in\R^k$ can be
written as \[\hat{u} (\vec{x}) = \sum_{i=1}^m \alpha_i b_i(\vec{x}) = \b{b}(\vec{x})^\T \b{\alpha} \]
where $\b{b} = (b_i)_{i=1}^m$ is a set of ''basis functions'', $b_i\colon \R^k \to\R$, and $\b{\alpha} = (\alpha_i)_{i=1}^m$ are the unknown coefficients.

In MLS the goal is to minimize the error of approximation in given values, $\b{e} = \hat u(\b{s}) - \b{u}$
between the approximation function and target function in the known points $\b{x}$. The error can also be written as $B\b{\alpha} - \b{u}$,
where $B$ is rectangular matrix of dimensions $n \times m$ with rows containing basis function evaluated in points $\vec{s}_j$.
\[ B =
\begin{bmatrix}
b_1(\vec{s}_1) & \ldots & b_m(\vec{s}_1) \\
\vdots & \ddots & \vdots \\
b_1(\vec{s}_n) & \ldots & b_m(\vec{s}_n)
\end{bmatrix} =
[b_i(\vec{s}_j)]_{j=1,i=1}^{n,m} = [\b{b}(\vec{s}_j)^\T]_{j=1}^n. \]

We can choose to minimize any norm of the error vector $e$
and usually choose to minimize the $2$-norm or square norm \[ \|\b{e}\| = \|\b{e}\|_2 = \sqrt{\sum_{j=1}^n e_j^2}. \]
Commonly, we also choose to minimize a weighted norm
<ref>Note that our definition is a bit unusual, usually weights are not
squared with the values. However, we do this to avoid computing square
roots when doing MLS. If you are used to the usual definition,
consider the weight to be $\omega^2$.</ref>
instead \[ \|\b{e}\|_{2,w} = \|\b{e}\|_w = \sqrt{\sum_{j=1}^n (w_j e_j)^2}. \]
The ''weights'' $w_i$ are assumed to be non negative and are assembled in a vector $\b{w}$ or a matrix $W = \operatorname{diag}(\b{w})$ and usually obtained from a weight function.
A ''weight function'' is a function $\omega\colon \R^k \to[0,\infty)$. We calculate $w_j$ as $w_i := \omega(\vec{p}-\vec{s}_j)$, so
good choices for $\omega$ are function which have higher values close to $0$ (making closer nodes more important), like the normal distribution.
If we choose $\omega \equiv 1$, we get the unweighted version.

A choice of minimizing the square norm gave this method its name - Least Squares approximation. If we use the weighted version, we get the Weighted Least Squares or WLS.
In the most general case we wish to minimize
\[ \|\b{e}\|_{2,w}^2 = \b{e}^\T W^2 \b{e} = (B\b{\alpha} - \b{u})^\T W^2(B\b{\alpha} - \b{u}) = \sum_j^n w_j^2 (\hat{u}(\vec{s}_j) - u_j)^2 \]

The problem of finding the coefficients $\b{\alpha}$ that minimize the error $\b{e}$ can be solved with at least three approaches:
* Normal equations (fastest, less accurate) - using Cholesky decomposition of $B$ (requires full rank and $m \leq n$)
* QR decomposition of $B$ (requires full rank and $m \leq n$, more precise)
* SVD decomposition of $B$ (more expensive, even more reliable, no rank demand)

In our Meshless Machine MLS engine we use SVD with regularization described below.

= Computing approximation coefficients =

== [http://mathworld.wolfram.com/NormalEquation.html Normal equations] ==
We seek the minimum of
\[ \|\b{e}\|_2^2 = (B\b{\alpha} - \b{u})^\T(B\b{\alpha} - \b{u}) \]
By seeking the zero gradient in terms of coefficients $\alpha_i$
\[\frac{\partial}{\partial \alpha_i} (B\b{\alpha} - \b{u})^\T(B\b{\alpha} - \b{u}) = 0\]
resulting in
\[ B^\T B\b{\alpha} = B^\T \b{u}. \]
The coefficient matrix $B^\T B$ is symmetric and positive definite. However, solving above problem directly is
poorly behaved with respect to round-off errors since the condition number $\kappa(B^\T B)$ is the square
of $\kappa(B)$.

In case of WLS the equations become
\[ (WB)^\T WB \b{\alpha} = WB^\T \b{u}. \]

Complexity of Cholesky decomposition is $\frac{n^3}{3}$ and complexity of matrix multiplication $nm^2$. To preform the Cholesky decomposition the rank of $WB$ must be full.

'''Pros:'''
* simple to implement
* low computational complexity

'''Cons:'''
* numerically unstable
* full rank requirement

== [https://en.wikipedia.org/wiki/QR_decomposition $QR$ Decomposition] ==
\[{\bf{B}} = {\bf{QR}} = \left[ {{{\bf{Q}}_1},{{\bf{Q}}_2}} \right]\left[ {\begin{array}{*{20}{c}}
{{{\bf{R}}_1}}\\
0
\end{array}} \right]\]
\[{\bf{B}} = {{\bf{Q}}_1}{{\bf{R}}_1}\]
$\bf{Q}$ is unitary matrix ($\bf{Q}^{-1}=\bf{Q}^T$). Useful property of a unitary matrices is that multiplying with them does not alter the (Euclidean) norm of a vector, i.e.,
\[\left\| {{\bf{Qx}}} \right\|{\bf{ = }}\left\| {\bf{x}} \right\|\]
And $\bf{R}$ is upper diagonal matrix
\[{\bf{R = (}}{{\bf{R}}_{\bf{1}}}{\bf{,}}0{\bf{)}}\]
therefore we can say
\[\begin{array}{l}
\left\| {{\bf{B\alpha }} - {\bf{u}}} \right\| = \left\| {{{\bf{Q}}^{\rm{T}}}\left( {{\bf{B\alpha }} - {\bf{u}}} \right)} \right\| = \left\| {{{\bf{Q}}^{\rm{T}}}{\bf{B\alpha }} - {{\bf{Q}}^{\rm{T}}}{\bf{u}}} \right\|\\
= \left\| {{{\bf{Q}}^{\rm{T}}}\left( {{\bf{QR}}} \right){\bf{\alpha }} - {{\bf{Q}}^{\rm{T}}}{\bf{u}}} \right\| = \left\| {\left( {{{\bf{R}}_1},0} \right){\bf{\alpha }} - {{\left( {{{\bf{Q}}_1},{{\bf{Q}}_{\bf{2}}}} \right)}^{\rm{T}}}{\bf{u}}} \right\|\\
= \left\| {{{\bf{R}}_{\bf{1}}}{\bf{\alpha }} - {{\bf{Q}}_{\bf{1}}}{\bf{u}}} \right\| + \left\| {{\bf{Q}}_2^{\rm{T}}{\bf{u}}} \right\|
\end{array}\]
Of the two terms on the right we have no control over the second, and we can render the first one
zero by solving
\[{{\bf{R}}_{\bf{1}}}{\bf{\alpha }} = {\bf{Q}}_{_{\bf{1}}}^{\rm{T}}{\bf{u}}\]
Which results in a minimum. We could also compute it with pseudo inverse
\[\mathbf{\alpha }={{\mathbf{B}}^{-1}}\mathbf{u}\]
Where pseudo inverse is simply \[{{\mathbf{B}}^{-1}}=\mathbf{R}_{\text{1}}^{\text{-1}}{{\mathbf{Q}}^{\text{T}}}\] (once again, $R$ is upper diagonal matrix, and $Q$ is unitary matrix).
And for weighted case
\[\mathbf{\alpha }={{\left( {{\mathbf{W}}^{0.5}}\mathbf{B} \right)}^{-1}}\left( {{\mathbf{W}}^{0.5}}\mathbf{u} \right)\]

Complexity of $QR$ decomposition \[\frac{2}{3}m{{n}^{2}}+{{n}^{2}}+\frac{1}{3}n-2=O({{n}^{3}})\]

Pros: better stability in comparison with normal equations cons: higher complexity

== [https://en.wikipedia.org/wiki/Singular_value_decomposition SVD decomposition] ==
In linear algebra, the [https://en.wikipedia.org/wiki/Singular_value_decomposition singular value decomposition (SVD)]
is a factorization of a real or complex matrix. It has many useful
applications in signal processing and statistics.

Formally, the singular value decomposition of an $m \times n$ real or complex
matrix $\bf{B}$ is a factorization of the form $\bf{B}= \bf{U\Sigma V^\T}$, where
$\bf{U}$ is an $m \times m$ real or complex unitary matrix, $\bf{\Sigma}$ is an $m \times n$
rectangular diagonal matrix with non-negative real numbers on the diagonal, and
$\bf{V}^\T$ is an $n \times n$ real or complex unitary matrix. The diagonal entries
$\Sigma_{ii}$ are known as the singular values of $\bf{B}$. The $m$ columns of
$\bf{U}$ and the $n$ columns of $\bf{V}$ are called the left-singular vectors and
right-singular vectors of $\bf{B}$, respectively.

The singular value decomposition and the eigen decomposition are closely
related. Namely:

* The left-singular vectors of $\bf{B}$ are eigen vectors of $\bf{BB}^\T$.
* The right-singular vectors of $\bf{B}$ are eigen vectors of $\bf{B}^\T{B}$.
* The non-zero singular values of $\bf{B}$ (found on the diagonal entries of $\bf{\Sigma}$) are the square roots of the non-zero eigenvalues of both $\bf{B}^\T\bf{B}$ and $\bf{B}^\T \bf{B}$.

with SVD we can write $\bf{B}$ as \[\bf{B}=\bf{U\Sigma{{V}^{\T}}}\] where are $\bf{U}$ and $\bf{V}$ again unitary matrices and $\bf{\Sigma}$
stands for diagonal matrix of singular values.

Again we can solve either the system or compute pseudo inverse as

\[ \bf{B}^{-1} = \left( \bf{U\Sigma V}^\T\right)^{-1} = \bf{V}\bf{\Sigma^{-1}U}^\T \]
where $\bf{\Sigma}^{-1}$ is trivial, just replace every non-zero diagonal entry by
its reciprocal and transpose the resulting matrix. The stability gain is
exactly here, one can now set threshold below which the singular value is
considered as $0$, basically truncate all singular values below some value and
thus stabilize the inverse.

SVD decomposition complexity \[ 2mn^2+2n^3 = O(n^3) \]

Pros: stable cons: high complexity

Method used in MLSM is SVD with regularization.

= Weighted Least Squares =
Weighted least squares approximation is the simplest version of the procedure described above. Given support $\b{s}$, values $\b{u}$
and an anchor point $\vec{p}$, we calculate the coefficients $\b{\alpha}$ using one of the above methods.
Then, to approximate a function in the neighbourhood of $\vec p$ we use the formula
\[
\hat{u}(\vec x) = \b{b}(\vec x)^\T \b{\alpha} = \sum_{i=1}^m \alpha_i b_i(\vec x).
\]

To approximate the derivative $\frac{\partial u}{\partial x_i}$, or any linear partial differential operator $\mathcal L$ on $u$, we
simply take the same linear combination of transformed basis functions $\mathcal L b_i$. We have considered coefficients $\alpha_i$ to be
constant and applied the linearity.
\[
\widehat{\mathcal L u}(\vec x) = \sum_{i=1}^m \alpha_i (\mathcal L b_i)(\vec x).
\]

= WLS at fixed point with fixed support and unknown function values :: Shape functions =
Suppose now we are given support $\b{s}$ and a point $\b{p}$ and want to construct the function approximation from values $\b{u}$.
We proceed as usual, solving the overdetermined system $WB \b{\alpha} = W\b{u}$ for coefficients $\b{\alpha}$ using the pseudoinverse
\[ \b{\alpha} = (WB)^+W\b{u}, \]
where $A^+$ denotes the Moore-Penrose pseudoinverse that can be calculated using SVD.

Writing down the approximation function $\hat{u}$ we get
\[
\hat u (\vec{p}) = \b{b}(\vec{p})^\T \b{\alpha} = \b{b}(\vec{p})^\T (WB)^+W\b{u} = \b{\chi}(\vec{p}) \b{u}.
\]

We have defined $\b{\chi}$ to be
\[ \b{\chi}(\vec{p}) = \b{b}(\vec{p})^\T (WB)^+W. \]
Vector $\b{\chi}$ is a row vector, also called a ''shape function''. The name comes from being able to take all the information
about shape of the domain and choice of approximation and store it in a single row vector, being able to approximate
a function value from given support values $\b{u}$ with a single dot product. For any values $\b{u}$, value $\b{\chi}(p) \b{u}$
gives us the approximation $\hat{u}(\vec{p})$ of $u$ in point $\vec{p}$.
Mathematically speaking, $\b{\chi}(\vec{p})$ is a functional, $\b{\chi}(\vec{p})\colon \R^n \to \R$, mapping $n$-tuples of known function values to
their approximations in point $\vec{p}$.

The same approach works for any linear operator $\mathcal L$ applied to $u$, just replace every $b_i$ in definition of $\b{\chi}$ with $\mathcal Lb_i$.
For example, take a $1$-dimensional case for approximation of derivatives with weight equal to $1$ and $n=m=3$, with equally spaced support values at distances $h$.
We wish to approximate $u''$ in the middle support point, just by making a weighted sum of the values, something like the finite difference
\[ u'' \approx \frac{u_1 - 2u_2 + u_3}{h^2}. \]
This is exactly the same formula as we would have come to by computing $\b{\chi}$, except that our approach is a lot more general. But one should think about
$\b{\chi}$ as one would about the finite difference scheme, it is a rule, telling us how to compute the derivative.
\[ u''(s_2) \approx \underbrace{\begin{bmatrix} \frac{1}{h^2} & \frac{-2}{h^2} & \frac{1}{h^2} \end{bmatrix}}_{\b{\chi}} \begin{bmatrix}u_1 \\ u_2 \\ u_3 \end{bmatrix} \]

The fact that $\b{\chi}$ is independent of the function values $\b{u}$ but depend only on domain geometry means that
'''we can just compute the shape functions $\b{\chi}$ for points of interest and then approximate any linear operator
of any function, given its values, very fast, using only a single dot product.'''

== Numerical calculation of the shape functions ==
The expression
\[ \b{\chi}(\vec{p}) = \b{b}(\vec{p})^\T (WB)^+W \]
can be evaluated directly, but this is not the most optimal approach. A numerically cheaper and more stable way is to translate the problem of inverting the matrix to solving a linear system of equations.

'''Invertible $B$ case:'''
If $B$ is invertible, then $\b{\chi}(\vec{p}) = \b{b}(\vec{p})^\T B^{-1}$, transposing the equation then multiplying it from the left by $B$,
$\b{\chi}$ can be thought as a solution of a system $B^\T\chi(\vec{p})^\T = \b{b}(\vec{p})$, which can be solved using LU or Cholesky decomposition for example.

'''General case:'''
For a system written as $Ax = b$, where $A$ is a $n\times m$ matrix, $x$ is a vector of length $m$ and $b$ a vector of length $m$, a generalized solution
$x$ is defined as such an $x$ that minimizes $\|A x - b\|_2^2$. If more $x$ attain the minimal value, $x$ with the minimal $\|x\|$ is chosen. Note that this generalizes the solution
a general sistem ($A$ is invertible) and over-determined system ($n > m$ and $A$ has full rank). Such an $x$ can be computed using the pseudoinverse $x = A^{+} b$.

In our case, let us denote a part of the solution containing the pseudoinverse by $\tilde{\b{\chi}}$.
\[ \b{\chi}(\vec{p}) = \underbrace{\b{b}(\vec{p})^\T (WB)^+}_{\tilde{\b{\chi}}} W \]
We have an expression $\tilde{\b{\chi}} = \b{b}(\vec{p})^\T (WB)^+$ which after transposition takes the form $\tilde{\b{\chi}}^\T = ((WB)^\T)^+\b{b}(\vec{p})$, the same as $x = A^+b$ above.
Therefore, $\tilde{\b{\chi}}^\T$ is the solution of an (indeterminate) system $(WB)^\T \tilde{\b{\chi}}^\T = \b{b}(\vec{p})$.
After solving that, we can get the shape function $\b\chi(\vec{p}) = \tilde{\b{\chi}} W$ by multiplying by matrix $W$.
The system before can be solved using any decomposition of matrix $(WB)^\T = B^\T W$ necessary, mosr generally the SVD decompostion, but depending on our knowledge of the
problem, we can use Cholesky ($B^\T W$ is positive definite), $LDL^\T$ if it is symmetric, $LU$ for a general square matrix, $QR$ for full rank overdetermined system and SVD for a general system.
Is more shapes need to be calculated using the same matrix $B^\T W$ and only different right hand sides, it can be done efficiently by storing the decomposition of $B^\T W$.

= MLS =

<figure id="fig:comparisonMLSandWLS">
[[File:mlswls.svg|thumb|upright=2|<caption>Comparison of WLS and MLS approximation</caption>]]
</figure>

When using WLS the approximation gets worse as we move away from the central point $\vec{p}$.
This is partially due to not being in the center of the support any more and partially due to weight
being distributed in such a way to assign more importance to nodes closer to $\vec{p}$.

We can battle this problem in two ways: when we wish to approximate in a new point that is sufficiently far
away from $\vec{p}$ we can compute new support, recompute the new coefficients $\b{\alpha}$ and approximate again.
This is very costly and we would like to avoid that. A partial fix is to keep support the same, but only
recompute the weight vector $\b{w}$, that will now assign weight values to nodes close to the new point.
We still need to recompute the coefficients $\b{\alpha}$, however we avoid the cost of setting up new support
and function values and recomputing $B$. This approach is called Moving Least Squares due to recomputing
the weighted least squares problem whenever we move the point of approximation.

Note that if out weight is constant or if $n = m$, when approximation reduces to interpolation, the weights do not play
any role and this method is redundant. In fact, its benefits arise when supports are rather large.

See <xr id="fig:comparisonMLSandWLS"/> for comparison between MLS and WLS approximations. MLS approximation remains close to
actual function while still inside the support domain, while WLS approximation becomes bad when
we come out of the reach of the weight function.

{{reflist}}

Weighted Least Squares (WLS)

2017-07-06T12:23:12Z

Mkolman: /* Numerical calculation of the shape functions */

One of the most important building blocks of the meshless methods is the Moving Least Squares (MLS) approximation , which is implemented in the [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classEngineMLS.html EngineMLS class]. Check [https://gitlab.com/e62Lab/e62numcodes/blob/master/test/mls_test.cpp EngineMLS unit tests] for examples.

= Notation Cheat sheet =
\begin{align*}
m \in \N & \dots \text{number of basis functions} \\
n \geq m \in \N & \dots \text{number of points in support domain} \\
k \in \mathbb{N} & \dots \text{dimensionality of vector space} \\
\vec s_j \in \R^k & \dots \text{point in support domain } \quad j=1,\dots,n \\
u_j \in \R & \dots \text{value of function to approximate in }\vec{s}_j \quad j=1,\dots,n \\
\vec p \in \R^k & \dots \text{center point of approximation} \\
b_i\colon \R^k \to \R & \dots \text{basis functions } \quad i=1,\dots,m \\
B_{j, i} \in \R & \dots \text{value of basis functions in support points } b_i(s_j-p) \quad j=1,\dots,n, \quad i=1,\dots,m\\
\omega \colon \R^k \to \R & \dots \text{weight function} \\
w_j \in \R & \dots \text{weights } \omega(\vec{s}_j-\vec{p}) \quad j=1,\dots,n \\
\alpha_i \in \R & \dots \text{expansion coefficients around point } \vec{p} \quad i=1,\dots,m \\
\hat u\colon \R^k \to \R & \dots \text{approximation function (best fit)} \\
\chi_j \in \R & \dots \text{shape coefficient for point }\vec{p} \quad j=1,\dots,n \\
\end{align*}

We will also use $\b{s}, \b{u}, \b{b}, \b{\alpha}, \b{\chi} $ to annotate a column of corresponding values,
$W$ as a $n\times n$ diagonal matrix filled with $w_j$ on the diagonal and $B$ as a $n\times m$ matrix filled with $B_{j, i}$.

= Definition of local approximation =
<figure id="fig:1DWLS">
[[File:image_1avhdsfej1b9cao01029m1e13o69.png|600px|thumb|upright=2|alt=1D MLS example|<caption>Example of 1D WLS approximation </caption>]]
</figure>
Our wish is to approximate an unknown function $u\colon \R^k \to \R$ while knowing $n$ values $u(\vec{s}_j) := u_j$.
The vector of known values will be denoted by $\b{u}$ and the vector of coordinates where those values were achieved by $\b{s}$.
Note that $\b{s}$ is not a vector in the usual sense since its components $\vec{s}_j$ are elements of $\R^k$, but we will call it vector anyway.
The values of $\b{s}$ are called ''nodes'' or ''support nodes'' or ''support''. The known values $\b{u}$ are also called ''support values''.

In general, an approximation function around point $\vec{p}\in\R^k$ can be
written as \[\hat{u} (\vec{x}) = \sum_{i=1}^m \alpha_i b_i(\vec{x}) = \b{b}(\vec{x})^\T \b{\alpha} \]
where $\b{b} = (b_i)_{i=1}^m$ is a set of ''basis functions'', $b_i\colon \R^k \to\R$, and $\b{\alpha} = (\alpha_i)_{i=1}^m$ are the unknown coefficients.

In MLS the goal is to minimize the error of approximation in given values, $\b{e} = \hat u(\b{s}) - \b{u}$
between the approximation function and target function in the known points $\b{x}$. The error can also be written as $B\b{\alpha} - \b{u}$,
where $B$ is rectangular matrix of dimensions $n \times m$ with rows containing basis function evaluated in points $\vec{s}_j$.
\[ B =
\begin{bmatrix}
b_1(\vec{s}_1) & \ldots & b_m(\vec{s}_1) \\
\vdots & \ddots & \vdots \\
b_1(\vec{s}_n) & \ldots & b_m(\vec{s}_n)
\end{bmatrix} =
[b_i(\vec{s}_j)]_{j=1,i=1}^{n,m} = [\b{b}(\vec{s}_j)^\T]_{j=1}^n. \]

We can choose to minimize any norm of the error vector $e$
and usually choose to minimize the $2$-norm or square norm \[ \|\b{e}\| = \|\b{e}\|_2 = \sqrt{\sum_{j=1}^n e_j^2}. \]
Commonly, we also choose to minimize a weighted norm
<ref>Note that our definition is a bit unusual, usually weights are not
squared with the values. However, we do this to avoid computing square
roots when doing MLS. If you are used to the usual definition,
consider the weight to be $\omega^2$.</ref>
instead \[ \|\b{e}\|_{2,w} = \|\b{e}\|_w = \sqrt{\sum_{j=1}^n (w_j e_j)^2}. \]
The ''weights'' $w_i$ are assumed to be non negative and are assembled in a vector $\b{w}$ or a matrix $W = \operatorname{diag}(\b{w})$ and usually obtained from a weight function.
A ''weight function'' is a function $\omega\colon \R^k \to[0,\infty)$. We calculate $w_j$ as $w_i := \omega(\vec{p}-\vec{s}_j)$, so
good choices for $\omega$ are function which have higher values close to $0$ (making closer nodes more important), like the normal distribution.
If we choose $\omega \equiv 1$, we get the unweighted version.

A choice of minimizing the square norm gave this method its name - Least Squares approximation. If we use the weighted version, we get the Weighted Least Squares or WLS.
In the most general case we wish to minimize
\[ \|\b{e}\|_{2,w}^2 = \b{e}^\T W^2 \b{e} = (B\b{\alpha} - \b{u})^\T W^2(B\b{\alpha} - \b{u}) = \sum_j^n w_j^2 (\hat{u}(\vec{s}_j) - u_j)^2 \]

The problem of finding the coefficients $\b{\alpha}$ that minimize the error $\b{e}$ can be solved with at least three approaches:
* Normal equations (fastest, less accurate) - using Cholesky decomposition of $B$ (requires full rank and $m \leq n$)
* QR decomposition of $B$ (requires full rank and $m \leq n$, more precise)
* SVD decomposition of $B$ (more expensive, even more reliable, no rank demand)

In our Meshless Machine MLS engine we use SVD with regularization described below.

= Computing approximation coefficients =

== [http://mathworld.wolfram.com/NormalEquation.html Normal equations] ==
We seek the minimum of
\[ \|\b{e}\|_2^2 = (B\b{\alpha} - \b{u})^\T(B\b{\alpha} - \b{u}) \]
By seeking the zero gradient in terms of coefficients $\alpha_i$
\[\frac{\partial}{\partial \alpha_i} (B\b{\alpha} - \b{u})^\T(B\b{\alpha} - \b{u}) = 0\]
resulting in
\[ B^\T B\b{\alpha} = B^\T \b{u}. \]
The coefficient matrix $B^\T B$ is symmetric and positive definite. However, solving above problem directly is
poorly behaved with respect to round-off errors since the condition number $\kappa(B^\T B)$ is the square
of $\kappa(B)$.

In case of WLS the equations become
\[ (WB)^\T WB \b{\alpha} = WB^\T \b{u}. \]

Complexity of Cholesky decomposition is $\frac{n^3}{3}$ and complexity of matrix multiplication $nm^2$. To preform the Cholesky decomposition the rank of $WB$ must be full.

'''Pros:'''
* simple to implement
* low computational complexity

'''Cons:'''
* numerically unstable
* full rank requirement

== [https://en.wikipedia.org/wiki/QR_decomposition $QR$ Decomposition] ==
\[{\bf{B}} = {\bf{QR}} = \left[ {{{\bf{Q}}_1},{{\bf{Q}}_2}} \right]\left[ {\begin{array}{*{20}{c}}
{{{\bf{R}}_1}}\\
0
\end{array}} \right]\]
\[{\bf{B}} = {{\bf{Q}}_1}{{\bf{R}}_1}\]
$\bf{Q}$ is unitary matrix ($\bf{Q}^{-1}=\bf{Q}^T$). Useful property of a unitary matrices is that multiplying with them does not alter the (Euclidean) norm of a vector, i.e.,
\[\left\| {{\bf{Qx}}} \right\|{\bf{ = }}\left\| {\bf{x}} \right\|\]
And $\bf{R}$ is upper diagonal matrix
\[{\bf{R = (}}{{\bf{R}}_{\bf{1}}}{\bf{,}}0{\bf{)}}\]
therefore we can say
\[\begin{array}{l}
\left\| {{\bf{B\alpha }} - {\bf{u}}} \right\| = \left\| {{{\bf{Q}}^{\rm{T}}}\left( {{\bf{B\alpha }} - {\bf{u}}} \right)} \right\| = \left\| {{{\bf{Q}}^{\rm{T}}}{\bf{B\alpha }} - {{\bf{Q}}^{\rm{T}}}{\bf{u}}} \right\|\\
= \left\| {{{\bf{Q}}^{\rm{T}}}\left( {{\bf{QR}}} \right){\bf{\alpha }} - {{\bf{Q}}^{\rm{T}}}{\bf{u}}} \right\| = \left\| {\left( {{{\bf{R}}_1},0} \right){\bf{\alpha }} - {{\left( {{{\bf{Q}}_1},{{\bf{Q}}_{\bf{2}}}} \right)}^{\rm{T}}}{\bf{u}}} \right\|\\
= \left\| {{{\bf{R}}_{\bf{1}}}{\bf{\alpha }} - {{\bf{Q}}_{\bf{1}}}{\bf{u}}} \right\| + \left\| {{\bf{Q}}_2^{\rm{T}}{\bf{u}}} \right\|
\end{array}\]
Of the two terms on the right we have no control over the second, and we can render the first one
zero by solving
\[{{\bf{R}}_{\bf{1}}}{\bf{\alpha }} = {\bf{Q}}_{_{\bf{1}}}^{\rm{T}}{\bf{u}}\]
Which results in a minimum. We could also compute it with pseudo inverse
\[\mathbf{\alpha }={{\mathbf{B}}^{-1}}\mathbf{u}\]
Where pseudo inverse is simply \[{{\mathbf{B}}^{-1}}=\mathbf{R}_{\text{1}}^{\text{-1}}{{\mathbf{Q}}^{\text{T}}}\] (once again, $R$ is upper diagonal matrix, and $Q$ is unitary matrix).
And for weighted case
\[\mathbf{\alpha }={{\left( {{\mathbf{W}}^{0.5}}\mathbf{B} \right)}^{-1}}\left( {{\mathbf{W}}^{0.5}}\mathbf{u} \right)\]

Complexity of $QR$ decomposition \[\frac{2}{3}m{{n}^{2}}+{{n}^{2}}+\frac{1}{3}n-2=O({{n}^{3}})\]

Pros: better stability in comparison with normal equations cons: higher complexity

== [https://en.wikipedia.org/wiki/Singular_value_decomposition SVD decomposition] ==
In linear algebra, the [https://en.wikipedia.org/wiki/Singular_value_decomposition singular value decomposition (SVD)]
is a factorization of a real or complex matrix. It has many useful
applications in signal processing and statistics.

Formally, the singular value decomposition of an $m \times n$ real or complex
matrix $\bf{B}$ is a factorization of the form $\bf{B}= \bf{U\Sigma V^\T}$, where
$\bf{U}$ is an $m \times m$ real or complex unitary matrix, $\bf{\Sigma}$ is an $m \times n$
rectangular diagonal matrix with non-negative real numbers on the diagonal, and
$\bf{V}^\T$ is an $n \times n$ real or complex unitary matrix. The diagonal entries
$\Sigma_{ii}$ are known as the singular values of $\bf{B}$. The $m$ columns of
$\bf{U}$ and the $n$ columns of $\bf{V}$ are called the left-singular vectors and
right-singular vectors of $\bf{B}$, respectively.

The singular value decomposition and the eigen decomposition are closely
related. Namely:

* The left-singular vectors of $\bf{B}$ are eigen vectors of $\bf{BB}^\T$.
* The right-singular vectors of $\bf{B}$ are eigen vectors of $\bf{B}^\T{B}$.
* The non-zero singular values of $\bf{B}$ (found on the diagonal entries of $\bf{\Sigma}$) are the square roots of the non-zero eigenvalues of both $\bf{B}^\T\bf{B}$ and $\bf{B}^\T \bf{B}$.

with SVD we can write $\bf{B}$ as \[\bf{B}=\bf{U\Sigma{{V}^{\T}}}\] where are $\bf{U}$ and $\bf{V}$ again unitary matrices and $\bf{\Sigma}$
stands for diagonal matrix of singular values.

Again we can solve either the system or compute pseudo inverse as

\[ \bf{B}^{-1} = \left( \bf{U\Sigma V}^\T\right)^{-1} = \bf{V}\bf{\Sigma^{-1}U}^\T \]
where $\bf{\Sigma}^{-1}$ is trivial, just replace every non-zero diagonal entry by
its reciprocal and transpose the resulting matrix. The stability gain is
exactly here, one can now set threshold below which the singular value is
considered as $0$, basically truncate all singular values below some value and
thus stabilize the inverse.

SVD decomposition complexity \[ 2mn^2+2n^3 = O(n^3) \]

Pros: stable cons: high complexity

Method used in MLSM is SVD with regularization.

= Weighted Least Squares =
Weighted least squares approximation is the simplest version of the procedure described above. Given support $\b{s}$, values $\b{u}$
and an anchor point $\vec{p}$, we calculate the coefficients $\b{\alpha}$ using one of the above methods.
Then, to approximate a function in the neighbourhood of $\vec p$ we use the formula
\[
\hat{u}(\vec x) = \b{b}(\vec x)^\T \b{\alpha} = \sum_{i=1}^m \alpha_i b_i(\vec x).
\]

To approximate the derivative $\frac{\partial u}{\partial x_i}$, or any linear partial differential operator $\mathcal L$ on $u$, we
simply take the same linear combination of transformed basis functions $\mathcal L b_i$. We have considered coefficients $\alpha_i$ to be
constant and applied the linearity.
\[
\widehat{\mathcal L u}(\vec x) = \sum_{i=1}^m \alpha_i (\mathcal L b_i)(\vec x).
\]

= WLS at fixed point with fixed support and unknown function values :: Shape functions =
Suppose now we are given support $\b{s}$ and a point $\b{p}$ and want to construct the function approximation from values $\b{u}$.
We proceed as usual, solving the overdetermined system $WB \b{\alpha} = W\b{u}$ for coefficients $\b{\alpha}$ using the pseudoinverse
\[ \b{\alpha} = (WB)^+W\b{u}, \]
where $A^+$ denotes the Moore-Penrose pseudoinverse that can be calculated using SVD.

Writing down the approximation function $\hat{u}$ we get
\[
\hat u (\vec{p}) = \b{b}(\vec{p})^\T \b{\alpha} = \b{b}(\vec{p})^\T (WB)^+W\b{u} = \b{\chi}(\vec{p}) \b{u}.
\]

We have defined $\b{\chi}$ to be
\[ \b{\chi}(\vec{p}) = \b{b}(\vec{p})^\T (WB)^+W. \]
Vector $\b{\chi}$ is a row vector, also called a ''shape function''. The name comes from being able to take all the information
about shape of the domain and choice of approximation and store it in a single row vector, being able to approximate
a function value from given support values $\b{u}$ with a single dot product. For any values $\b{u}$, value $\b{\chi}(p) \b{u}$
gives us the approximation $\hat{u}(\vec{p})$ of $u$ in point $\vec{p}$.
Mathematically speaking, $\b{\chi}(\vec{p})$ is a functional, $\b{\chi}(\vec{p})\colon \R^n \to \R$, mapping $n$-tuples of known function values to
their approximations in point $\vec{p}$.

The same approach works for any linear operator $\mathcal L$ applied to $u$, just replace every $b_i$ in definition of $\b{\chi}$ with $\mathcal Lb_i$.
For example, take a $1$-dimensional case for approximation of derivatives with weight equal to $1$ and $n=m=3$, with equally spaced support values at distances $h$.
We wish to approximate $u''$ in the middle support point, just by making a weighted sum of the values, something like the finite difference
\[ u'' \approx \frac{u_1 - 2u_2 + u_3}{h^2}. \]
This is exactly the same formula as we would have come to by computing $\b{\chi}$, except that our approach is a lot more general. But one should think about
$\b{\chi}$ as one would about the finite difference scheme, it is a rule, telling us how to compute the derivative.
\[ u''(s_2) \approx \underbrace{\begin{bmatrix} \frac{1}{h^2} & \frac{-2}{h^2} & \frac{1}{h^2} \end{bmatrix}}_{\b{\chi}} \begin{bmatrix}u_1 \\ u_2 \\ u_3 \end{bmatrix} \]

The fact that $\b{\chi}$ is independent of the function values $\b{u}$ but depend only on domain geometry means that
'''we can just compute the shape functions $\b{\chi}$ for points of interest and then approximate any linear operator
of any function, given its values, very fast, using only a single dot product.'''

== Numerical calculation of the shape functions ==
The expression
\[ \b{\chi}(\vec{p}) = \b{b}(\vec{p})^\T (WB)^+W \]
can be evaluated directly, but this is not the most optimal approach. A numerically cheaper and more stable way is to translate the problem of inverting the matrix to solving a linear system of equations.

'''Invertible $B$ case:'''
If $B$ is invertible, then $\b{\chi}(\vec{p}) = \b{b}(\vec{p})^\T B^{-1}$, transposing the equation then multiplying it from the left by $B$,
$\b{\chi}$ can be thought as a solution of a system $B^\T\chi(\vec{p})^\T = \b{b}(\vec{p})$, which can be solved using LU or Cholesky decomposition for example.

'''General case:'''
For a system written as $Ax = b$, where a is a $n\times m$ matrix, $x$ is a vector of length $m$ and $b$ a vector of length $m$, a generalized solution
$x$ is defined as such an $x$ that minimizes $\|A x - b\|_2^2$. If more $x$ attain the minimal value, $x$ with the minimal $\|x\|$ is chosen. Note that this generalizes the solution
a general sistem ($A$ is invertible) and over-determined system ($n > m$ and $A$ has full rank). Such an $x$ can be computed using the pseudoinverse $x = A^{+} b$.

In our case, let us denote a part of the solution containing the pseudoinverse by $\tilde{\b{\chi}}$.
\[ \b{\chi}(\vec{p}) = \underbrace{\b{b}(\vec{p})^\T (WB)^+}_{\tilde{\b{\chi}}} W \]
We have an expression $\tilde{\b{\chi}} = \b{b}(\vec{p})^\T (WB)^+$ which after transposition takes the form $\tilde{\b{\chi}}^\T = ((WB)^\T)^+\b{b}(\vec{p})$, the same as $x = A^+b$ above.
Therefore, $\tilde{\b{\chi}}^\T$ is the solution of an (indeterminate) system $(WB)^\T \tilde{\b{\chi}}^\T = \b{b}(\vec{p})$.
After solving that, we can get the shape function $\b\chi(\vec{p}) = \tilde{\b{\chi}} W$ by multiplying by matrix $W$.
The system before can be solved using any decomposition of matrix $(WB)^\T = B^\T W$ necessary, mosr generally the SVD decompostion, but depending on our knowledge of the
problem, we can use Cholesky ($B^\T W$ is positive definite), $LDL^\T$ if it is symmetric, $LU$ for a general square matrix, $QR$ for full rank overdetermined system and SVD for a general system.
Is more shapes need to be calculated using the same matrix $B^\T W$ and only different right hand sides, it can be done efficiently by storing the decomposition of $B^\T W$.

= MLS =

<figure id="fig:comparisonMLSandWLS">
[[File:mlswls.svg|thumb|upright=2|<caption>Comparison of WLS and MLS approximation</caption>]]
</figure>

When using WLS the approximation gets worse as we move away from the central point $\vec{p}$.
This is partially due to not being in the center of the support any more and partially due to weight
being distributed in such a way to assign more importance to nodes closer to $\vec{p}$.

We can battle this problem in two ways: when we wish to approximate in a new point that is sufficiently far
away from $\vec{p}$ we can compute new support, recompute the new coefficients $\b{\alpha}$ and approximate again.
This is very costly and we would like to avoid that. A partial fix is to keep support the same, but only
recompute the weight vector $\b{w}$, that will now assign weight values to nodes close to the new point.
We still need to recompute the coefficients $\b{\alpha}$, however we avoid the cost of setting up new support
and function values and recomputing $B$. This approach is called Moving Least Squares due to recomputing
the weighted least squares problem whenever we move the point of approximation.

Note that if out weight is constant or if $n = m$, when approximation reduces to interpolation, the weights do not play
any role and this method is redundant. In fact, its benefits arise when supports are rather large.

See <xr id="fig:comparisonMLSandWLS"/> for comparison between MLS and WLS approximations. MLS approximation remains close to
actual function while still inside the support domain, while WLS approximation becomes bad when
we come out of the reach of the weight function.

{{reflist}}

How to build

2017-06-12T09:16:40Z

Mkolman: /* Building on Mac OSX */

=Installation=
To make this work from plain Ubuntu installation, run
<syntaxhighlight lang="bash">
sudo apt-get install cmake doxygen graphviz libboost-dev libsfml-dev libhdf5-serial-dev
git clone https://gitlab.com/e62Lab/e62numcodes.git
cd e62numcodes
./run_tests.sh
</syntaxhighlight>
which installs dependancies, clones the repository, goes into the root folder of the repository and runs tests. This will check the configuration, notify you of potentially missing dependencies, build and run all tests, check code style and docs. If this works, you are ready to go! Otherwise install any missing packages and if it still fails, raise an issue!

=Building=

List of dependencies:

* Build tools, like <syntaxhighlight lang="bash" inline> cmake >= 2.8.12</syntaxhighlight>, <syntaxhighlight lang="bash" inline> g++ >= 4.8</syntaxhighlight>, <syntaxhighlight lang="bash" inline>make</syntaxhighlight>
* [http://www.boost.org/ Boost]
* <syntaxhighlight lang="bash" inline> doxygen >= 1.8.8 </syntaxhighlight> and Graphviz
* for drawing [http://www.sfml-dev.org/ SMFL library version 2]
* for IO [https://www.hdfgroup.org/ HDF5 library]

Out of source builds are preferred. Run
<syntaxhighlight lang="bash">
mkdir -p build
cd build
cmake ..
make
</syntaxhighlight>
Note that you only have to run <syntaxhighlight lang="bash" inline> cmake </syntaxhighlight> once, after that only <syntaxhighlight lang="bash" inline> make</syntaxhighlight> is sufficient.

Binaries are placed into <syntaxhighlight lang="bash" inline> bin </syntaxhighlight> folder. Test can be run all at once via <syntaxhighlight lang="bash" inline> make
run_all_tests </syntaxhighlight> or individually via eg. <syntaxhighlight lang="bash" inline>make basisfunc_run_tests </syntaxhighlight>.

==Linear Algebra==
We use [http://eigen.tuxfamily.org/ Eigen] as our matrix library. See
[http://eigen.tuxfamily.org/dox-devel/group__QuickRefPage.html here] for use
reference and documentation. For a quick transition from Matlab see
[http://eigen.tuxfamily.org/dox/AsciiQuickReference.txt here].

==Drawing==
Some tests include drawing. We are using [http://www.sfml-dev.org/ SMFL library], which can be installed on most linux systems easly
as <syntaxhighlight lang="bash" inline> sudo apt-get install libsfml-dev </syntaxhighlight> or <syntaxhighlight lang="bash" inline> sudo pacman -S sfml </syntaxhighlight>. After the
installation uncomment a test case in <syntaxhighlight lang="bash" inline> domain_draw_test.cpp </syntaxhighlight> and run <syntaxhighlight lang="bash" inline> make
test_domain_draw </syntaxhighlight> to see the visual effect.

Binaries using SFML require additional linker flags <syntaxhighlight lang="bash" inline> -lsfml-graphics
-lsfml-window -lsfml-system </syntaxhighlight>, but the makefile should take care of that for you.

==HDF5==

In order to use IO you need [https://www.hdfgroup.org/ hdf5 library].
You can install it easily using the command <syntaxhighlight lang="bash" inline> sudo apt-get install libhdf5-
dev </syntaxhighlight> or <syntaxhighlight lang="bash" inline> sudo pacman -S hdf5-cpp-fortran </syntaxhighlight>.

Ubuntu places hdf5 libs in a werid folder <code>/usr/lib/x86_64-linux-gnu/hdf5/serial/</code>.
If you get error similar to <code>-lhdf5 not found</code> and you have hdf5 installed, you might have to link the libraries into
a discoverable place, like <code>/usr/lib/</code> or add the above directory to the linker path.
If using cmake, you can add the following line to your CMakeLists.txt
link_directories(/usr/lib/x86_64-linux-gnu/hdf5/serial/)

== Using Intel Math Kernel Library (MKL) ==
Eigen has great support for MKL all you have to do is define a EIGEN_USE_MKL_ALL macro before any includes.
You can see further instructions [https://eigen.tuxfamily.org/dox/TopicUsingIntelMKL.html on their website].

Besides setting <syntaxhighlight lang="c++" inline>#define EIGEN_USE_MKL_ALL</syntaxhighlight> in your code,
some linker and compilation fixes are needed. You have to set MKL and MKLROOT variables in cmake. You can define
the variable MKLROOT as a system variable (using export) which is enough. You can also define it manually when calling
cmake. If it is not set in either way it will default to "/opt/intel/compilers_and_libraries_2017.2.174/linux/mkl".
<syntaxhighlight lang="bash">
cmake .. -DMKL=ON -DMKLROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl
</syntaxhighlight>
Your target has to be linked with some MKL libraries so make sure to add the following link to your cmake file.
<syntaxhighlight lang="cmake">
target_link_libraries(target ${LMKL})
</syntaxhighlight>

== Building on Mac OSX ==
This method was tested on El Capitan. Linking the OpenMP library is still not functioning as intended.

First install all dependencies from homebrew
<syntaxhighlight lang="bash">
brew install llvm cmake homebrew/science/hdf5 SFML
</syntaxhighlight>
Now you can clone and build the project with CLang using the following commands
<syntaxhighlight lang="bash">
git clone https://gitlab.com/e62Lab/e62numcodes.git
cd e62numcodes
mkdir build
cd build
cmake .. -DCMAKE_C_COMPILER=/usr/local/opt/llvm/bin/clang -DCMAKE_CXX_COMPILER=/usr/local/opt/llvm/bin/clang++
make domain_run_tests
</syntaxhighlight>

== Using Intel C/C++ Compiler ==

In order to use Intel's compiler you have to first set the <syntaxhighlight lang="bash" inline>CXX</syntaxhighlight>
and <syntaxhighlight lang="bash" inline>CC</syntaxhighlight> bash variables. Before calling
<syntaxhighlight lang="bash" inline> cmake </syntaxhighlight> for the first time you have to export the following:

<syntaxhighlight lang="bash">
export CXX="icpc"
export CC="icc"
</syntaxhighlight>

or you can define the compiler when first calling cmake like so:
<syntaxhighlight lang="bash">
cmake .. -DCMAKE_C_COMPILER=$(which icc) -DCMAKE_CXX_COMPILER=$(which icpc)
</syntaxhighlight>

You can also compile it directly for Intel® Xeon Phi™ Coprocessor. You do this by adding <syntaxhighlight lang="bash" inline>-Dmmic=ON</syntaxhighlight>
flag to the <syntaxhighlight lang="bash" inline>cmake</syntaxhighlight> command:
<syntaxhighlight lang="bash">
cmake .. -Dmmic=ON -DCMAKE_C_COMPILER=$(which icc) -DCMAKE_CXX_COMPILER=$(which icpc)
</syntaxhighlight>

Note: All features that depend on system third-party libraries are not available on MIC (Many Integrated Core).
This includes:

* [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classmm_1_1HDF5IO.html HDF5IO] class in <syntaxhighlight lang="bash" inline>io.hpp</syntaxhighlight>
* [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classmm_1_1Monitor.html Monitor] class in <syntaxhighlight lang="bash" inline>util.hpp</syntaxhighlight>
* all of [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/util_8hpp_source.html <syntaxhighlight lang="bash" inline>draw.hpp</syntaxhighlight>]

How to build

2017-06-12T09:15:10Z

Mkolman:

=Installation=
To make this work from plain Ubuntu installation, run
<syntaxhighlight lang="bash">
sudo apt-get install cmake doxygen graphviz libboost-dev libsfml-dev libhdf5-serial-dev
git clone https://gitlab.com/e62Lab/e62numcodes.git
cd e62numcodes
./run_tests.sh
</syntaxhighlight>
which installs dependancies, clones the repository, goes into the root folder of the repository and runs tests. This will check the configuration, notify you of potentially missing dependencies, build and run all tests, check code style and docs. If this works, you are ready to go! Otherwise install any missing packages and if it still fails, raise an issue!

=Building=

List of dependencies:

* Build tools, like <syntaxhighlight lang="bash" inline> cmake >= 2.8.12</syntaxhighlight>, <syntaxhighlight lang="bash" inline> g++ >= 4.8</syntaxhighlight>, <syntaxhighlight lang="bash" inline>make</syntaxhighlight>
* [http://www.boost.org/ Boost]
* <syntaxhighlight lang="bash" inline> doxygen >= 1.8.8 </syntaxhighlight> and Graphviz
* for drawing [http://www.sfml-dev.org/ SMFL library version 2]
* for IO [https://www.hdfgroup.org/ HDF5 library]

Out of source builds are preferred. Run
<syntaxhighlight lang="bash">
mkdir -p build
cd build
cmake ..
make
</syntaxhighlight>
Note that you only have to run <syntaxhighlight lang="bash" inline> cmake </syntaxhighlight> once, after that only <syntaxhighlight lang="bash" inline> make</syntaxhighlight> is sufficient.

Binaries are placed into <syntaxhighlight lang="bash" inline> bin </syntaxhighlight> folder. Test can be run all at once via <syntaxhighlight lang="bash" inline> make
run_all_tests </syntaxhighlight> or individually via eg. <syntaxhighlight lang="bash" inline>make basisfunc_run_tests </syntaxhighlight>.

==Linear Algebra==
We use [http://eigen.tuxfamily.org/ Eigen] as our matrix library. See
[http://eigen.tuxfamily.org/dox-devel/group__QuickRefPage.html here] for use
reference and documentation. For a quick transition from Matlab see
[http://eigen.tuxfamily.org/dox/AsciiQuickReference.txt here].

==Drawing==
Some tests include drawing. We are using [http://www.sfml-dev.org/ SMFL library], which can be installed on most linux systems easly
as <syntaxhighlight lang="bash" inline> sudo apt-get install libsfml-dev </syntaxhighlight> or <syntaxhighlight lang="bash" inline> sudo pacman -S sfml </syntaxhighlight>. After the
installation uncomment a test case in <syntaxhighlight lang="bash" inline> domain_draw_test.cpp </syntaxhighlight> and run <syntaxhighlight lang="bash" inline> make
test_domain_draw </syntaxhighlight> to see the visual effect.

Binaries using SFML require additional linker flags <syntaxhighlight lang="bash" inline> -lsfml-graphics
-lsfml-window -lsfml-system </syntaxhighlight>, but the makefile should take care of that for you.

==HDF5==

In order to use IO you need [https://www.hdfgroup.org/ hdf5 library].
You can install it easily using the command <syntaxhighlight lang="bash" inline> sudo apt-get install libhdf5-
dev </syntaxhighlight> or <syntaxhighlight lang="bash" inline> sudo pacman -S hdf5-cpp-fortran </syntaxhighlight>.

Ubuntu places hdf5 libs in a werid folder <code>/usr/lib/x86_64-linux-gnu/hdf5/serial/</code>.
If you get error similar to <code>-lhdf5 not found</code> and you have hdf5 installed, you might have to link the libraries into
a discoverable place, like <code>/usr/lib/</code> or add the above directory to the linker path.
If using cmake, you can add the following line to your CMakeLists.txt
link_directories(/usr/lib/x86_64-linux-gnu/hdf5/serial/)

== Using Intel Math Kernel Library (MKL) ==
Eigen has great support for MKL all you have to do is define a EIGEN_USE_MKL_ALL macro before any includes.
You can see further instructions [https://eigen.tuxfamily.org/dox/TopicUsingIntelMKL.html on their website].

Besides setting <syntaxhighlight lang="c++" inline>#define EIGEN_USE_MKL_ALL</syntaxhighlight> in your code,
some linker and compilation fixes are needed. You have to set MKL and MKLROOT variables in cmake. You can define
the variable MKLROOT as a system variable (using export) which is enough. You can also define it manually when calling
cmake. If it is not set in either way it will default to "/opt/intel/compilers_and_libraries_2017.2.174/linux/mkl".
<syntaxhighlight lang="bash">
cmake .. -DMKL=ON -DMKLROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl
</syntaxhighlight>
Your target has to be linked with some MKL libraries so make sure to add the following link to your cmake file.
<syntaxhighlight lang="cmake">
target_link_libraries(target ${LMKL})
</syntaxhighlight>

== Building on Mac OSX ==
This method was tested on El Capitan. Linking the OpenMP library is still not functioning as intended.

First install all dependencies from homebrew
<syntaxhighlight lang="bash">
brew install llvm cmake homebrew/science/hdf5 SFML
</syntaxhighlight>
Now you can clone and build the project with CLang using the following commands
<syntaxhighlight lang="bash">
git clone https://gitlab.com/e62Lab/e62numcodes.git
cd e62numcodes
mkdir build
cd build
cmake .. -DCMAKE_C_COMPILER=//clang -DCMAKE_CXX_COMPILER=/clang++
make domain_run_tests
</syntaxhighlight>

== Using Intel C/C++ Compiler ==

In order to use Intel's compiler you have to first set the <syntaxhighlight lang="bash" inline>CXX</syntaxhighlight>
and <syntaxhighlight lang="bash" inline>CC</syntaxhighlight> bash variables. Before calling
<syntaxhighlight lang="bash" inline> cmake </syntaxhighlight> for the first time you have to export the following:

<syntaxhighlight lang="bash">
export CXX="icpc"
export CC="icc"
</syntaxhighlight>

or you can define the compiler when first calling cmake like so:
<syntaxhighlight lang="bash">
cmake .. -DCMAKE_C_COMPILER=$(which icc) -DCMAKE_CXX_COMPILER=$(which icpc)
</syntaxhighlight>

You can also compile it directly for Intel® Xeon Phi™ Coprocessor. You do this by adding <syntaxhighlight lang="bash" inline>-Dmmic=ON</syntaxhighlight>
flag to the <syntaxhighlight lang="bash" inline>cmake</syntaxhighlight> command:
<syntaxhighlight lang="bash">
cmake .. -Dmmic=ON -DCMAKE_C_COMPILER=$(which icc) -DCMAKE_CXX_COMPILER=$(which icpc)
</syntaxhighlight>

Note: All features that depend on system third-party libraries are not available on MIC (Many Integrated Core).
This includes:

* [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classmm_1_1HDF5IO.html HDF5IO] class in <syntaxhighlight lang="bash" inline>io.hpp</syntaxhighlight>
* [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classmm_1_1Monitor.html Monitor] class in <syntaxhighlight lang="bash" inline>util.hpp</syntaxhighlight>
* all of [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/util_8hpp_source.html <syntaxhighlight lang="bash" inline>draw.hpp</syntaxhighlight>]

File:Vec novec size timed log normed.png

2017-03-30T11:08:18Z

Mkolman: Mkolman uploaded a new version of File:Vec novec size timed log normed.png

File uploaded with MsUpload

Execution on Intel® Xeon Phi™ co-processor

2017-03-30T10:57:47Z

Mkolman: /* Speedup by vectorization and parallelization */

== Speedup by parallelization ==

We tested the speedups on the Intel® Xeon Phi™ with the following code:
<syntaxhighlight lang="c++" line>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <omp.h>
#include <math.h>

int main(int argc, char *argv[]) {
int numthreads;
int n;

assert(argc == 3 && "args: numthreads n");
sscanf(argv[1], "%d", &numthreads);
sscanf(argv[2], "%d", &n);

printf("Init...\n");
printf("Start (%d threads)...\n", numthreads);
printf("%d test cases\n", n);

int m = 1000000;
double ttime = omp_get_wtime();

int i;
double d = 0;
#pragma offload target(mic:0)
{
#pragma omp parallel for private (i) schedule(static) num_threads(numthreads)
for(i = 0; i < n; ++i) {
for(int j = 0; j < m; ++j) {
d = sin(d) + 0.1 + j;
d = pow(0.2, d)*j;
}
}
}
double time = omp_get_wtime() - ttime;
fprintf(stderr, "%d %d %.6f\n", n, numthreads, time);
printf("time: %.6f s\n", time);
printf("Done d = %.6lf.\n", d);

return 0;
}
</syntaxhighlight>

The code essentially distributes a problem of size $n\cdot m$ among <syntaxhighlight lang="bash" inline>numthreads</syntaxhighlight> cores,
We tested the time of execution for $n$ from the set $\{1, 10, 20, 50, 100, 200, 500, 1000\}$
and <syntaxhighlight lang="bash" inline> numthreads </syntaxhighlight> from $1$ to $350$. The plots of exectuion times and performance speeups are shown below.

<figure id="times">
[[File:times.png|thumb|center|upright=3|alt=A square of nodes coloured according to the solution(with smaller and larger node density)|<caption>A picture of our solution (with smaller and larger node density)</caption>]]
</figure>

 
<figure id="fig:speedups">
[[File:speedups.png|thumb|center|upright=3|alt=A square of nodes coloured according to the solution(with smaller and larger node density)|<caption>A picture of our solution (with smaller and larger node density)</caption>]]
</figure>

 
The code was compiled using: 
<syntaxhighlight lang="bash" inline> icc -openmp -O3 -qopt-report=2 -qopt-report-phase=vec -o test test.cpp</syntaxhighlight> 
without warnings or errors. Then, in order to offload to Intel Phi, user must be logged in as root: 
<syntaxhighlight lang="bash" inline> sudo su </syntaxhighlight> 
To run correctly, intel compiler and runtime variables must be sourced: 
<syntaxhighlight lang="bash" inline> source /opt/intel/bin/compilervars.sh intel64</syntaxhighlight> 
Finally, the code was tested using the following command, where <syntaxhighlight lang="bash" inline> test </syntaxhighlight> is the name of the compiled executable: 
<syntaxhighlight lang="bash" inline> for n in 1 10 20 50 100 200 500 1000; do for nt in {1..350}; echo $nt $n; ./test $nt $n 2>> speedups.txt; done; done</syntaxhighlight>

== Speedup by vectorization ==

Intel Xeon Phi has a 512 bit of space for simultaneous computation, which means it can calculate 8 double (or 16 single) operations at the same time. This is called vectorization and greatly improves code execution.

Consider the following code of speedtest.cpp:
<syntaxhighlight lang="c++" line>
#include <cmath>
#include <iostream>

int main() {
const int N = 104;
double a[N];
for (int i = 0; i < 1e5; i++)
for (int j = 0; j < N; j++)
a[j] = std::sin(std::exp(a[j]-j)*3 * i + i*j);
std::cout << a[4] << "\n";
return 0;
}
</syntaxhighlight>
Intel's C++ compiler ICPC will successfully vectorize the inner for loop, so that it will run significantly faster than with vectorization disabled.

The code can be compiled with or without vectorization
<syntaxhighlight lang="bash">
$ icpc speedtest.cpp -o vectorized_speedtest -O3
$ icpc speedtest.cpp -o unvectorized_speedtest -O3 -no-vec
</syntaxhighlight>

The below table shows execution times of the code displayed before on different machines with different settings. Two times represent execution time with double and float data type, respectively.

{| class="wikitable"
|-
! Machine
| ASUS ZenBook Pro UX501VW
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon Phi™ Coprocessor SE10/7120
| Intel® Xeon Phi™ Coprocessor SE10/7120
|-
! Compiler
| g++-6.3.1
| g++-4.8.5
| icpc-16.0.1
| icpc-16.0.1 -no-vec
| icpc-16.0.1 -mmic
| icpc-16.0.1 -mmic -no-vec
|-
! Double time[s]
| 0.63 - 0.66
| 0.65 - 0.66
| 0.155 - 0.160
| 0.50 - 0.51
| 0.25 - 0.26
| 11.1 - 11.2
|-
! Float time[s]
| 0.65 - 0.71
| 0.53 - 0.55
| 0.155 - 0.160
| 0.17 - 0.19
| 0.37 - 0.38
| 4.2 - 4.3
|}

We can see massive 44 fold speedup with and without vectorization.

=== Code incapable of vectorization ===

On the other hand there is a very similar code that can not be vectorized. Now all iterations of the inner loop access the same variable instead of each its own element in a list. ICPC is now unable to vectorize the code resulting in no difference when using -no-vec compile flag.

<syntaxhighlight lang="c++" line>
#include <cmath>
#include <iostream>

int main() {
const int N = 104;
double a;
for (int i = 0; i < 1e5; i++)
for (int j = 0; j < N; j++)
a = std::sin(std::exp(a-j)*3 * i + i*j);
std::cout << a << "\n";
return 0;
}
</syntaxhighlight>

{| class="wikitable"
|-
! Machine
| ASUS ZenBook Pro UX501VW
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon Phi™ Coprocessor SE10/7120
| Intel® Xeon Phi™ Coprocessor SE10/7120
|-
! Compiler
| g++-6.3.1
| g++-4.8.5
| icpc-16.0.1
| icpc-16.0.1 -no-vec
| icpc-16.0.1 -mmic
| icpc-16.0.1 -mmic -no-vec
|-
! Double time[s]
| 0.80 - 0.82
| 0.72 - 0.73
| 0.58 - 0.59
| 0.58 - 0.59
| 10.9 - 11.0
| 10.9 - 11.0
|-
! Float time[s]
| 0.69 - 0.72
| 0.66 - 0.67
| 0.32 - 0.33
| 0.32 - 0.34
| 4.1 - 4.2
| 4.1 - 4.2
|}

== Speedup by vectorization and parallelization ==

Consider the following code:
<syntaxhighlight lang="c++" line>#include <stdio.h>
int main(){
double *a, *b, *c;
int i,j,k, ok, n=1000; // or n=10000
// allocated memory on the heap aligned to 64 byte boundary
ok = posix_memalign((void**)&a, 64, n*n*sizeof(double));
ok = posix_memalign((void**)&b, 64, n*n*sizeof(double));
ok = posix_memalign((void**)&c, 64, n*n*sizeof(double));
// initialize matrices
for (i = 0; i < n*n; ++i) {
a[i] = 1;
b[i] = 1;
c[i] = 1;
}
//parallelize via OpenMP on MIC
#pragma omp parallel for
for( i = 0; i < n; i++ ) {
for( k = 0; k < n; k++ ) {
#pragma vector aligned
#pragma ivdep
for( j = 0; j < n; j++ ) {
//c[i][j] = c[i][j] + a[i][k]*b[k][j];
c[i*n+j] = c[i*n+j] + a[i*n+k]*b[k*n+j];
}
}
}
printf("%f\n", c[n]);
}
</syntaxhighlight>

It is designed to be parallelizable and vectorizable. But with smaller systems (N=1000) we lose the benefits of vectorization with larger number of threads used.

However even the larger test case (N=10000) is barely faster than execution on the host processor. With 61 threads and vectorization it runs for a flat minute,
where the host with 24 threads needs 1 minute and 15 seconds.

It is interesting to note that so-called "real time", i.e. the total processor time used, behaves differently for vectorized and nonvectorized code. For $N=10^4$
nonvectorized code constantly uses 133 minutes for completion regardless of thread number, but vectorized code goes from 33 minutes with one thread to an hour
of total processing time with 61 threads. Similarly with $N=10^3$ and 61 threads nonvectorized code uses 125% of it beginning processing time, where for vectorized
that figure is 600%.

<figure id="times">
[[File:vec_novec_timed.png|thumb|center|upright=3|alt=Times used|<caption>Times used for the sample problem with and without vectorization using different number threads for two different N.</caption>]]
</figure>

Below is a scan over different sizes $N$. Two parallelization regimes were used; Number of threads equal to the physical number of cores, and twice as many threads
to take advantage of hyperthreading. All modes of execution show the same basic shape. All are inefficient for small $N$ but approach the complexity $N^3$ for large $N$.

<figure id="times">
[[File:vec_novec_size_timed_log.png|thumb|center|upright=3|alt=Times used|<caption>Times used for a single operation with and without vectorization for different N.</caption>]]
</figure>

Below we take a closer look at the time used for a single operation for larger $N$.

<figure id="times">
[[File:vec_novec_size_timed_log_normed.png|thumb|center|upright=3|alt=Times used|<caption>Times used for the sample problem with and without vectorization for different N.</caption>]]
</figure>

File:Vec novec size timed log normed.png

2017-03-30T10:49:27Z

Mkolman: File uploaded with MsUpload

File uploaded with MsUpload

File:Vec novec timed.png

2017-03-30T10:44:39Z

Mkolman: Mkolman uploaded a new version of File:Vec novec timed.png

File uploaded with MsUpload

File:Vec novec size timed log.png

2017-03-30T10:41:33Z

Mkolman: File uploaded with MsUpload

File uploaded with MsUpload

File:Vec novec timed.png

2017-03-28T18:47:23Z

Mkolman: Mkolman uploaded a new version of File:Vec novec timed.png

File uploaded with MsUpload

Execution on Intel® Xeon Phi™ co-processor

2017-03-28T13:56:57Z

Mkolman: /* Speedup by vectorization and parallelization */

Execution on Intel® Xeon Phi™ co-processor

2017-03-28T09:40:46Z

Mkolman: /* Speedup by vectorization and parallelization */

== Speedup by parallelization ==

We tested the speedups on the Intel® Xeon Phi™ with the following code:
<syntaxhighlight lang="c++" line>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <omp.h>
#include <math.h>

int main(int argc, char *argv[]) {
int numthreads;
int n;

assert(argc == 3 && "args: numthreads n");
sscanf(argv[1], "%d", &numthreads);
sscanf(argv[2], "%d", &n);

printf("Init...\n");
printf("Start (%d threads)...\n", numthreads);
printf("%d test cases\n", n);

int m = 1000000;
double ttime = omp_get_wtime();

int i;
double d = 0;
#pragma offload target(mic:0)
{
#pragma omp parallel for private (i) schedule(static) num_threads(numthreads)
for(i = 0; i < n; ++i) {
for(int j = 0; j < m; ++j) {
d = sin(d) + 0.1 + j;
d = pow(0.2, d)*j;
}
}
}
double time = omp_get_wtime() - ttime;
fprintf(stderr, "%d %d %.6f\n", n, numthreads, time);
printf("time: %.6f s\n", time);
printf("Done d = %.6lf.\n", d);

return 0;
}
</syntaxhighlight>

The code essentially distributes a problem of size $n\cdot m$ among <syntaxhighlight lang="bash" inline>numthreads</syntaxhighlight> cores,
We tested the time of execution for $n$ from the set $\{1, 10, 20, 50, 100, 200, 500, 1000\}$
and <syntaxhighlight lang="bash" inline> numthreads </syntaxhighlight> from $1$ to $350$. The plots of exectuion times and performance speeups are shown below.

<figure id="times">
[[File:times.png|thumb|center|upright=3|alt=A square of nodes coloured according to the solution(with smaller and larger node density)|<caption>A picture of our solution (with smaller and larger node density)</caption>]]
</figure>

 
<figure id="fig:speedups">
[[File:speedups.png|thumb|center|upright=3|alt=A square of nodes coloured according to the solution(with smaller and larger node density)|<caption>A picture of our solution (with smaller and larger node density)</caption>]]
</figure>

 
The code was compiled using: 
<syntaxhighlight lang="bash" inline> icc -openmp -O3 -qopt-report=2 -qopt-report-phase=vec -o test test.cpp</syntaxhighlight> 
without warnings or errors. Then, in order to offload to Intel Phi, user must be logged in as root: 
<syntaxhighlight lang="bash" inline> sudo su </syntaxhighlight> 
To run correctly, intel compiler and runtime variables must be sourced: 
<syntaxhighlight lang="bash" inline> source /opt/intel/bin/compilervars.sh intel64</syntaxhighlight> 
Finally, the code was tested using the following command, where <syntaxhighlight lang="bash" inline> test </syntaxhighlight> is the name of the compiled executable: 
<syntaxhighlight lang="bash" inline> for n in 1 10 20 50 100 200 500 1000; do for nt in {1..350}; echo $nt $n; ./test $nt $n 2>> speedups.txt; done; done</syntaxhighlight>

== Speedup by vectorization ==

Intel Xeon Phi has a 512 bit of space for simultaneous computation, which means it can calculate 8 double (or 16 single) operations at the same time. This is called vectorization and greatly improves code execution.

Consider the following code of speedtest.cpp:
<syntaxhighlight lang="c++" line>
#include <cmath>
#include <iostream>

int main() {
const int N = 104;
double a[N];
for (int i = 0; i < 1e5; i++)
for (int j = 0; j < N; j++)
a[j] = std::sin(std::exp(a[j]-j)*3 * i + i*j);
std::cout << a[4] << "\n";
return 0;
}
</syntaxhighlight>
Intel's C++ compiler ICPC will successfully vectorize the inner for loop, so that it will run significantly faster than with vectorization disabled.

The code can be compiled with or without vectorization
<syntaxhighlight lang="bash">
$ icpc speedtest.cpp -o vectorized_speedtest -O3
$ icpc speedtest.cpp -o unvectorized_speedtest -O3 -no-vec
</syntaxhighlight>

The below table shows execution times of the code displayed before on different machines with different settings. Two times represent execution time with double and float data type, respectively.

{| class="wikitable"
|-
! Machine
| ASUS ZenBook Pro UX501VW
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon Phi™ Coprocessor SE10/7120
| Intel® Xeon Phi™ Coprocessor SE10/7120
|-
! Compiler
| g++-6.3.1
| g++-4.8.5
| icpc-16.0.1
| icpc-16.0.1 -no-vec
| icpc-16.0.1 -mmic
| icpc-16.0.1 -mmic -no-vec
|-
! Double time[s]
| 0.63 - 0.66
| 0.65 - 0.66
| 0.155 - 0.160
| 0.50 - 0.51
| 0.25 - 0.26
| 11.1 - 11.2
|-
! Float time[s]
| 0.65 - 0.71
| 0.53 - 0.55
| 0.155 - 0.160
| 0.17 - 0.19
| 0.37 - 0.38
| 4.2 - 4.3
|}

We can see massive 44 fold speedup with and without vectorization.

=== Code incapable of vectorization ===

On the other hand there is a very similar code that can not be vectorized. Now all iterations of the inner loop access the same variable instead of each its own element in a list. ICPC is now unable to vectorize the code resulting in no difference when using -no-vec compile flag.

<syntaxhighlight lang="c++" line>
#include <cmath>
#include <iostream>

int main() {
const int N = 104;
double a;
for (int i = 0; i < 1e5; i++)
for (int j = 0; j < N; j++)
a = std::sin(std::exp(a-j)*3 * i + i*j);
std::cout << a << "\n";
return 0;
}
</syntaxhighlight>

{| class="wikitable"
|-
! Machine
| ASUS ZenBook Pro UX501VW
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon Phi™ Coprocessor SE10/7120
| Intel® Xeon Phi™ Coprocessor SE10/7120
|-
! Compiler
| g++-6.3.1
| g++-4.8.5
| icpc-16.0.1
| icpc-16.0.1 -no-vec
| icpc-16.0.1 -mmic
| icpc-16.0.1 -mmic -no-vec
|-
! Double time[s]
| 0.80 - 0.82
| 0.72 - 0.73
| 0.58 - 0.59
| 0.58 - 0.59
| 10.9 - 11.0
| 10.9 - 11.0
|-
! Float time[s]
| 0.69 - 0.72
| 0.66 - 0.67
| 0.32 - 0.33
| 0.32 - 0.34
| 4.1 - 4.2
| 4.1 - 4.2
|}

== Speedup by vectorization and parallelization ==

Consider the following code:
<syntaxhighlight lang="c++" line>#include <stdio.h>
int main(){
double *a, *b, *c;
int i,j,k, ok, n=1000; // or n=10000
// allocated memory on the heap aligned to 64 byte boundary
ok = posix_memalign((void**)&a, 64, n*n*sizeof(double));
ok = posix_memalign((void**)&b, 64, n*n*sizeof(double));
ok = posix_memalign((void**)&c, 64, n*n*sizeof(double));
// initialize matrices
for (i = 0; i < n*n; ++i) {
a[i] = 1;
b[i] = 1;
c[i] = 1;
}
//parallelize via OpenMP on MIC
#pragma omp parallel for
for( i = 0; i < n; i++ ) {
for( k = 0; k < n; k++ ) {
#pragma vector aligned
#pragma ivdep
for( j = 0; j < n; j++ ) {
//c[i][j] = c[i][j] + a[i][k]*b[k][j];
c[i*n+j] = c[i*n+j] + a[i*n+k]*b[k*n+j];
}
}
}
printf("%f\n", c[n]);
}
</syntaxhighlight>

<figure id="times">
[[File:vec_novec_timed.png|thumb|center|upright=3|alt=Times used|<caption>Times used for the sample problem with and without vectorization for two different N.</caption>]]
</figure>

Execution on Intel® Xeon Phi™ co-processor

2017-03-28T09:38:53Z

Mkolman: /* Speedup by vectorization and parallelization */

== Speedup by parallelization ==

We tested the speedups on the Intel® Xeon Phi™ with the following code:
<syntaxhighlight lang="c++" line>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <omp.h>
#include <math.h>

int main(int argc, char *argv[]) {
int numthreads;
int n;

assert(argc == 3 && "args: numthreads n");
sscanf(argv[1], "%d", &numthreads);
sscanf(argv[2], "%d", &n);

printf("Init...\n");
printf("Start (%d threads)...\n", numthreads);
printf("%d test cases\n", n);

int m = 1000000;
double ttime = omp_get_wtime();

int i;
double d = 0;
#pragma offload target(mic:0)
{
#pragma omp parallel for private (i) schedule(static) num_threads(numthreads)
for(i = 0; i < n; ++i) {
for(int j = 0; j < m; ++j) {
d = sin(d) + 0.1 + j;
d = pow(0.2, d)*j;
}
}
}
double time = omp_get_wtime() - ttime;
fprintf(stderr, "%d %d %.6f\n", n, numthreads, time);
printf("time: %.6f s\n", time);
printf("Done d = %.6lf.\n", d);

return 0;
}
</syntaxhighlight>

The code essentially distributes a problem of size $n\cdot m$ among <syntaxhighlight lang="bash" inline>numthreads</syntaxhighlight> cores,
We tested the time of execution for $n$ from the set $\{1, 10, 20, 50, 100, 200, 500, 1000\}$
and <syntaxhighlight lang="bash" inline> numthreads </syntaxhighlight> from $1$ to $350$. The plots of exectuion times and performance speeups are shown below.

<figure id="times">
[[File:times.png|thumb|center|upright=3|alt=A square of nodes coloured according to the solution(with smaller and larger node density)|<caption>A picture of our solution (with smaller and larger node density)</caption>]]
</figure>

 
<figure id="fig:speedups">
[[File:speedups.png|thumb|center|upright=3|alt=A square of nodes coloured according to the solution(with smaller and larger node density)|<caption>A picture of our solution (with smaller and larger node density)</caption>]]
</figure>

 
The code was compiled using: 
<syntaxhighlight lang="bash" inline> icc -openmp -O3 -qopt-report=2 -qopt-report-phase=vec -o test test.cpp</syntaxhighlight> 
without warnings or errors. Then, in order to offload to Intel Phi, user must be logged in as root: 
<syntaxhighlight lang="bash" inline> sudo su </syntaxhighlight> 
To run correctly, intel compiler and runtime variables must be sourced: 
<syntaxhighlight lang="bash" inline> source /opt/intel/bin/compilervars.sh intel64</syntaxhighlight> 
Finally, the code was tested using the following command, where <syntaxhighlight lang="bash" inline> test </syntaxhighlight> is the name of the compiled executable: 
<syntaxhighlight lang="bash" inline> for n in 1 10 20 50 100 200 500 1000; do for nt in {1..350}; echo $nt $n; ./test $nt $n 2>> speedups.txt; done; done</syntaxhighlight>

== Speedup by vectorization ==

Intel Xeon Phi has a 512 bit of space for simultaneous computation, which means it can calculate 8 double (or 16 single) operations at the same time. This is called vectorization and greatly improves code execution.

Consider the following code of speedtest.cpp:
<syntaxhighlight lang="c++" line>
#include <cmath>
#include <iostream>

int main() {
const int N = 104;
double a[N];
for (int i = 0; i < 1e5; i++)
for (int j = 0; j < N; j++)
a[j] = std::sin(std::exp(a[j]-j)*3 * i + i*j);
std::cout << a[4] << "\n";
return 0;
}
</syntaxhighlight>
Intel's C++ compiler ICPC will successfully vectorize the inner for loop, so that it will run significantly faster than with vectorization disabled.

The code can be compiled with or without vectorization
<syntaxhighlight lang="bash">
$ icpc speedtest.cpp -o vectorized_speedtest -O3
$ icpc speedtest.cpp -o unvectorized_speedtest -O3 -no-vec
</syntaxhighlight>

The below table shows execution times of the code displayed before on different machines with different settings. Two times represent execution time with double and float data type, respectively.

{| class="wikitable"
|-
! Machine
| ASUS ZenBook Pro UX501VW
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon Phi™ Coprocessor SE10/7120
| Intel® Xeon Phi™ Coprocessor SE10/7120
|-
! Compiler
| g++-6.3.1
| g++-4.8.5
| icpc-16.0.1
| icpc-16.0.1 -no-vec
| icpc-16.0.1 -mmic
| icpc-16.0.1 -mmic -no-vec
|-
! Double time[s]
| 0.63 - 0.66
| 0.65 - 0.66
| 0.155 - 0.160
| 0.50 - 0.51
| 0.25 - 0.26
| 11.1 - 11.2
|-
! Float time[s]
| 0.65 - 0.71
| 0.53 - 0.55
| 0.155 - 0.160
| 0.17 - 0.19
| 0.37 - 0.38
| 4.2 - 4.3
|}

We can see massive 44 fold speedup with and without vectorization.

=== Code incapable of vectorization ===

On the other hand there is a very similar code that can not be vectorized. Now all iterations of the inner loop access the same variable instead of each its own element in a list. ICPC is now unable to vectorize the code resulting in no difference when using -no-vec compile flag.

<syntaxhighlight lang="c++" line>
#include <cmath>
#include <iostream>

int main() {
const int N = 104;
double a;
for (int i = 0; i < 1e5; i++)
for (int j = 0; j < N; j++)
a = std::sin(std::exp(a-j)*3 * i + i*j);
std::cout << a << "\n";
return 0;
}
</syntaxhighlight>

{| class="wikitable"
|-
! Machine
| ASUS ZenBook Pro UX501VW
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon® CPU E5-2620 v3
| Intel® Xeon Phi™ Coprocessor SE10/7120
| Intel® Xeon Phi™ Coprocessor SE10/7120
|-
! Compiler
| g++-6.3.1
| g++-4.8.5
| icpc-16.0.1
| icpc-16.0.1 -no-vec
| icpc-16.0.1 -mmic
| icpc-16.0.1 -mmic -no-vec
|-
! Double time[s]
| 0.80 - 0.82
| 0.72 - 0.73
| 0.58 - 0.59
| 0.58 - 0.59
| 10.9 - 11.0
| 10.9 - 11.0
|-
! Float time[s]
| 0.69 - 0.72
| 0.66 - 0.67
| 0.32 - 0.33
| 0.32 - 0.34
| 4.1 - 4.2
| 4.1 - 4.2
|}

== Speedup by vectorization and parallelization ==

Consider the following code:
<syntaxhighlight lang="c++" line>#include <stdio.h>
int main(){
double *a, *b, *c;
int i,j,k, ok, n=1000; // or n=10000
// allocated memory on the heap aligned to 64 byte boundary
ok = posix_memalign((void**)&a, 64, n*n*sizeof(double));
ok = posix_memalign((void**)&b, 64, n*n*sizeof(double));
ok = posix_memalign((void**)&c, 64, n*n*sizeof(double));
// initialize matrices
for (i = 0; i < n*n; ++i) {
a[i] = 1;
b[i] = 1;
c[i] = 1;
}
//parallelize via OpenMP on MIC
#pragma omp parallel for
for( i = 0; i < n; i++ ) {
for( k = 0; k < n; k++ ) {
#pragma vector aligned
#pragma ivdep
for( j = 0; j < n; j++ ) {
//c[i][j] = c[i][j] + a[i][k]*b[k][j];
c[i*n+j] = c[i*n+j] + a[i*n+k]*b[k*n+j];
}
}
}
printf("%f\n", c[n]);
}
</syntaxhighlight>

<figure id="times">
[[File:vec_novec_timed.png|thumb|center|upright=3|alt=Times used for the sample problem with and without vectorization for two different N.</caption>]]
</figure>

Execution on Intel® Xeon Phi™ co-processor

2017-03-28T09:35:35Z

Mkolman: /* Speedup by vectorization */

File:Vec novec timed.png

2017-03-28T09:35:13Z

Mkolman: File uploaded with MsUpload

File uploaded with MsUpload

How to build

2017-03-09T12:16:08Z

Mkolman: /* Intel C/C++ Compiler */

=Installation=
To make this work from plain Ubuntu installation, run
<syntaxhighlight lang="bash">
sudo apt-get install cmake doxygen graphviz libboost-dev libsfml-dev libhdf5-serial-dev
git clone https://gitlab.com/e62Lab/e62numcodes.git
cd e62numcodes
./run_tests.sh
</syntaxhighlight>
which installs dependancies, clones the repository, goes into the root folder of the repository and runs tests. This will check the configuration, notify you of potentially missing dependencies, build and run all tests, check code style and docs. If this works, you are ready to go! Otherwise install any missing packages and if it still fails, raise an issue!

=Building=

List of dependencies:

* Build tools, like <syntaxhighlight lang="bash" inline> cmake >= 2.8.12</syntaxhighlight>, <syntaxhighlight lang="bash" inline> g++ >= 4.8</syntaxhighlight>, <syntaxhighlight lang="bash" inline>make</syntaxhighlight>
* [http://www.boost.org/ Boost]
* <syntaxhighlight lang="bash" inline> doxygen >= 1.8.8 </syntaxhighlight> and Graphviz
* for drawing [http://www.sfml-dev.org/ SMFL library version 2]
* for IO [https://www.hdfgroup.org/ HDF5 library]

Out of source builds are preferred. Run
<syntaxhighlight lang="bash">
mkdir -p build
cd build
cmake ..
make
</syntaxhighlight>
Note that you only have to run <syntaxhighlight lang="bash" inline> cmake </syntaxhighlight> once, after that only <syntaxhighlight lang="bash" inline> make</syntaxhighlight> is sufficient.

Binaries are placed into <syntaxhighlight lang="bash" inline> bin </syntaxhighlight> folder. Test can be run all at once via <syntaxhighlight lang="bash" inline> make
run_all_tests </syntaxhighlight> or individually via eg. <syntaxhighlight lang="bash" inline>make basisfunc_run_tests </syntaxhighlight>.

==Linear Algebra==
We use [http://eigen.tuxfamily.org/ Eigen] as our matrix library. See
[http://eigen.tuxfamily.org/dox-devel/group__QuickRefPage.html here] for use
reference and documentation. For a quick transition from Matlab see
[http://eigen.tuxfamily.org/dox/AsciiQuickReference.txt here].

==Drawing==
Some tests include drawing. We are using [http://www.sfml-dev.org/ SMFL library], which can be installed on most linux systems easly
as <syntaxhighlight lang="bash" inline> sudo apt-get install libsfml-dev </syntaxhighlight> or <syntaxhighlight lang="bash" inline> sudo pacman -S sfml </syntaxhighlight>. After the
installation uncomment a test case in <syntaxhighlight lang="bash" inline> domain_draw_test.cpp </syntaxhighlight> and run <syntaxhighlight lang="bash" inline> make
test_domain_draw </syntaxhighlight> to see the visual effect.

Binaries using SFML require additional linker flags <syntaxhighlight lang="bash" inline> -lsfml-graphics
-lsfml-window -lsfml-system </syntaxhighlight>, but the makefile should take care of that for you.

==HDF5==

In order to use IO you need [https://www.hdfgroup.org/ hdf5 library].
You can install it easily using the command <syntaxhighlight lang="bash" inline> sudo apt-get install libhdf5-
dev </syntaxhighlight> or <syntaxhighlight lang="bash" inline> sudo pacman -S hdf5-cpp-fortran </syntaxhighlight>.

Ubuntu places hdf5 libs in a werid folder <code>/usr/lib/x86_64-linux-gnu/hdf5/serial/</code>.
If you get error similar to <code>-lhdf5 not found</code> and you have hdf5 installed, you might have to link the libraries into
a discoverable place, like <code>/usr/lib/</code> or add the above directory to the linker path.
If using cmake, you can add the following line to your CMakeLists.txt
link_directories(/usr/lib/x86_64-linux-gnu/hdf5/serial/)

== Using Intel Math Kernel Library (MKL) ==
Eigen has great support for MKL all you have to do is define a EIGEN_USE_MKL_ALL macro before any includes.
You can see further instructions [https://eigen.tuxfamily.org/dox/TopicUsingIntelMKL.html on their website].

Besides setting <syntaxhighlight lang="c++" inline>#define EIGEN_USE_MKL_ALL</syntaxhighlight> in your code,
some linker and compilation fixes are needed. You have to set MKL and MKLROOT variables in cmake. You can define
the variable MKLROOT as a system variable (using export) which is enough. You can also define it manually when calling
cmake. If it is not set in either way it will default to "/opt/intel/compilers_and_libraries_2017.2.174/linux/mkl".
<syntaxhighlight lang="bash">
cmake .. -DMKL=ON -DMKLROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl
</syntaxhighlight>
Your target has to be linked with some MKL libraries so make sure to add the following link to your cmake file.
<syntaxhighlight lang="cmake">
target_link_libraries(target ${LMKL})
</syntaxhighlight>

== Using Intel C/C++ Compiler ==

In order to use Intel's compiler you have to first set the <syntaxhighlight lang="bash" inline>CXX</syntaxhighlight>
and <syntaxhighlight lang="bash" inline>CC</syntaxhighlight> bash variables. Before calling
<syntaxhighlight lang="bash" inline> cmake </syntaxhighlight> for the first time you have to export the following:

<syntaxhighlight lang="bash">
export CXX="icpc"
export CC="icc"
</syntaxhighlight>

or you can define the compiler when first calling cmake like so:
<syntaxhighlight lang="bash">
cmake .. -DCMAKE_C_COMPILER=$(which icc) -DCMAKE_CXX_COMPILER=$(which icpc)
</syntaxhighlight>

You can also compile it directly for Intel® Xeon Phi™ Coprocessor. You do this by adding <syntaxhighlight lang="bash" inline>-Dmmic=ON</syntaxhighlight>
flag to the <syntaxhighlight lang="bash" inline>cmake</syntaxhighlight> command:
<syntaxhighlight lang="bash">
cmake .. -Dmmic=ON -DCMAKE_C_COMPILER=$(which icc) -DCMAKE_CXX_COMPILER=$(which icpc)
</syntaxhighlight>

Note: All features that depend on system third-party libraries are not available on MIC (Many Integrated Core).
This includes:

* [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classmm_1_1HDF5IO.html HDF5IO] class in <syntaxhighlight lang="bash" inline>io.hpp</syntaxhighlight>
* [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classmm_1_1Monitor.html Monitor] class in <syntaxhighlight lang="bash" inline>util.hpp</syntaxhighlight>
* all of [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/util_8hpp_source.html <syntaxhighlight lang="bash" inline>draw.hpp</syntaxhighlight>]

How to build

2017-03-09T12:15:41Z

Mkolman: /* Intel Math Kernel Library (MKL) */

=Installation=
To make this work from plain Ubuntu installation, run
<syntaxhighlight lang="bash">
sudo apt-get install cmake doxygen graphviz libboost-dev libsfml-dev libhdf5-serial-dev
git clone https://gitlab.com/e62Lab/e62numcodes.git
cd e62numcodes
./run_tests.sh
</syntaxhighlight>
which installs dependancies, clones the repository, goes into the root folder of the repository and runs tests. This will check the configuration, notify you of potentially missing dependencies, build and run all tests, check code style and docs. If this works, you are ready to go! Otherwise install any missing packages and if it still fails, raise an issue!

=Building=

List of dependencies:

* Build tools, like <syntaxhighlight lang="bash" inline> cmake >= 2.8.12</syntaxhighlight>, <syntaxhighlight lang="bash" inline> g++ >= 4.8</syntaxhighlight>, <syntaxhighlight lang="bash" inline>make</syntaxhighlight>
* [http://www.boost.org/ Boost]
* <syntaxhighlight lang="bash" inline> doxygen >= 1.8.8 </syntaxhighlight> and Graphviz
* for drawing [http://www.sfml-dev.org/ SMFL library version 2]
* for IO [https://www.hdfgroup.org/ HDF5 library]

Out of source builds are preferred. Run
<syntaxhighlight lang="bash">
mkdir -p build
cd build
cmake ..
make
</syntaxhighlight>
Note that you only have to run <syntaxhighlight lang="bash" inline> cmake </syntaxhighlight> once, after that only <syntaxhighlight lang="bash" inline> make</syntaxhighlight> is sufficient.

Binaries are placed into <syntaxhighlight lang="bash" inline> bin </syntaxhighlight> folder. Test can be run all at once via <syntaxhighlight lang="bash" inline> make
run_all_tests </syntaxhighlight> or individually via eg. <syntaxhighlight lang="bash" inline>make basisfunc_run_tests </syntaxhighlight>.

==Linear Algebra==
We use [http://eigen.tuxfamily.org/ Eigen] as our matrix library. See
[http://eigen.tuxfamily.org/dox-devel/group__QuickRefPage.html here] for use
reference and documentation. For a quick transition from Matlab see
[http://eigen.tuxfamily.org/dox/AsciiQuickReference.txt here].

==Drawing==
Some tests include drawing. We are using [http://www.sfml-dev.org/ SMFL library], which can be installed on most linux systems easly
as <syntaxhighlight lang="bash" inline> sudo apt-get install libsfml-dev </syntaxhighlight> or <syntaxhighlight lang="bash" inline> sudo pacman -S sfml </syntaxhighlight>. After the
installation uncomment a test case in <syntaxhighlight lang="bash" inline> domain_draw_test.cpp </syntaxhighlight> and run <syntaxhighlight lang="bash" inline> make
test_domain_draw </syntaxhighlight> to see the visual effect.

Binaries using SFML require additional linker flags <syntaxhighlight lang="bash" inline> -lsfml-graphics
-lsfml-window -lsfml-system </syntaxhighlight>, but the makefile should take care of that for you.

==HDF5==

In order to use IO you need [https://www.hdfgroup.org/ hdf5 library].
You can install it easily using the command <syntaxhighlight lang="bash" inline> sudo apt-get install libhdf5-
dev </syntaxhighlight> or <syntaxhighlight lang="bash" inline> sudo pacman -S hdf5-cpp-fortran </syntaxhighlight>.

Ubuntu places hdf5 libs in a werid folder <code>/usr/lib/x86_64-linux-gnu/hdf5/serial/</code>.
If you get error similar to <code>-lhdf5 not found</code> and you have hdf5 installed, you might have to link the libraries into
a discoverable place, like <code>/usr/lib/</code> or add the above directory to the linker path.
If using cmake, you can add the following line to your CMakeLists.txt
link_directories(/usr/lib/x86_64-linux-gnu/hdf5/serial/)

== Using Intel Math Kernel Library (MKL) ==
Eigen has great support for MKL all you have to do is define a EIGEN_USE_MKL_ALL macro before any includes.
You can see further instructions [https://eigen.tuxfamily.org/dox/TopicUsingIntelMKL.html on their website].

Besides setting <syntaxhighlight lang="c++" inline>#define EIGEN_USE_MKL_ALL</syntaxhighlight> in your code,
some linker and compilation fixes are needed. You have to set MKL and MKLROOT variables in cmake. You can define
the variable MKLROOT as a system variable (using export) which is enough. You can also define it manually when calling
cmake. If it is not set in either way it will default to "/opt/intel/compilers_and_libraries_2017.2.174/linux/mkl".
<syntaxhighlight lang="bash">
cmake .. -DMKL=ON -DMKLROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl
</syntaxhighlight>
Your target has to be linked with some MKL libraries so make sure to add the following link to your cmake file.
<syntaxhighlight lang="cmake">
target_link_libraries(target ${LMKL})
</syntaxhighlight>

==Intel C/C++ Compiler==

In order to use Intel's compiler you have to first set the <syntaxhighlight lang="bash" inline>CXX</syntaxhighlight>
and <syntaxhighlight lang="bash" inline>CC</syntaxhighlight> bash variables. Before calling
<syntaxhighlight lang="bash" inline> cmake </syntaxhighlight> for the first time you have to export the following:

<syntaxhighlight lang="bash">
export CXX="icpc"
export CC="icc"
</syntaxhighlight>

or you can define the compiler when first calling cmake like so:
<syntaxhighlight lang="bash">
cmake .. -DCMAKE_C_COMPILER=$(which icc) -DCMAKE_CXX_COMPILER=$(which icpc)
</syntaxhighlight>

You can also compile it directly for Intel® Xeon Phi™ Coprocessor. You do this by adding <syntaxhighlight lang="bash" inline>-Dmmic=ON</syntaxhighlight>
flag to the <syntaxhighlight lang="bash" inline>cmake</syntaxhighlight> command:
<syntaxhighlight lang="bash">
cmake .. -Dmmic=ON -DCMAKE_C_COMPILER=$(which icc) -DCMAKE_CXX_COMPILER=$(which icpc)
</syntaxhighlight>

Note: All features that depend on system third-party libraries are not available on MIC (Many Integrated Core).
This includes:

* [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classmm_1_1HDF5IO.html HDF5IO] class in <syntaxhighlight lang="bash" inline>io.hpp</syntaxhighlight>
* [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classmm_1_1Monitor.html Monitor] class in <syntaxhighlight lang="bash" inline>util.hpp</syntaxhighlight>
* all of [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/util_8hpp_source.html <syntaxhighlight lang="bash" inline>draw.hpp</syntaxhighlight>]

How to build

2017-03-09T12:13:06Z

Mkolman: /* Intel */

=Installation=
To make this work from plain Ubuntu installation, run
<syntaxhighlight lang="bash">
sudo apt-get install cmake doxygen graphviz libboost-dev libsfml-dev libhdf5-serial-dev
git clone https://gitlab.com/e62Lab/e62numcodes.git
cd e62numcodes
./run_tests.sh
</syntaxhighlight>
which installs dependancies, clones the repository, goes into the root folder of the repository and runs tests. This will check the configuration, notify you of potentially missing dependencies, build and run all tests, check code style and docs. If this works, you are ready to go! Otherwise install any missing packages and if it still fails, raise an issue!

=Building=

List of dependencies:

* Build tools, like <syntaxhighlight lang="bash" inline> cmake >= 2.8.12</syntaxhighlight>, <syntaxhighlight lang="bash" inline> g++ >= 4.8</syntaxhighlight>, <syntaxhighlight lang="bash" inline>make</syntaxhighlight>
* [http://www.boost.org/ Boost]
* <syntaxhighlight lang="bash" inline> doxygen >= 1.8.8 </syntaxhighlight> and Graphviz
* for drawing [http://www.sfml-dev.org/ SMFL library version 2]
* for IO [https://www.hdfgroup.org/ HDF5 library]

Out of source builds are preferred. Run
<syntaxhighlight lang="bash">
mkdir -p build
cd build
cmake ..
make
</syntaxhighlight>
Note that you only have to run <syntaxhighlight lang="bash" inline> cmake </syntaxhighlight> once, after that only <syntaxhighlight lang="bash" inline> make</syntaxhighlight> is sufficient.

Binaries are placed into <syntaxhighlight lang="bash" inline> bin </syntaxhighlight> folder. Test can be run all at once via <syntaxhighlight lang="bash" inline> make
run_all_tests </syntaxhighlight> or individually via eg. <syntaxhighlight lang="bash" inline>make basisfunc_run_tests </syntaxhighlight>.

==Linear Algebra==
We use [http://eigen.tuxfamily.org/ Eigen] as our matrix library. See
[http://eigen.tuxfamily.org/dox-devel/group__QuickRefPage.html here] for use
reference and documentation. For a quick transition from Matlab see
[http://eigen.tuxfamily.org/dox/AsciiQuickReference.txt here].

==Drawing==
Some tests include drawing. We are using [http://www.sfml-dev.org/ SMFL library], which can be installed on most linux systems easly
as <syntaxhighlight lang="bash" inline> sudo apt-get install libsfml-dev </syntaxhighlight> or <syntaxhighlight lang="bash" inline> sudo pacman -S sfml </syntaxhighlight>. After the
installation uncomment a test case in <syntaxhighlight lang="bash" inline> domain_draw_test.cpp </syntaxhighlight> and run <syntaxhighlight lang="bash" inline> make
test_domain_draw </syntaxhighlight> to see the visual effect.

Binaries using SFML require additional linker flags <syntaxhighlight lang="bash" inline> -lsfml-graphics
-lsfml-window -lsfml-system </syntaxhighlight>, but the makefile should take care of that for you.

==HDF5==

In order to use IO you need [https://www.hdfgroup.org/ hdf5 library].
You can install it easily using the command <syntaxhighlight lang="bash" inline> sudo apt-get install libhdf5-
dev </syntaxhighlight> or <syntaxhighlight lang="bash" inline> sudo pacman -S hdf5-cpp-fortran </syntaxhighlight>.

Ubuntu places hdf5 libs in a werid folder <code>/usr/lib/x86_64-linux-gnu/hdf5/serial/</code>.
If you get error similar to <code>-lhdf5 not found</code> and you have hdf5 installed, you might have to link the libraries into
a discoverable place, like <code>/usr/lib/</code> or add the above directory to the linker path.
If using cmake, you can add the following line to your CMakeLists.txt
link_directories(/usr/lib/x86_64-linux-gnu/hdf5/serial/)

== Intel Math Kernel Library (MKL) ==
Eigen has great support for MKL all you have to do is define a EIGEN_USE_MKL_ALL macro before any includes.
You can see further instructions [https://eigen.tuxfamily.org/dox/TopicUsingIntelMKL.html on their website].

Besides setting <syntaxhighlight lang="c++" inline>#define EIGEN_USE_MKL_ALL</syntaxhighlight> in your code,
some linker and compilation fixes are needed. You have to set MKL and MKLROOT variables in cmake. You can define
the variable MKLROOT as a system variable (using export) which is enough. You can also define it manually when calling
cmake. If it is not set in either way it will default to "/opt/intel/compilers_and_libraries_2017.2.174/linux/mkl".
<syntaxhighlight lang="bash">
cmake .. -DMKL=ON -DMKLROOT=/opt/intel/compilers_and_libraries_2016.1.150/linux/mkl
</syntaxhighlight>
Your target has to be linked with some MKL libraries so make sure to add the following link to your cmake file.
<syntaxhighlight lang="cmake">
target_link_libraries(target ${LMKL})
</syntaxhighlight>

==Intel C/C++ Compiler==

In order to use Intel's compiler you have to first set the <syntaxhighlight lang="bash" inline>CXX</syntaxhighlight>
and <syntaxhighlight lang="bash" inline>CC</syntaxhighlight> bash variables. Before calling
<syntaxhighlight lang="bash" inline> cmake </syntaxhighlight> for the first time you have to export the following:

<syntaxhighlight lang="bash">
export CXX="icpc"
export CC="icc"
</syntaxhighlight>

or you can define the compiler when first calling cmake like so:
<syntaxhighlight lang="bash">
cmake .. -DCMAKE_C_COMPILER=$(which icc) -DCMAKE_CXX_COMPILER=$(which icpc)
</syntaxhighlight>

You can also compile it directly for Intel® Xeon Phi™ Coprocessor. You do this by adding <syntaxhighlight lang="bash" inline>-Dmmic=ON</syntaxhighlight>
flag to the <syntaxhighlight lang="bash" inline>cmake</syntaxhighlight> command:
<syntaxhighlight lang="bash">
cmake .. -Dmmic=ON -DCMAKE_C_COMPILER=$(which icc) -DCMAKE_CXX_COMPILER=$(which icpc)
</syntaxhighlight>

Note: All features that depend on system third-party libraries are not available on MIC (Many Integrated Core).
This includes:

* [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classmm_1_1HDF5IO.html HDF5IO] class in <syntaxhighlight lang="bash" inline>io.hpp</syntaxhighlight>
* [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classmm_1_1Monitor.html Monitor] class in <syntaxhighlight lang="bash" inline>util.hpp</syntaxhighlight>
* all of [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/util_8hpp_source.html <syntaxhighlight lang="bash" inline>draw.hpp</syntaxhighlight>]

How to build

2017-03-09T10:27:19Z

Mkolman: /* Intel */

=Installation=
To make this work from plain Ubuntu installation, run
<syntaxhighlight lang="bash">
sudo apt-get install cmake doxygen graphviz libboost-dev libsfml-dev libhdf5-serial-dev
git clone https://gitlab.com/e62Lab/e62numcodes.git
cd e62numcodes
./run_tests.sh
</syntaxhighlight>
which installs dependancies, clones the repository, goes into the root folder of the repository and runs tests. This will check the configuration, notify you of potentially missing dependencies, build and run all tests, check code style and docs. If this works, you are ready to go! Otherwise install any missing packages and if it still fails, raise an issue!

=Building=

List of dependencies:

* Build tools, like <syntaxhighlight lang="bash" inline> cmake >= 2.8.12</syntaxhighlight>, <syntaxhighlight lang="bash" inline> g++ >= 4.8</syntaxhighlight>, <syntaxhighlight lang="bash" inline>make</syntaxhighlight>
* [http://www.boost.org/ Boost]
* <syntaxhighlight lang="bash" inline> doxygen >= 1.8.8 </syntaxhighlight> and Graphviz
* for drawing [http://www.sfml-dev.org/ SMFL library version 2]
* for IO [https://www.hdfgroup.org/ HDF5 library]

Out of source builds are preferred. Run
<syntaxhighlight lang="bash">
mkdir -p build
cd build
cmake ..
make
</syntaxhighlight>
Note that you only have to run <syntaxhighlight lang="bash" inline> cmake </syntaxhighlight> once, after that only <syntaxhighlight lang="bash" inline> make</syntaxhighlight> is sufficient.

Binaries are placed into <syntaxhighlight lang="bash" inline> bin </syntaxhighlight> folder. Test can be run all at once via <syntaxhighlight lang="bash" inline> make
run_all_tests </syntaxhighlight> or individually via eg. <syntaxhighlight lang="bash" inline>make basisfunc_run_tests </syntaxhighlight>.

==Linear Algebra==
We use [http://eigen.tuxfamily.org/ Eigen] as our matrix library. See
[http://eigen.tuxfamily.org/dox-devel/group__QuickRefPage.html here] for use
reference and documentation. For a quick transition from Matlab see
[http://eigen.tuxfamily.org/dox/AsciiQuickReference.txt here].

==Drawing==
Some tests include drawing. We are using [http://www.sfml-dev.org/ SMFL library], which can be installed on most linux systems easly
as <syntaxhighlight lang="bash" inline> sudo apt-get install libsfml-dev </syntaxhighlight> or <syntaxhighlight lang="bash" inline> sudo pacman -S sfml </syntaxhighlight>. After the
installation uncomment a test case in <syntaxhighlight lang="bash" inline> domain_draw_test.cpp </syntaxhighlight> and run <syntaxhighlight lang="bash" inline> make
test_domain_draw </syntaxhighlight> to see the visual effect.

Binaries using SFML require additional linker flags <syntaxhighlight lang="bash" inline> -lsfml-graphics
-lsfml-window -lsfml-system </syntaxhighlight>, but the makefile should take care of that for you.

==HDF5==

In order to use IO you need [https://www.hdfgroup.org/ hdf5 library].
You can install it easily using the command <syntaxhighlight lang="bash" inline> sudo apt-get install libhdf5-
dev </syntaxhighlight> or <syntaxhighlight lang="bash" inline> sudo pacman -S hdf5-cpp-fortran </syntaxhighlight>.

Ubuntu places hdf5 libs in a werid folder <code>/usr/lib/x86_64-linux-gnu/hdf5/serial/</code>.
If you get error similar to <code>-lhdf5 not found</code> and you have hdf5 installed, you might have to link the libraries into
a discoverable place, like <code>/usr/lib/</code> or add the above directory to the linker path.
If using cmake, you can add the following line to your CMakeLists.txt
link_directories(/usr/lib/x86_64-linux-gnu/hdf5/serial/)

==Intel==

In order to use Intel's compiler you have to first set the <syntaxhighlight lang="bash" inline>CXX</syntaxhighlight>
and <syntaxhighlight lang="bash" inline>CC</syntaxhighlight> bash variables. Before calling
<syntaxhighlight lang="bash" inline> cmake </syntaxhighlight> for the first time you have to export the following:

<syntaxhighlight lang="bash">
export CXX="icpc"
export CC="icc"
</syntaxhighlight>

or you can define the compiler when first calling cmake like so:
<syntaxhighlight lang="bash">
cmake .. -DCMAKE_C_COMPILER=$(which icc) -DCMAKE_CXX_COMPILER=$(which icpc)
</syntaxhighlight>

You can also compile it directly for Intel® Xeon Phi™ Coprocessor. You do this by adding <syntaxhighlight lang="bash" inline>-Dmmic=ON</syntaxhighlight>
flag to the <syntaxhighlight lang="bash" inline>cmake</syntaxhighlight> command:
<syntaxhighlight lang="bash">
cmake .. -Dmmic=ON -DCMAKE_C_COMPILER=$(which icc) -DCMAKE_CXX_COMPILER=$(which icpc)
</syntaxhighlight>

Note: All features that depend on system third-party libraries are not available on MIC (Many Integrated Core).
This includes:

* [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classmm_1_1HDF5IO.html HDF5IO] class in <syntaxhighlight lang="bash" inline>io.hpp</syntaxhighlight>
* [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classmm_1_1Monitor.html Monitor] class in <syntaxhighlight lang="bash" inline>util.hpp</syntaxhighlight>
* all of [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/util_8hpp_source.html <syntaxhighlight lang="bash" inline>draw.hpp</syntaxhighlight>]

Execution on Intel® Xeon Phi™ co-processor

2017-03-01T17:41:24Z

Mkolman: /* Speedup by vectorization */

Execution on Intel® Xeon Phi™ co-processor

2017-03-01T17:38:44Z

Mkolman: /* Speedup by vectorization */

Execution on Intel® Xeon Phi™ co-processor

2017-03-01T17:37:58Z

Mkolman: /* Speedup by vectorization */

Execution on Intel® Xeon Phi™ co-processor

2017-03-01T17:21:11Z

Mkolman: /* Speedup by vectorization */

Execution on Intel® Xeon Phi™ co-processor

2017-03-01T16:45:45Z

Mkolman: /* Speedup by vectorization */

Execution on Intel® Xeon Phi™ co-processor

2017-03-01T16:45:14Z

Mkolman:

How to build

2016-12-28T11:27:22Z

Mkolman: /* Intel */

=Installation=
To make this work from plain Ubuntu installation, run
<syntaxhighlight lang="bash">
sudo apt-get install cmake doxygen graphviz libboost-dev libsfml-dev libhdf5-serial-dev
git clone https://gitlab.com/e62Lab/e62numcodes.git
cd e62numcodes
./run_tests.sh
</syntaxhighlight>
which installs dependancies, clones the repository, goes into the root folder of the repository and runs tests. This will check the configuration, notify you of potentially missing dependencies, build and run all tests, check code style and docs. If this works, you are ready to go! Otherwise install any missing packages and if it still fails, raise an issue!

=Building=

List of dependencies:

* Build tools, like <syntaxhighlight lang="bash" inline> cmake >= 2.8.12</syntaxhighlight>, <syntaxhighlight lang="bash" inline> g++ >= 4.8</syntaxhighlight>, <syntaxhighlight lang="bash" inline>make</syntaxhighlight>
* [http://www.boost.org/ Boost]
* <syntaxhighlight lang="bash" inline> doxygen >= 1.8.8 </syntaxhighlight> and Graphviz
* for drawing [http://www.sfml-dev.org/ SMFL library version 2]
* for IO [https://www.hdfgroup.org/ HDF5 library]

Out of source builds are preferred. Run
<syntaxhighlight lang="bash">
mkdir -p build
cd build
cmake ..
make
</syntaxhighlight>
Note that you only have to run <syntaxhighlight lang="bash" inline> cmake </syntaxhighlight> once, after that only <syntaxhighlight lang="bash" inline> make</syntaxhighlight> is sufficient.

Binaries are placed into <syntaxhighlight lang="bash" inline> bin </syntaxhighlight> folder. Test can be run all at once via <syntaxhighlight lang="bash" inline> make
run_all_tests </syntaxhighlight> or individually via eg. <syntaxhighlight lang="bash" inline>make basisfunc_run_tests </syntaxhighlight>.

==Linear Algebra==
We use [http://eigen.tuxfamily.org/ Eigen] as our matrix library. See
[http://eigen.tuxfamily.org/dox-devel/group__QuickRefPage.html here] for use
reference and documentation. For a quick transition from Matlab see
[http://eigen.tuxfamily.org/dox/AsciiQuickReference.txt here].

==Drawing==
Some tests include drawing. We are using [http://www.sfml-dev.org/ SMFL library], which can be installed on most linux systems easly
as <syntaxhighlight lang="bash" inline> sudo apt-get install libsfml-dev </syntaxhighlight> or <syntaxhighlight lang="bash" inline> sudo pacman -S sfml </syntaxhighlight>. After the
installation uncomment a test case in <syntaxhighlight lang="bash" inline> domain_draw_test.cpp </syntaxhighlight> and run <syntaxhighlight lang="bash" inline> make
test_domain_draw </syntaxhighlight> to see the visual effect.

Binaries using SFML require additional linker flags <syntaxhighlight lang="bash" inline> -lsfml-graphics
-lsfml-window -lsfml-system </syntaxhighlight>, but the makefile should take care of that for you.

==HDF5==

In order to use IO you need [https://www.hdfgroup.org/ hdf5 library].
You can install it easily using the command <syntaxhighlight lang="bash" inline> sudo apt-get install libhdf5-
dev </syntaxhighlight> or <syntaxhighlight lang="bash" inline> sudo pacman -S hdf5-cpp-fortran </syntaxhighlight>.

Ubuntu places hdf5 libs in a werid folder <code>/usr/lib/x86_64-linux-gnu/hdf5/serial/</code>.
If you get error similar to <code>-lhdf5 not found</code> and you have hdf5 installed, you might have to link the libraries into
a discoverable place, like <code>/usr/lib/</code> or add the above directory to the linker path.
If using cmake, you can add the following line to your CMakeLists.txt
link_directories(/usr/lib/x86_64-linux-gnu/hdf5/serial/)

==Intel==

In order to use Intel's compiler you have to first set the <syntaxhighlight lang="bash" inline>CXX</syntaxhighlight>
and <syntaxhighlight lang="bash" inline>CC</syntaxhighlight> bash variables. Before calling
<syntaxhighlight lang="bash" inline> cmake </syntaxhighlight> for the first time you have to export the following:

<syntaxhighlight lang="bash">
export CXX="icpc"
export CC="icc"
</syntaxhighlight>

You can also compile it directly for Intel® Xeon Phi™ Coprocessor. You do this by adding <syntaxhighlight lang="bash" inline>-Dmmic=ON</syntaxhighlight>
flag to the <syntaxhighlight lang="bash" inline>cmake</syntaxhighlight> command:
<syntaxhighlight lang="bash">
cmake .. -Dmmic=ON
</syntaxhighlight>

Note: All features that depend on system third-party libraries are not available on MIC (Many Integrated Core).
This includes:

* [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classmm_1_1HDF5IO.html HDF5IO] class in <syntaxhighlight lang="bash" inline>io.hpp</syntaxhighlight>
* [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/classmm_1_1Monitor.html Monitor] class in <syntaxhighlight lang="bash" inline>util.hpp</syntaxhighlight>
* all of [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/technical_docs/html/util_8hpp_source.html <syntaxhighlight lang="bash" inline>draw.hpp</syntaxhighlight>]

How to build

2016-12-22T16:12:22Z

Mkolman: /* Intel */

How to build

2016-12-22T15:10:08Z

Mkolman: /* Building */

Solid Mechanics

2016-11-07T13:50:55Z

Mkolman: /* Point contact on a 2D half-plane */

=Basic equations of elasticity=

To determine the distribution of static stresses and displacements in a solid body we must obtain a solution (either analytically or numerically) to the basic equations of the theory of elasticity, satisfying the boundary conditions on forces and/or displacements. The equations thus form a boundary value problem. For a general three dimensional solid object the equations governing its behavior are:
* equations of equilibrium (3)
* strain-displacement equations (6)
* stress-strain equations (6)
where the number in brackets indicates the number of equations. The equations of equilibrium are three tensor partial differential equations for the balance of linear momentum, the strain-displacement equations are relations that stem from infinitesimal strain theory and the stress-strain equations are a set of linear algebraic constitutive relations (3D Hooke's law). In two dimensions these equations simplify to 8 equations (2 equilibrium, 3 strain-displacement, and 3 stress-strain).

A large amount of confusion regarding linear elasticity originates from the many different notations used in this field. For this reason we first provide an overview of the different notations that are used.

=== Direct tensor form ===
In direct tensor form (independent of coordinate system) the governing equations are:
*Equation of motion (an expression of Newton's second law):
\[
\boldsymbol{\nabla} \cdot \boldsymbol{\sigma} + \boldsymbol{F} = \rho \ddot{\boldsymbol{u}}
\]
*Strain-displacement equations:
\[
\boldsymbol{\varepsilon} = \frac{1}{2}\left[ \boldsymbol{\nabla} \boldsymbol{u} + (\boldsymbol{\nabla} \boldsymbol{u})^T \right]
\]
*Stress-displacement equations (constitutive equations). For a linear elastic material this is Hooke's law:
\[
\boldsymbol{\sigma} = \boldsymbol{C} : \boldsymbol{\varepsilon}
\]

where $\boldsymbol{\sigma}$ is the Cauchy stress tensor, $\boldsymbol{\varepsilon}$ is the infinitesimal strain tensor, $\boldsymbol{u}$ is the displacement vector, $\boldsymbol{C}$ is the fourth order stiffness tensor, $\boldsymbol{F}$ is the body force per unit volume (a vector quantity), $\rho$ is the mass density, $\boldsymbol{\nabla}(\bullet)$ is the gradient operator, $\boldsymbol{\nabla}\cdot(\bullet)$ is the divergence operator, $(\bullet)^T$ represents a transpose, $\ddot{(\bullet)}$ is the second derivative with respect to time, and $\boldsymbol{A}:\boldsymbol{B}$ is the inner product of two second order tensors (a tensor contraction).

=== Cartesian coordinate form ===

Using the Einstein summation convention (implied summations over repeated indexes) the equations are:
*Equation of motion (an expression of Newton's second law):
\[
\sigma_{ji,j} + F_i = \rho \partial_{tt} u_i
\]
*Strain-displacement equations:
\[
\varepsilon_{ij} = \frac{1}{2}\left(u_{j,i}+u_{i,j}\right)
\]
*Stress-displacement equations (constitutive equations). For a linear elastic material this is Hooke's law:
\[
\sigma_{i,j} = C_{ijkl} \varepsilon_{kl},
\]
where $i,j = 1,2,3$ represent, respectively, $x$, $y$ and $z$, the $(\bullet),j$ subscript is a shorthand for partial derivative $\partial(\bullet)/\partial x_j$ and $\partial_{tt}$ is shorthand notation for $\partial^2/\partial t^2$, $\sigma_{ij}=\sigma_{ji}$ is the Cauchy stress tensor (with 6 independent components), $F_i$ are the body forces, $\rho$ is the mass density, $u_i$ is the displacement, $\varepsilon_{ij} = \varepsilon_{ji}$ is the strain tensor (also with 6 independent components), and, finally $C_{ijkl}$ is the fourth-order stiffness tensor that due to symmetry requirements $C_{ijkl} = C_{klij} = C_{jikl} =C_{ijkl}$ can be reduced to 21 different elements.

=== Matrix-vector (FEM) notation ===

*Equation of motion:
\[
\boldsymbol{L}^T\boldsymbol{\sigma} + \boldsymbol{F} = \rho\ddot{\boldsymbol{u}}
\]
*Strain-displacement equations:
\[
\boldsymbol{\varepsilon} = \boldsymbol{L}\boldsymbol{u}
\]
*Stress-displacement equations (constitutive equations). For a linear elastic material this is Hooke's law:
\[
\boldsymbol{\sigma} = \boldsymbol{C}\boldsymbol{\varepsilon}
\]
where $\boldsymbol{\sigma}$ is the Cauchy stress tensor represented in vector form (6 components), $\boldsymbol{L}$ is a differential operator matrix (size $3 \times 6$), $\bullet^T$ is transpose of a matrix $\bullet$, $\boldsymbol{F}$ is the body force vector, $\rho$ is the mass density, $\boldsymbol{u}$ is the displacement vector, $\boldsymbol{\varepsilon}$ is the strain tensor represented in vector form (6 components), and $\boldsymbol{C}$ is the symmetric stress-strain matrix (size $6 \times 6$ with 21 material constants $C_{ij} = C_{ji}$). Certain literature prefers using the symbol $\boldsymbol{D}$ instead of $\boldsymbol{C}$ for the stress-strain matrix. The symbol $C$ is then used as the "compliance tensor" that relates the strains to the stresses, e.g. $\boldsymbol{\varepsilon} = \boldsymbol{C}\boldsymbol{\sigma}$.

To solve the basic equations of elasticity two approaches exist according to the boundary conditions of the boundary value problem. In the '''displacement formulation''' the displacements are prescribed everywhere at the boundaries and the stresses and strains are eliminated from the formulation. The other possible option is that the surface tractions are prescribed everywhere on the surface boundary. The equations of elasticity are then manipulated to leave the stresses as the unknown to be solved for. This approach is known as the '''stress formulation'''.

== Displacement formulation ==

In this approach our goal is to eliminate the strains and stresses from the equations and leave only the displacements as the unknown to be solved for. The first step is to substitute

=== Navier equations ===
The Navier or Navier-Cauchy equations describe the dynamic of a solid through the displacement vector field $\b{u}$. The equation is as follows
\begin{equation}
\rho \frac{\partial^2 \b{u}}{\partial t^2} = \mu \nabla^2 \b{u} + (\lambda + \mu) \nabla(\nabla \cdot \b{u}) + \b{F}\ ,
\end{equation}
where $\mu$ and $\lambda$ are Lamé constants, $\rho$ is the object density and $\b{F}$ are the external forces.

== Stress formulation ==
* [https://en.wikipedia.org/wiki/Linear_elasticity#Stress_formulation Stress formulation at Wikipedia]
* [https://en.wikipedia.org/wiki/Stress_functions Stress functions at Wikipedia]



== Two-dimensional stress distributions ==
Many problems in elasticity can be simplified as two-dimensional problems described by ''plane theory of elasticity''. In general there are two types of problems we may encounter in plane analysis: '''plane stress''' and '''plane strain'''. The first problem arises in analysis of thin plates loaded in the plane of the plate, while the second is used for elongated bodies of constant cross section subject to uniform loading.

=== Plane stress ===

Plane stress distributions build on the assumption that the normal stress and shear stresses directed perpendicular to the $x$-$y$ plane are assumed zero:
\begin{equation}\label{eq:pstress_assump}
\sigma_{zz} = \sigma_{zx} = \sigma_{zy} = 0.
\end{equation}
It is also assumed that the stress components do not vary through the thickness of the plate (the assumptions do violate some compatibility conditions, but are still sufficiently accurate for practical applications if the plate is thin).

Using (\ref{eq:pstress_assump}) the three-dimensional Hooke's law can be reduced to:
\begin{equation}\label{eq:planestress}
\boldsymbol{\sigma} = \boldsymbol{C}\boldsymbol{\varepsilon}
\end{equation}
in matrix form where for isotropic materials, we have
\begin{equation}\label{eq:planestressmatrix}
\boldsymbol{C}
= \frac{E}{1-\nu^2}
\begin{bmatrix}
1 & \nu & 0 \\
\nu & 1 & 0 \\
0 & 0 & \frac{1-\nu}{2}
\end{bmatrix} \qquad \text{(Plane stress)}
\end{equation}
and \[\boldsymbol{\sigma} = \{\sigma_{xx} \quad \sigma_{yy} \quad \sigma_{xy}\},\] \[\boldsymbol{\varepsilon} = \{\varepsilon_{xx} \quad \varepsilon_{yy} \quad \varepsilon_{xy}\},\] where the brackets $\{\,\}$ indicate that these are column vectors.

=== Plane strain ===

The plane strain problem arises in analysis of walls, dams, tunnels where one dimension of the structure is very large in comparison to the other two dimensions ($x$- and $y$- coordinates). It is also appropriate for small-scale problems such as bars and rollers compressed by forces normal to their cross section. In all such problems the body may be imagined as a prismatic cylinder with one dimension much larger that the other two. The applied forces act in the $x$-$y$ plane and do not vary in the $z$ direction, leading to the assumption
\begin{equation}
\frac{\partial}{\partial z} = u_z = 0.
\end{equation}

With the above assumption it follows immediately that
\begin{equation}
\varepsilon_{zz} = \varepsilon_{zx} = \varepsilon_{zy} = 0.
\end{equation}
The three dimensional Hooke's law can now be reduced to
\begin{equation}\label{eq:planestrain}
\boldsymbol{\sigma} = \boldsymbol{C}\boldsymbol{\varepsilon}
\end{equation}
where the matrix $C$ is given by
\begin{equation}\label{eq:matrixplanestrain}
\boldsymbol{C}
= \frac{E}{(1+\nu)(1-2\nu)}
\begin{bmatrix}
1-\nu & \nu & 0 \\
\nu & 1-\nu & 0 \\
0 & 0 & \frac{1-2\nu}{2}
\end{bmatrix} \qquad \text{(Plane strain)}
\end{equation}
The vectors $\boldsymbol{\sigma}$ and $\boldsymbol{\varepsilon}$ are the same as above for plane stress. In the case of plane strain we have additonal non-zero components of the stress tensor:
\begin{equation}\label{eq:sigmazz}
\sigma_{zz} = \nu(\sigma_{xx}+\sigma_{yy})
\end{equation}
\begin{equation}
\sigma_{yz} = \sigma_{zx} = 0
\end{equation}
The reason $\sigma_{zz}$ is not included in the matrix stress-strain equation (\ref{eq:planestrain}) is because it is linearly dependent on the normal stresses $\sigma_{xx}$ and $\sigma_{zz}$.

=== Connection between plane stress and plane strain ===

For '''isotropic''' materials with elastic modulus $E$ and Poisson's ratio $\nu$ it is possible to go from plane stress to plane strain, or vice-versa, by replacing $E$ and $\nu$ in the stress-strain matrix with a fictitious modulus $E^*$ and fictitious Poisson ratio $\nu^*$. This allows us to "reuse" a plane stress program to solve plane strain or again vice-versa (as long as the material is isotropic). A few exercises on this topic are given [http://www.colorado.edu/engineering/cas/courses.d/IFEM.d/IFEM.Ch14.d/IFEM.Ch14.pdf at this link (page 13)].

To go from plane stres'''s''' ($s$) to plane strai'''n''' ($n$) insert the fictitious quantities
\begin{equation}\label{eq:ston1}
E_n^* = \frac{E_s}{1-\nu_s^2},
\end{equation}
\begin{equation}\label{eq:ston2}
\nu_n^* = \frac{\nu_s}{1-\nu_s}.
\end{equation}

<div class="toccolours mw-collapsible mw-collapsed">
'''Substitution from plane stress to plane strain'''
<div class="mw-collapsible-content">Note: this derivation is shown just to confirm the above formulas. In a numerical program, we keep the stress-strain matrix and just use the above formulas (\ref{eq:ston1}) and (\ref{eq:ston2}) to update the values of $E$ and $\nu$. Also note we have omitted the indexes ($s$) and ($n$) in the following derivations.

We start with the plane stress matrix with the inserted fictitious values $E^*$ and $\nu^*$
\[
\frac{E^*}{1-{\nu^*}^2}
\begin{bmatrix}
1 & \nu^* & 0 \\
\nu^* & 1 & 0 \\
0 & 0 & \frac{1}{2}(1-\nu^*)
\end{bmatrix}
\]
and make the above substitutions (\ref{eq:ston1}) and (\ref{eq:ston2}) leading to:
\[
\frac{E}{\left(1-\nu^2\right)\left(1-\left(\frac{\nu}{1-\nu}\right)^2\right)}
\begin{bmatrix}
1 & \frac{\nu}{1-\nu} & 0 \\
\frac{\nu}{1-\nu} & 1 & 0 \\
0 & 0 & \frac{1}{2}(1-\frac{\nu}{1-\nu})
\end{bmatrix}.
\]
We can then use the rule to convert sums of squares into products as well as bring the factor $1/(1-v)$ out of the matrix:
\[
\frac{E}{\left(1-\nu\right)\left(1+\nu\right)\left(1-\frac{\nu}{1-\nu}\right)\left(1+\frac{\nu}{1-\nu}\right)\left(1-\nu\right)}
\begin{bmatrix}
1-\nu & \nu & 0 \\
\nu & 1-\nu & 0 \\
0 & 0 & \frac{1}{2}(1-2\nu)
\end{bmatrix}.
\]
By joining some of the factors to a common denominator and rearranging leads to
\[
\frac{E\left(1-\nu\right)\left(1-\nu\right)}{\left(1-\nu\right)\left(1+\nu\right)\left(1-2\nu\right)1\left(1-\nu\right)}
\begin{bmatrix}
1-\nu & \nu & 0 \\
\nu & 1-\nu & 0 \\
0 & 0 & \frac{1}{2}(1-2\nu)
\end{bmatrix}.
\]
The final step is canceling the factors that occur in both the numerator and denominator
\[
\frac{E}{\left(1+\nu\right)\left(1-2\nu\right)}
\begin{bmatrix}
1-\nu & \nu & 0 \\
\nu & 1-\nu & 0 \\
0 & 0 & \frac{1}{2}(1-2\nu)
\end{bmatrix}
\]
which leads exactly to the relationship for plane strain given in (\ref{eq:planestrain}).
</div>
</div>

In the opposite case we want to go from plane strai'''n''' ($n$) to plane stres'''s''' ($s$) we can use:
\begin{equation}\label{eq:ntos1}
E_s^* = \frac{E_n(1+2\nu_n)}{(1+\nu_n)^2},
\end{equation}
\begin{equation}\label{eq:ntos2}
\nu_s^* = \frac{\nu_n}{1+\nu_n}.
\end{equation}

<div class="toccolours mw-collapsible mw-collapsed">
'''Substitution from plane strain to plane stress'''
<div class="mw-collapsible-content">
Note that we have again omitted the indexes ($s$) and ($n$) since our wish is expressed by the bold title.

We start with the plane strain matrix with the inserted fictitious values $E^*$ and $\nu^*$
\[
\frac{E^*}{\left(1+\nu^*\right)\left(1-2\nu^*\right)}
\begin{bmatrix}
1-\nu^* & \nu^* & 0 \\
\nu^* & 1-\nu^* & 0 \\
0 & 0 & \frac{1}{2}(1-2\nu^*)
\end{bmatrix}
\]
and make the above substitutions (\ref{eq:ntos1}) and (\ref{eq:ntos2}) leading to:
\[
\frac{E(1+2\nu)}{\left(1+\nu\right)^2\left(1+\frac{\nu}{1+\nu}\right)\left(1-2\frac{\nu}{1+\nu}\right)}
\begin{bmatrix}
1-\frac{\nu}{1+\nu} & \frac{\nu}{1+\nu} & 0 \\
\frac{\nu}{1+\nu} & 1-\frac{\nu}{1+\nu} & 0 \\
0 & 0 & \frac{1}{2}(1-2\frac{\nu}{1+\nu})
\end{bmatrix}.
\]
Writing some of the sums with common denominators and rearranging leads to
\[
\frac{E(1+2\nu)(1+\nu)(1+\nu)}{\left(1+\nu\right)^2\left(1+2\nu\right)\left(1-\nu\right)}
\begin{bmatrix}
\frac{1}{1+\nu} & \frac{\nu}{1+\nu} & 0 \\
\frac{\nu}{1+\nu} & \frac{1}{1+\nu} & 0 \\
0 & 0 & \frac{1}{2}\left(\frac{1-\nu}{1+\nu}\right)
\end{bmatrix}.
\]
We can also bring the factor $1/(1+\nu)$ out from the matrix components. After canceling all factors that occur in both denominator and numerator we are left with
\[
\frac{E}{\left(1+\nu\right)\left(1-\nu\right)}
\begin{bmatrix}
1 & \nu & 0 \\
\nu & 1 & 0 \\
0 & 0 & \frac{1}{2}(1-\nu)
\end{bmatrix}.
\]
Rewriting the product of sums as a sum of squares gives us the matrix for plane stress in (\ref{eq:planestress}).
</div>
</div>

=Point contact on a 2D half-plane=

A starting point to solve problems in contact mechanics is to understand the effect of a point-load applied to a homogeneous, linear elastic, isotropic half-plane. This problem may be defined either as plane stress or plain strain (for solution with FreeFem++ we have choosen the latter). The traction boundary conditions for this problem are:
\begin{equation}\label{eq:bc}
\sigma_{xy}(x,0) = 0, \quad \sigma_{yy}(x,y) = -P\delta(x,y)
\end{equation}
where $\delta(x,y)$ is the Dirac delta function. Together these boundary conditions state that there is a singular normal force $P$ applied at $(x,y) = (0,0)$ and there are no shear stresses on the surface of the elastic half-plane.

The analytical relations for the stresses can be found from the [https://en.wikipedia.org/wiki/Flamant_solution Flamant solution] (stress distributions in a linear elastic wedge loaded by point forces a the tip. When the "wedge" is flat we get a half-plane. The derivation uses polar coordinates.) and are given as:
\begin{equation}
\sigma_{xx} = -\frac{2P}{\pi} \frac{x^2y}{\left(x^2+y^2\right)^2},
\end{equation}
\begin{equation}
\sigma_{yy} = -\frac{2P}{\pi} \frac{y^3}{\left(x^2+y^2\right)^2},
\end{equation}
\begin{equation}
\sigma_{xy} = -\frac{2P}{\pi} \frac{xy^2}{\left(x^2+y^2\right)^2},
\end{equation}
for some point $(x,y)$ in the half-plane. From this stress field the strain components and thus the displacements $(u_x,u_y)$ can be determined. The displacements are given by
\begin{align}
u_x &= -\frac{P}{4\pi\mu}\left((\kappa-1)\theta - \frac{2xy}{r^2}\right), \label{eq:dispx}\\
u_y &= -\frac{P}{4\pi\mu}\left((\kappa+1)\log r + \frac{2x^2}{r^2}\right), \label{eq:dispy}
\end{align}
where $$r = \sqrt{x^2+y^2}$$ and $$\tan \theta = \frac{x}{y}.$$ The symbol $\kappa$ is known as Dundars constant and is defined as
\[
\kappa = \begin{cases} 3 - 4\nu & \quad \text{(Plane strain)}, \\
\cfrac{3 - \nu}{1+\nu} & \quad \text{(Plane stress)}. \end{cases}
\]
The last remaining symbol is $\mu$ which represents the shear modulus (sometimes also denoted with $G$).

==Numerical solution with [http://www.freefem.org/ FreeFem++]==
Due to the known analytical solution the point-contact problem can be used for benchmarking numerical PDE solvers in terms of accuracy (as well as computational efficiency). The purpose of this section is to compare the numerical solution obtained by FreeFem++ with the analytical solution, as well as provide a reference numerical solution for the [http://www-e6.ijs.si/ParallelAndDistributedSystems/MeshlessMachine/wiki/index.php/Main_Page C++ library] developed in our laboratory.

For purposes of simplicity we limit ourselves to the domain $(x,y) \in \Omega = [-1,1] \times[-1,-0.1]$ and prescribe Dirichlet displacement on the boundaries $\Gamma_D$ from the known analytical solution (\ref{eq:dispx}, \ref{eq:dispy}). This way we avoid having to deal with the Dirac delta traction boundary condition (\ref{eq:bc}). The problem can be described as find $\boldsymbol{u(\boldsymbol{x})}$ that satisfies
\begin{equation}
\boldsymbol{\nabla}\cdot\boldsymbol{\sigma}= 0 \qquad \text{on }\Omega
\end{equation}
and
\begin{equation}
\boldsymbol{u} = \boldsymbol{u}_{\text{analytical}} \qquad \text{on }\Gamma_D
\end{equation}
where $\boldsymbol{u}_\text{analytical}$ is given in equations (\ref{eq:dispx}) and (\ref{eq:dispy}).

To solve the point-contact problem in FreeFem++ we must first provide the weak form of the balance equation:
\begin{equation*}
\boldsymbol{\nabla}\cdot\boldsymbol{\sigma} + \boldsymbol{b} = 0.
\end{equation*}
The corresponding weak formulation is
\begin{equation}\label{eq:weak}
\int_\Omega \boldsymbol{\sigma} : \boldsymbol{\varepsilon}(\boldsymbol{v}) \, d\Omega - \int_\Omega \boldsymbol{b}\cdot\boldsymbol{v}\,d\Omega = 0
\end{equation}
where $:$ denotes the tensor scalar product (tensor contraction), i.e. $\boldsymbol{A}:\boldsymbol{B} =\sum_{i,j} A_{ij}B_{ij}$. The vector $\boldsymbol{v}$ is the test function or so-called "virtual displacement".

Equation (\ref{eq:weak}) can be handed to FreeFem++ with the help of [https://en.wikipedia.org/wiki/Voigt_notation#Mandel_notation Voigt or Mandel notation], that reduces the symmetric tensors $\boldsymbol{\sigma}$ and $\boldsymbol{\varepsilon}$ to vectors. The benefit of [https://en.wikipedia.org/wiki/Voigt_notation#Mandel_notation Mandel notation] is that it allows the tensor scalar product to be performed as a scalar product of two vectors.
For this reason we create the following macros:
macro u [ux,uy] // displacements
macro v [vx,vy] // test function
macro b [bx,by] // body forces
macro e(u) [dx(u[0]),dy(u[1]),(dx(u[1])+dy(u[0]))/2] // strain (for post-processing)
macro em(u) [dx(u[0]),dy(u[1]),sqrt(2)*(dx(u[1])+dy(u[0]))/2] // strain in Mandel notation
macro A [[2*mu+lambda,mu,0],[mu,2*mu+lambda,0],[0,0,2*mu]] // stress-strain matrix

The weak form (\ref{eq:weak}) can then be expressed naturally in FreeFem++ syntax as
int2d(Th)((A*em(u))'*em(v)) - int2d(Th)(b'*v)

===Stress and displacement fields===

===Convergence studies===

<figure id="fig:point_contact_convergence">
[[File:Convergence.png|thumb|upright=2|<caption>Convergence results for the point contact problem. The colours blue, red and green represent linear, quadratic and cubic finite elements, respectively.</caption>]]
</figure>

For the convergence between analytical and numerical solution we vary the number of nodes by increasing the grid size in both $x$- and $y$- directions simultaneously by powers of two from $2^2$ (16 nodes all together) to $2^7$ (16384 nodes all together).
The $L^2$ error norm is used to measure the "difference" between solutions. Since the displacements are the variable that obtain from FreeFem++, we use the displacement magnitude $|\boldsymbol{u}| = \sqrt{u_x^2+u_y^2}$ to define our $L^2$-error norm. The exact equation we have used is
\begin{equation}
L^2\text{-norm} = \sqrt{\frac{\int_\Omega (|\boldsymbol{u_{\text{numerical}}}|-|\boldsymbol{u_{\text{analytical}}}|)^2d\Omega}{\int_\Omega|\boldsymbol{u_{\text{analytical}}}|^2d\Omega}}.
\end{equation}
Results are shown in <xr id="fig:point_contact_convergence"/>.

=Contact between parallel cylinders=

=References=
* Theory of matrix structural analysis

Solid Mechanics

2016-10-27T13:29:25Z

Mkolman: /* Strain-displacement equations */

Solid Mechanics

2016-10-27T13:07:21Z

Mkolman: /* Connection between plane stress and plane strain */