Pardiso
From Medusa: Coordinate Free Mehless Method implementation
Revision as of 12:21, 19 July 2017 by Mdepolli (talk | contribs) (Created page with "=Pardiso library= Pardiso library can be downloaded from [http://www.pardiso-project.org/]. There are also examples of use, the manual, etc. Note that the source of the libra...")
Pardiso library
Pardiso library can be downloaded from [1]. There are also examples of use, the manual, etc. Note that the source of the library is not available. There are only pre-compiled objects available on download after one registers on the site.
- The package PARDISO is a thread-safe, high-performance, robust, memory efficient and easy to use software for solving large sparse symmetric and unsymmetric linear systems of equations on shared-memory and distributed-memory multiprocessors. The solver uses a combination of left- and right-looking Level-3 BLAS supernode techniques".
- Pardiso is proprietary software but is available for academic use; by registering a yearly license can be obtained.
- Parallelization is implemented using both OpenMP and MPI.
Example of use
This minimal example makes use of the OpenMP parallelization to run Pardiso solver on a sample matrix.
test.cpp:
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <iostream>
4 #include <sstream>
5 #include <vector>
6 #include <cassert>
7 #include <math.h>
8
9 // PARDISO prototype.
10 // There is no pardiso.h, therefore function prototypes have to be defined within the user's code
11 extern "C" void pardisoinit (void *, int *, int *, int *, double *, int *);
12 extern "C" void pardiso (void *, int *, int *, int *, int *, int *,
13 double *, int *, int *, int *, int *, int *,
14 int *, double *, double *, int *, double *);
15 extern "C" void pardiso_chkmatrix (int *, int *, double *, int *, int *, int *);
16 extern "C" void pardiso_chkvec (int *, int *, double *, int *);
17 extern "C" void pardiso_printstats (int *, int *, double *, int *, int *, int *, double *, int *);
18
19 struct TestCase {
20 // the dimension of the problem, Matrix A will have n*n elements, while vectors b and x will have n elements
21 int n;
22 // indices of individual rows in compact matrix definition: row[0] contains non zero elements at column indices ia[0] .. (ia[1]-1)
23 std::vector<int> ia;
24 // column indices of non-zero elements
25 std::vector<int> ja;
26 // values of non-zero elements
27 std::vector<double> A;
28 // right hand side
29 std::vector<double> b;
30 // solution vector
31 std::vector<double> x;
32
33 // Pardiso workspace variable (using void* as the type ensures enough space is allocated on 32 and 64-bit systems)
34 void* pt[64];
35 // Pardiso solver parameter variables
36 int iparm[64];
37 double dparm[64];
38
39 void createTestMatrix(int n) {
40 this->n = n;
41
42 // -2 on the diagonal (except topmost and bottommost, which are 1), 1 on the +1 and -1 diagonals (except topmost and bottommost, which are 0)
43 ia.reserve(n*2);
44 ja.reserve(n*3);
45 A.reserve(n*3); // 3 per row
46
47 // 1st row (only 1 element)
48 ia.push_back(0);
49 A.push_back(1);
50 ja.push_back(0);
51
52 for (int i = 1; i < n-1; ++i) {
53 // new row is stating
54 ia.push_back(A.size());
55
56 // the 3 diagonal elements
57 A.push_back(1);
58 ja.push_back(i-1);
59
60 A.push_back(-2);
61 ja.push_back(i);
62
63 A.push_back(1);
64 ja.push_back(i+1);
65 }
66
67 // last row (only 1 element)
68 ia.push_back(A.size());
69 A.push_back(1);
70 ja.push_back(n-1);
71
72 // the last element points at the end of the matrix (n+1 row)
73 ia.push_back(A.size());
74
75 // Convert matrix from 0-based to 1-based notation (Fortran...)
76 for (auto i = 0; i < ia.size(); i++)
77 ia[i] += 1;
78 for (auto i = 0; i < ja.size(); i++)
79 ja[i] += 1;
80
81 // set b to 1 (topmost and bottommost elements) and 1/n² (the other elements)
82 b.resize(n, 1);
83 for (auto i = 0; i < b.size(); i++)
84 b[i] /= (n*n);
85 }
86
87 void solve(int numThreads) {
88 // the solution placeholder
89 x.resize(n, 0);
90
91 int error = 0;
92 int solver = 0; // use sparse direct solver
93 int maxfct, mnum, phase, msglvl;
94 int mtype = 11; // Real unsymmetric matrix
95 double ddum; // double dummy variable
96 int idum; // integer dummy variable
97 int nrhs = 1; // number of right hand sides
98 maxfct = 1; // Maximum number of numerical factorizations.
99 mnum = 1; // Which factorization to use.
100 msglvl = 1; // Print statistical information
101 error = 0; // Initialize error flag to no-error
102 iparm[2] = numThreads; // initialize the number of threads
103
104 pardisoinit (pt, &mtype, &solver, iparm, dparm, &error);
105 std::cerr << "error = " << error << "\n";
106 pardiso_chkmatrix (&mtype, &n, A.data(), ia.data(), ja.data(), &error);
107 std::cerr << "error = " << error << "\n";
108 pardiso_chkvec (&n, &nrhs, b.data(), &error);
109 std::cerr << "error = " << error << "\n";
110 pardiso_printstats (&mtype, &n, A.data(), ia.data(), ja.data(), &nrhs, b.data(), &error);
111 std::cerr << "error = " << error << "\n";
112
113 phase = 11; // reordering & symbolic factorization
114 pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, A.data(), ia.data(), ja.data(), &idum, &nrhs, iparm, &msglvl, &ddum, &ddum, &error, dparm);
115 std::cerr << "error = " << error << "\n";
116 phase = 22; // numerical factorization
117 pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, A.data(), ia.data(), ja.data(), &idum, &nrhs, iparm, &msglvl, &ddum, &ddum, &error, dparm);
118 std::cerr << "error = " << error << "\n";
119 phase = 33; // back substitution and refinement
120 iparm[7] = 10; // number of refinement steps
121 pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, A.data(), ia.data(), ja.data(), &idum, &nrhs, iparm, &msglvl, b.data(), x.data(), &error, dparm);
122 std::cerr << "error = " << error << "\n";
123 phase = -1; // finalization and memory release
124 pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, A.data(), ia.data(), ja.data(), &idum, &nrhs, iparm, &msglvl, b.data(), x.data(), &error, dparm);
125 std::cerr << "error = " << error << "\n";
126 }
127 };
128
129
130 int main(int argc, char** argv) {
131 assert(argc == 2 && "Second argument is size of the system.");
132 int n, nThreads;
133 std::stringstream ss(argv[1]);
134 ss >> n;
135 auto omp_num_threads = getenv("OMP_NUM_THREADS");
136 if (omp_num_threads != nullptr) {
137 std::cerr << "OMP_NUM_THREADS = " << omp_num_threads << "\n";
138 nThreads = atoi(omp_num_threads); // number of threads
139 }
140 std::cout << "n = " << n << ", num threads = " << nThreads << std::endl;
141
142 TestCase testcase;
143 testcase.createTestMatrix(n);
144 testcase.solve(nThreads);
145 return 0;
146 }
The code is compiled with GCC using the following command line:
g++ test.cpp -o test -O3 -L./ -lpardiso500-GNU481-X86-64 -lgfortran -fopenmp -lpthread -lm -llapack -lblas
Then it is executed:
OMP_NUM_THREADS=8 ./test 1000000
This takes quite some time on the experimental machine (Intel i7 870 @ 2.93 GHz), which is dilligently reported by Pardiso itself:
Summary PARDISO 5.0.0: ( reorder to reorder )
Time fulladj: 0.054887 s
Time reorder: 2.192858 s
Time symbfct: 0.189367 s
Time parlist: 0.059304 s
Time malloc : 0.122711 s
Time total : 2.871472 s total - sum: 0.252344 s
Summary PARDISO 5.0.0: ( factorize to factorize )
Time A to LU: 0.000000 s
Time numfct : 0.196008 s
Time malloc : -0.000004 s
Time total : 0.196070 s total - sum: 0.000066 s
Summary PARDISO 5.0.0: ( solve to solve )
Time solve : 0.033254 s
Time total : 0.200364 s total - sum: 0.167110 s