Pardiso

From Medusa: Coordinate Free Mehless Method implementation
Revision as of 14:19, 19 July 2017 by Mdepolli (talk | contribs) (Pardiso library)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Pardiso library

Pardiso library can be downloaded from [1]. There are also examples of use, the manual, etc. Note that the source of the library is not available. There are only pre-compiled objects available on download after one registers on the site.

  • The package PARDISO is a thread-safe, high-performance, robust, memory efficient and easy to use software for solving large sparse symmetric and unsymmetric linear systems of equations on shared-memory and distributed-memory multiprocessors. The solver uses a combination of left- and right-looking Level-3 BLAS supernode techniques".
  • Pardiso is proprietary software but is available for academic use; by registering a yearly license can be obtained.
  • Parallelization is implemented using both OpenMP and MPI.
  • MPI-based numerical factorization and parallel forward/backward substitution on distributed-memory architectures is implemented for symmetric indefinite matrices.

Example of use

This minimal example makes use of the OpenMP parallelization to run Pardiso solver on a sample matrix.

test.cpp:

  1 #include <stdio.h>
  2 #include <stdlib.h>
  3 #include <iostream>
  4 #include <sstream>
  5 #include <vector>
  6 #include <cassert>
  7 #include <math.h>
  8 
  9 // PARDISO prototype.
 10 // There is no pardiso.h, therefore function prototypes have to be defined within the user's code
 11 extern "C" void pardisoinit (void   *, int    *,   int *, int *, double *, int *);
 12 extern "C" void pardiso     (void   *, int    *,   int *, int *,    int *, int *, 
 13                   double *, int    *,    int *, int *,   int *, int *,
 14                      int *, double *, double *, int *, double *);
 15 extern "C" void pardiso_chkmatrix  (int *, int *, double *, int *, int *, int *);
 16 extern "C" void pardiso_chkvec     (int *, int *, double *, int *);
 17 extern "C" void pardiso_printstats (int *, int *, double *, int *, int *, int *, double *, int *);
 18 
 19 struct TestCase {
 20     // the dimension of the problem, Matrix A will have n*n elements, while vectors b and x will have n elements
 21     int n;
 22     // indices of individual rows in compact matrix definition: row[0] contains non zero elements at column indices ia[0] .. (ia[1]-1)
 23     std::vector<int> ia;
 24     // column indices of non-zero elements
 25     std::vector<int> ja;
 26     // values of non-zero elements
 27     std::vector<double> A;
 28     // right hand side
 29     std::vector<double> b;
 30     // solution vector
 31     std::vector<double> x;
 32 
 33     // Pardiso workspace variable (using void* as the type ensures enough space is allocated on 32 and 64-bit systems)
 34     void*    pt[64];
 35     // Pardiso solver parameter variables
 36     int      iparm[64];
 37     double   dparm[64];
 38     
 39     void createTestMatrix(int n) {
 40         this->n = n;
 41         
 42         // -2 on the diagonal (except topmost and bottommost, which are 1), 1 on the +1 and -1 diagonals (except topmost and bottommost, which are 0)
 43         ia.reserve(n*2);
 44         ja.reserve(n*3);
 45         A.reserve(n*3);  // 3 per row
 46         
 47         // 1st row (only 1 element)
 48         ia.push_back(0);
 49         A.push_back(1);
 50         ja.push_back(0);
 51 
 52         for (int i = 1; i < n-1; ++i) {
 53             // new row is stating
 54             ia.push_back(A.size());
 55             
 56             // the 3 diagonal elements
 57             A.push_back(1);
 58             ja.push_back(i-1);
 59             
 60             A.push_back(-2);
 61             ja.push_back(i);
 62             
 63             A.push_back(1);
 64             ja.push_back(i+1);
 65         }
 66         
 67         // last row (only 1 element)
 68         ia.push_back(A.size());
 69         A.push_back(1);
 70         ja.push_back(n-1);
 71         
 72         // the last element points at the end of the matrix (n+1 row)
 73         ia.push_back(A.size());
 74         
 75         // Convert matrix from 0-based to 1-based notation (Fortran...)
 76         for (auto i = 0; i < ia.size(); i++) 
 77             ia[i] += 1;
 78         for (auto i = 0; i < ja.size(); i++) 
 79             ja[i] += 1;
 80             
 81         // set b to 1 (topmost and bottommost elements) and 1/n² (the other elements)
 82         b.resize(n, 1);
 83         for (auto i = 0; i < b.size(); i++) 
 84             b[i] /= (n*n);
 85     }
 86     
 87     void solve(int numThreads) {
 88         // the solution placeholder
 89         x.resize(n, 0);
 90         
 91         int error = 0;
 92         int solver = 0;     // use sparse direct solver
 93         int maxfct, mnum, phase, msglvl;
 94         int mtype = 11;     // Real unsymmetric matrix
 95         double ddum;        // double dummy variable
 96         int idum;           // integer dummy variable
 97         int nrhs = 1;       // number of right hand sides
 98         maxfct = 1;         // Maximum number of numerical factorizations.
 99         mnum   = 1;         // Which factorization to use.
100         msglvl = 1;         // Print statistical information
101         error  = 0;         // Initialize error flag to no-error
102         iparm[2] = numThreads;  // initialize the number of threads 
103         
104         pardisoinit (pt,  &mtype, &solver, iparm, dparm, &error);
105         std::cerr << "error = " << error << "\n";
106         pardiso_chkmatrix  (&mtype, &n, A.data(), ia.data(), ja.data(), &error);
107         std::cerr << "error = " << error << "\n";
108         pardiso_chkvec (&n, &nrhs, b.data(), &error);
109         std::cerr << "error = " << error << "\n";
110         pardiso_printstats (&mtype, &n, A.data(), ia.data(), ja.data(), &nrhs, b.data(), &error);
111         std::cerr << "error = " << error << "\n";
112    
113         phase = 11;     // reordering & symbolic factorization
114         pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, A.data(), ia.data(), ja.data(), &idum, &nrhs, iparm, &msglvl, &ddum, &ddum, &error, dparm); 
115         std::cerr << "error = " << error << "\n";
116         phase = 22;     // numerical factorization
117         pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, A.data(), ia.data(), ja.data(), &idum, &nrhs, iparm, &msglvl, &ddum, &ddum, &error, dparm);
118         std::cerr << "error = " << error << "\n";
119         phase = 33;     // back substitution and refinement
120         iparm[7] = 10;  // number of refinement steps
121         pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, A.data(), ia.data(), ja.data(), &idum, &nrhs, iparm, &msglvl, b.data(), x.data(), &error, dparm);
122         std::cerr << "error = " << error << "\n";
123         phase = -1;     // finalization and memory release
124         pardiso (pt, &maxfct, &mnum, &mtype, &phase, &n, A.data(), ia.data(), ja.data(), &idum, &nrhs, iparm, &msglvl, b.data(), x.data(), &error, dparm);
125         std::cerr << "error = " << error << "\n";
126     }
127 };
128 
129 
130 int main(int argc, char** argv) {
131     assert(argc == 2 && "Second argument is size of the system.");
132     int n, nThreads;
133     std::stringstream ss(argv[1]);
134     ss >> n;
135     auto omp_num_threads = getenv("OMP_NUM_THREADS");
136     if (omp_num_threads != nullptr) {
137         std::cerr << "OMP_NUM_THREADS = " << omp_num_threads << "\n";
138         nThreads = atoi(omp_num_threads);      // number of threads
139     }
140     std::cout << "n = " << n << ", num threads = " << nThreads << std::endl;
141     
142     TestCase testcase;
143     testcase.createTestMatrix(n);
144     testcase.solve(nThreads);
145     return 0;
146 }

The code is compiled with GCC using the following command line: g++ test.cpp -o test -O3 -L./ -lpardiso500-GNU481-X86-64 -lgfortran -fopenmp -lpthread -lm -llapack -lblas

Then it is executed: OMP_NUM_THREADS=8 ./test 1000000

This takes quite some time on the experimental machine (Intel i7 870 @ 2.93 GHz), which is dilligently reported by Pardiso itself:

Summary PARDISO 5.0.0: ( reorder to reorder )
    Time fulladj: 0.054887 s
    Time reorder: 2.192858 s
    Time symbfct: 0.189367 s
    Time parlist: 0.059304 s
    Time malloc : 0.122711 s
    Time total  : 2.871472 s total - sum: 0.252344 s
Summary PARDISO 5.0.0: ( factorize to factorize )
    Time A to LU: 0.000000 s
    Time numfct : 0.196008 s
    Time malloc : -0.000004 s
    Time total  : 0.196070 s total - sum: 0.000066 s
Summary PARDISO 5.0.0: ( solve to solve )
    Time solve  : 0.033254 s
    Time total  : 0.200364 s total - sum: 0.167110 s