Contents

Tabel
Graph


Tabel:

Number of doubles Orig. Bcast / p=1 New Bcast / p=1 Orig. Bcast / p=2 New Bcast / p=2 Orig. Bcast / p=3 New Bcast / p=3 Orig. Bcast / p=4 New Bcast / p=4
1 0,000004 0,000004 0,000223 0,000262 0,000275 0,000267 0,000459 0,000290
10 0,000004 0,000004 0,000268 0,000268 0,000290 0,000276 0,000457 0,000312
100 0,000004 0,000007 0,000746 0,000741 0,000818 0,000739 0,001220 0,000894
1000 0,000006 0,000037 0,002422 0,002329 0,003103 0,003093 0,005784 0,003748
10000 0,000052 0,000379 0,017470 0,015790 0,029754 0,027422 0,049730 0,040134

Graph:




These are the results of the tests I performed using the original broadcast-function (Orig. Bcast) provided by the MPI-framework and the functions written by myself (New Bcast) using P2P-communication. Each test is performed 100 times before calculating the average result. The number of processes used for each test is denoted in the tabel by p=x, where x is the actual number of procs. As it is not necessary for performance testing, I have not implemented my own functions for ALL datatypes and ALL reduction-functions but only for MPI_DOUBLE and MPI_SUM.

The results show that the performance of the original and the self-written functions mostly does not differ much. An interesting point is, that the original functions seem to be a little faster when using few processes whereas my own functions perform better at higher numbers of processes. The cause could be a restriction on BC-traffic on the machines I tested the program on. I used the machines at the university in the late evening. As I am not aware of the algorithms used in the original implementation I cannot denote differences between the original and the new implementation.