'up to 20% Numerical error or Bug in ten line code block [closed]
Rewriting a single tiny block of code of an application has yielded a considerable performance improvement. The code is 100% sequential, thus there should be no hidden perturbations to the values stored in memory.
Double-checking that the results after computation are the same has shown an up-to 20% relative error in the results, so the question is if this can be explained by numerical error in the algorithm or there is an error in the algorithm itself and the two blocks are not equivalent.
Conceptually, the main modification has been replacing this
for m=0,...
result += gradient * temp[m]
with this
for m=0,...
sum_temp += temp[m]
result = gradient * sum_temp
While in the process dealing with the process of defining new arrays and initializing them.
EDIT: (for clarity) so-called 'result' in the code below are SoGrSca
and SoGrSca
's elements
EDIT: typical values for (final) result are +-1e-8
Actual code is as follows (C++, weird-named nested arrays have been kept untouched at the expense of clarity)
// Three relevant macros for tidyness
//
#define FOR_I3 for (int i = 0; i < 3; ++i)
#define LOOP_n_N(NUM) for (int n = 0; n < NUM; n++)
#define LOOP_m_N(NUM) for (int m = 0; m < NUM; m++)
// Toy definitions for eye-friendlyness
// on Stackoverflow
//
const int NDIM = 6;
const int ORDER = 20;
// More definitions...
//
// 1) Arrays of doubles
//
// SoGrSca, SoGrVec, PrimSca, Jaco_Sol, PrimVec, mat_der_sol
//
// 2) Arrays of integers
//
// mat_co1, mat_co2
// Version 1 of the code
//
FOR_I3 SoGrSca[0][nk + l][i] = 0.0;
FOR_I3 FOR_J3 SoGrVec[0][nk + l][j][i] = 0.0;
//
LOOP_n_N(NDIM) LOOP_m_N(ORDER) {
int idof1 = ORDER*mat_co1[NDIM*l + n] + m;
int idof2 = mat_Gr[ndof*n + ORDER*mat_co2[NDIM*l + n] + m ];
double grp_temp = mat_der_sol[idof1]*GsGm1*PrimSca[1][nk + idof2]/PrimSca[0][nk + idof2];
FOR_I3 SoGrSca[0][nk + l][i] += Jaco_Sol[nk + l][n][i]*grp_temp;
FOR_J3 {
double grp_temp = mat_der_sol[idof1]*PrimVec[0][nk + idof2][j];
FOR_I3 SoGrVec[0][nk + l][j][i] += Jaco_Sol[nk + l][n][i]*grp_temp;
}
}
// Store SoGrSca and SoGrVec values ...
// Version 2 of the code
//
FOR_I3 SoGrSca[0][nk + l][i] = 0.0;
FOR_I3 FOR_J3 SoGrVec[0][nk + l][j][i] = 0.0;
//
double grp_temp_A_v[NDIM];
double grp_temp_B_v[NDIM][3];
LOOP_n_N(NDIM){
grp_temp_A_v[n] = 0.0;
FOR_I3 grp_temp_B_v[n][i] = 0.0;
}
LOOP_n_N(NDIM) LOOP_m_N(ORDER) {
int idof1 = ORDER * mat_co1[NDIM*l + n] + m;
int idof2 = mat_Gr[ndof*n + ORDER*mat_co2[NDIM*l + n] + m ];
grp_temp_A_v[n] += mat_der_sol[idof1] * GsGm1*PrimSca[1][nk + idof2]/PrimSca[0][nk + idof2];
FOR_J3 grp_temp_B_v[n][j] += mat_der_sol[idof1] * PrimVec[0][nk + idof2][j];
}
LOOP_n_N(NDIM) {
FOR_I3 SoGrSca[0][nk + l][i] += Jaco_Sol[nk + l][n][i] * grp_temp_A_v[n];
FOR_J3 {
FOR_I3 SoGrVec[0][nk + l][j][i] += Jaco_Sol[nk + l][n][i] * grp_temp_B_v[n][j];
}
}
// Compare SoGrSca and SoGrVec...
//
// ERROR: they are different
Comparing algorithm is, given y_old
and y_new
,
( abs( y_old - y_new ) / ( abs(y_old) + z ) ) < epsilon // 'true' means equality
where z protects against zero division errors,
z=1e-10
and epsilon required to pass is huge, i.e. epsilon=0.2
While it should be closer to e.g. 1e-7
FUN EDIT: just another example of joining a project with already-well-established macros lol
Solution 1:[1]
Floating point number arithmetic is not distributive. That means there are cases where the following statement is true
x * (a + b) != x * a + x * b;
If you move the multiplication out of your loop and expect the exact same result, you assume that the distributive law is always true, which it is not. As an example I came up with this godbolt snippet.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Jakob Stark |