'up to 20% Numerical error or Bug in ten line code block [closed]

Rewriting a single tiny block of code of an application has yielded a considerable performance improvement. The code is 100% sequential, thus there should be no hidden perturbations to the values stored in memory.

Double-checking that the results after computation are the same has shown an up-to 20% relative error in the results, so the question is if this can be explained by numerical error in the algorithm or there is an error in the algorithm itself and the two blocks are not equivalent.

Conceptually, the main modification has been replacing this

for m=0,...
  result += gradient * temp[m]

with this

for m=0,...
  sum_temp += temp[m]
result = gradient * sum_temp

While in the process dealing with the process of defining new arrays and initializing them.

EDIT: (for clarity) so-called 'result' in the code below are SoGrSca and SoGrSca's elements

EDIT: typical values for (final) result are +-1e-8

Actual code is as follows (C++, weird-named nested arrays have been kept untouched at the expense of clarity)

// Three relevant macros for tidyness
//
#define FOR_I3 for (int i = 0; i < 3; ++i)
#define LOOP_n_N(NUM)  for (int n = 0; n < NUM; n++)
#define LOOP_m_N(NUM)  for (int m = 0; m < NUM; m++)

// Toy definitions for eye-friendlyness
// on Stackoverflow
//
const int NDIM = 6; 
const int ORDER = 20;

// More definitions...
//
// 1) Arrays of doubles
//
// SoGrSca, SoGrVec, PrimSca, Jaco_Sol, PrimVec, mat_der_sol
//
// 2) Arrays of integers
//
// mat_co1, mat_co2



// Version 1 of the code
//
FOR_I3 SoGrSca[0][nk + l][i]           = 0.0;
FOR_I3 FOR_J3 SoGrVec[0][nk + l][j][i] = 0.0;
//
LOOP_n_N(NDIM) LOOP_m_N(ORDER) {

  int idof1 = ORDER*mat_co1[NDIM*l + n] + m;
  int idof2 = mat_Gr[ndof*n +   ORDER*mat_co2[NDIM*l + n] + m           ];

  double grp_temp = mat_der_sol[idof1]*GsGm1*PrimSca[1][nk + idof2]/PrimSca[0][nk + idof2];
  FOR_I3 SoGrSca[0][nk + l][i]      += Jaco_Sol[nk + l][n][i]*grp_temp;
  FOR_J3 {
    double grp_temp = mat_der_sol[idof1]*PrimVec[0][nk + idof2][j];
    FOR_I3 SoGrVec[0][nk + l][j][i] += Jaco_Sol[nk + l][n][i]*grp_temp;
  }
}

// Store SoGrSca and SoGrVec values ...

// Version 2 of the code
//
FOR_I3 SoGrSca[0][nk + l][i]           = 0.0;
FOR_I3 FOR_J3 SoGrVec[0][nk + l][j][i] = 0.0;
//
double grp_temp_A_v[NDIM];
double grp_temp_B_v[NDIM][3];
LOOP_n_N(NDIM){                    
        grp_temp_A_v[n] = 0.0;                  
        FOR_I3  grp_temp_B_v[n][i] = 0.0;       
}
LOOP_n_N(NDIM) LOOP_m_N(ORDER) {
  int idof1 = ORDER * mat_co1[NDIM*l + n] + m;
  int idof2 = mat_Gr[ndof*n +   ORDER*mat_co2[NDIM*l + n] + m           ];
  grp_temp_A_v[n] += mat_der_sol[idof1] * GsGm1*PrimSca[1][nk + idof2]/PrimSca[0][nk + idof2];
  FOR_J3 grp_temp_B_v[n][j] += mat_der_sol[idof1] * PrimVec[0][nk + idof2][j];
}

LOOP_n_N(NDIM) {
  FOR_I3 SoGrSca[0][nk + l][i]      += Jaco_Sol[nk + l][n][i] * grp_temp_A_v[n];
  FOR_J3 {
    FOR_I3 SoGrVec[0][nk + l][j][i] += Jaco_Sol[nk + l][n][i] * grp_temp_B_v[n][j];
  }
}


// Compare SoGrSca and SoGrVec... 
//
// ERROR: they are different

Comparing algorithm is, given y_old and y_new,

( abs( y_old - y_new ) / ( abs(y_old) + z ) ) < epsilon // 'true' means equality

where z protects against zero division errors,

z=1e-10

and epsilon required to pass is huge, i.e. epsilon=0.2 While it should be closer to e.g. 1e-7

FUN EDIT: just another example of joining a project with already-well-established macros lol



Solution 1:[1]

Floating point number arithmetic is not distributive. That means there are cases where the following statement is true

x * (a + b) != x * a + x * b;

If you move the multiplication out of your loop and expect the exact same result, you assume that the distributive law is always true, which it is not. As an example I came up with this godbolt snippet.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jakob Stark