steve_bank
Diabetic retinopathy and poor eyesight. Typos ...
Pre computed look up tables can be faster but there is always a trasde off between speed and memory usage. For 32 or 64 bit ints how big do thearray need to be?
For x = n1 * n2; there are 2 memory moves and the mult instruction.
Disregarding the table generation the lookup algorithm has 4 moves, 2 subtractions, and 1 addition. I haven’t figured out how to count instruction cycles with VS yet but I’d say it is not likely faster, but I could be wrong. Does two sub and 1 add execute faster than a mult?
And BTW a function call incurs a time penalty. It would be faster to make osq[n1 + n2] - osq[n1 – n2] a macro instead of a function.
#define BOUND 20000
void main() {
int a, b,c = 0,n1=21000,n2 = 21000,x;
for (a = 0; a < BOUND * 2; a++)osq[-a] = osq[a] = a * a / 4;
c = osq[n1 + n2] - osq[n1 - n2];
x = n1 * n2;
printf("n1 %d n2 %d x %d c %d \n", n1, n2, x,c);
}
c = osq[n1 + n2] - osq[n1 - n2];
005F5B32 mov eax,dword ptr [n1]
005F5B35 add eax,dword ptr [n2]
005F5B38 mov ecx,dword ptr [n1]
005F5B3B sub ecx,dword ptr [n2]
005F5B3E mov edx,dword ptr [eax*4+621240h]
005F5B45 sub edx,dword ptr [ecx*4+621240h]
005F5B4C mov dword ptr [c],edx
x = n1 * n2;
005F5B4F mov eax,dword ptr [n1]
005F5B52 imul eax,dword ptr [n2]
005F5B56 mov dword ptr [x],eax
For x = n1 * n2; there are 2 memory moves and the mult instruction.
Disregarding the table generation the lookup algorithm has 4 moves, 2 subtractions, and 1 addition. I haven’t figured out how to count instruction cycles with VS yet but I’d say it is not likely faster, but I could be wrong. Does two sub and 1 add execute faster than a mult?
And BTW a function call incurs a time penalty. It would be faster to make osq[n1 + n2] - osq[n1 – n2] a macro instead of a function.
#define BOUND 20000
void main() {
int a, b,c = 0,n1=21000,n2 = 21000,x;
for (a = 0; a < BOUND * 2; a++)osq[-a] = osq[a] = a * a / 4;
c = osq[n1 + n2] - osq[n1 - n2];
x = n1 * n2;
printf("n1 %d n2 %d x %d c %d \n", n1, n2, x,c);
}
c = osq[n1 + n2] - osq[n1 - n2];
005F5B32 mov eax,dword ptr [n1]
005F5B35 add eax,dword ptr [n2]
005F5B38 mov ecx,dword ptr [n1]
005F5B3B sub ecx,dword ptr [n2]
005F5B3E mov edx,dword ptr [eax*4+621240h]
005F5B45 sub edx,dword ptr [ecx*4+621240h]
005F5B4C mov dword ptr [c],edx
x = n1 * n2;
005F5B4F mov eax,dword ptr [n1]
005F5B52 imul eax,dword ptr [n2]
005F5B56 mov dword ptr [x],eax