Author Topic: 'Float' faster than 'int' data type for multiplication and division?  (Read 1828 times)

David Knight (OP)

  • **
  • Posts: 577
I have been exploring ways of optimising code for the last few months on and off. I always figured that multiplication and division was slow, particularly with 'float' data types when compared to 'int' type due to the increased calculations required and so tend to avoid this where possible (eg x + x instead of 2 * x), particularly where the compiler may have a hard time optimising this due to variables.

I know that the Ingenic jz4770 CPU in the gcw0 has a Floating Point Unit (FPU) which greatly speeds up floating point calculations but I wanted to see how much slower float calculations were compared to int with this co-processor.

To test this I compiled and ran a sample program (from here).

Code: [Select]
#include <stdio.h>
#ifdef _WIN32
#include <sys/timeb.h>
#else
#include <sys/time.h>
#endif
#include <time.h>
#include <cstdlib>

double
mygettime(void) {
# ifdef _WIN32
  struct _timeb tb;
  _ftime(&tb);
  return (double)tb.time + (0.001 * (double)tb.millitm);
# else
  struct timeval tv;
  if(gettimeofday(&tv, 0) < 0) {
    perror("oops");
  }
  return (double)tv.tv_sec + (0.000001 * (double)tv.tv_usec);
# endif
}

template< typename Type >
void my_test(const char* name) {
  Type v  = 0;
  // Do not use constants or repeating values
  //  to avoid loop unroll optimizations.
  // All values >0 to avoid division by 0
  // Perform ten ops/iteration to reduce
  //  impact of ++i below on measurements
  Type v0 = (Type)(rand() % 256)/16 + 1;
  Type v1 = (Type)(rand() % 256)/16 + 1;
  Type v2 = (Type)(rand() % 256)/16 + 1;
  Type v3 = (Type)(rand() % 256)/16 + 1;
  Type v4 = (Type)(rand() % 256)/16 + 1;
  Type v5 = (Type)(rand() % 256)/16 + 1;
  Type v6 = (Type)(rand() % 256)/16 + 1;
  Type v7 = (Type)(rand() % 256)/16 + 1;
  Type v8 = (Type)(rand() % 256)/16 + 1;
  Type v9 = (Type)(rand() % 256)/16 + 1;

  double t1 = mygettime();
  for (size_t i = 0; i < 100000000; ++i) {
    v += v0;
    v -= v1;
    v += v2;
    v -= v3;
    v += v4;
    v -= v5;
    v += v6;
    v -= v7;
    v += v8;
    v -= v9;
  }
  // Pretend we make use of v so compiler doesn't optimize out
  //  the loop completely
  printf("%s add/sub: %f [%d]\n", name, mygettime() - t1, (int)v&1);
  t1 = mygettime();
  for (size_t i = 0; i < 100000000; ++i) {
    v /= v0;
    v *= v1;
    v /= v2;
    v *= v3;
    v /= v4;
    v *= v5;
    v /= v6;
    v *= v7;
    v /= v8;
    v *= v9;
  }
  // Pretend we make use of v so compiler doesn't optimize out
  //  the loop completely
  printf("%s mul/div: %f [%d]\n", name, mygettime() - t1, (int)v&1);
}

int main() {
  my_test< short >("short");
  my_test< long >("long");
  my_test< long long >("long long");
  my_test< float >("float");
  my_test< double >("double");

  return 0;
}

Compile flags are -O0 -fomit-frame-pointer -Wall -Winline -ansi -march=mips32r2 -fpermissive

Results:
Code: [Select]
short add/sub: 10.552326 [0]
short mul/div: 22.954214 [1]
long add/sub: 9.499934 [0]
long mul/div: 29.107459 [0]
long long add/sub: 15.575820 [0]
long long mul/div: 214.377224 [0]
float add/sub: 20.134216 [0]
float mul/div: 20.636165 [0]
double add/sub: 33.790495 [0]
double mul/div: 29.725830 [0]

From these results I conclude that floating point multiplication and division is marginally faster than integer multiplication and division on the gcw0, however addition and subtraction with floating point types is two times slower than for integer types.

Does this mean that if I were to convert int types to float when doing multiplication and division I should expect an increase in speed or am I misinterpreting the code?

Senor Quack

  • *
  • Posts: 215
Re: 'Float' faster than 'int' data type for multiplication and division?
« Reply #1 on: January 25, 2016, 10:27:04 pm »
Removed my post until I can figure out some crap in my bench results ;)
« Last Edit: January 26, 2016, 12:44:19 am by Senor Quack »

Nebuleon

  • *
  • Posts: 37
Re: 'Float' faster than 'int' data type for multiplication and division?
« Reply #2 on: January 26, 2016, 01:07:35 am »
What I know from benchmarks I did:

The GCW Zero has an integer multiplication instruction, MULT/MULTU [unsigned], that is a different number of cycles from the integer division instruction, DIV/DIVU [unsigned]. Also, the time taken by DIV/DIVU differs according to the number of bits of the result (e.g. dividing 2,075,121,490 by 2 will take longer than dividing 2 by 5), among 11 to 60 cycles (so 11 to 60 nanoseconds, given that our CPU is 1 GHz). MULT and MULTU will always take 7 cycles.

Integer ADD and SUB are both 1 cycle.

I don't know much about the floating-point instructions, but I do know that the floating-point division instructions, DIV.S (32-bit) and DIV.D (64-bit), don't have the per-divisor timing that integer DIV/DIVU do.

Senor Quack did benchmarks on floating-point instructions, but I found out that they are dominated by the memory access timings rather than by the operations themselves. I'll work with him to make sure the overhead becomes as minimal in his assembly code as in the 1/10 overhead loop in the Original Post. :)
The Cloud is nice, but not if it decides to rain on your parade.

Nebuleon

  • *
  • Posts: 37
Re: 'Float' faster than 'int' data type for multiplication and division?
« Reply #3 on: January 26, 2016, 01:22:44 am »
Does this mean that if I were to convert int types to float when doing multiplication and division I should expect an increase in speed or am I misinterpreting the code?

I had missed this part of your post. The conversions to float and from float in order to use the FPU will give you worse performance overall, and for inputs that are integers and may not fit in float's 22 bits of "mantissa", much lower precision.

You'll also break code that depends on the behavior of integer division, for example:
Code: [Select]
int tileY = y / TILE_SIZE * TILE_SIZE; /* round down to a multiple of TILE_SIZE */, because floating-point code may round to the nearest integer instead.
The Cloud is nice, but not if it decides to rain on your parade.

David Knight (OP)

  • **
  • Posts: 577
Re: 'Float' faster than 'int' data type for multiplication and division?
« Reply #4 on: February 08, 2016, 06:24:47 pm »
Does this mean that if I were to convert int types to float when doing multiplication and division I should expect an increase in speed or am I misinterpreting the code?

I had missed this part of your post. The conversions to float and from float in order to use the FPU will give you worse performance overall, and for inputs that are integers and may not fit in float's 22 bits of "mantissa", much lower precision.

You'll also break code that depends on the behavior of integer division, for example:
Code: [Select]
int tileY = y / TILE_SIZE * TILE_SIZE; /* round down to a multiple of TILE_SIZE */, because floating-point code may round to the nearest integer instead.

tldr: bad idea. Got it  ;D