Author Topic: 'Float' faster than 'int' data type for multiplication and division? (Read 1828 times)

David Knight · « **on:** January 24, 2016, 08:06:13 pm »

I have been exploring ways of optimising code for the last few months on and off. I always figured that multiplication and division was slow, particularly with 'float' data types when compared to 'int' type due to the increased calculations required and so tend to avoid this where possible (eg x + x instead of 2 * x), particularly where the compiler may have a hard time optimising this due to variables.

I know that the Ingenic jz4770 CPU in the gcw0 has a Floating Point Unit (FPU) which greatly speeds up floating point calculations but I wanted to see how much slower float calculations were compared to int with this co-processor.

To test this I compiled and ran a sample program (from here).

Code: [Select]

#include <stdio.h>
#ifdef _WIN32
#include <sys/timeb.h>
#else
#include <sys/time.h>
#endif
#include <time.h>
#include <cstdlib>

double
mygettime(void) {
# ifdef _WIN32
  struct _timeb tb;
  _ftime(&tb);
  return (double)tb.time + (0.001 * (double)tb.millitm);
# else
  struct timeval tv;
  if(gettimeofday(&tv, 0) < 0) {
    perror("oops");
  }
  return (double)tv.tv_sec + (0.000001 * (double)tv.tv_usec);
# endif
}

template< typename Type >
void my_test(const char* name) {
  Type v  = 0;
  // Do not use constants or repeating values
  //  to avoid loop unroll optimizations.
  // All values >0 to avoid division by 0
  // Perform ten ops/iteration to reduce
  //  impact of ++i below on measurements
  Type v0 = (Type)(rand() % 256)/16 + 1;
  Type v1 = (Type)(rand() % 256)/16 + 1;
  Type v2 = (Type)(rand() % 256)/16 + 1;
  Type v3 = (Type)(rand() % 256)/16 + 1;
  Type v4 = (Type)(rand() % 256)/16 + 1;
  Type v5 = (Type)(rand() % 256)/16 + 1;
  Type v6 = (Type)(rand() % 256)/16 + 1;
  Type v7 = (Type)(rand() % 256)/16 + 1;
  Type v8 = (Type)(rand() % 256)/16 + 1;
  Type v9 = (Type)(rand() % 256)/16 + 1;

  double t1 = mygettime();
  for (size_t i = 0; i < 100000000; ++i) {
    v += v0;
    v -= v1;
    v += v2;
    v -= v3;
    v += v4;
    v -= v5;
    v += v6;
    v -= v7;
    v += v8;
    v -= v9;
  }
  // Pretend we make use of v so compiler doesn't optimize out
  //  the loop completely
  printf("%s add/sub: %f [%d]\n", name, mygettime() - t1, (int)v&1);
  t1 = mygettime();
  for (size_t i = 0; i < 100000000; ++i) {
    v /= v0;
    v *= v1;
    v /= v2;
    v *= v3;
    v /= v4;
    v *= v5;
    v /= v6;
    v *= v7;
    v /= v8;
    v *= v9;
  }
  // Pretend we make use of v so compiler doesn't optimize out
  //  the loop completely
  printf("%s mul/div: %f [%d]\n", name, mygettime() - t1, (int)v&1);
}

int main() {
  my_test< short >("short");
  my_test< long >("long");
  my_test< long long >("long long");
  my_test< float >("float");
  my_test< double >("double");

  return 0;
}

Compile flags are -O0 -fomit-frame-pointer -Wall -Winline -ansi -march=mips32r2 -fpermissive

Results:

Code: [Select]

short add/sub: 10.552326 [0]
short mul/div: 22.954214 [1]
long add/sub: 9.499934 [0]
long mul/div: 29.107459 [0]
long long add/sub: 15.575820 [0]
long long mul/div: 214.377224 [0]
float add/sub: 20.134216 [0]
float mul/div: 20.636165 [0]
double add/sub: 33.790495 [0]
double mul/div: 29.725830 [0]

From these results I conclude that floating point multiplication and division is marginally faster than integer multiplication and division on the gcw0, however addition and subtraction with floating point types is two times slower than for integer types.

Does this mean that if I were to convert int types to float when doing multiplication and division I should expect an increase in speed or am I misinterpreting the code?

Senor Quack · « **Reply #1 on:** January 25, 2016, 10:27:04 pm »

Removed my post until I can figure out some crap in my bench results

Nebuleon · « **Reply #2 on:** January 26, 2016, 01:07:35 am »

What I know from benchmarks I did:

The GCW Zero has an integer multiplication instruction, MULT/MULTU [unsigned], that is a different number of cycles from the integer division instruction, DIV/DIVU [unsigned]. Also, the time taken by DIV/DIVU differs according to the number of bits of the result (e.g. dividing 2,075,121,490 by 2 will take longer than dividing 2 by 5), among 11 to 60 cycles (so 11 to 60 nanoseconds, given that our CPU is 1 GHz). MULT and MULTU will always take 7 cycles.

Integer ADD and SUB are both 1 cycle.

I don't know much about the floating-point instructions, but I do know that the floating-point division instructions, DIV.S (32-bit) and DIV.D (64-bit), don't have the per-divisor timing that integer DIV/DIVU do.

Senor Quack did benchmarks on floating-point instructions, but I found out that they are dominated by the memory access timings rather than by the operations themselves. I'll work with him to make sure the overhead becomes as minimal in his assembly code as in the 1/10 overhead loop in the Original Post.

Nebuleon · « **Reply #3 on:** January 26, 2016, 01:22:44 am »

Quote from: David Knight on January 24, 2016, 08:06:13 pm

Does this mean that if I were to convert int types to float when doing multiplication and division I should expect an increase in speed or am I misinterpreting the code?

I had missed this part of your post. The conversions to float and from float in order to use the FPU will give you worse performance overall, and for inputs that are integers and may not fit in float's 22 bits of "mantissa", much lower precision.

You'll also break code that depends on the behavior of integer division, for example:

Code: [Select]

int tileY = y / TILE_SIZE * TILE_SIZE; /* round down to a multiple of TILE_SIZE */, because floating-point code may round to the nearest integer instead.

David Knight · « **Reply #4 on:** February 08, 2016, 06:24:47 pm »

Quote from: Nebuleon on January 26, 2016, 01:22:44 am

Quote from: David Knight on January 24, 2016, 08:06:13 pm
Does this mean that if I were to convert int types to float when doing multiplication and division I should expect an increase in speed or am I misinterpreting the code?

I had missed this part of your post. The conversions to float and from float in order to use the FPU will give you worse performance overall, and for inputs that are integers and may not fit in float's 22 bits of "mantissa", much lower precision.

You'll also break code that depends on the behavior of integer division, for example:
Code: [Select]
int tileY = y / TILE_SIZE * TILE_SIZE; /* round down to a multiple of TILE_SIZE */, because floating-point code may round to the nearest integer instead.

tldr: bad idea. Got it

Dingoonity.org

Author Topic: 'Float' faster than 'int' data type for multiplication and division? (Read 1828 times)

David Knight (OP)

'Float' faster than 'int' data type for multiplication and division?

Senor Quack

Re: 'Float' faster than 'int' data type for multiplication and division?

Nebuleon

Re: 'Float' faster than 'int' data type for multiplication and division?

Nebuleon

Re: 'Float' faster than 'int' data type for multiplication and division?

David Knight (OP)

Re: 'Float' faster than 'int' data type for multiplication and division?