Fast log2(float x) implementation C++ - c++

I need a Very-Fast implementation of log2(float x) function in C++.
I found a very interesting implementation (and extremely fast!)
#include <intrin.h>
inline unsigned long log2(int x)
{
unsigned long y;
_BitScanReverse(&y, x);
return y;
}
But this function is good only for integer values in input.
Question: Is there any way to convert this function to double type input variable?
UPD:
I found this implementation:
typedef unsigned long uint32;
typedef long int32;
static inline int32 ilog2(float x)
{
uint32 ix = (uint32&)x;
uint32 exp = (ix >> 23) & 0xFF;
int32 log2 = int32(exp) - 127;
return log2;
}
which is much faster than the previous example, but the output is unsigned type.
Is it possible to make this function return a double type?
Thanks in advance!

If you just need the integer part of the logarithm, then you can extract that directly from the floating point number.
Portably:
#include <cmath>
int log2_fast(double d) {
int result;
std::frexp(d, &result);
return result-1;
}
Possibly faster, but relying on unspecified and undefined behaviour:
int log2_evil(double d) {
return ((reinterpret_cast<unsigned long long&>(d) >> 52) & 0x7ff) - 1023;
}

MSVC + GCC compatible version that give XX.XXXXXXX +-0.0054545
float mFast_Log2(float val) {
union { float val; int32_t x; } u = { val };
register float log_2 = (float)(((u.x >> 23) & 255) - 128);
u.x &= ~(255 << 23);
u.x += 127 << 23;
log_2 += ((-0.3358287811f) * u.val + 2.0f) * u.val -0.65871759316667f;
return (log_2);
}

Edit: See link by Job in the comments below for a better version.
Fast log() function (5× faster approximately)
Maybe of interest for you. The code works here; It is not infinitely precise though. As the code is broken on the web page (the > have been removed) I'll post it here:
inline float fast_log2 (float val)
{
int * const exp_ptr = reinterpret_cast <int *> (&val);
int x = *exp_ptr;
const int log_2 = ((x >> 23) & 255) - 128;
x &= ~(255 << 23);
x += 127 << 23;
*exp_ptr = x;
val = ((-1.0f/3) * val + 2) * val - 2.0f/3; // (1)
return (val + log_2);
}
inline float fast_log (const float &val)
{
return (fast_log2 (val) * 0.69314718f);
}

You can take a look into this implementation, but :
it may not work on some platforms
might not beat std::log

C++11 added std::log2 into <cmath>.

This is an improvement on the first answer which does not rely on IEEE implementation, although I imagine that it is only fast on IEEE machines where frexp() is basically a costless function.
Instead of discarding the fraction that frexp returns, one can use it to linearly interpolate. The fraction value is between 0.5 and 1.0 if it is positive, so we stretch between 0.0 and 1.0 and add it to the exponent.
In practice, it looks like this fast evaluation is good to about 5-10%, always returning a value that is a little low. I'm sure it could be made better by tweaking the 2* scaling factor.
#include <cmath>
double log2_fast(double d) {
int exponent;
double fraction = std::frexp(d, &exponent);
return (result-1) + 2* (fraction - 0.5);
}
You can verify that this is reasonable fast approximation with this:
#include <cmath>
int main()
{
for(double x=0.001;x<1000;x+=0.1)
{
std::cout << x << " " << std::log2(x) << " " << log2_fast(x) << "\n";
}
}

No, but if you only need the integeral part of the result and don't insist on portability, there is even faster one. Because all you need is to extract the exponent part of the float!

This function is not C++, it's MSVC++ specific. Also, I highly doubt that any such intrinsics exist. And if they did, the Standard function would simply be configured to use it. So just call the Standard-provided library.

Related

Efficiently convert two Integers x and y into the float x.y

Given two integers X and Y, whats the most efficient way of converting them into X.Y float value in C++?
E.g.
X = 3, Y = 1415 -> 3.1415
X = 2, Y = 12 -> 2.12
Here are some cocktail-napkin benchmark results, on my machine, for all solutions converting two ints to a float, as of the time of writing.
Caveat: I've now added a solution of my own, which seems to do well, and am therefore biased! Please double-check my results.
Test
Iterations
ns / iteration
#aliberro's conversion v2
79,113,375
13
#3Dave's conversion
84,091,005
12
#einpoklum's conversion
1,966,008,981
0
#Ripi2's conversion
47,374,058
21
#TarekDakhran's conversion
1,960,763,847
0
CPU: Quad Core Intel Core i5-7600K speed/min/max: 4000/800/4200 MHz
Devuan GNU/Linux 3
Kernel: 5.2.0-3-amd64 x86_64
GCC 9.2.1, with flags: -O3 -march=native -mtune=native
Benchmark code (Github Gist).
float sum = x + y / pow(10,floor(log10(y)+1));
log10 returns log (base 10) of its argument. For 1234, that'll be 3 point something.
Breaking this down:
log10(1234) = 3.091315159697223
floor(log10(1234)+1) = 4
pow(10,4) = 10000.0
3 + 1234 / 10000.0 = 3.1234.
But, as #einpoklum pointed out, log(0) is NaN, so you have to check for that.
#include <iostream>
#include <cmath>
#include <vector>
using namespace std;
float foo(int x, unsigned int y)
{
if (0==y)
return x;
float den = pow(10,-1 * floor(log10(y)+1));
return x + y * den;
}
int main()
{
vector<vector<int>> tests
{
{3,1234},
{1,1000},
{2,12},
{0,0},
{9,1}
};
for(auto& test: tests)
{
cout << "Test: " << test[0] << "," << test[1] << ": " << foo(test[0],test[1]) << endl;
}
return 0;
}
See runnable version at:
https://onlinegdb.com/rkaYiDcPI
With test output:
Test: 3,1234: 3.1234
Test: 1,1000: 1.1
Test: 2,12: 2.12
Test: 0,0: 0
Test: 9,1: 9.1
Edit
Small modification to remove division operation.
(reworked solution)
Initially, my thoughts were improving on the performance of power-of-10 and division-by-power-of-10 by writing specialized versions of these functions, for integers. Then there was #TarekDakhran's comment about doing the same for counting the number of digits. And then I realized: That's essentially doing the same thing twice... so let's just integrate everything. This will, specifically, allow us to completely avoid any divisions or inversions at runtime:
inline float convert(int x, int y) {
float fy (y);
if (y == 0) { return float(x); }
if (y >= 1e9) { return float(x + fy * 1e-10f); }
if (y >= 1e8) { return float(x + fy * 1e-9f); }
if (y >= 1e7) { return float(x + fy * 1e-8f); }
if (y >= 1e6) { return float(x + fy * 1e-7f); }
if (y >= 1e5) { return float(x + fy * 1e-6f); }
if (y >= 1e4) { return float(x + fy * 1e-5f); }
if (y >= 1e3) { return float(x + fy * 1e-4f); }
if (y >= 1e2) { return float(x + fy * 1e-3f); }
if (y >= 1e1) { return float(x + fy * 1e-2f); }
return float(x + fy * 1e-1f);
}
Additional notes:
This will work for y == 0; but - not for negative x or y values. Adapting it for negative value is pretty easy and not very expensive though.
Not sure if this is absolutely optimal. Perhaps a binary-search for the number of digits of y would work better?
A loop would make the code look nicer; but the compiler would need to unroll it. Would it unroll the loop and compute all those floats beforehand? I'm not sure.
I put some effort into optimizing my previous answer and ended up with this.
inline uint32_t digits_10(uint32_t x) {
return 1u
+ (x >= 10u)
+ (x >= 100u)
+ (x >= 1000u)
+ (x >= 10000u)
+ (x >= 100000u)
+ (x >= 1000000u)
+ (x >= 10000000u)
+ (x >= 100000000u)
+ (x >= 1000000000u)
;
}
inline uint64_t pow_10(uint32_t exp) {
uint64_t res = 1;
while(exp--) {
res *= 10u;
}
return res;
}
inline double fast_zip(uint32_t x, uint32_t y) {
return x + static_cast<double>(y) / pow_10(digits_10(y));
}
double IntsToDbl(int ipart, int decpart)
{
//The decimal part:
double dp = (double) decpart;
while (dp > 1)
{
dp /= 10;
}
//Joint boths parts
return ipart + dp;
}
Simple and very fast solution is converting both values x and y to string, then concatenate them, then casting the result into a floating number as following:
#include <string>
#include <iostream>
std::string x_string = std::to_string(x);
std::string y_string = std::to_string(y);
std::cout << x_string +"."+ y_string ; // the result, cast it to float if needed
(Answer based on the fact that OP has not indicated what they want to use the float for.)
The fastest (most efficient) way is to do it implicitly, but not actually do anything (after compiler optimizations).
That is, write a "pseudo-float" class, whose members are integers of x and y's types before and after the decimal point; and have operators for doing whatever it is you were going to do with the float: operator+, operator*, operator/, operator- and maybe even implementations of pow(), log2(), log10() and so on.
Unless what you were planning to do is literally save a 4-byte float somewhere for later use, it would almost certainly be faster if you had the next operand you need to work with then to really create a float from just x and y, already losing precision and wasting time.
Try this
#include <iostream>
#include <math.h>
using namespace std;
float int2Float(int integer,int decimal)
{
float sign = integer/abs(integer);
float tm = abs(integer), tm2 = abs(decimal);
int base = decimal == 0 ? -1 : log10(decimal);
tm2/=pow(10,base+1);
return (tm+tm2)*sign;
}
int main()
{
int x,y;
cin >>x >>y;
cout << int2Float(x,y);
return 0;
}
version 2, try this out
#include <iostream>
#include <cmath>
using namespace std;
float getPlaces(int x)
{
unsigned char p=0;
while(x!=0)
{
x/=10;
p++;
}
float pow10[] = {1.0f,10.0f,100.0f,1000.0f,10000.0f,100000.0f};//don't need more
return pow10[p];
}
float int2Float(int x,int y)
{
if(y == 0) return x;
float sign = x != 0 ? x/abs(x) : 1;
float tm = abs(x), tm2 = abs(y);
tm2/=getPlaces(y);
return (tm+tm2)*sign;
}
int main()
{
int x,y;
cin >>x >>y;
cout << int2Float(x,y);
return 0;
}
If you want something that is simple to read and follow, you could try something like this:
float convertToDecimal(int x)
{
float y = (float) x;
while( y > 1 ){
y = y / 10;
}
return y;
}
float convertToDecimal(int x, int y)
{
return (float) x + convertToDecimal(y);
}
This simply reduces one integer to the first floating point less than 1 and adds it to the other one.
This does become a problem if you ever want to use a number like 1.0012 to be represented as 2 integers. But that isn't part of the question. To solve it, I would use a third integer representation to be the negative power of 10 for multiplying the second number. IE 1.0012 would be 1, 12, 4. This would then be coded as follows:
float convertToDecimal(int num, int e)
{
return ((float) num) / pow(10, e);
}
float convertToDecimal(int x, int y, int e)
{
return = (float) x + convertToDecimal(y, e);
}
It a little more concise with this answer, but it doesn't help to answer your question. It might help show a problem with using only 2 integers if you stick with that data model.

Save a float into an integer without losing floating point precision

I want to save the value of a float variable named f in the third element of an array named i in a way that the floating point part isn't wiped (i.e. I don't want to save 1 instead of 1.5). After that, complete the last line in a way that we see 1.5 in the output (don't use cout<<1.5; or cout<<f; or some similar tricks!)
float f=1.5;
int i[3];
i[2] = ... ;
cout<<... ;
Does anybody have any idea?
Use type-punning with union if they have the same size under a compilation environment:
static_assert(sizeof(int) == sizeof(float));
int castFloatToInt(float f) {
union { float f; int i; } u;
u.f = f;
return u.i;
}
float castIntToFloat(int i) {
union { float f; int i; } u;
u.i = i;
return u.f;
}
// ...
float f=1.5;
int i[3];
i[2] = castFloatToInt(f);
cout << castIntToFloat(i);
Using union is the way to prevent aliasing problem, otherwise compiler may generate incorrect results due to optimization.
This is a common technique for manipulating bits of float directly. Although normally uint32_t will be used instead.
Generally speaking, you cannot store a float in an int without loss of precision.
You could multiply your number with a factor, store it and after that divide again to get some decimal places out of it.
Note that this will not work for all numbers and you have to choose your factor carefully.
float f = 1.5f;
const float factor = 10.0f;
int i[3];
i[2] = static_cast<int>(f * factor);
std::cout << static_cast<float>(i[2]) / factor;
If we can assume that int is 32 bits then you can do it with type-punning:
float f = 1.5;
int i[3];
i[2] = *(int *)&f;
cout << *(float *)&i[2];
but this is getting into Undefined Behaviour territory (breaking aliasing rules), since it accesses a type via a pointer to a different (incompatible) type.
LIVE DEMO

c++ convert a fractional part of a number into integer

I needed to convert a fractional part of a number into integer without a comma,
for example I have 3.35 I want to get just 35 part without zero or a comma,
Because I used the modf() function to extract the the fractional part but it gives me a 0.35
if there is any way to do that or to filter the '0.' part I will be very grateful if you show me how with the smaller code possible,
A bit more efficient than converting to a string and back again:
int fractional_part_as_int(double number, int number_of_decimal_places) {
double dummy;
double frac = modf(number,&dummy);
return round(frac*pow(10,number_of_decimal_places));
}
#include <iostream>
#include <cmath>
double round(double r) {
return (r > 0.0) ? std::floor(r + 0.5) : std::ceil(r - 0.5);
}
double floor_to_zero(double f) {
return (f > 0.0) ? std::floor(f) : std::ceil(f);
}
double sign(double s) {
return (s < 0.0) ? -1.0 : 1.0;
}
int frac(double f, int prec) {
return round((f - floor_to_zero(f)) * prec) * sign(f);
}
int main() {
double a = 1.2345;
double b = -34.567;
std::cout << frac(a, 100) << " " << frac(b, 100) << std::endl; // 23 57
}
another solution
int precision= 100;
double number = 3.35;
int f = floor(xx);
double temp = ( f - number ) * -1;
int fractional_part = temp * precision;
IF you need it as a string, a quite easy C style solution would be (should work for variable number of decimal places):
double yourNumber = 0.35f;
char buffer[32];
snprintf(buffer, 32, "%g", yourNumber);
strtok(buffer, "."); // Here we would get the part before . , should still check
char* fraction = strtok(NULL, ".");
int fractionAsInt = atoi(fraction);
This example lacks error handling in case of a bad string and is not feasible if you just need a fixed number of decimal places, since the arithmetic approaches work better there.
Something like this should work:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
static int get_frac(double value, unsigned short precision)
{
return (int)((value - (long)value) * pow(10, precision));
}
static int get_frac_no_trailing_zeros(double value, unsigned short precision)
{
int v = get_frac(value, precision);
while (v % 10 == 0)
v /= 10;
return v;
}
int main(int argc, char *argv[])
{
double v;
v = 123.4564;
printf("%.4f = %d\n", v, get_frac(v, 2));
printf("%.4f = %d\n", v, get_frac(v, 4));
printf("%.4f = %d\n", v, get_frac(v, 6));
printf("%.4f = %d\n", v, get_frac_no_trailing_zeros(v, 6));
return EXIT_SUCCESS;
}
You may also want to either avoid calling pow by having a user supply a number in a power of 10 in a first place, or use a lookup table.
Using some stl magic, here is the sample code:
typedef std::pair<int, int> SplitFloat;
SplitFloat Split(float value, int precision)
{
// Get integer part.
float left = std::floor(value);
// Get decimal part.
float right = (value - left) * float(std::pow(10, precision));
return SplitFloat(left, right);
}
It can be improved, but is pretty straightforward.
I just did something close to what you are trying to do, though I'm still pretty new. None the less, maybe this will help someone in the future as I landed here looking for results for my problem.
The first step is making sure that the variable that contains 3.35 is a double, but that's probably obvious.
Next, create a variable that is only an integer and set it's value equal to the value of the double. It will then only contain the whole number.
Then subtract the whole number (int) from the double. You will be left with the fraction/decimal value. From there, just multiply by 100.
Beyond the 100ths decimal value, you would have to do a little more configuring obviously, but it should be fairly simple to do with an if statement. If the decimal value is greater than .99, multiply 1000 instead etc..
Here's how I would do it.
#include <sstream>
#include <string>
int main()
{
double d = yourDesiredNumber; //this is your number
std::ostringstream out;
out << setprecision(yourDesiredPrecision) << std::fixed
<< std::showpoint << d;
std::istringstream in(out.str());
std::string wholePart; //you won't need this.
int fractionalPart;
std::getline(in, wholePart, '.');
in >> fractionalPart;
//now fractionalPart contains your desired value.
}
I'm pretty sure that instead of two different istringstream and ostringstream objects you could have gotten away with just one stringstream object, but I am not sure about the details (never used that class) so I didn't use it in the example.

C++ floating point to integer type conversions

What are the different techniques used to convert float type of data to integer in C++?
#include <iostream>
using namespace std;
struct database {
int id, age;
float salary;
};
int main() {
struct database employee;
employee.id = 1;
employee.age = 23;
employee.salary = 45678.90;
/*
How can i print this value as an integer
(with out changing the salary data type in the declaration part) ?
*/
cout << endl << employee.id << endl << employee.
age << endl << employee.salary << endl;
return 0;
}
What you are looking for is 'type casting'. typecasting (putting the type you know you want in brackets) tells the compiler you know what you are doing and are cool with it. The old way that is inherited from C is as follows.
float var_a = 9.99;
int var_b = (int)var_a;
If you had only tried to write
int var_b = var_a;
You would have got a warning that you can't implicitly (automatically) convert a float to an int, as you lose the decimal.
This is referred to as the old way as C++ offers a superior alternative, 'static cast'; this provides a much safer way of converting from one type to another. The equivalent method would be (and the way you should do it)
float var_x = 9.99;
int var_y = static_cast<int>(var_x);
This method may look a bit more long winded, but it provides much better handling for situations such as accidentally requesting a 'static cast' on a type that cannot be converted. For more information on the why you should be using static cast, see this question.
Normal way is to:
float f = 3.4;
int n = static_cast<int>(f);
Size of some float types may exceed the size of int.
This example shows a safe conversion of any float type to int using the int safeFloatToInt(const FloatType &num); function:
#include <iostream>
#include <limits>
using namespace std;
template <class FloatType>
int safeFloatToInt(const FloatType &num) {
//check if float fits into integer
if ( numeric_limits<int>::digits < numeric_limits<FloatType>::digits) {
// check if float is smaller than max int
if( (num < static_cast<FloatType>( numeric_limits<int>::max())) &&
(num > static_cast<FloatType>( numeric_limits<int>::min())) ) {
return static_cast<int>(num); //safe to cast
} else {
cerr << "Unsafe conversion of value:" << num << endl;
//NaN is not defined for int return the largest int value
return numeric_limits<int>::max();
}
} else {
//It is safe to cast
return static_cast<int>(num);
}
}
int main(){
double a=2251799813685240.0;
float b=43.0;
double c=23333.0;
//unsafe cast
cout << safeFloatToInt(a) << endl;
cout << safeFloatToInt(b) << endl;
cout << safeFloatToInt(c) << endl;
return 0;
}
Result:
Unsafe conversion of value:2.2518e+15
2147483647
43
23333
For most cases (long for floats, long long for double and long double):
long a{ std::lround(1.5f) }; //2l
long long b{ std::llround(std::floor(1.5)) }; //1ll
Check out the boost NumericConversion library. It will allow to explicitly control how you want to deal with issues like overflow handling and truncation.
I believe you can do this using a cast:
float f_val = 3.6f;
int i_val = (int) f_val;
the easiest technique is to just assign float to int, for example:
int i;
float f;
f = 34.0098;
i = f;
this will truncate everything behind floating point or you can round your float number before.
One thing I want to add. Sometimes, there can be precision loss. You may want to add some epsilon value first before converting. Not sure why that works... but it work.
int someint = (somedouble+epsilon);
This is one way to convert IEEE 754 float to 32-bit integer if you can't use floating point operations. It has also a scaler functionality to include more digits to the result. Useful values for scaler are 1, 10 and 100.
#define EXPONENT_LENGTH 8
#define MANTISSA_LENGTH 23
// to convert float to int without floating point operations
int ownFloatToInt(int floatBits, int scaler) {
int sign = (floatBits >> (EXPONENT_LENGTH + MANTISSA_LENGTH)) & 1;
int exponent = (floatBits >> MANTISSA_LENGTH) & ((1 << EXPONENT_LENGTH) - 1);
int mantissa = (floatBits & ((1 << MANTISSA_LENGTH) - 1)) | (1 << MANTISSA_LENGTH);
int result = mantissa * scaler; // possible overflow
exponent -= ((1 << (EXPONENT_LENGTH - 1)) - 1); // exponent bias
exponent -= MANTISSA_LENGTH; // modify exponent for shifting the mantissa
if (exponent <= -(int)sizeof(result) * 8) {
return 0; // underflow
}
if (exponent > 0) {
result <<= exponent; // possible overflow
} else {
result >>= -exponent;
}
if (sign) result = -result; // handle sign
return result;
}

How to perform a bitwise operation on floating point numbers

I tried this:
float a = 1.4123;
a = a & (1 << 3);
I get a compiler error saying that the operand of & cannot be of type float.
When I do:
float a = 1.4123;
a = (int)a & (1 << 3);
I get the program running. The only thing is that the bitwise operation is done on the integer representation of the number obtained after rounding off.
The following is also not allowed.
float a = 1.4123;
a = (void*)a & (1 << 3);
I don't understand why int can be cast to void* but not float.
I am doing this to solve the problem described in Stack Overflow question How to solve linear equations using a genetic algorithm?.
At the language level, there's no such thing as "bitwise operation on floating-point numbers". Bitwise operations in C/C++ work on value-representation of a number. And the value-representation of floating point numbers is not defined in C/C++ (unsigned integers are an exception in this regard, as their shift is defined as-if they are stored in 2's complement). Floating point numbers don't have bits at the level of value-representation, which is why you can't apply bitwise operations to them.
All you can do is analyze the bit content of the raw memory occupied by the floating-point number. For that you need to either use a union as suggested below or (equivalently, and only in C++) reinterpret the floating-point object as an array of unsigned char objects, as in
float f = 5;
unsigned char *c = reinterpret_cast<unsigned char *>(&f);
// inspect memory from c[0] to c[sizeof f - 1]
And please, don't try to reinterpret a float object as an int object, as other answers suggest. That doesn't make much sense, and is not guaranteed to work in compilers that follow strict-aliasing rules in optimization. The correct way to inspect memory content in C++ is by reinterpreting it as an array of [signed/unsigned] char.
Also note that you technically aren't guaranteed that floating-point representation on your system is IEEE754 (although in practice it is unless you explicitly allow it not to be, and then only with respect to -0.0, ±infinity and NaN).
If you are trying to change the bits in the floating-point representation, you could do something like this:
union fp_bit_twiddler {
float f;
int i;
} q;
q.f = a;
q.i &= (1 << 3);
a = q.f;
As AndreyT notes, accessing a union like this invokes undefined behavior, and the compiler could grow arms and strangle you. Do what he suggests instead.
You can work around the strict-aliasing rule and perform bitwise operations on a float type-punned as an uint32_t (if your implementation defines it, which most do) without undefined behavior by using memcpy():
float a = 1.4123f;
uint32_t b;
std::memcpy(&b, &a, 4);
// perform bitwise operation
b &= 1u << 3;
std::memcpy(&a, &b, 4);
float a = 1.4123;
unsigned int* inta = reinterpret_cast<unsigned int*>(&a);
*inta = *inta & (1 << 3);
Have a look at the following. Inspired by fast inverse square root:
#include <iostream>
using namespace std;
int main()
{
float x, td = 2.0;
int ti = *(int*) &td;
cout << "Cast int: " << ti << endl;
ti = ti>>4;
x = *(float*) &ti;
cout << "Recast float: " << x << endl;
return 0;
}
FWIW, there is a real use case for bit-wise operations on floating point (I just ran into it recently) - shaders written for OpenGL implementations that only support older versions of GLSL (1.2 and earlier did not have support for bit-wise operators), and where there would be loss of precision if the floats were converted to ints.
The bit-wise operations can be implemented on floating point numbers using remainders (modulo) and inequality checks. For example:
float A = 0.625; //value to check; ie, 160/256
float mask = 0.25; //bit to check; ie, 1/4
bool result = (mod(A, 2.0 * mask) >= mask); //non-zero if bit 0.25 is on in A
The above assumes that A is between [0..1) and that there is only one "bit" in mask to check, but it could be generalized for more complex cases.
This idea is based on some of the info found in is-it-possible-to-implement-bitwise-operators-using-integer-arithmetic
If there is not even a built-in mod function, then that can also be implemented fairly easily. For example:
float mod(float num, float den)
{
return num - den * floor(num / den);
}
#mobrule:
Better:
#include <stdint.h>
...
union fp_bit_twiddler {
float f;
uint32_t u;
} q;
/* mutatis mutandis ... */
For these values int will likely be ok, but generally, you should use
unsigned ints for bit shifting to avoid the effects of arithmetic shifts. And
the uint32_t will work even on systems whose ints are not 32 bits.
The Python implementation in Floating point bitwise operations (Python recipe) of floating point bitwise operations works by representing numbers in binary that extends infinitely to the left as well as to the right from the fractional point. Because floating point numbers have a signed zero on most architectures it uses ones' complement for representing negative numbers (well, actually it just pretends to do so and uses a few tricks to achieve the appearance).
I'm sure it can be adapted to work in C++, but care must be taken so as to not let the right shifts overflow when equalizing the exponents.
Bitwise operators should NOT be used on floats, as floats are hardware specific, regardless of similarity on what ever hardware you might have. Which project/job do you want to risk on "well it worked on my machine"? Instead, for C++, you can get a similar "feel" for the bit shift operators by overloading the stream operator on an "object" wrapper for a float:
// Simple object wrapper for float type as templates want classes.
class Float
{
float m_f;
public:
Float( const float & f )
: m_f( f )
{
}
operator float() const
{
return m_f;
}
};
float operator>>( const Float & left, int right )
{
float temp = left;
for( right; right > 0; --right )
{
temp /= 2.0f;
}
return temp;
}
float operator<<( const Float & left, int right )
{
float temp = left;
for( right; right > 0; --right )
{
temp *= 2.0f;
}
return temp;
}
int main( int argc, char ** argv )
{
int a1 = 40 >> 2;
int a2 = 40 << 2;
int a3 = 13 >> 2;
int a4 = 256 >> 2;
int a5 = 255 >> 2;
float f1 = Float( 40.0f ) >> 2;
float f2 = Float( 40.0f ) << 2;
float f3 = Float( 13.0f ) >> 2;
float f4 = Float( 256.0f ) >> 2;
float f5 = Float( 255.0f ) >> 2;
}
You will have a remainder, which you can throw away based on your desired implementation.