C++ Integer overflow problem when casting from double to unsigned int

C++ Integer overflow problem when casting from double to unsigned int - c++

I need to convert time from one format to another in C++ and it must be cross-platform compatible. I have created a structure as my time container. The structure fields must also be unsigned int as specified by legacy code.
struct time{
unsigned int timeInteger;
unsigned int timeFraction;
} time1, time2;
Mathematically the conversion is as follows:
time2.timeInteger = time1.timeInteger + 2208988800
time2.timeFraction = (time1.timeFraction * 20e-6) * 2e32
Here is my original code in C++ however when I attempt to write to a binary file, the converted time does not match with the truth data. I think this problem is due to a type casting mistake? This code will compile in VS2008 and will execute.
void convertTime(){
time2.timeInteger = unsigned int(time1.timeInteger + 2209032000);
time2.timeFraction = unsigned int(double(time1.timeFraction) * double(20e-6)*double(pow(double(2),32)));
}

Just a guess, but are you assuming that 2e32 == 2^32? This assumption would make sense if you're trying to scale the result into a 32 bit integer. In fact 2e32 == 2 * 10^32

Slightly unrelated, I think you should rethink your type design. You are basically talking about two different types here. They happen to store the same data, albeit in different results.
To minimize errors in their usage, you should define them as two completely distinct types that have a well-defined conversion between them.
Consider for example:
struct old_time {
unsigned int timeInteger;
unsigned int timeFraction;
};
struct new_time {
public:
new_time(unsigned int ti, unsigned int tf) :
timeInteger(ti), timeFraction(tf) { }
new_time(new_time const& other) :
timeInteger(other.timeInteger),
timeFraction(other.timeFraction) { }
new_time(old_time const& other) :
timeInteger(other.timeInteger + 2209032000U),
timeFraction(other.timeFraction * conversion_factor) { }
operator old_time() const {
old_time other;
other.timeInteger = timeInteger - 2209032000U;
other.timeFraction = timeFraction / conversion_factor;
return other;
}
private:
unsigned int timeInteger;
unsigned int timeFraction;
};
(EDIT: of course this code doesn’t work for the reasons pointed out below.
Now this code can be used frictionless in a safe way:
time_old told; /* initialize … */
time_new tnew = told; // converts old to new format
time_old back = tnew; // … and back.

The problem is that (20 ^ -6) * (2 e32) is far bigger than UINT_MAX. Maybe you meant 2 to the power of 32, or UINT_MAX, rather than 2e32.
In addition, your first line with the integer, the initial value must be less than (2^32 - 2209032000), and depending on what this is measured in, it could wrap round too. In my opinion, set the first value to be a long long (normally 64bits) and change 2e32.
If you can't change the type, then it may become necessary to store the field as it's result in a double, say, and then cast to unsigned int before use.

Related

How to manage endianess of double from network

I have a BIG problem with the answer to this question Swap bits in c++ for a double
Yet, this question is more or less what I search for:
I receive a double from the network and I want to encoded it properly in my machine.
In the case I receive an int I perform this code using ntohl :
int * piData = reinterpret_cast<int*>((void*)pData);
//manage endianness of incomming network data
unsigned long ulValue = ntohl(*piData);
int iValue = static_cast<int>(ulValue);
But in the case I receive an double, I don't know what to do.
The answer to the question suggest to do:
template <typename T>
void swap_endian(T& pX)
{
char& raw = reinterpret_cast<char&>(pX);
std::reverse(&raw, &raw + sizeof(T));
}
However , if I quote this site:
The ntohl() function converts the unsigned integer netlong from network byte order to host byte order.
When the two byte orders are different, this means the endian-ness of the data will be changed. When the two byte orders are the same, the data will not be changed.
On the contrary #GManNickG's answer to the question always does the inversion with std::reverse .
Am I wrong considering that this answer is false ? ( in the extent of network management of endianess which the use of ntohl suggest though it was not precisely said in the title of the OP question).
In the end: Should I split my double into two parts of 4 bytes and apply the ntohl function on the two parts ? Are there more cannonical solutions ?
There's also this interesting question in C, host to network double?, but it limits to 32 bits values. And the answer says doubles should be converted to strings because of architecture differences... I'm also gonna work with audio samples, should I really consider converting all the samples to strings in my database ? ( the doubles come from a database that I query over the network)

If your doubles are in IEEE 754 format that you should be relatively OK. Now you have to divide their 64 bits into two 32-bit halves and then transmit them in big-endian order (which is network order);
How about:
void send_double(double d) {
long int i64 = *((reinterpret_cast<int *>)(&d)); /* Ugly, but works */
int hiword = htonl(static_cast<int>(i64 >> 32));
send(hiword);
int loword = htonl(static_cast<int>(i64));
send(loword);
}
double recv_double() {
int hiword = ntohl(recv_int());
int loword = ntohl(recv_int());
long int i64 = (((static_cast<long int>) hiword) << 32) | loword;
return *((reinterpret_cast<double *>(&i64));
}

Assuming you have a compile-time option to determine endianness:
#if BIG_ENDIAN
template <typename T>
void swap_endian(T& pX)
{
// Don't need to do anything here...
}
#else
template <typename T>
void swap_endian(T& pX)
{
char& raw = reinterpret_cast<char&>(pX);
std::reverse(&raw, &raw + sizeof(T));
}
#endif
Of course, the other option is to not send double across the network at all - considering that it's not guaranteed to be IEEE-754 compatible - there are machines out there using other floating point formats... Using for example a string would work much better...

I could not make John Källén code work on my machine. Moreover, it might be more useful to convert the double into bytes (8 bit, 1 char):
template<typename T>
string to_byte_string(const T& v)
{
char* begin_ = reinterpret_cast<char*>(v);
return string(begin_, begin_ + sizeof(T));
}
template<typename T>
T from_byte_string(std::string& s)
{
assert(s.size() == sizeof(T) && "Wrong Type Cast");
return *(reinterpret_cast<T*>(&s[0]));
}
This code will also works for structs which are using POD types.
If you really want the double as two ints
double d;
int* data = reinterpret_cast<int*>(&d);
int first = data[0];
int second = data[1];
Finally, long int will not always be a 64bit integer (I had to use long long int to make a 64bit int on my machine).

If you want to know system endianless
ONLY #if __cplusplus > 201703L
#include <bit>
#include <iostream>
using namespace std;
int main()
{
if constexpr (endian::native == endian::big)
cout << "big-endian";
else if constexpr (endian::native == endian::little)
cout << "little-endian";
else
cout << "mixed-endian";
}
For more info: https://en.cppreference.com/w/cpp/types/endian

Endianess of float numbers

I want to convert float numbers from little endian to big endian but am not able to do it .
I have succesfuly converted endianess of int numbers but can somebody help with float numbers please

#include <cstring> // for std::memcpy
#include <algorithm> // for std::reverse
#include <iterator> // For C++11 std::begin() and std::end()
// converting from float to bytes for writing out
float f = 10.0;
char c[sizeof f];
std::memcpy(c,&f,sizeof f);
std::reverse(std::begin(c),std::end(c)); // begin() and end() are C++11. For C++98 say std::reverse(c,c + sizeof f);
// ... write c to network, file, whatever ...
going the other direction:
char c[] = { 41, 36, 42, 59 };
static_assert(sizeof(float) == sizeof c,"");
std::reverse(std::begin(c),std::end(c));
float f;
std::memcpy(&f,c,sizeof f);
The representation of floating point values is implementation defined, so the values resulting from this could be different between different implementations. That is, 10.0 byte swapped could be 1.15705e-041, or something else, or it might not be a valid floating point number at all.
However any implementation which uses IEEE 754 (which most do, and which you can check by seeing if std::numeric_limits<float>.is_iec559 is true), should give you the same results. (std::numeric_limits is from #include <limits>.)
The above code converts a float to bytes, modifies the bytes, and then converts those bytes back to float. If you have some byte values that you want to read as a float then you could set the values of the char array to your bytes and then use memcpy() as shown above (by the line after std::reverse()) to put those bytes into f.
Often people will recommend using reinterpret_cast for this sort of thing but I think it's good to avoid casts. People often use them incorrectly and get undefined behavior without realizing it. In this case reinterpret_cast can be used legally, but I still think it's better to avoid it.
Although it does reduce 4 lines to 1...
std::reverse(reinterpret_cast<char*>(&f),reinterpret_cast<char*>(&f) + sizeof f);
And here's an example of why you shouldn't use reinterpret_cast. The following will probably work but may result in undefined behavior. Since it works you probably wouldn't even notice you've done anything wrong, which is one of the least desirable outcomes possible.
char c[] = { 41, 36, 42, 59 };
static_assert(sizeof(float) == sizeof c,"");
float f = *reinterpret_cast<float*>(&c[0]);

The correct way to do such things is to use a union.
union float_int {
float m_float;
int32_t m_int;
};
That way you can convert your float in an integer and since you already know how to convert your integer endianess, you're all good.
For a double it goes like this:
union double_int {
double m_float;
int64_t m_int;
};
The int32_t and int64_t are usually available in stdint.h, boost offers such and Qt has its own set of definitions. Just make sure that the size of the integer is exactly equal to the size of the float. On some systems you also have long double defined:
union double_int {
long double m_float;
int128_t m_int;
};
If the int128_t doesn't work, you can use a struct as this:
union long_double_int {
long double m_float;
struct {
int32_t m_int_low;
int32_t m_int_hi;
};
};
Which could make you think that in all cases, instead of using an int, you could use bytes:
union float_int {
float m_float;
unsigned char m_bytes[4];
};
And that's when you discover that you don't need all the usual shifts used when doing such a conversion... because you can also declare:
union char_int {
int m_int;
unsigned char m_bytes[4];
};
Now your code looks very simple:
float_int fi;
char_int ci;
fi.m_float = my_float;
ci.m_bytes[0] = fi.m_bytes[3];
ci.m_bytes[1] = fi.m_bytes[2];
ci.m_bytes[2] = fi.m_bytes[1];
ci.m_bytes[3] = fi.m_bytes[0];
// now ci.m_int is the float in the other endian
fwrite(&ci, 1, 4, f);
[...snip...]
fread(&ci, 1, 4, f);
// here ci.m_int is the float in the other endian, so restore:
fi.m_bytes[0] = ci.m_bytes[3];
fi.m_bytes[1] = ci.m_bytes[2];
fi.m_bytes[2] = ci.m_bytes[1];
fi.m_bytes[3] = ci.m_bytes[0];
my_float = fi.m_float;
// now my_float was restored from the file
Obviously the endianess is swapped in this example. You probably also need to know whether you indeed need to do such a swap if your program is to be compiled on both LITTLE_ENDIAN and BIG_ENDIAN computers (check against BYTE_ENDIAN.)

Packing 32bit floats into 30 bits (c++)

Here are the goals I'm trying to achieve:
I need to pack 32 bit IEEE floats into 30 bits.
I want to do this by decreasing the size of mantissa by 2 bits.
The operation itself should be as fast as possible.
I'm aware that some precision will be lost, and this is acceptable.
It would be an advantage, if this operation would not ruin special cases like SNaN, QNaN, infinities, etc. But I'm ready to sacrifice this over speed.
I guess this questions consists of two parts:
1) Can I just simply clear the least significant bits of mantissa? I've tried this, and so far it works, but maybe I'm asking for trouble... Something like:
float f;
int packed = (*(int*)&f) & ~3;
// later
f = *(float*)&packed;
2) If there are cases where 1) will fail, then what would be the fastest way to achieve this?
Thanks in advance

You actually violate the strict aliasing rules (section 3.10 of the C++ standard) with these reinterpret casts. This will probably blow up in your face when you turn on the compiler optimizations.
C++ standard, section 3.10 paragraph 15 says:
If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
Specifically, 3.10/15 doesn't allow us to access a float object via an lvalue of type unsigned int. I actually got bitten myself by this. The program I wrote stopped working after turning on optimizations. Apparently, GCC didn't expect an lvalue of type float to alias an lvalue of type int which is a fair assumption by 3.10/15. The instructions got shuffled around by the optimizer under the as-if rule exploiting 3.10/15 and it stopped working.
Under the following assumptions
float really corresponds to a 32bit IEEE-float,
sizeof(float)==sizeof(int)
unsigned int has no padding bits or trap representations
you should be able to do it like this:
/// returns a 30 bit number
unsigned int pack_float(float x) {
unsigned r;
std::memcpy(&r,&x,sizeof r);
return r >> 2;
}
float unpack_float(unsigned int x) {
x <<= 2;
float r;
std::memcpy(&r,&x,sizeof r);
return r;
}
This doesn't suffer from the "3.10-violation" and is typically very fast. At least GCC treats memcpy as an intrinsic function. In case you don't need the functions to work with NaNs, infinities or numbers with extremely high magnitude you can even improve accuracy by replacing "r >> 2" with "(r+1) >> 2":
unsigned int pack_float(float x) {
unsigned r;
std::memcpy(&r,&x,sizeof r);
return (r+1) >> 2;
}
This works even if it changes the exponent due to a mantissa overflow because the IEEE-754 coding maps consecutive floating point values to consecutive integers (ignoring +/- zero). This mapping actually approximates a logarithm quite well.

Blindly dropping the 2 LSBs of the float may fail for small number of unusual NaN encodings.
A NaN is encoded as exponent=255, mantissa!=0, but IEEE-754 doesn't say anything about which mantiassa values should be used. If the mantissa value is <= 3, you could turn a NaN into an infinity!

You should encapsulate it in a struct, so that you don't accidentally mix the usage of the tagged float with regular "unsigned int":
#include <iostream>
using namespace std;
struct TypedFloat {
private:
union {
unsigned int raw : 32;
struct {
unsigned int num : 30;
unsigned int type : 2;
};
};
public:
TypedFloat(unsigned int type=0) : num(0), type(type) {}
operator float() const {
unsigned int tmp = num << 2;
return reinterpret_cast<float&>(tmp);
}
void operator=(float newnum) {
num = reinterpret_cast<int&>(newnum) >> 2;
}
unsigned int getType() const {
return type;
}
void setType(unsigned int type) {
this->type = type;
}
};
int main() {
const unsigned int TYPE_A = 1;
TypedFloat a(TYPE_A);
a = 3.4;
cout << a + 5.4 << endl;
float b = a;
cout << a << endl;
cout << b << endl;
cout << a.getType() << endl;
return 0;
}
I can't guarantee its portability though.

How much precision do you need? If 16-bit float is enough (sufficient for some types of graphics), then ILM's 16-bit float ("half"), part of OpenEXR is great, obeys all kinds of rules (http://www.openexr.com/), and you'll have plenty of space left over after you pack it into a struct.
On the other hand, if you know the approximate range of values they're going to take, you should consider fixed point. They're more useful than most people realize.

I can't select any of the answers as the definite one, because most of them have valid information, but not quite what I was looking for. So I'll just summarize my conclusions.
The method for conversion I've posted in my question's part 1) is clearly wrong by C++ standard, so other methods to extract float's bits should be used.
And most important... as far as I understand from reading the responses and other sources about IEEE754 floats, it's ok to drop the least significant bits from mantissa. It will mostly affect only precision, with one exception: sNaN. Since sNaN is represented by exponent set to 255, and mantissa != 0, there can be situation where mantissa would be <= 3, and dropping last two bits would convert sNaN to +/-Infinity. But since sNaN are not generated during floating point operations on CPU, its safe under controlled environment.

How to use an int as an array of ints/bools?

I noticed while making a program that a lot of my int type variables never went above ten. I figure that because an int is 2 bytes at the shortest (1 if you count char), so I should be able to store 4 unsigned ints with a max value of 15 in a short int, and I know I can access each one individually using >> and <<:
short unsigned int SLWD = 11434;
S is (SLWD >> 12), L is ((SLWD << 4) >> 12),
W is ((SLWD << 8) >> 12), and D is ((SLWD << 8) >> 12)
However, I have no idea how to encompase this in a function of class, since any type of GetVal() function would have to be of type int, which defeats the purpose of isolating the bits in the first place.

First, remember the Rules of Optimization. But this is possible in C or C++ using bitfields:
struct mystruct {
unsigned int smallint1 : 3; /* 3 bits wide, values 0 -- 7 */
signed int smallint2 : 4; /* 4 bits wide, values -8 -- 7 */
unsigned int boolean : 1; /* 1 bit wide, values 0 -- 1 */
};
It's worth noting that while you gain by not requiring so much storage, you lose because it becomes more costly to access everything, since each read or write now has a bunch of bit twiddling mechanics associated with it. Given that storage is cheap, it's probably not worth it.
Edit: You can also use vector<bool> to store 1-bit bools; but beware of it because it doesn't act like a normal vector! In particular, it doesn't provide iterators. It's sufficiently different that it's fair to say a vector<bool> is not actually a vector. Scott Meyers wrote very clearly on this topic in 'Effective STL'.

In C, and for the sole purpose of saving space, you can reinterpret the unsigned short as a structure with bitfields (or use such structure without messing with reinterpretations):
#include <stdio.h>
typedef struct bf_
{
unsigned x : 4;
unsigned y : 4;
unsigned z : 4;
unsigned w : 4;
} bf;
int main(void)
{
unsigned short i = 5;
bf *bitfields = (bf *) &i;
bitfields->w = 12;
printf("%d\n", bitfields->x);
// etc..
return 0;
}

That's a very common technique. You usually allocate an array of the larger primitive type (e.g., ints or longs), and have some abstraction to deal with the mapping. If you're using an OO language, it's usually a good idea to actually define some sort of BitArray or SmartArray or something like that, and impement a getVal() that takes an index. The important thing is to make sure you hide the details of the internal representation (e.g., for when you move between platforms).
That being said, most mainstream languages already have this functionality available.
If you just want bits, WikiPedia has a good list.
If you want more than bits, you can still find something, or implement it yourself with a similar interface. Take a look at the Java BitSet for reference

Converting floating point to fixed point

In C++, what's the generic way to convert any floating point value (float) to fixed point (int, 16:16 or 24:8)?
EDIT: For clarification, fixed-point values have two parts to them: an integer part and a fractional part. The integer part can be represented by a signed or unsigned integer data type. The fractional part is represented by an unsigned data integer data type.
Let's make an analogy with money for the sake of clarity. The fractional part may represent cents -- a fractional part of a dollar. The range of the 'cents' data type would be 0 to 99. If a 8-bit unsigned integer were to be used for fixed-point math, then the fractional part would be split into 256 evenly divisible parts.
I hope that clears things up.

Here you go:
// A signed fixed-point 16:16 class
class FixedPoint_16_16
{
short intPart;
unsigned short fracPart;
public:
FixedPoint_16_16(double d)
{
*this = d; // calls operator=
}
FixedPoint_16_16& operator=(double d)
{
intPart = static_cast<short>(d);
fracPart = static_cast<unsigned short>
(numeric_limits<unsigned short> + 1.0)*d);
return *this;
}
// Other operators can be defined here
};
EDIT: Here's a more general class based on anothercommon way to deal with fixed-point numbers (and which KPexEA pointed out):
template <class BaseType, size_t FracDigits>
class fixed_point
{
const static BaseType factor = 1 << FracDigits;
BaseType data;
public:
fixed_point(double d)
{
*this = d; // calls operator=
}
fixed_point& operator=(double d)
{
data = static_cast<BaseType>(d*factor);
return *this;
}
BaseType raw_data() const
{
return data;
}
// Other operators can be defined here
};
fixed_point<int, 8> fp1; // Will be signed 24:8 (if int is 32-bits)
fixed_point<unsigned int, 16> fp1; // Will be unsigned 16:16 (if int is 32-bits)

A cast from float to integer will throw away the fractional portion so if you want to keep that fraction around as fixed point then you just multiply the float before casting it. The below code will not check for overflow mind you.
If you want 16:16
double f = 1.2345;
int n;
n=(int)(f*65536);
if you want 24:8
double f = 1.2345;
int n;
n=(int)(f*256);

**** Edit** : My first comment applies to before Kevin's edit,but I'll leave it here for posterity. Answers change so quickly here sometimes!
The problem with Kevin's approach is that with Fixed Point you are normally packing into a guaranteed word size (typically 32bits). Declaring the two parts separately leaves you to the whim of your compiler's structure packing. Yes you could force it, but it does not work for anything other than 16:16 representation.
KPexEA is closer to the mark by packing everything into int - although I would use "signed long" to try and be explicit on 32bits. Then you can use his approach for generating the fixed point value, and bit slicing do extract the component parts again. His suggestion also covers the 24:8 case.
( And everyone else who suggested just static_cast.....what were you thinking? ;) )

I gave the answer to the guy that wrote the best answer, but I really used a related questions code that points here.
It used templates and was easy to ditch dependencies on the boost lib.

This is fine for converting from floating point to integer, but the O.P. also wanted fixed point.
Now how you'd do that in C++, I don't know (C++ not being something I can think in readily). Perhaps try a scaled-integer approach, i.e. use a 32 or 64 bit integer and programmatically allocate the last, say, 6 digits to what's on the right hand side of the decimal point.

There isn't any built in support in C++ for fixed point numbers. Your best bet would be to write a wrapper 'FixedInt' class that takes doubles and converts them.
As for a generic method to convert... the int part is easy enough, just grab the integer part of the value and store it in the upper bits... decimal part would be something along the lines of:
for (int i = 1; i <= precision; i++)
{
if (decimal_part > 1.f/(float)(i + 1)
{
decimal_part -= 1.f/(float)(i + 1);
fixint_value |= (1 << precision - i);
}
}
although this is likely to contain bugs still

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ Integer overflow problem when casting from double to unsigned int - c++

Just a guess, but are you assuming that 2e32 == 2^32? This assumption would make sense if you're trying to scale the result into a 32 bit integer. In fact 2e32 == 2 * 10^32

Related

How to manage endianess of double from network

Endianess of float numbers

Packing 32bit floats into 30 bits (c++)

How to use an int as an array of ints/bools?

Converting floating point to fixed point

Categories

Resources