Kinds of integer overflow on subtraction - c++

I'm making an attempt to learn C++ over again, using Sams Teach Yourself C++ in 21 Days (6th ed.). I'm trying to work through it very thoroughly, making sure I understand each chapter (although I'm acquainted with C-syntax languages already).
Near the start of chapter 5 (Listing 5.2), a point is made about unsigned integer overflow. Based on their example I wrote this:
#include <iostream>
int main () {
unsigned int bignum = 100;
unsigned int smallnum = 50;
unsigned int udiff;
int diff;
udiff = bignum - smallnum;
std::cout << "Difference (1) is " << udiff << "\n";
udiff = smallnum - bignum;
std::cout << "Difference (2) is " << udiff << "\n";
diff = bignum - smallnum;
std::cout << "Difference (3) is " << diff << "\n";
diff = smallnum - bignum;
std::cout << "Difference (4) is " << diff << "\n";
return 0;
}
This gives the following output, which is not surprising to me:
Difference (1) is 50
Difference (2) is 4294967246
Difference (3) is 50
Difference (4) is -50
If I change the program so that the line declaring bignum reads instead unsigned int bignum = 3000000000; then the output is instead
Difference (1) is 2999999950
Difference (2) is 1294967346
Difference (3) is -1294967346
Difference (4) is 1294967346
The first of these is obviously fine. The number 1294967346 is explained by the fact that 1294967346 is precisely 2^32 - 3000000000. I don't understand why the second line doesn't read 1294967396, owing to the 50 contributed by smallnum.
The third and fourth lines I can't explain. How do these results come about?
Edit: For the third line - does it give this result just by finding the solution modulo 2^32 that fits in the range of values allowed for a signed int?

2^32 - 3000000000 = 1294967296 (!)

Related

Showing binary representation of floating point types in C++ [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Consider the following code for integral types:
template <class T>
std::string as_binary_string( T value ) {
return std::bitset<sizeof( T ) * 8>( value ).to_string();
}
int main() {
unsigned char a(2);
char b(4);
unsigned short c(2);
short d(4);
unsigned int e(2);
int f(4);
unsigned long long g(2);
long long h(4);
std::cout << "a = " << +a << " " << as_binary_string( a ) << std::endl;
std::cout << "b = " << +b << " " << as_binary_string( b ) << std::endl;
std::cout << "c = " << c << " " << as_binary_string( c ) << std::endl;
std::cout << "d = " << c << " " << as_binary_string( d ) << std::endl;
std::cout << "e = " << e << " " << as_binary_string( e ) << std::endl;
std::cout << "f = " << f << " " << as_binary_string( f ) << std::endl;
std::cout << "g = " << g << " " << as_binary_string( g ) << std::endl;
std::cout << "h = " << h << " " << as_binary_string( h ) << std::endl;
std::cout << "\nPress any key and enter to quit.\n";
char q;
std::cin >> q;
return 0;
}
Pretty straight forward, works well and is quite simple.
EDIT
How would one go about writing a function to extract the binary or bit pattern of arbitrary floating point types at compile time?
When it comes to floats I have not found anything similar in any existing libraries of my own knowledge. I've searched google for days looking for one, so then I resorted into trying to write my own function without any success. I no longer have the attempted code available since I've originally asked this question so I can not exactly show you all of the different attempts of implementations along with their compiler - build errors. I was interested in trying to generate the bit pattern for floats in a generic way during compile time and wanted to integrate that into my existing class that seamlessly does the same for any integral type. As for the floating types themselves, I have taken into consideration the different formats as well as architecture endian. For my general purposes the standard IEEE versions of the floating point types is all that I should need to be concerned with.
iBug had suggested for me to write my own function when I originally asked this question, while I was in the attempt of trying to do so. I understand binary numbers, memory sizes, and the mathematics, but when trying to put it all together with how floating point types are stored in memory with their different parts {sign bit, base & exp } is where I was having the most trouble.
Since then with the suggestions those who have given a great answer - example I was able to write a function that would fit nicely into my already existing class template and now it works for my intended purposes.
What about writing one by yourself?
static_assert(sizeof(float) == sizeof(uint32_t));
static_assert(sizeof(double) == sizeof(uint64_t));
std::string as_binary_string( float value ) {
std::uint32_t t;
std::memcpy(&t, &value, sizeof(value));
return std::bitset<sizeof(float) * 8>(t).to_string();
}
std::string as_binary_string( double value ) {
std::uint64_t t;
std::memcpy(&t, &value, sizeof(value));
return std::bitset<sizeof(double) * 8>(t).to_string();
}
You may need to change the helper variable t in case the sizes for the floating point numbers are different.
You can alternatively copy them bit-by-bit. This is slower but serves for arbitrarily any type.
template <typename T>
std::string as_binary_string( T value )
{
const std::size_t nbytes = sizeof(T), nbits = nbytes * CHAR_BIT;
std::bitset<nbits> b;
std::uint8_t buf[nbytes];
std::memcpy(buf, &value, nbytes);
for(int i = 0; i < nbytes; ++i)
{
std::uint8_t cur = buf[i];
int offset = i * CHAR_BIT;
for(int bit = 0; bit < CHAR_BIT; ++bit)
{
b[offset] = cur & 1;
++offset; // Move to next bit in b
cur >>= 1; // Move to next bit in array
}
}
return b.to_string();
}
You said it doesn't need to be standard. So, here is what works in clang on my computer:
#include <iostream>
#include <algorithm>
using namespace std;
int main()
{
char *result;
result=new char[33];
fill(result,result+32,'0');
float input;
cin >>input;
asm(
"mov %0,%%eax\n"
"mov %1,%%rbx\n"
".intel_syntax\n"
"mov rcx,20h\n"
"loop_begin:\n"
"shr eax\n"
"jnc loop_end\n"
"inc byte ptr [rbx+rcx-1]\n"
"loop_end:\n"
"loop loop_begin\n"
".att_syntax\n"
:
: "m" (input), "m" (result)
);
cout <<result <<endl;
delete[] result;
return 0;
}
This code makes a bunch of assumptions about the computer architecture and I am not sure on how many computers it would work.
EDIT:
My computer is a 64-bit Mac-Air. This program basically works by allocating a 33-byte string and filling the first 32 bytes with '0' (the 33rd byte will automatically be '\0').
Then it uses inline assembly to store the float into a 32-bit register and then it repeatedly shifts it to the right by one bit.
If the last bit in the register was 1 before the shift, it gets stored into the carry flag.
The assembly code then checks the carry flag and, if it contains 1, it increases the corresponding byte in the string by 1.
Since it was previously initialized to '0', it will turn to '1'.
So, effectively, when the loop in the assembly is finished, the binary representation of a float is stored into a string.
This code only works for x64 (it uses 64-bit registers "rbx" and "rcx" to store the pointer and the counter for the loop), but I think it's easy to tweak it to work on other processors.
An IEEE floating point number looks like the following
sign exponent mantissa
1 bit 11 bits 52 bits
Note that there's a hidden 1 before the mantissa, and the exponent
is biased so 1023 = 0, not two's complement.
By memcpy()ing to a 64 bit unsigned integer you can then apply AND and
OR masks to get the bit pattern. The arrangement could be big endian
or little endian.
You can easily work out which arrangement you have by passing easy numbers
such as 1 or 2.
Generally people either use std::hexfloat or cast a pointer to the floating-point value to a pointer to an unsigned integer of the same size and print the indirected value in hex format. Both methods facilitate bit-level analysis of floating-point in a productive fashion.
You could roll your by casting the address of the float/double to a char and iterating it that way:
#include <memory>
#include <iostream>
#include <limits>
#include <iomanip>
template <typename T>
std::string getBits(T t) {
std::string returnString{""};
char *base{reinterpret_cast<char *>(std::addressof(t))};
char *tail{base + sizeof(t) - 1};
do {
for (int bits = std::numeric_limits<unsigned char>::digits - 1; bits >= 0; bits--) {
returnString += ( ((*tail) & (1 << bits)) ? '1' : '0');
}
} while (--tail >= base);
return returnString;
}
int main() {
float f{10.0};
double d{100.0};
double nd{-100.0};
std::cout << std::setprecision(1);
std::cout << getBits(f) << std::endl;
std::cout << getBits(d) << std::endl;
std::cout << getBits(nd) << std::endl;
}
Output on my machine (note the sign flip in the third output):
01000001001000000000000000000000
0100000001011001000000000000000000000000000000000000000000000000
1100000001011001000000000000000000000000000000000000000000000000

float to int conversion going wrong (even though the float is already an int)

I was writing a little function to calculate the binomial coefficiant using the tgamma function provided by c++. tgamma returns float values, but I wanted to return an integer. Please take a look at this example program comparing three ways of converting the float back to an int:
#include <iostream>
#include <cmath>
int BinCoeffnear(int n,int k){
return std::nearbyint( std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1)) );
}
int BinCoeffcast(int n,int k){
return static_cast<int>( std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1)) );
}
int BinCoeff(int n,int k){
return (int) std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1));
}
int main()
{
int n = 7;
int k = 2;
std::cout << "Correct: " << std::tgamma(7+1) / (std::tgamma(2+1)*std::tgamma(7-2+1)); //returns 21
std::cout << " BinCoeff: " << BinCoeff(n,k); //returns 20
std::cout << " StaticCast: " << BinCoeffcast(n,k); //returns 20
std::cout << " nearby int: " << BinCoeffnear(n,k); //returns 21
return 0;
}
why is it, that even though the calculation returns a float equal to 21, 'normal' conversion fails and only nearbyint returns the correct value. What is the nicest way to implement this?
EDIT: according to c++ documentation here tgamma(int) returns a double.
From this std::tgamma reference:
If arg is a natural number, std::tgamma(arg) is the factorial of arg-1. Many implementations calculate the exact integer-domain factorial if the argument is a sufficiently small integer.
It seems that the compiler you're using is doing that, calculating the factorial of 7 for the expression std::tgamma(7+1).
The result might differ between compilers, and also between optimization levels. As demonstrated by Jonas there is a big difference between optimized and unoptimized builds.
The remark by #nos is on point. Note that the first line
std::cout << "Correct: " <<
std::tgamma(7+1) / (std::tgamma(2+1)*std::tgamma(7-2+1));
Prints a double value and does not perform a floating point to integer conversion.
The result of your calculation in floating point is indeed less than 21, yet this double precision value is printed by cout as 21.
On my machine (x86_64, gnu libc, g++ 4.8, optimization level 0) setting cout.precision(18) makes the results explicit.
Correct: 20.9999999999999964 BinCoeff: 20 StaticCast: 20 nearby int: 21
In this case practical to replace integer operations with floating point operations, but one has to keep in mind that the result must be integer. The intention is to use std::round.
The problem with std::nearbyint is that depending on the rounding mode it may produce different results.
std::fesetround(FE_DOWNWARD);
std::cout << " nearby int: " << BinCoeffnear(n,k);
would return 20.
So with std::round the BinCoeff function might look like
int BinCoeffRound(int n,int k){
return static_cast<int>(
std::round(
std::tgamma(n+1) /
(std::tgamma(k+1)*std::tgamma(n-k+1))
));
}
Floating-point numbers have rounding errors associated with them. Here is a good article on the subject: What Every Computer Scientist Should Know About Floating-Point Arithmetic.
In your case the floating-point number holds a value very close but less than 21. Rules for implicit floating–integral conversions say:
The fractional part is truncated, that is, the fractional part is
discarded.
Whereas std::nearbyint:
Rounds the floating-point argument arg to an integer value in floating-point format, using the current rounding mode.
In this case the floating-point number will be exactly 21 and the following implicit conversion would return 21.
The first cout outputs 21 because of rounding that happens in cout by default. See std::setprecition.
Here's a live example.
What is the nicest way to implement this?
Use the exact integer factorial function that takes and returns unsigned int instead of tgamma.
the problem is on handling the floats.
floats cant 2 as 2 but as 1.99999 something like that.
So converting to int will drop out the decimal part.
So instead of converting to int immediately first round it to by calling the ceil function w/c declared in cmath or math.h.
this code will return all 21
#include <iostream>
#include <cmath>
int BinCoeffnear(int n,int k){
return std::nearbyint( std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1)) );
}
int BinCoeffcast(int n,int k){
return static_cast<int>( ceil(std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1))) );
}
int BinCoeff(int n,int k){
return (int) ceil(std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1)));
}
int main()
{
int n = 7;
int k = 2;
std::cout << "Correct: " << (std::tgamma(7+1) / (std::tgamma(2+1)*std::tgamma(7-2+1))); //returns 21
std::cout << " BinCoeff: " << BinCoeff(n,k); //returns 20
std::cout << " StaticCast: " << BinCoeffcast(n,k); //returns 20
std::cout << " nearby int: " << BinCoeffnear(n,k); //returns 21
std::cout << "\n" << (int)(2.9995) << "\n";
}

C++ safeguards exceeding limits of integer

I am working on a chapter review of a book: at the end of the chapter there are some questions/tasks which you are to complete.
I decided to do them in the format of a program rather than a text file:
#include <iostream>
int main(int argc, char* argv[]) {
std::cout << "Chapter review\n"
<< "1. Why does C++ have more than one integer type?\n"
<< "\tTo be able to represent more accurate values & save memory by only allocating what is needed for the task at hand.\n"
<< "2. Declare variables matching the following descriptions:\n"
<< "a.\tA short integer with the value 80:\n";
short myVal1 = 80;
std::cout << "\t\t\"short myVal1 = 80;\": " << myVal1 << std::endl
<< "b.\tAn unsigned int integer with the value 42,110:\n";
unsigned int myVal2 = 42110;
std::cout << "\t\t\"unsigned int myVal2 = 42110;\": " << myVal2 << std::endl
<< "c.\tAn integer with the value 3,000,000,000:\n";
float myVal3 = 3E+9;
std::cout << "\t\t\"float myVal3 = 3E+9;\": " << static_cast<unsigned int>(myVal3) << std::endl
<< "3. What safeguards does C++ provide to keep you from exceeding the limits of an integer type?\n"
<< "\tWhen it reaches maximum number it starts from the begging again (lowest point).\n"
<< "4. What is the distinction between 33L and 33?\n"
<< "\t33L is of type long, 33 is of type int.\n"
<< "5. Consider the two C++ statements that follow:\n\tchar grade = 65;\n\tchar grade = 'A';\nAre they equivalent?\n"
<< "\tYes, the ASCII decimal number for 'A' is '65'.\n"
<< "6. How could you use C++ to find out which character the code 88 represents?\nCome up with at least two ways.\n"
<< "\t1: \"static_cast<char>(88);\": " << static_cast<char>(88) << std::endl; // 1.
char myChar = 88;
std::cout << "\t2: \"char myChar = 88;\": " << myChar << std::endl // 2.
<< "\t3: \"std::cout << (char) 88;\" " << (char) 88 << std::endl // 3.
<< "\t4: \"std::cout << char (88);\": " << char (88) << std::endl // 4.
<< "7. Assigning a long value to a float can result in a rounding error. What about assigning long to double? long long to double?\n"
<< "\tlong -> double: Rounding error.\n\tlong long -> double: Significantly incorrect number and/or rounding error.\n"
<< "8. Evaluate the following expressions as C++ would:\n"
<< "a.\t8 * 9 + 2\n"
<< "\t\tMultiplication (8 * 9 = 72) -> addition (72 + 2 = 74).\n"
<< "b.\t6 * 3 / 4\n"
<< "\t\tMultiplication (6 * 3 = 18 -> division (18 / 4 = 4).\n"
<< "c.\t3 / 4 * 6\n"
<< "\t\tDivision (3 / 4 = 0) -> multiplication (0 * 6 = 0).\n"
<< "d.\t6.0 * 3 / 4\n"
<< "\t\tMultiplication (6.0 * 3 -> 18.0) -> division (18.0 / 4 = 4.5).\n"
<< "e.\t 15 % 4\n"
<< "\t\tDivision (15 / 4 = 3.75) Then returns the reminder, basically how many times can 4 go into 15 in this case that is 3 (3*4 = 12).\n"
<< "9. Suppose x1 and x2 are two type of double variables that you want to add as integers and assign to an integer variable. Construct a C++ statement for doing so. What if you wanted to add them as type double and then convert to int?\n"
<< "\t1: \"int myInt = static_cast<double>(doubleVar);\"\n\t2: \"int myInt = int (doubleVar);\".\n"
<< "10. What is the variable type for each of the following declarations?\n"
<< "a.\t\"auto cars = 15;\"\n\t\tint\n"
<< "b.\t\"auto iou = 150.37f;\"\n\t\tfloat\n"
<< "c.\t\"auto level = 'B';\"\n\t\tchar\n"
<< "d.\t\"auto crat = U'/U00002155';\"\n\t\twchar_t ?\n"
<< "e.\t\"auto fract = 8.25f/.25;\"\n\t\tfloat" << std::endl;
return 0;
}
It's been a while since I read chapter 3 due to moving/some other real life stuff.
What I am unsure about here is basically question number 3: it says safeguards as in plural.
However I am only aware of one: that it starts from the beginning again after reaching maximum value? Am I missing something here?
Let me know if you see any other errors also - I am doing this to learn after all :).
Basically I can't accept a comment as an answer so to sum it up:
There are none safeguards, I misunderstood that question which #n.m. clarified for me.
10.e was wrong as pointed out by #Jarod42, which is correct.
Thanks!
As for me "Declare variable of integer with the value 3,000,000,000" is:
unsigned anInteger = 3000000000;
cause c++ the 11th supplies 15 integer types and unsigned int is the smallest that can store such a big integer as 3 000 000 000.
C++ classifies integer overflow as "Undefined Behavior" - anything can happen as a result of it. This by itself may be called a "safeguard" (though it's a stretch), by the following thinking:
gcc has that -ftrapv compilation switch that makes your program crash when integer overflow happens. This allows you to debug your overflows easily. This feature is possible because C++ made it legal (by nature of Undefined Behavior) to make your program crash in these circumstances. I think the C++ Committee had this exact scenario in mind when making that part of the C++ Standard.
This is different from e.g. Java, where integer overflow causes wraparound, and is probably harder to debug.

Horrible number comes from nowhere

Out of nowhere I get quite a big result for this function... It should be very simple, but I can't see it now.
double prob_calculator_t::pimpl_t::B_full_term() const
{
double result = 0.0;
for (uint32_t j=0, j_end=U; j<j_end; j++)
{
uint32_t inhabited_columns = doc->row_sums[j];
// DEBUG
cout << "inhabited_columns: " << inhabited_columns << endl;
cout << "log_of_sum[j]: " << log_of_sum[j] << endl;
cout << "sum_of_log[j]: " << sum_of_log[j] << endl;
// end DEBUG
result += ( -inhabited_columns * log( log_of_sum[j] ) + sum_of_log[ j ] );
cout << "result: " << result << endl;
}
return result;
}
and where is the trace:
inhabited_columns: 1
log_of_sum[j]: 110.56
sum_of_log[j]: -2.81341
result: 2.02102e+10
inhabited_columns: 42
log_of_sum[j]: 110.56
sum_of_log[j]: -143.064
result: 4.04204e+10
Thanks for the help!
inhabited_columns is unsigned and I see a unary - just before it: -inhabited_columns.
(Note that unary - has a really high operator precedence; higher than * etc).
That is where your problem is! To quote Mike Seymour's answer:
When you negate it, the result is still unsigned; the value is reduced
modulo 232 to give a large positive value.
One fix would be to write
-(inhabited_columns * log(log_of_sum[j]))
as then the negation will be carried out in floating point
inhabited_columns is an unsigned type. When you negate it, the result is still unsigned; the value is reduced modulo 232 to give a large positive value.
You should change it to a sufficiently large signed type (maybe int32_t, if you're not going to have more than a couple of billion columns), or perhaps double since you're about to use it in double-precision arithmetic.

Narrowing type conversion in C++ using pointers

I have been having some problems with downward type conversion in C++ using pointers, and before I came up with the idea of doing it this way Google basically told me this is impossible and it wasn't covered in any books I learned C++ from. I figured this would work...
long int TheLong=723330;
int TheInt1=0;
int TheInt2=0;
long int * pTheLong1 = &TheLong;
long int * pTheLong2 = &TheLong + 0x4;
TheInt1 = *pTheLong1;
TheInt2 = *pTheLong2;
cout << "The double is " << TheLong << " which is "
<< TheInt1 << " * " << TheInt2 << "\n";
The increment on line five might not be correct but the output has me worried that my C compiler I am using gcc 3.4.2 is automatically turning TheInt1 into a long int or something. The output looks like this...
The double is 723330 which is 723330 * 4067360
The output from TheInt1 is impossibly high, and the output from TheInt2 is absent.
I have three questions...
Am I even on the right track?
What is the proper increment for line five?
Why the hell is TheInt1/TheInt2 allowing such a large value?
int is probably 32 bit, which gives it a range of -2*10^9 to 2*10^9.
In the line long int * pTheLong2 = &TheLong + 0x4; you are doing pointer arithmetic to a long int*, which means the address will increase by the size of 0x4 long ints. I guess you are assuming that long int is twice the size of int. This is absolutely not guaranteed, but probably true if you are compiling in 64 bit mode. So you want to add half the size of a long int -- exactly the size of an int under your assumption -- to your pointer. int * pTheLong2 = (int*)(&TheLong) + 1; achieves this.
You are on the right track, but please keep in mind, as others have pointed out, that you are now exploring undefined behaviour. This means that portability is broken and optimization flags may very well change the behaviour.
By the way, a more correct thing to output (assuming that the machine is little-endian) would be:
cout << "The long is " << TheLong << " which is "
<< TheInt1 << " + " << TheInt2 << " * 2^32" << endl;
For completeness' sake, a well-defined conversion of a 32 bit integer to two 16 bit ones:
#include <cstdint>
#include <iostream>
int main() {
uint32_t fullInt = 723330;
uint16_t lowBits = (fullInt >> 0) & 0x0000FFFF;
uint16_t highBits = (fullInt >> 16) & 0x0000FFFF;
std::cout << fullInt << " = "
<< lowBits << " + " << highBits << " * 2^16"
<< std::endl;
return 0;
}
Output: 723330 = 2434 + 11 * 2^16
Am I even on the right track?
Probably not. You seem confused.
What is the proper increment for line five?
There are none. Pointer arithmetic is possible only inside arrays, you have no arrays here. So
long int * pTheLong2 = &TheLong + 0x4;
is undefined behavior and any value other than 0 (and possibly 1) by which you'd replace 0x4 would also be UB.
Why the hell is TheInt1/TheInt2 allowing such a large value?
int and long int often have the same range of possible values.
TheInt2 = *pTheLong2;
This invokes undefined behavior, because the C++ Standard does not give any guarantee as to which memory location pTheLong2 is pointing to, as it's initlialized as:
long int * pTheLong2 = &TheLong + 0x4;
&TheLong is a memory location of the variable TheLong and pTheLong2 is initialized to a memory location which is either not a part of the program hence illegal, or its pointing to a memory location within the program itself, though you don't know where exactly, neither the C++ Standard gives any guarantee where it's pointing to.
Hence, dereferencing such a pointer invokes undefined behavior.