Binary File Reads Negative Integers After Writing - c++

I came from this question where I wanted to write 2 integers to a single byte that were garunteed to be between 0-16 (4 bits each).
Now if I close the file, and run a different program that reads....
for (int i = 0; i < 2; ++i)
{
char byteToRead;
file.seekg(i, std::ios::beg);
file.read(&byteToRead, sizeof(char));
bool correct = file.bad();
unsigned int num1 = (byteToRead >> 4);
unsigned int num2 = (byteToRead & 0x0F);
}
The issue is, sometimes this works but other times I'm having the first number come out negative and the second number is something like 10 or 9 all the time and they were most certainly not the numbers I wrote!
So here, for example, the first two numbers work, but the next number does not. For examplem, the output of the read above would be:
At byte 0, num1 = 5 and num2 = 6
At byte 1, num1 = 4294967289 and num2 = 12
At byte 1, num1 should be 9. It seems the 12 writes fine but the 9 << 4 isn't working. The byteToWrite on my end is byteToWrite -100 'œ''
I checked out this question which has a similar problem I think but I feel like my endian is right here.

The right-shift operator preserves the value of the left-most bit. If the left-most bit is 0 before the shift, it will still be 0 after the shift; if it is 1, it will still be 1 after the shift. This allow to preserve the value's sign.
In your case, you combine 9 (0b1001) with 12 (0b1100), so you write 0b10011100 (0x9C). The bit #7 is 1.
When byteToRead is right-shifted, you get 0b11111001 (0xF9), but it is implicitly converted to an int. The convertion from char to int also preserve the value's sign, so it produce 0xFFFFFFF9. Then the implicit int is implicitly converted to a unsigned int. So num1 contains 0xFFFFFFF9 which is 4294967289.
There is 2 solutions:
cast byteToRead into a unsigned char when doing the right-shift;
apply a mask to the shift's result to only keep the 4 bits you want.

The problem originates with byteToRead >> 4 . In C, any arithmetic operations are performed in at least int precision. So the first thing that happens is that byteToRead is promoted to int.
These promotions are value-preserving. Your system has plain char as signed, i.e. having range -128 through to 127. Your char might have been initially -112 (bit pattern 10010000), and then after promotion to int it retains its value of -112 (bit pattern 11111...1110010000).
The right-shift of a negative value is implementation-defined but a common implementation is to do an "arithmetic shift", i.e. perform division by two; so you end up with the result of byteToRead >> 4 being -7 (bit pattern 11111....111001).
Converting -7 to unsigned int results in UINT_MAX - 6 which is 4295967289, because unsigned arithmetic is defined as wrapping around mod UINT_MAX+1 .
To fix this you need to convert to unsigned before performing the arithmetic . You could cast (or alias) byteToRead to unsigned char, e.g.:
unsigned char byteToRead;
file.read( (char *)&byteToRead, 1 );

Related

Is a negative integer summed with a greater unsigned integer promoted to unsigned int?

After getting advised to read "C++ Primer 5 ed by Stanley B. Lipman" I don't understand this:
Page 66. "Expressions Involving Unsigned Types"
unsigned u = 10;
int i = -42;
std::cout << i + i << std::endl; // prints -84
std::cout << u + i << std::endl; // if 32-bit ints, prints 4294967264
He said:
In the second expression, the int value -42 is converted to unsigned before the addition is done. Converting a negative number to unsigned behaves exactly as if we had attempted to assign that negative value to an unsigned object. The value “wraps around” as described above.
But if I do something like this:
unsigned u = 42;
int i = -10;
std::cout << u + i << std::endl; // Why the result is 32?
As you can see -10 is not converted to unsigned int. Does this mean a comparison occurs before promoting a signed integer to an unsigned integer?
-10 is being converted to a unsigned integer with a very large value, the reason you get a small number is that the addition wraps you back around. With 32 bit unsigned integers -10 is the same as 4294967286. When you add 42 to that you get 4294967328, but the max value is 4294967296, so we have to take 4294967328 modulo 4294967296 and we get 32.
Well, I guess this is an exception to "two wrongs don't make a right" :)
What's happening is that there are actually two wrap arounds (unsigned overflows) under the hood and the final result ends up being mathematically correct.
First, i is converted to unsigned and as per the wrap around behavior the value is std::numeric_limits<unsigned>::max() - 9.
When this value is summed with u the mathematical result would be std::numeric_limits<unsigned>::max() - 9 + 42 == std::numeric_limits<unsigned>::max() + 33 which is an overflow and we get another wrap around. So the final result is 32.
As a general rule in an arithmetic expression if you only have unsigned overflows (no matter how many) and if the final mathematical result is representable in the expression data type, then the value of the expression will be the mathematically correct one. This is a consequence of the fact that unsigned integers in C++ obey the laws of arithmetic modulo 2n (see bellow).
Important notice. According to C++ unsigned arithmetic does not overflow:
§6.9.1 Fundamental types [basic.fundamental]
Unsigned integers shall obey the laws of arithmetic modulo 2n where n
is the number of bits in the value representation of that particular
size of integer 49
49) This implies that unsigned arithmetic does not overflow because a
result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest
value that can be represented by the resulting unsigned integer type.
I will however leave "overflow" in my answer to express values that cannot be represented in regular arithmetic.
Also what we colloquially call "wrap around" is in fact just the arithmetic modulo nature of the unsigned integers. I will however use "wrap around" also because it is easier to understand.
i is in fact promoted to unsigned int.
Unsigned integers in C and C++ implement arithmetic in ℤ / 2nℤ, where n is the number of bits in the unsigned integer type. Thus we get
[42] + [-10] ≡ [42] + [2n - 10] ≡ [2n + 32] ≡ [32],
with [x] denoting the equivalence class of x in ℤ / 2nℤ.
Of course, the intermediate step of picking only non-negative representatives of each equivalence class, while it formally occurs, is not necessary to explain the result; the immediate
[42] + [-10] ≡ [32]
would also be correct.
"In the second expression, the int value -42 is converted to unsigned before the addition is done"
yes this is true
unsigned u = 42;
int i = -10;
std::cout << u + i << std::endl; // Why the result is 32?
Supposing we are in 32 bits (that change nothing in 64b, this is just to explain) this is computed as 42u + ((unsigned) -10) so 42u + 4294967286u and the result is 4294967328u truncated in 32 bits so 32. All was done in unsigned
This is part of what is wonderful about 2's complement representation. The processor doesn't know or care if a number is signed or unsigned, the operations are the same. In both cases, the calculation is correct. It's only how the binary number is interpreted after the fact, when printing, that is actually matters (there may be other cases, as with comparison operators)
-10 in 32BIT binary is FFFFFFF6
42 IN 32bit BINARY is 0000002A
Adding them together, it doesn't matter to the processor if they are signed or unsigned, the result is: 100000020. In 32bit, the 1 at the start will be placed in the overflow register, and in c++ is just disappears. You get 0x20 as the result, which is 32.
In the first case, it is basically the same:
-42 in 32BIT binary is FFFFFFD6
10 IN 32bit binary is 0000000A
Add those together and get FFFFFFE0
FFFFFFE0 as a signed int is -32 (decimal). The calculation is correct! But, because it is being PRINTED as an unsigned, it shows up as 4294967264. It's about interpreting the result.

why declare "score[11] = {};" and "grade" as "unsigned" instead of "int'

I'm new to C++ and is trying to learn the concept of array. I saw this code snippet online. For the sample code below, does it make any difference to declare:
unsigned scores[11] = {};
unsigned grade;
as:
int scores[11] = {};
int grade;
I guess there must be a reason why score[11] = {}; and grade is declared as unsigned, but what is the reason behind it?
int main() {
unsigned scores[11] = {};
unsigned grade;
while (cin >> grade) {
if (0 <= grade <= 100) {
++scores[grade / 10];
}
}
for (int i = 0; i < 11; i++) {
cout << scores[i] << endl;
}
}
unsigned means that the variable will not hold a negative values (or even more accurate - It will not care about the sign-). It seems obvious that scores and grades are signless values (no one scores -25). So, it is natural to use unsigned.
But note that: if (0 <= grade <= 100) is redundant. if (grade <= 100) is enough since no negative values are allowed.
As Blastfurnace commented, if (0 <= grade <= 100) is not right even. if you want it like this you should write it as:
if (0 <= grade && grade <= 100)
Unsigned variables
Declaring a variable as unsigned int instead of int has 2 consequences:
It can't be negative. It provides you a guarantee that it never will be and therefore you don't need to check for it and handle special cases when writing code that only works with positive integers
As you have a limited size, it allows you to represent bigger numbers. On 32 bits, the biggest unsigned int is 4294967295 (2^32-1) whereas the biggest int is 2147483647 (2^31-1)
One consequence of using unsigned int is that arithmetic will be done in the set of unsigned int. So 9 - 10 = 4294967295 instead of -1 as no negative number can be encoded on unsigned int type. You will also have issues if you compare them to negative int.
More info on how negative integer are encoded.
Array initialization
For the array definition, if you just write:
unsigned int scores[11];
Then you have 11 uninitialized unsigned int that have potentially values different than 0.
If you write:
unsigned int scores[11] = {};
Then all int are initialized with their default value that is 0.
Note that if you write:
unsigned int scores[11] = { 1, 2 };
You will have the first int intialized to 1, the second to 2 and all the others to 0.
You can easily play a little bit with all these syntax to gain a better understanding of it.
Comparison
About the code:
if(0 <= grade <= 100)
as stated in the comments, this does not do what you expect. In fact, this will always evaluate to true and therefore execute the code in the if. Which means if you enter a grade of, say, 20000, you should have a core dump. The reason is that this:
0 <= grade <= 100
is equivalent to:
(0 <= grade) <= 100
And the first part is either true (implicitly converted to 1) or false (implicitly converted to 0). As both values are lower than 100, the second comparison is always true.
unsigned integers have some strange properties and you should avoid them unless you have a good reason. Gaining 1 extra bit of positive size, or expressing a constraint that a value may not be negative, are not good reasons.
unsigned integers implement arithmetic modulo UINT_MAX+1. By contrast, operations on signed integers represent the natural arithmetic that we are familiar with from school.
Overflow semantics
unsigned has well defined overflow; signed does not:
unsigned u = UINT_MAX;
u++; // u becomes 0
int i = INT_MAX;
i++; // undefined behaviour
This has the consequence that signed integer overflow can be caught during testing, while an unsigned overflow may silently do the wrong thing. So use unsigned only if you are sure you want to legalize overflow.
If you have a constraint that a value may not be negative, then you need a way to detect and reject negative values; int is perfect for this. An unsigned will accept a negative value and silently overflow it into a positive value.
Bit shift semantics
Bit shift of unsigned by an amount not greater than the number of bits in the data type is always well defined. Until C++20, bit shift of signed was undefined if it would cause a 1 in the sign bit to be shifted left, or implementation-defined if it would cause a 1 in the sign bit to be shifted right. Since C++20, signed right shift always preserves the sign, but signed left shift does not. So use unsigned for some kinds of bit twiddling operations.
Mixed sign operations
The built-in arithmetic operations always operate on operands of the same type. If they are supplied operands of different types, the "usual arithmetic conversions" coerce them into the same type, sometimes with surprising results:
unsigned u = 42;
std::cout << (u * -1); // 4294967254
std::cout << std::boolalpha << (u >= -1); // false
What's the difference?
Subtracting an unsigned from another unsigned yields an unsigned result, which means that the difference between 2 and 1 is 4294967295.
Double the max value
int uses one bit to represent the sign of the value. unsigned uses this bit as just another numerical bit. So typically, int has 31 numerical bits and unsigned has 32. This extra bit is often cited as a reason to use unsigned. But if 31 bits are insufficient for a particular purpose, then most likely 32 bits will also be insufficient, and you should be considering 64 bits or more.
Function overloading
The implicit conversion from int to unsigned has the same rank as the conversion from int to double, so the following example is ill formed:
void f(unsigned);
void f(double);
f(42); // error: ambiguous call to overloaded function
Interoperability
Many APIs (including the standard library) use unsigned types, often for misguided reasons. It is sensible to use unsigned to avoid mixed-sign operations when interacting with these APIs.
Appendix
The quoted snippet includes the expression 0 <= grade <= 100. This will first evaluate 0 <= grade, which is always true, because grade can't be negative. Then it will evaluate true <= 100, which is always true, because true is converted to the integer 1, and 1 <= 100 is true.
Yes it does make a difference. In the first case you declare an array of 11 elements a variable of type "unsigned int". In the second case you declare them as ints.
When the int is on 32 bits you can have values from the following ranges
–2,147,483,648 to 2,147,483,647 for plain int
0 to 4,294,967,295 for unsigned int
You normally declare something unsigned when you don't need negative numbers and you need that extra range given by unsigned. In your case I assume that that by declaring the variables unsigned, the developer doesn't accept negative scores and grades. You basically do a statistic of how many grades between 0 and 10 were introduced at the command line. So it looks like something to simulate a school grading system, therefore you don't have negative grades. But this is my opinion after reading the code.
Take a look at this post which explains what unsigned is:
what is the unsigned datatype?
As the name suggests, signed integers can be negative and unsigned cannot be. If we represent an integer with N bits then for unsigned the minimum value is 0 and the maximum value is 2^(N-1). If it is a signed integer of N bits then it can take the values from -2^(N-2) to 2^(N-2)-1. This is because we need 1-bit to represent the sign +/-
Ex: signed 3-bit integer (yes there are such things)
000 = 0
001 = 1
010 = 2
011 = 3
100 = -4
101 = -3
110 = -2
111 = -1
But, for unsigned it just represents the values [0,7]. The most significant bit (MSB) in the example signifies a negative value. That is, all values where the MSB is set are negative. Hence the apparent loss of a bit in its absolute values.
It also behaves as one might expect. If you increment -1 (111) we get (1 000) but since we don't have a fourth bit it simply "falls off the end" and we are left with 000.
The same applies to subtracting 1 from 0. First take the two's complement
111 = twos_complement(001)
and add it to 000 which yields 111 = -1 (from the table) which is what one might expect. What happens when you increment 011(=3) yielding 100(=-4) is perhaps not what one might expect and is at odds with our normal expectations. These overflows are troublesome with fixed point arithmetic and have to be dealt with.
One other thing worth pointing out is the a signed integer can take one negative value more than it can positive which has a consequence for rounding (when using integer to represent fixed point numbers for example) but am sure that's better covered in the DSP or signal processing forums.

When is int value ordinarily converted to unsigned?

Here is from C++ Prime 5th:
if we use both unsigned and int values in an arithmetic expression, the int value ordinarily is converted to unsigned.
And I tried the following code:
int i = -40;
unsigned u = 42;
unsigned i2 = i;
std::cout << i2 << std::endl; // 4294967256
std::cout << i + u << std::endl; // 2
For i + u, if i is converted to unsigned, it shall be 4294967256, and then i + u cannot equal to 2.
Is int value always converted to unsigned if we use both unsigned and int values in an arithmetic expression?
Yes.
What you're experiencing is integer overflow. The sign of a negative number is given by the leading sign bit, which is either 0 (positive) or 1 (negative). However, for unsigned integers, this leading bit just gives you one more bit of integer precision, rather than an indication of sign. "Negative" signed numbers end up counting backwards from the maximum unsigned integer size.
How does this relate to your particular code? Well, -40 does get turned into 4294967256. However, 4294967256+42 can't fit in an unsigned integer, as if you remember -40 just turned into the max unsigned integer minus 39. So by adding 42, you exceed the capacity of your data-type. The would-be 33rd bit just gets chopped off, and you start over from 0. Hence, you still get 2 as an answer.
Signed: -42, -41, ..., -1, 0, 1, 2
Unsigned: 4294967256, 4294967257, ..., 4294967295, 0, 1, 2, ...
^--^ Overflow!
You'd probably be interested in reading about Two's Complement

Why does the unsigned int give a negative value in c++?

I have two functions add and main as follows.
int add(unsigned int a, unsigned int b)
{
return a+b;
}
int main()
{
unsigned int a,b;
cout << "Enter a value for a: ";
cin >> a;
cout << "Enter a value for b: ";
cin >> b;
cout << "a: " << a << " b: "<<b <<endl;
cout << "Result is: " << add(a,b) <<endl;
return 0;
}
When I run this program I get the following results:
Enter a value for a: -1
Enter a value for b: -2
a: 4294967295 b: 4294967294
Result is: -3
Why is the result -3?
Because add returns an int (no unsigned int) which cannot represent 4294967295 + 4294967294 = 4294967293 (unsigned integer arithmetic is defined mod 2^n with n = 32 in this case) because the result is too big.
Thus, you have signed integer overflow (or, more precisely, an implicit conversion from a source integer that cannot be represented as int) which has an implementation defined result, i.e. any output (that is representable as int) would be "correct".
The reason for getting exactly -3 is that the result is 2^32 - 3 and that gets converted to -3 on your system. But still, note that any result would be equally legal.
int add(unsigned int a, unsigned int b)
{
return a+b;
}
The expression a+b adds two operands of type unsigned int, yielding an unsigned int result. Unsigned addition, strictly speaking, does not "overflow"; rather than result is reduced modulo MAX + 1, where MAX is the maximum value of the unsigned type. In this case, assuming 32-bit unsigned int, the result of adding 4294967295 + 4294967294 is well defined: it's 4294967293, or 232-3.
Since add is defined to return an int result, the unsigned value is implicitly converted from unsigned int to int. Unlike an arithmetic overflow, an unsigned-to-signed conversion that can't be represented in the target type yields an implementation-defined result. On a typical implementation, such a conversion (where the source and target have the same size) will reinterpret the representation, yielding -3. Other results are possible, depending on the implementation, but not particularly likely.
As for why a and b were set to those values in the first place, apparently that's how cin >> a behaves when a is an unsigned value and the input is negative. I'm not sure whether that behavior is defined by the language, implementation-defined, or undefined. In any case, once you have those values, the result returned by add follows as described above.
If you are intending to return an unsigned int then you need to add unsigned to your function declaration. If you changed your return type to an unsigned int
and you use the values -1 & -2 then this will be your output:
a: 4294967295 b: 4294967294
Result: 4294967293
unsigned int ranges from [0, 4294967295] provided an unsigned int is 4bytes in size on your local machine. When you input -1 for an unsigned int you have buffer overflow and what happens here is the compiler will set -1 to be the largest possible valid number in an unsigned int. When you pass -2 into your function the same thing happens but you are being index back to the second largest value. With unsigned int there is no "sign" for negatives stored. If you take the largest possible value of an unsigned as stated above by (a) and add 1 it will give you a value of 0. These values are passed into the function, and the function creates 2 stack variables of local scope within this function.
So now you have a copy of a & b on the stack and a has a value of unsigned int max size and b has a value of unsigned int max size - 1. Now you are performing an addition on these two values which exceeds the max value size of an unsigned int and wrapping occurs in the opposite direction. So if the index value starts at .....95 and you add 1 this gives you 0 for the last digit, so if you take the second value which is max value - 1 : .....94 subtract one from it because we already reached 0 now we are in the positive direction. This will give you a result of ......93 which the function is returning if your return type is unsigned int.
This same concept applies if your return type is int, however what happens here
is this: The addition of the two unsigned values are the same result giving you
.....93 then the compiler will implicitly cast this to an int. Well here the range value for a signed int is -2147483648 to 2147483647 - one bit is used to store the (signed value) but it also depends on if two's compliment is being used etc.
The compiler here is smart enough to know that the signed int has a range of these values but the wrapping still occurs. What happens when we store 4294967293 into a singed int? Well the max value of int in the positive is 2147483647 so if we subtract the two (4294967293 - 2147483647) we would get 2147483646 left over. At this point does this value fit in the range of a signed int max value? It does however because of the implicit casting being done from an unsigned to a signed value we have 2 bits to account for, the signed itself and the wrapping value meaning max_value + 1 = 0 to account for, except this doesn't happen with signed values when you add 1 to max_value you get the largest possible -max_value.
For the unsigned values:
- ...95 = -1
- ...94 = -2
- ...93 = -3
With signed values the negative is preserved in its own bit, or if twos compliment is used etc., pending on the definitions to a signed int within the compiler. Here the compiler recognizes this unsigned value as being negative even though negative values are not stored in an unsigned for the sign is not preserved. So when it explicitly casts from an unsigned to signed one bit is used to store the signed value then the calculations are done and the wrapping occurs from a buffer overflow. So as we subtracted the actual unsigned value that would represent -3 : 4294967293 with the max+ value for a signed int 2147483647 we got the value 2147483646 now with signed ints if you add 1 to the max value it does not give you 0 like an unsigned does, it will give you the largest -signed value which for a 4byte int is -2147483648, since this does fit in the max+ value and the compiler knows that this is supposed to be a negative value if we add our remainder by the -max_value for a signed int : 2147483646 + (-2147483648) this will give us -2, but because of the fact that the wrapping with int is different we have to subtract 1 from this -2 - 1 = -3.
When it converts from an unsigned int to an int, the number is over what a signed integer can hold. This causes it to overflow as a negative value.
If you change the return value to an unsigned integer, your problem should be solved.

Is it legal to use char overflow in C++ code

Good day, colleagues!
I need to obtain cyclic series on successive numbers from 0 to 255. Is it legal to use unsigned char overflow like this:
unsigned char test_char = 0;
while (true) {
std::cout << test_char++ << " ";
}
Or will be more safely to use this code:
int test_int = 0;
while (true) {
std::cout << test_int++ % 256 << " ";
}
Of course, in real code there will be reasonable condition instead of while (true).
3.9.1/4 "Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer"
"This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type"
So, yes it is legal. And the second form is preferred, since it's more readable.
Even though sizeof(char) will always be 1, it is not necessary that a char will be exactly 8 bits. (I am guessing unsigned char will be similar).
So of the two, if given a choice, I would prefer the latter as the former might not even be correct.
btw, You probably intended unsigned int instead of int for the latter? Modulus with negative numbers could get tricky (after the int overflows, as Jimmy noted). If I recollect correctly, I believe it is compiler dependent.
unsigned char, like all other unsigned integral types, follows modulo 2n arithmetic, so basically both your methods are equivalent. Use the first
There is no such thing as unsigned overflow, per 3.9.1/4 as quoted by Erik. However, as Moron says, it is possible that the modulus of the unsigned char number system is greater than 256.
Note that your expression does not store the result of % 256 back to test_int. The safe way to do this is
test_int = ( test_int + 1 ) % 256;
std::cout << test_int << " ";
the output of the 2 samples is completely different.
the first one will print characters (a b c d e f g h ...)
the second one will print integers (0 1 2 3 4 ... 255 0 ...)
anyway it depends if you have a a control for overflowexception (.NET) otherwise in old C++ it the value is always valid and goes from 0 to 255