why does ascii value vary for signed and unsigned character in c++? - c++

I have tried these 2 following codes:
int main()
{
int val=-125;
char code=val;
cout<<"\t"<<code<<" "<<(int)code;
getch();
}
The output i got is a^ -125
The second code is:
int main()
{
int val=-125;
unsigned char code=val;
cout<<"\t"<<code<<" "<<(int)code;
getch();
}
The output i got is: a^ 131
after trying both the codes is it safe to conclude that a character can have 2 ASCII values or my approach to find ASCII value(s) is flawed?
P.S.-
I was unable to upload the pictures of my output, so I am forced to type the output where the character I got isn't present in the standard keyboard.

In both examples 'code' has the same bitwise value. The first bit is 1, because it was a negativ number. Since both 'codes' have the same value the output character is the same (converting from number->character treats the number as an unsigned value).
After that you convert your character back to a (signed) interger. This conversion respects the type and the sign of you char.
->unsigned char -> int -> int always positiv
->char -> int -> int has the same sign as the char (and because the first bit was 1 it's negativ here)

unsigned integers in C++ have modulo 2n behavior, where n is the number of value bits.
that means if your char has 8 bits, then unsigned char has modulo 256 behavior.
this behavior is as if the values 0 through 255 were placed on a clockface. any operation that produces a result that goes past the 0-255 divide just effectively wraps around. just like arithmetic with hours on a clockface.
which means that assigning the value -125 yields the corresponding value in the range 0 through 255, namely -125 + 256 = 131.

Related

Bitwise complement operator (~) not working at a point in C

This code takes 2 byte int and exchanges the bytes.
Why the line in this code i commented seems to be not working?
INPUT/OUTPUT
*When i input 4 expected output is 1024 instead the 9th to 16th bits were all set to "1" after passing that line.
*Then i tried input 65280, whose expected output is 255 instead it outputs 65535 (sets all 16 bits to "1"
#include<stdio.h>
int main(void)
{
short unsigned int num;
printf("Enter the number: ");
fscanf(stdin,"%hu",&num);
printf("\nNumber with no swap between bytes---> %hu\n",num);
unsigned char swapa,swapb;
swapa=~num;
num>>=8;
swapb=~num;
num=~swapa;
num<<=8;
num=~swapb; //this line is not working why
printf("Swaped bytes value----> %hu\n",num);
}
Integral promotions also the current value of num is getting clobbering by the commented line, probably want a |=, +=, or ^=.

Why does the unsigned int give a negative value in c++?

I have two functions add and main as follows.
int add(unsigned int a, unsigned int b)
{
return a+b;
}
int main()
{
unsigned int a,b;
cout << "Enter a value for a: ";
cin >> a;
cout << "Enter a value for b: ";
cin >> b;
cout << "a: " << a << " b: "<<b <<endl;
cout << "Result is: " << add(a,b) <<endl;
return 0;
}
When I run this program I get the following results:
Enter a value for a: -1
Enter a value for b: -2
a: 4294967295 b: 4294967294
Result is: -3
Why is the result -3?
Because add returns an int (no unsigned int) which cannot represent 4294967295 + 4294967294 = 4294967293 (unsigned integer arithmetic is defined mod 2^n with n = 32 in this case) because the result is too big.
Thus, you have signed integer overflow (or, more precisely, an implicit conversion from a source integer that cannot be represented as int) which has an implementation defined result, i.e. any output (that is representable as int) would be "correct".
The reason for getting exactly -3 is that the result is 2^32 - 3 and that gets converted to -3 on your system. But still, note that any result would be equally legal.
int add(unsigned int a, unsigned int b)
{
return a+b;
}
The expression a+b adds two operands of type unsigned int, yielding an unsigned int result. Unsigned addition, strictly speaking, does not "overflow"; rather than result is reduced modulo MAX + 1, where MAX is the maximum value of the unsigned type. In this case, assuming 32-bit unsigned int, the result of adding 4294967295 + 4294967294 is well defined: it's 4294967293, or 232-3.
Since add is defined to return an int result, the unsigned value is implicitly converted from unsigned int to int. Unlike an arithmetic overflow, an unsigned-to-signed conversion that can't be represented in the target type yields an implementation-defined result. On a typical implementation, such a conversion (where the source and target have the same size) will reinterpret the representation, yielding -3. Other results are possible, depending on the implementation, but not particularly likely.
As for why a and b were set to those values in the first place, apparently that's how cin >> a behaves when a is an unsigned value and the input is negative. I'm not sure whether that behavior is defined by the language, implementation-defined, or undefined. In any case, once you have those values, the result returned by add follows as described above.
If you are intending to return an unsigned int then you need to add unsigned to your function declaration. If you changed your return type to an unsigned int
and you use the values -1 & -2 then this will be your output:
a: 4294967295 b: 4294967294
Result: 4294967293
unsigned int ranges from [0, 4294967295] provided an unsigned int is 4bytes in size on your local machine. When you input -1 for an unsigned int you have buffer overflow and what happens here is the compiler will set -1 to be the largest possible valid number in an unsigned int. When you pass -2 into your function the same thing happens but you are being index back to the second largest value. With unsigned int there is no "sign" for negatives stored. If you take the largest possible value of an unsigned as stated above by (a) and add 1 it will give you a value of 0. These values are passed into the function, and the function creates 2 stack variables of local scope within this function.
So now you have a copy of a & b on the stack and a has a value of unsigned int max size and b has a value of unsigned int max size - 1. Now you are performing an addition on these two values which exceeds the max value size of an unsigned int and wrapping occurs in the opposite direction. So if the index value starts at .....95 and you add 1 this gives you 0 for the last digit, so if you take the second value which is max value - 1 : .....94 subtract one from it because we already reached 0 now we are in the positive direction. This will give you a result of ......93 which the function is returning if your return type is unsigned int.
This same concept applies if your return type is int, however what happens here
is this: The addition of the two unsigned values are the same result giving you
.....93 then the compiler will implicitly cast this to an int. Well here the range value for a signed int is -2147483648 to 2147483647 - one bit is used to store the (signed value) but it also depends on if two's compliment is being used etc.
The compiler here is smart enough to know that the signed int has a range of these values but the wrapping still occurs. What happens when we store 4294967293 into a singed int? Well the max value of int in the positive is 2147483647 so if we subtract the two (4294967293 - 2147483647) we would get 2147483646 left over. At this point does this value fit in the range of a signed int max value? It does however because of the implicit casting being done from an unsigned to a signed value we have 2 bits to account for, the signed itself and the wrapping value meaning max_value + 1 = 0 to account for, except this doesn't happen with signed values when you add 1 to max_value you get the largest possible -max_value.
For the unsigned values:
- ...95 = -1
- ...94 = -2
- ...93 = -3
With signed values the negative is preserved in its own bit, or if twos compliment is used etc., pending on the definitions to a signed int within the compiler. Here the compiler recognizes this unsigned value as being negative even though negative values are not stored in an unsigned for the sign is not preserved. So when it explicitly casts from an unsigned to signed one bit is used to store the signed value then the calculations are done and the wrapping occurs from a buffer overflow. So as we subtracted the actual unsigned value that would represent -3 : 4294967293 with the max+ value for a signed int 2147483647 we got the value 2147483646 now with signed ints if you add 1 to the max value it does not give you 0 like an unsigned does, it will give you the largest -signed value which for a 4byte int is -2147483648, since this does fit in the max+ value and the compiler knows that this is supposed to be a negative value if we add our remainder by the -max_value for a signed int : 2147483646 + (-2147483648) this will give us -2, but because of the fact that the wrapping with int is different we have to subtract 1 from this -2 - 1 = -3.
When it converts from an unsigned int to an int, the number is over what a signed integer can hold. This causes it to overflow as a negative value.
If you change the return value to an unsigned integer, your problem should be solved.

Binary File Reads Negative Integers After Writing

I came from this question where I wanted to write 2 integers to a single byte that were garunteed to be between 0-16 (4 bits each).
Now if I close the file, and run a different program that reads....
for (int i = 0; i < 2; ++i)
{
char byteToRead;
file.seekg(i, std::ios::beg);
file.read(&byteToRead, sizeof(char));
bool correct = file.bad();
unsigned int num1 = (byteToRead >> 4);
unsigned int num2 = (byteToRead & 0x0F);
}
The issue is, sometimes this works but other times I'm having the first number come out negative and the second number is something like 10 or 9 all the time and they were most certainly not the numbers I wrote!
So here, for example, the first two numbers work, but the next number does not. For examplem, the output of the read above would be:
At byte 0, num1 = 5 and num2 = 6
At byte 1, num1 = 4294967289 and num2 = 12
At byte 1, num1 should be 9. It seems the 12 writes fine but the 9 << 4 isn't working. The byteToWrite on my end is byteToWrite -100 'œ''
I checked out this question which has a similar problem I think but I feel like my endian is right here.
The right-shift operator preserves the value of the left-most bit. If the left-most bit is 0 before the shift, it will still be 0 after the shift; if it is 1, it will still be 1 after the shift. This allow to preserve the value's sign.
In your case, you combine 9 (0b1001) with 12 (0b1100), so you write 0b10011100 (0x9C). The bit #7 is 1.
When byteToRead is right-shifted, you get 0b11111001 (0xF9), but it is implicitly converted to an int. The convertion from char to int also preserve the value's sign, so it produce 0xFFFFFFF9. Then the implicit int is implicitly converted to a unsigned int. So num1 contains 0xFFFFFFF9 which is 4294967289.
There is 2 solutions:
cast byteToRead into a unsigned char when doing the right-shift;
apply a mask to the shift's result to only keep the 4 bits you want.
The problem originates with byteToRead >> 4 . In C, any arithmetic operations are performed in at least int precision. So the first thing that happens is that byteToRead is promoted to int.
These promotions are value-preserving. Your system has plain char as signed, i.e. having range -128 through to 127. Your char might have been initially -112 (bit pattern 10010000), and then after promotion to int it retains its value of -112 (bit pattern 11111...1110010000).
The right-shift of a negative value is implementation-defined but a common implementation is to do an "arithmetic shift", i.e. perform division by two; so you end up with the result of byteToRead >> 4 being -7 (bit pattern 11111....111001).
Converting -7 to unsigned int results in UINT_MAX - 6 which is 4295967289, because unsigned arithmetic is defined as wrapping around mod UINT_MAX+1 .
To fix this you need to convert to unsigned before performing the arithmetic . You could cast (or alias) byteToRead to unsigned char, e.g.:
unsigned char byteToRead;
file.read( (char *)&byteToRead, 1 );

What if I try to assign values greater than pow(2,64)-1 to unsigned long long in c++?

If I have two unsigned long long values say pow(10,18) and pow (10,19) and I multiply them and store the output in another variable of type unsigned long long...the value which we get is obviously not the answer but does it have any logic? We get a junk type of value each time we try to this with arbitrarily large numbers, but do the outputs have any logic with the input values?
Unsigned integral types in C++ obey the rules of modular arithmetic, i.e. they represent the integers modulo 2N, where N is the number of value bits of the integral type (possibly less than its sizeof times CHAR_BIT); specifically, the type holds the values [0, 2N).
So when you multiply two numbers, the result is the remainder of the mathematical result divided by 2N.
The number N is obtainable programmatically via std::numeric_limits<T>::digits.
Yes, there's a logic.
As KerreK wrote, integers are "wrapped around" the 2N bits that constitute the width of their datatype.
To make it easy, let's consider the following:
#include <iostream>
#include <cmath>
using namespace std;
int main() {
unsigned char tot;
unsigned char ca = 200;
unsigned char ca2 = 200;
tot = ca * ca2;
cout << (int)tot;
return 0;
}
(try it: http://ideone.com/nWDYjO)
In the above example an unsigned char is 1 byte wide (max 255 decimal value), when multiplying 200 * 200 we get 40000. If we try to store it into the unsigned char it won't obviously fit.
The value is then "wrapped around", that is, tot gets the result of the following
(ca*ca2) % 256
where 256 are the bit of the unsigned char (1 byte), 28 bits
In your case you would get
(pow(10,18) * pow (10,19)) %
2number_of_bits_of_unsigned_long_long(architecture_dependent)

Unexpected results when looking at ASCII codes in C++

The bit of code below is extracting ASCII codes from characters.
When I convert characters in the normal ASCII region I get the value I expect.
When I convert £ and € from the extened region I get a load of 1's padding the INT that I'm storing the character in.
e.g. the output of the below is:
45 (ascii E as expected)
FFFFFF80 (extended ascii € as expected but padded with ones)
It's not causing me an issue but I'm just wondering why this happens.
Here's the code...
unsigned int asciichar[3];
string cTextToEncode = "E€";
for (unsigned int i = 0; i < cTextToEncode.length(); i++)
{
asciichar[i] = (unsigned int)cTextToEncode[i];
cout << hex << asciichar[i] << "\n";
}
Can anyone explain why this is?
Thanks
depending on the implementation a char can be either signed or unsigned. In your case they appear to be signed, so 0x80 is interpreted as -128 instead of 128, hence when cast to an integer it becomes 0xffffff80.
btw, this has nothing at all to do with ASCII
First, there's no € in ASCII (extended or otherwise) because the euro didn't exist when ASCII was created. However, several ASCII-friendly 8-bit encodings do support the € character, but the conversion is done by your source code editor (the compiler merely sees a byte which happens to represent € in your editor, but might be something else entirely on, say, a computer in Israel).
Second, (unsigned int) casts do not extract the ASCII encoding of a character. They merely convert the value of the underlying numeric char type to an unsigned integer. This causes strange things to happen when the converted value is negative - on your compiler, char happens to be signed char and thus characters with an ASCII value larger than 127 end up being negative char values.
You should convert to an unsigned char first, and then to an unsigned int.
You should be careful when promoting signed values.
When promoting signed char to signed int a first bit (sign bit) is taken into account. The algorithm is roughly look like this:
1) If you have 1X-XX-XX-XX (char in binary, X - any binary digit) then int will be (starts with 24 ones) 1...1-1X-XX-XX-XX (binary) -> 0xFFFFFFYY (hex)
2) if you have 0X-XX-XX-XX (binary), then you'll have (starts with 24 zeroes) 0...0-0X-XX-XX-XX (binary) -> 0x000000YY (hex).
In your case you want to force rule #2 all the time. In order to do this, you need to tell compiler to ignore first bit (sign bit). For this you need to use unsigned char.