Truncating of digits - c++

I was going through the following code. It basically truncates the the digits of the character entered through cin object. The problem is I don't know how assigning an int value to a character object truncates the digits except for the first.
#include <iostream>
using namespace std;
int main(){
unsigned int integer;
unsigned char character;
cin >> integer;
character = integer;
cout << character ;
}

The problem is I don't know how assigning an int value to a character object truncates the digits except for the first.
Let's for the sake of illustration assume that char is unsigned and is 8 bits wide, and int is 32 bits wide. What such an assignment would do is chop off the top 24 bits, leaving the bottom 8.
The truncation does not have anything to do with the decimal digits of the integer. For example, 9999 would become 15 (because 9999 & 0xFF == 15).

I am not sure what you mean by "except for the first." but let me see if I can explain what is happening.
unsigned char is, I believe, required by the standard to be 1 byte in length. int is typically much longer, 4 bytes is typical. Thus when you enter a number >255, it looses all the value above that since all it can hold is one byte and leading 3 bytes of data are lost.

Related

Why does char occupy 7 bits when the length is 1 byte ie 8 bits?

I've seen that the below program is taking only 7 bits of memory to store the character, but in general everywhere I've studied says that char occupies 1 byte of memory ie is 8 bits.
Does a single character require 8 bits or 7 bits?
If it requires 8 bits, what will be stored in the other bit?
#include <iostream>
using namespace std;
int main()
{
char ch = 'a';
int val = ch;
while (val > 0)
{
(val % 2)? cout<<1<<" " : cout<<0<<" ";
val /= 2;
}
return 0;
}
Output:
1 0 0 0 0 1 1
The below code shows the memory gap between the character, i.e. is 7 bits:
9e9 <-> 9f0 <->......<-> a13
#include <iostream>
using namespace std;
int main()
{
char arr[] = {'k','r','i','s','h','n','a'};
for(int i=0;i<7;i++)
cout<<&arr+i<<endl;
return 0;
}
Output:
0x7fff999019e9
0x7fff999019f0
0x7fff999019f7
0x7fff999019fe
0x7fff99901a05
0x7fff99901a0c
0x7fff99901a13
Your first code sample doesn't print leading zero bits, as ASCII characters all have the upper bit set to zero you'll only get at most seven bits printed if using ASCII characters. Extended ASCII characters or utf-8 use the upper bit for characters outside the basic ASCII character set.
Your second example is actually printing that each character is seven bytes long which is obviously incorrect. If you change the size of the array you are using to not be seven characters long you'll see different results.
&arr + i is equivalent to (&arr) + i as &arr is a pointer to char[7] which has a size of 7, the +i adds 7 * i bytes to the pointer. (&arr) + 1 points to one byte past the end of the array, if you try printing the values these pointers point to you'll get junk or a crash: **(&arr + i).
Your code should be static_cast<void*>(&arr[i]), you'll then see the pointer going up by one for each iteration. The cast to void* is necessary to stop the standard library from trying to print the pointer as a null terminated string.
It has nothing to do with space assigned for char. You simply converting ASCII represent of char into binary.
ASCII is a 7 bit character set. In C normally represented by an 8 bit char. If highest bit in an 8 bit byte is set, it is not an ASCII character. The eighth bit was used for parity. To communicate information between computers using different encoding.
ASCII stands for American Standard Code for Information Interchange, with the emphasis on American. The character set could not represent like Arabic letters (things with umlauts for example) or latin.
To “extend” the ASCII set and use those extra 128 values that became available by using all 8 bits, which caused problems. Eventually, Unicode came along which can represent every Unicode character. But 8 bit become a standard for char.

Bitwise complement operator (~) not working at a point in C

This code takes 2 byte int and exchanges the bytes.
Why the line in this code i commented seems to be not working?
INPUT/OUTPUT
*When i input 4 expected output is 1024 instead the 9th to 16th bits were all set to "1" after passing that line.
*Then i tried input 65280, whose expected output is 255 instead it outputs 65535 (sets all 16 bits to "1"
#include<stdio.h>
int main(void)
{
short unsigned int num;
printf("Enter the number: ");
fscanf(stdin,"%hu",&num);
printf("\nNumber with no swap between bytes---> %hu\n",num);
unsigned char swapa,swapb;
swapa=~num;
num>>=8;
swapb=~num;
num=~swapa;
num<<=8;
num=~swapb; //this line is not working why
printf("Swaped bytes value----> %hu\n",num);
}
Integral promotions also the current value of num is getting clobbering by the commented line, probably want a |=, +=, or ^=.

Scanf does not read leading 0s if digit after leading 0s are 8 or 9

I have a very strange question which stems from a bug of a C++11 program I have been writing.
See the following code:
long long a[1000];
int main(int argc, char * argv[]) {
for(long long i = 0; i < 300; ++i) {
scanf("%lli", &a[i]);
std::cout << a[i] << std::endl;
}
return 0;
}
Trying the inputs 1, 2 etc we get outputs 1\n, 2\n, etc. like expected. This also works for inputs like 001 where we get 1\n, 0004 where we get 4\n.
However when the digit after the leading zeros is an 8 or 9, the scanf() reads the leading zeroes first, then reads the digits after.
For example:
Input: 0009, output: 000\n9\n.
Input: 08, output 0\n8\n.
Input: 00914, output 00\n914\n.
I've done some testing and for these cases it seems the scanf() reads the leading zeros first, and the remaining digits are left in the buffer, which are picked up on the second run of the loop.
Can someone hint at what is going on?
I am using XCode 11.3.7 and compiling with Clang C++11. (I haven't messed with the project settings)
Thank you in advance!!!
Use %lld instead of %lli.
The reason %i doesn't work is because 0 is interpreted as a prefix for octal numbers, and the digits 8 and 9 don't exist in octal:
d Matches an optionally signed decimal integer; the next pointer must be a pointer to int.
i Matches an optionally signed integer; the next pointer must be a pointer to int. The integer is read in base 16 if it begins with 0x or 0X, in base 8 if it begins with 0, and in base 10 otherwise. Only characters
that correspond to the base are used.
You would also get the wrong answer for other numbers, e.g. 010 in octal would be parsed as 8.
Or, even better: use C++ instead of C.
std::cin >> a[i];

Binary File Reads Negative Integers After Writing

I came from this question where I wanted to write 2 integers to a single byte that were garunteed to be between 0-16 (4 bits each).
Now if I close the file, and run a different program that reads....
for (int i = 0; i < 2; ++i)
{
char byteToRead;
file.seekg(i, std::ios::beg);
file.read(&byteToRead, sizeof(char));
bool correct = file.bad();
unsigned int num1 = (byteToRead >> 4);
unsigned int num2 = (byteToRead & 0x0F);
}
The issue is, sometimes this works but other times I'm having the first number come out negative and the second number is something like 10 or 9 all the time and they were most certainly not the numbers I wrote!
So here, for example, the first two numbers work, but the next number does not. For examplem, the output of the read above would be:
At byte 0, num1 = 5 and num2 = 6
At byte 1, num1 = 4294967289 and num2 = 12
At byte 1, num1 should be 9. It seems the 12 writes fine but the 9 << 4 isn't working. The byteToWrite on my end is byteToWrite -100 'œ''
I checked out this question which has a similar problem I think but I feel like my endian is right here.
The right-shift operator preserves the value of the left-most bit. If the left-most bit is 0 before the shift, it will still be 0 after the shift; if it is 1, it will still be 1 after the shift. This allow to preserve the value's sign.
In your case, you combine 9 (0b1001) with 12 (0b1100), so you write 0b10011100 (0x9C). The bit #7 is 1.
When byteToRead is right-shifted, you get 0b11111001 (0xF9), but it is implicitly converted to an int. The convertion from char to int also preserve the value's sign, so it produce 0xFFFFFFF9. Then the implicit int is implicitly converted to a unsigned int. So num1 contains 0xFFFFFFF9 which is 4294967289.
There is 2 solutions:
cast byteToRead into a unsigned char when doing the right-shift;
apply a mask to the shift's result to only keep the 4 bits you want.
The problem originates with byteToRead >> 4 . In C, any arithmetic operations are performed in at least int precision. So the first thing that happens is that byteToRead is promoted to int.
These promotions are value-preserving. Your system has plain char as signed, i.e. having range -128 through to 127. Your char might have been initially -112 (bit pattern 10010000), and then after promotion to int it retains its value of -112 (bit pattern 11111...1110010000).
The right-shift of a negative value is implementation-defined but a common implementation is to do an "arithmetic shift", i.e. perform division by two; so you end up with the result of byteToRead >> 4 being -7 (bit pattern 11111....111001).
Converting -7 to unsigned int results in UINT_MAX - 6 which is 4295967289, because unsigned arithmetic is defined as wrapping around mod UINT_MAX+1 .
To fix this you need to convert to unsigned before performing the arithmetic . You could cast (or alias) byteToRead to unsigned char, e.g.:
unsigned char byteToRead;
file.read( (char *)&byteToRead, 1 );

why does ascii value vary for signed and unsigned character in c++?

I have tried these 2 following codes:
int main()
{
int val=-125;
char code=val;
cout<<"\t"<<code<<" "<<(int)code;
getch();
}
The output i got is a^ -125
The second code is:
int main()
{
int val=-125;
unsigned char code=val;
cout<<"\t"<<code<<" "<<(int)code;
getch();
}
The output i got is: a^ 131
after trying both the codes is it safe to conclude that a character can have 2 ASCII values or my approach to find ASCII value(s) is flawed?
P.S.-
I was unable to upload the pictures of my output, so I am forced to type the output where the character I got isn't present in the standard keyboard.
In both examples 'code' has the same bitwise value. The first bit is 1, because it was a negativ number. Since both 'codes' have the same value the output character is the same (converting from number->character treats the number as an unsigned value).
After that you convert your character back to a (signed) interger. This conversion respects the type and the sign of you char.
->unsigned char -> int -> int always positiv
->char -> int -> int has the same sign as the char (and because the first bit was 1 it's negativ here)
unsigned integers in C++ have modulo 2n behavior, where n is the number of value bits.
that means if your char has 8 bits, then unsigned char has modulo 256 behavior.
this behavior is as if the values 0 through 255 were placed on a clockface. any operation that produces a result that goes past the 0-255 divide just effectively wraps around. just like arithmetic with hours on a clockface.
which means that assigning the value -125 yields the corresponding value in the range 0 through 255, namely -125 + 256 = 131.