how many bits I should put in stoi() function - c++

the following code will make the program crash:
string test="b1";
unsigned __int8 t1 = stoi(test, 0, 8);
but 'b1'=177, should be ok for a 8 bits, right? , if I use
string test="b1";
unsigned __int8 t1 = stoi(test, 0, 16);
everything looks ok, why need to use 16 bits for 'b1'?
a more complicate situation is 16 bits will make it right, but 32 bits make it wrong!!!
string test="0800";
unsigned __int16 t1 = stoi(test, 0, 16);

std::stoi's third parameter has nothing to do with any number of bits. It's the base that the number is represented in.
2 means binary, 8 means octal, 10 means decimal, 16 means hexadecimal, etc. all the way up to base-36. 0 means to determine the base from the prefix: strings starting with "0x" or "0X" are interpreted as hexadecimal, strings starting with "0" are interpreted as octal, and all other strings are interpreted as decimal.
When you call std::stoi("b1", 0, 8), std::stoi will throw a std::invalid_argument exception since b is not a valid digit in base-8, and your program will crash if that exception goes uncaught.
std::stoi("0800", 0, 16) and std::stio("0800", 0, 32) are both totally valid, but of course 80016 and 80032 represent different numbers, so the two calls will return different results.

Base 8 has exactly 8 different digits. Valid digits are the following:
0
1
2
3
4
5
6
7
Notice that b is not a valid digit in base 8. Only bases greater or equal to 12 have the digit b.
if I use
unsigned __int8 t1 = stoi(test, 0, 16);
everything looks ok
16 is greater or equal to 12. b is a valid digit in base 16.

Related

Why does char occupy 7 bits when the length is 1 byte ie 8 bits?

I've seen that the below program is taking only 7 bits of memory to store the character, but in general everywhere I've studied says that char occupies 1 byte of memory ie is 8 bits.
Does a single character require 8 bits or 7 bits?
If it requires 8 bits, what will be stored in the other bit?
#include <iostream>
using namespace std;
int main()
{
char ch = 'a';
int val = ch;
while (val > 0)
{
(val % 2)? cout<<1<<" " : cout<<0<<" ";
val /= 2;
}
return 0;
}
Output:
1 0 0 0 0 1 1
The below code shows the memory gap between the character, i.e. is 7 bits:
9e9 <-> 9f0 <->......<-> a13
#include <iostream>
using namespace std;
int main()
{
char arr[] = {'k','r','i','s','h','n','a'};
for(int i=0;i<7;i++)
cout<<&arr+i<<endl;
return 0;
}
Output:
0x7fff999019e9
0x7fff999019f0
0x7fff999019f7
0x7fff999019fe
0x7fff99901a05
0x7fff99901a0c
0x7fff99901a13
Your first code sample doesn't print leading zero bits, as ASCII characters all have the upper bit set to zero you'll only get at most seven bits printed if using ASCII characters. Extended ASCII characters or utf-8 use the upper bit for characters outside the basic ASCII character set.
Your second example is actually printing that each character is seven bytes long which is obviously incorrect. If you change the size of the array you are using to not be seven characters long you'll see different results.
&arr + i is equivalent to (&arr) + i as &arr is a pointer to char[7] which has a size of 7, the +i adds 7 * i bytes to the pointer. (&arr) + 1 points to one byte past the end of the array, if you try printing the values these pointers point to you'll get junk or a crash: **(&arr + i).
Your code should be static_cast<void*>(&arr[i]), you'll then see the pointer going up by one for each iteration. The cast to void* is necessary to stop the standard library from trying to print the pointer as a null terminated string.
It has nothing to do with space assigned for char. You simply converting ASCII represent of char into binary.
ASCII is a 7 bit character set. In C normally represented by an 8 bit char. If highest bit in an 8 bit byte is set, it is not an ASCII character. The eighth bit was used for parity. To communicate information between computers using different encoding.
ASCII stands for American Standard Code for Information Interchange, with the emphasis on American. The character set could not represent like Arabic letters (things with umlauts for example) or latin.
To “extend” the ASCII set and use those extra 128 values that became available by using all 8 bits, which caused problems. Eventually, Unicode came along which can represent every Unicode character. But 8 bit become a standard for char.

A bit field of enumeration type and a value stored to it

I wrote the following and I expected that 16 would be printed.
#include <iostream>
enum E : long { e = 16 };
struct X
{
E e : 5;
};
X x;
int main(){ x.e = E::e; std::cout << static_cast<int>(x.e) << std::endl; }
DEMO
But it wasn't. I got a compiler warning and -16 was printed instead. The warning was:
warning: implicit truncation from 'E' to bitfield changes value from 16 to -16
It's unclear to me. Why was the warning display and why was -16 printed? I declared the bit-field of size of 5 that's enough to store 16 in there.
It is a two's complement issue with signed values. You're going out of range of what a 5-bit signed value can represent.
If you only have 5 bits to store the value of 16, you'll have 10000. The leading 1 indicates that this is a negative value. Only 4 bits represent the magnitude when you have 5 bits for a signed value. To determine the absolute value of a 2s complement value, you flip all the bits and add 1, so
10000 -> 01111 -> 10000 which is 16, so it's negative 16.
Your options would be to use 6 bits instead of 5 if you want to represent the signed range of values, or use an unsigned long in which case you can use all 5 bits for the magnitude

Binary File Reads Negative Integers After Writing

I came from this question where I wanted to write 2 integers to a single byte that were garunteed to be between 0-16 (4 bits each).
Now if I close the file, and run a different program that reads....
for (int i = 0; i < 2; ++i)
{
char byteToRead;
file.seekg(i, std::ios::beg);
file.read(&byteToRead, sizeof(char));
bool correct = file.bad();
unsigned int num1 = (byteToRead >> 4);
unsigned int num2 = (byteToRead & 0x0F);
}
The issue is, sometimes this works but other times I'm having the first number come out negative and the second number is something like 10 or 9 all the time and they were most certainly not the numbers I wrote!
So here, for example, the first two numbers work, but the next number does not. For examplem, the output of the read above would be:
At byte 0, num1 = 5 and num2 = 6
At byte 1, num1 = 4294967289 and num2 = 12
At byte 1, num1 should be 9. It seems the 12 writes fine but the 9 << 4 isn't working. The byteToWrite on my end is byteToWrite -100 'œ''
I checked out this question which has a similar problem I think but I feel like my endian is right here.
The right-shift operator preserves the value of the left-most bit. If the left-most bit is 0 before the shift, it will still be 0 after the shift; if it is 1, it will still be 1 after the shift. This allow to preserve the value's sign.
In your case, you combine 9 (0b1001) with 12 (0b1100), so you write 0b10011100 (0x9C). The bit #7 is 1.
When byteToRead is right-shifted, you get 0b11111001 (0xF9), but it is implicitly converted to an int. The convertion from char to int also preserve the value's sign, so it produce 0xFFFFFFF9. Then the implicit int is implicitly converted to a unsigned int. So num1 contains 0xFFFFFFF9 which is 4294967289.
There is 2 solutions:
cast byteToRead into a unsigned char when doing the right-shift;
apply a mask to the shift's result to only keep the 4 bits you want.
The problem originates with byteToRead >> 4 . In C, any arithmetic operations are performed in at least int precision. So the first thing that happens is that byteToRead is promoted to int.
These promotions are value-preserving. Your system has plain char as signed, i.e. having range -128 through to 127. Your char might have been initially -112 (bit pattern 10010000), and then after promotion to int it retains its value of -112 (bit pattern 11111...1110010000).
The right-shift of a negative value is implementation-defined but a common implementation is to do an "arithmetic shift", i.e. perform division by two; so you end up with the result of byteToRead >> 4 being -7 (bit pattern 11111....111001).
Converting -7 to unsigned int results in UINT_MAX - 6 which is 4295967289, because unsigned arithmetic is defined as wrapping around mod UINT_MAX+1 .
To fix this you need to convert to unsigned before performing the arithmetic . You could cast (or alias) byteToRead to unsigned char, e.g.:
unsigned char byteToRead;
file.read( (char *)&byteToRead, 1 );

C++ Int bit manipulating is 2UL = 10UL?

I have a quick question.
I've been playing around with bit manipulation in c/c++ for a while and I recently discovered that when I compare 2UL and 10UL to a regular unsigned int they seem to return the same bit.
For example,
#define JUMP 2UL
#define FALL 10UL
unsigned int flags = 0UL;
this->flags |= FALL;
//this returns true
this->is(JUMP);
bool Player::is(const unsigned long &isThis)
{
return ((this->flags & isThis) == isThis);
}
Please confirm if 2U equals 10U and if so, how would I go around it if I need more than 8(?) flags in a single unsigned integer.
Kind regards,
-Markus
Of course. 10ul is 1010 in binary and 2 is 10. Therefore, doing x |= 10 sets the second bit too.
You probably wanted to use 0x10 and 0x2 as your flags. These would work as you expect.
As an aside: a single digit in the hex notation represent 4 bits, not 8.
JUMP, 2: 0010
FALL, 10: 1010
FALL & JUMP = JUMP = 0010
Decimal 2 in binary is 0010, whereas decimal 10 is binary 1010. If you bitwise-and them (2 & 10), that yields binary 0010, or decimal 2. So 10 & 2 is indeed equal to 2. Maybe your intention is to test for 1ul << 2 and 1ul << 10, which would be bits number 2 and 10 respectively. Or maybe you meant to use hexadecimal 10, (decimal 16, binary 10000), which is denoted as 0x10.

How does Microsoft Visual C++ store signed chars and how can I test individual bits in a signed char?

OK, so say I have a signed char with value -103:
char s_char = -103;
How does the computer store this char in bits? Is the first bit 0 because the char is negative? If so, would the computer store the char as 01100101, because 1100101 (base 2) in base 10 is 103?
And a second question: how can I access or test a single bit in the signed char? Would
s_char & (0x80 >> pos)
give me the value of the bit at position pos counting from the left?
char is just an integer. 8-bit integers in most cases. So -103 is just:
10011001
To access a single bit in a char, you can do it the same way as any other integer:
char s_char = -103;
s_char & (1 << n)
will get you the nth bit from the bottom.
Signed values are usually stored using Two's Complement. http://en.wikipedia.org/wiki/Two%27s_complement
This essentially provides a signed bit which determines whether or not the number stored is negative or positive. If you're using an 8-bit int for example, the range of possible signed numbers is -128 to 127. This breaks down to a series of 8 bits, for example, where the left-most bit represents a value of -128. The bits following 'hold' half the value as the bit to left, but are positive instead. An 8-bit number in binary form would look like this:
0 0 0 0 0 0 0 0
-128 64 32 16 8 4 2 1
Since a char is an integer type, it would be stored in the same way as a regular int would be. A char with the value of -103 would break down to something like this:
1 0 0 1 1 0 0 1
-128 64 32 16 8 4 2 1
If you want to test a single bit, you could use a mask. For example, if you wanted to test if the left most bit was set, you could do something like this:
s_char & (0x80)
This return true if the left-most bit was set to 1 in s_char, regardless of the other bits. I hope that helps!