Incrementing a uint8_t variable, strange outcome - c++

In a C++ class I've the following code/while loop:
uint8_t len = 0;
while (*s != ',') {
len = (uint8_t)(len + 1u);
++s;
}
return (len);
The outcome should be a value between 0 and max 20.
As I receive a strange outcome, and started debugging. When I step through this
I get the following values for the variable Len:
‘\01’, ‘\02’, ‘\03’, ‘\04’, ‘\05’, ‘\06’, ‘\a’, ‘\b’, ‘\t’
I don’t understand the change from ‘\06’ to ‘\a’!
Can somebody explain this? I expect that the Len value is simply increased by 1 until character array pointer s hits the ',' char.

The values are correct, but your debugger interprets them as char type, not an integer type.
You can see escape sequences used in C++ here (and the corresponding values in ASCII).
\01 - 1 in octal, 1 in decimal
\02 - 2 in octal, 2 in decimal
...
\06 - 6 in octal, 6 in decimal
\a - equivalent to \07, the ASCII code to use the computer bell
\b - equivalent to \010 (10 octal, 8 decimal), the ASCII code for "backspace" character
\t - equivalent to \011 (11 octal, 9 decimal), the ASCII code for tabulator
etc.
I don't know if you can change the way your debugger interprets the data. Worst case, you can always print the value after casting it to int.
(gdb)p static_cast<int>(len)

Related

Why does char occupy 7 bits when the length is 1 byte ie 8 bits?

I've seen that the below program is taking only 7 bits of memory to store the character, but in general everywhere I've studied says that char occupies 1 byte of memory ie is 8 bits.
Does a single character require 8 bits or 7 bits?
If it requires 8 bits, what will be stored in the other bit?
#include <iostream>
using namespace std;
int main()
{
char ch = 'a';
int val = ch;
while (val > 0)
{
(val % 2)? cout<<1<<" " : cout<<0<<" ";
val /= 2;
}
return 0;
}
Output:
1 0 0 0 0 1 1
The below code shows the memory gap between the character, i.e. is 7 bits:
9e9 <-> 9f0 <->......<-> a13
#include <iostream>
using namespace std;
int main()
{
char arr[] = {'k','r','i','s','h','n','a'};
for(int i=0;i<7;i++)
cout<<&arr+i<<endl;
return 0;
}
Output:
0x7fff999019e9
0x7fff999019f0
0x7fff999019f7
0x7fff999019fe
0x7fff99901a05
0x7fff99901a0c
0x7fff99901a13
Your first code sample doesn't print leading zero bits, as ASCII characters all have the upper bit set to zero you'll only get at most seven bits printed if using ASCII characters. Extended ASCII characters or utf-8 use the upper bit for characters outside the basic ASCII character set.
Your second example is actually printing that each character is seven bytes long which is obviously incorrect. If you change the size of the array you are using to not be seven characters long you'll see different results.
&arr + i is equivalent to (&arr) + i as &arr is a pointer to char[7] which has a size of 7, the +i adds 7 * i bytes to the pointer. (&arr) + 1 points to one byte past the end of the array, if you try printing the values these pointers point to you'll get junk or a crash: **(&arr + i).
Your code should be static_cast<void*>(&arr[i]), you'll then see the pointer going up by one for each iteration. The cast to void* is necessary to stop the standard library from trying to print the pointer as a null terminated string.
It has nothing to do with space assigned for char. You simply converting ASCII represent of char into binary.
ASCII is a 7 bit character set. In C normally represented by an 8 bit char. If highest bit in an 8 bit byte is set, it is not an ASCII character. The eighth bit was used for parity. To communicate information between computers using different encoding.
ASCII stands for American Standard Code for Information Interchange, with the emphasis on American. The character set could not represent like Arabic letters (things with umlauts for example) or latin.
To “extend” the ASCII set and use those extra 128 values that became available by using all 8 bits, which caused problems. Eventually, Unicode came along which can represent every Unicode character. But 8 bit become a standard for char.

What does '0' mean in a subtraction? [duplicate]

This question already has answers here:
C++- Adding or subtracting '0' from a value
(4 answers)
Closed 3 years ago.
class Complex
{
public:
int a,b;
void input(string s)
{
int v1=0;
int i=0;
while(s[i]!='+')
{
v1=v1*10+s[i]-'0'; // <<---------------------------here
i++;
}
while(s[i]==' ' || s[i]=='+'||s[i]=='i')
{
i++;
}
int v2=0;
while(i<s.length())
{
v2=v2*10+s[i]-'0';
i++;
}
a=v1;
b=v2;
}
};
This is a class complex and the function input inputs string and convert it into integers a and b of class complex.
what is the requirement of subtracting '0' in this code
The characters representing the digits, '0' thru '9' have values that are (and must be) sequential. For example, in the ASCII character set the '0' character is encoded with the value 48 (decimal), '1' is 49, '2' is 50 and so on, until '9', which is 57. Other encoding systems may use different actual values for the digits (for example, in EBCDIC, '0' is 240 and '9' is 249), but the C standard requires that they are sequentially congruent. From §5.2.1 of the C11 (ISO/IEC 9899:201x) Draft:
In both the source and execution basic character sets, the value of
each character after 0 in the above list of decimal digits shall be
one greater than the value of the previous.
Thus, when you subtract the '0' character from another character that represents a digit, you get the numerical value of that digit (rather than its encoded value).
So, in the code:
int a = '6' - '0';
the value of the a will be 6 (and similarly for other digits).
The reason for not just using a value of (say) 48, rather than writing '0' is that the former would only work on systems that use that particular (i.e. ASCII) character encoding, whereas the latter will work on any compliant system.
"What does '0' means in c++" - The symbol '0' designates a single character (constant) with the value 0, which, when interpreted as an ASCII character (which it will be) has the numerical value 0x30 (or 48 in decimal). So, you are basically just subtracting 48.
I dont quite understand the logic of this function but I hope this will help:
'0' is a character literal for 0 in ASCII. The [] operator of string returns a character. So most likely s[i] - '0' is supposed to get you the digit stored in s[i] as a character. Example: '3' -'0' = 3. Note lack of ' around the 3.
The C and C++ standards require that the characters '0'..'9' be
contiguous and increasing. So to convert one of those characters to
the digit that it represents you subtract '0' and to convert a digit
to the character that represents it you add '0'.
In this case the goal is to convert the character in the integer digit that represent.

Difference between converting int to char by (char) and by ASCII

I have an example:
int var = 5;
char ch = (char)var;
char ch2 = var+48;
cout << ch << endl;
cout << ch2 << endl;
I had some other code. (char) returned wrong answer, but +48 didn't. When I changed ONLY (char) to +48, then my code got corrected.
What is the difference between converting int to char by using (char) and +48 (ASCII) in C++?
char ch=(char)var; has the same effect as char ch=var; and assigns the numeric value 5 to ch. You're using ASCII (supported by all modern systems) and ASCII character code 5 represents Enquiry 'ENQ' an old terminal control code. Perhaps some old timer has a clue what it did!
char ch2 = var+48; assigns the numeric value 53 to ch2 which happens to represent the ASCII character for the digit '5'. ASCII 48 is zero (0) and the digits all appear in the ASCII table in order after that. So 48+5 lands on 53 (which represents the character '5').
In C++ char is a integer type. The value is interpreted as representing an ASCII character but it should be thought of as holding a number.
Its numeric range is either [-128,127] or [0,255]. That's because C++ requires sizeof(char)==1 and all modern platforms have 8 bit bytes.
NB: C++ doesn't actually mandate ASCII, but again that will be the case on all modern platforms.
PS: I think its an unfortunate artifact of C (inherited by C++) that sizeof(char)==1 and there isn't a separate fundamental type called byte.
A char is simply the base integral denomination in c++. Output statements, like cout and printf map char integers to the corresponding character mapping. On Windows computers this is typically ASCII.
Note that the 5th in ASCII maps to the Enquiry character which has no printable character, while the 53rd character maps to the printable character 5.
A generally accepted hack to store a number 0-9 in a char is to do: const char ch = var + '0' It's important to note the shortcomings here:
If your code is running on some non-ASCII character mapping then characters 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 may not be laid out in order in which case this wouldn't work
If var is outside the 0 - 9 range this var + '0' will map to something other than a numeric character mapping
A guaranteed way to get the most significant digit of a number independent of 1 or 2 is to use:
const auto ch = to_string(var).front()
Generally char represents a number as int does. Casting an int value to char doesn't provide it's ASCII representation.
The ASCII codes as numbers for digits range from 48 (== '0') to 58 (== '9'). So to get the printable digit you have to add '0' (or 48).
The difference is that casting to char (char) explicitly converts the digit to a char and adding 48 do not.
Its important to note that an int is typically 32 bit and char is typically 8 bit. This means that the number you can store in a char is from -127 to +127(or 0 to 255-(2^8-1) if you use unsigned char) and in an int from −2,147,483,648 (−231) to 2,147,483,647 (231 − 1)(or 0 to 2^32 -1 for unsigned).
Adding 48 to a value is not changing the type to char.

Why '1' and (char)1 are not equal when compared in c++?

My main goal is to convert int to char type. I used (char)1 to type cast, but it doesn't seem to work due to the following result:
When I compare '1' and (char)1 in c++ in the following code
if ('1' == (char)1)
{
return 1;
}
However, it seems that the comparison is either invalid due to different variable type or they are actually not the same thing. I always thought converting integer 1 to character is (char)1. Can anyone tell me how I can convert integer 1 to char '1'?
'1' is equal to (char)49 according to http://www.asciitable.com/
(char)1 is equal to SOH (start of heading) which is a non-printable character.
Because the ASCII equivalent of '1' is 49, not 1.
'1' == The character CODE value for the printable 1, traditionally ASCII value, but today, the code point value in whatever charset is used.
The old trick is (ch - '0') to get the numeric value.
Depending on the language you should use a conversion function for a full string.
C++ - stoi, stol or strol or stringstream
C - atoi or atol (these work in C++ too)
As ibiza said, char(49) is in fact what 1 is. This is because char draws from the ASCII library.
Because when you do (char)X with X a number, you are just converting X into the range of a char, either -128 to 127 or 0 to 255 (like a modulo).
For example, (char)300 gives 44 (because 300 % 256 = 44) and (char)1 gives 1. As said in the others comments, 1 is the ASCII equivalent of SOH (Start of Heading), and not of the character '1'.

what does that mean, C programm for RLE

I am new to C so I do not understand what is happening in this line:
out[counter++] = recurring_count + '0';
What does +'0' mean?
Additionally, can you please help me by writing comments for most of the code? I don't understand it well, so I hope you can help me. Thank you.
#include "stdafx.h"
#include "stdafx.h"
#include<iostream>
void encode(char mass[], char* out, int size)
{
int counter = 0;
int recurring_count = 0;
for (int i = 0; i < size - 1; i++)
{
if (mass[i] != mass[i + 1])
{
recurring_count++;
out[counter++] = mass[i];
out[counter++] = recurring_count + '0';
recurring_count = 0;
}
else
{
recurring_count++;
}
}
}
int main()
{
char data[] = "yyyyyyttttt";
int size = sizeof(data) / sizeof(data[0]);
char * out = new char[size + 1]();
encode(data, out, size);
std::cout << out;
delete[] out;
std::cin.get();
return 0;
}
It adds the character encoding value of '0' to the value in recurring_count. If we assume ASCII encoded characters, that means adding 48.
This is common practice for making a "readable" digit from a integer value in the range 0..9 - in other words, convert a single digit number to an actual digit representation in a character form. And as long as all digits are "in sequence" (only digits between 0 and 9), it works for any encoding, not just ASCII - so a computer using EBCDIC encoding would still have the same effect.
recurring_count + '0' is a simple way of converting the int recurring_count value into an ascii character.
As you can see over on wikipedia the ascii character code of 0 is 48. Adding the value to that takes you to the corresponding character code for that value.
You see, computers may not really know about letters, digits, symbols; like the letter a, or the digit 1, or the symbol ?. All they know is zeroes and ones. True or not. To exist or not.
Here's one bit: 1
Here's another one: 0
These two are only things that a bit can be, existence or absence.
Yet computers can know about, say, 5. How? Well, 5 is 5 only in base 10; in base 4, it would be a 11, and in base 2, it would be 101. You don't have to know about the base 4, but let's examine the base 2 one, to make sure you know about that:
How would you represent 0 if you had only 0s and 1s? 0, right? You probably would also represent the 1 as 1. Then for 2? Well, you'd write 2 if you could, but you can't... So you write 10 instead.
This is exactly analogous to what you do while advancing from 9 to 10 in base 10. You cannot write 10 inside a single digit, so you rather reset the last digit to zero, and increase the next digit by one. Same thing while advancing from 19 to 20, you attempt to increase 9 by one, but you can't, because there is no single digit representation of 10 in base 10, so you rather reset that digit, and increase the next digit.
This is how you represent numbers with just 0s and 1s.
Now that you have numbers, how would you represent letters and symbols and character-digits, like the 4 and 3 inside the silly string L4M3 for example? You could map them; map them so, for example, that the number 1 would from then on represent the character A, and then 2 would represent B.
Of course, it would be a little problematic; because when you do that the number 1 would represent both the number 1 and the character A. This is exactly the reason why if you write...
printf( "%d %c", 65, 65 );
You will have the output "65 A", provided that the environment you're on is using ASCII encoding, because in ASCII 65 has been mapped to represent A when interpreted as a character. A full list can be found over there.
In short
'A' with single quotes around delivers the message that, "Hey, this A over here is to receive whatever the representative integer value of A is", and in most environments it will just be 65. Same for '0', which evaluates to 48 with ASCII encoding.