Understanding how to create atoi; How are characters compared?

Understanding how to create atoi; How are characters compared? - c++

I am trying to improve my understanding of C++, pointer arithmetic especially. I use atoi pretty often, but I have rarely given thought as to how it works. Looking up how it is done, I understand it mostly, but there is one thing that I am confused about.
Here is an example of a solution I have found online:
int atoi( char* pStr )
{
int iRetVal = 0;
if ( pStr )
{
while ( *pStr && *pStr <= '9' && *pStr >= '0' )
{
iRetVal = (iRetVal * 10) + (*pStr - '0');
pStr++;
}
}
return iRetVal;
}
I think the main reason I have had a hard time grasping how atoi as been done in the past is the way characters are compared. The "while" statement is saying while the character exists, and the character is less-than-or-equal-to 9, and it is greater-than-or-equal-to 0 then do stuff. This statement says two things to me:
Characters can be compared to other characters logically (but what is the returned value?).
Before I looked into this I suppose I knew it subconsciously but I never actually thought about it, but a '5' character is "smaller" than a '6' character in the same way that 5 is less than 6, so you can compare the characters as integers, essentially (for this intent).
Somehow while (*sPtr) and *SPtr != 0 are different. This seems obvious to me, but I find that I cannot put it into words, which means I know this is true but I do not understand why.
Edit: I have no idea what the *pStr - '0' part would do.
Any help making sense of these observations would be very... helpful! Thanks!

while the character exists
No, not really. It says "while character is not 0 (or '\0'). Basically, ASCII character '\0' indicates an end of a "C" string. Since you don't want to go past the end of a character array (and the exact length is not known), every character is tested for '\0'.
Characters can be compared to other characters logically
That's right. Character is nothing but a number, well, at least in ASCII encoding. In ASCII, for instance, '0' corresponds to a decimal value of 48, '1' is 49, 'Z' is 90 (you can take a look at ASCII Table here). So yeah, you can compare characters just like you compare integers.
Somehow while (*sPtr) and *sPtr != 0 are different.
Not different at all. A decimal 0 is a special ASCII symbol (nul) that is used to indicate the end of "C" string, as I mentioned in the beginning. You cannot see or print (nul), but it's there.

The *pStr - '0' converts the character to its numeric value '1' - '0' = 1
The while loop checks if we are not at the end of the string and that we have a valid digit.

A character in C is represented simply as an ASCII value. Since all the digits are consecutive in ASCII (i.e. 0x30 == '0' and 0x39 == '9' with all the other digits in between), you can determine if a character is a digit by simply doing a range check, and you can get the digit's value by subtracting '0'.

Note that posted implementation of atoi is not complete. Real atoi can process negative values.
Somehow while (*sPtr) and *sPtr != 0 are different.
These two expressions are the same. When used as condition, *sPtr is considered true when value stored at address sPtr is not zero, and *sPtr != 0 is true when value stored at address sPtr is not zero. Difference is when used somewhere else, then second expression evaluates to true or false, but the first one evaluates to stored value.

C-style strings are null-terminated.
Therefore:
while ( *pStr && *pStr <= '9' && *pStr >= '0' )
This tests:
*pStr that we have not yet reached the end of the string and is equivalent to writing *pStr != 0 (note without the single quote, ASCII value 0, or NUL).
*pStr >= '0' && *pStr <= '9' (perhaps more logically) that the character at *pStr is in the range '0' (ASCII value 48) to '9' (ASCII value 57), that is a digit.

The representation of '0' in memory os 0x30 and the representation of '9' is 0x39. This is what the computer sees, and when it compares them with logical operators, it uses these values. The nul-termination character is represented as 0x00, (aka zero). The key here is that chars are just like any other int to the machine.
Therefore, the while statement is saying:
While the char we are examining is valid (aka NOT zero and therefore NOT a nul-terminator), and its value (as the machine sees it) is less than 0x39 and its value is greater than 0x30, proceed.
The body of the while loop then calculates the appropriate value to add to the accumulator based on the integer's position in the string. It then increments the pointer and goes again. Once it's done, it returns the accumulated value.

This chunk of code is using ascii values to accumulate an integer tally of it's alpha equivalent.
In regards to your first numbered bullet, it seems quite trivial that when comparing anything the result is boolean. Although I feel like you were trying to ask if the compiler actually understands "characters". To my understanding though this comparison is done using the ascii values of the characters. i.e. a < b is interpreted as ( 97 < 98).
(Note that it is also easy to see that ascii values are used when you compare 'a' and 'A', as 'A' is less than 'a')
Concerning your second bullet, it seems that the while loop is checking that there is in fact an assigned value that is not NULL (ascii value of 0). The and operator produces FALSE as soon as a false statement is encountered, so that you don't do comparison on a NULL char. As for the rest of the while loop, it is doing ascii comparison as I mentioned about bullet 1. It is just checking whether or not the given character corresponds to an ascii value that is related to a number. i.e. between '0' and '9' (or ascii: between 48 and 57)
LASTLY
the (*ptr-'0') is the most interesting part in my opinion. This statement returns an integer between 0 and 9 inclusive. If you take a look at an ascii chart you will notice the numbers 0 through 9 are beside each other. So imagine '3'-'0' which is 51 - 48 and produces 3! :D So in simpler terms, it is doing ascii subtraction and returning the corresponding integer value. :D
Cheers, and I hope this explains a bit

Let's break it down:
if ( pStr )
If you pass atoi a null pointer, pStr will be 0x00 - and this will be false. Otherwise, we have something to parse.
while ( *pStr && *pStr <= '9' && *pStr >= '0' )
Ok, there's a bunch of things going on here. *pStr means we check if the value pStr is pointing to is 0x00 or not. If you look at an ASCII table, the ASCII for 0x00 is 'null' and in C/C++ the convention is that strings are null terminated (as opposed to Pascal and Java style strings, which tell you their length then have that many characters). So, when *pStr evaluates to false, our string has come to an end and we should stop.
*pStr <= '9' && *pStr >= '0' works because the values for the ASCII characters '0' '1' '2' '3' '4' '5' '6' '7' '8' '9' are all contiguous - '0' is 0x30 and '9' is 0x39, for example. So, if pStr's pointed to value is outside this range, then we're not parsing an integer and we should stop.
iRetVal = (iRetVal * 10) + (*pStr - '0');
Because of the properties of ASCII numerals being contiguous in memory, it so happens that if we know we have a numeral, *pStr - '0' evaluates to its numerical value - 0 for '0' (0x30 - 0x30), 1 for '1' (0x31 - 0x30)... 9 for '9'. So we shift our number up and slide in the new place.
pStr++;
By adding one to the pointer, the pointer points to the next address in memory - the next character in the string we are converting to an integer.
Note that this function will screw up if the string is not null terminated, it has any non numerals (such as '-') or if it is non-ASCII in any way. It's not magic, it just relies on these things being true.

Related

Incrementing a uint8_t variable, strange outcome

In a C++ class I've the following code/while loop:
uint8_t len = 0;
while (*s != ',') {
len = (uint8_t)(len + 1u);
++s;
}
return (len);
The outcome should be a value between 0 and max 20.
As I receive a strange outcome, and started debugging. When I step through this
I get the following values for the variable Len:
‘\01’, ‘\02’, ‘\03’, ‘\04’, ‘\05’, ‘\06’, ‘\a’, ‘\b’, ‘\t’
I don’t understand the change from ‘\06’ to ‘\a’!
Can somebody explain this? I expect that the Len value is simply increased by 1 until character array pointer s hits the ',' char.

The values are correct, but your debugger interprets them as char type, not an integer type.
You can see escape sequences used in C++ here (and the corresponding values in ASCII).
\01 - 1 in octal, 1 in decimal
\02 - 2 in octal, 2 in decimal
...
\06 - 6 in octal, 6 in decimal
\a - equivalent to \07, the ASCII code to use the computer bell
\b - equivalent to \010 (10 octal, 8 decimal), the ASCII code for "backspace" character
\t - equivalent to \011 (11 octal, 9 decimal), the ASCII code for tabulator
etc.
I don't know if you can change the way your debugger interprets the data. Worst case, you can always print the value after casting it to int.
(gdb)p static_cast<int>(len)

Value of '\0' in OpenCV

When iterating through and finding the uchar of every color of every pixel I noticed I was getting '\0' every so often which would show up as 0 when cast to an int. If what I remember holds true, '\0' isn't supposed to be 0 but rather like 140? Is this the correct thinking?

If what I remember holds true, '\0' isn't supposed to be 0 but rather like 140? Is this the correct thinking?
No.
A character literal of '\0' is numeric 0 expressed in octal format. The prefix for an octal literal is \0, and then it must be followed by zero or more octal digits. Numeric 0 in octal is 000, so the compiler allows \0, \00 and \000 to be used for numeric 0.
The character '0' is the ASCII character 0 (Unicode codepoint U+0030 DIGIT ZERO), which has a numeric value of 48.

What does '0' mean in a subtraction? [duplicate]

This question already has answers here:
C++- Adding or subtracting '0' from a value
(4 answers)
Closed 3 years ago.
class Complex
{
public:
int a,b;
void input(string s)
{
int v1=0;
int i=0;
while(s[i]!='+')
{
v1=v1*10+s[i]-'0'; // <<---------------------------here
i++;
}
while(s[i]==' ' || s[i]=='+'||s[i]=='i')
{
i++;
}
int v2=0;
while(i<s.length())
{
v2=v2*10+s[i]-'0';
i++;
}
a=v1;
b=v2;
}
};
This is a class complex and the function input inputs string and convert it into integers a and b of class complex.
what is the requirement of subtracting '0' in this code

The characters representing the digits, '0' thru '9' have values that are (and must be) sequential. For example, in the ASCII character set the '0' character is encoded with the value 48 (decimal), '1' is 49, '2' is 50 and so on, until '9', which is 57. Other encoding systems may use different actual values for the digits (for example, in EBCDIC, '0' is 240 and '9' is 249), but the C standard requires that they are sequentially congruent. From §5.2.1 of the C11 (ISO/IEC 9899:201x) Draft:
In both the source and execution basic character sets, the value of
each character after 0 in the above list of decimal digits shall be
one greater than the value of the previous.
Thus, when you subtract the '0' character from another character that represents a digit, you get the numerical value of that digit (rather than its encoded value).
So, in the code:
int a = '6' - '0';
the value of the a will be 6 (and similarly for other digits).
The reason for not just using a value of (say) 48, rather than writing '0' is that the former would only work on systems that use that particular (i.e. ASCII) character encoding, whereas the latter will work on any compliant system.

"What does '0' means in c++" - The symbol '0' designates a single character (constant) with the value 0, which, when interpreted as an ASCII character (which it will be) has the numerical value 0x30 (or 48 in decimal). So, you are basically just subtracting 48.

I dont quite understand the logic of this function but I hope this will help:
'0' is a character literal for 0 in ASCII. The [] operator of string returns a character. So most likely s[i] - '0' is supposed to get you the digit stored in s[i] as a character. Example: '3' -'0' = 3. Note lack of ' around the 3.

The C and C++ standards require that the characters '0'..'9' be
contiguous and increasing. So to convert one of those characters to
the digit that it represents you subtract '0' and to convert a digit
to the character that represents it you add '0'.
In this case the goal is to convert the character in the integer digit that represent.

Why '1' and (char)1 are not equal when compared in c++?

My main goal is to convert int to char type. I used (char)1 to type cast, but it doesn't seem to work due to the following result:
When I compare '1' and (char)1 in c++ in the following code
if ('1' == (char)1)
{
return 1;
}
However, it seems that the comparison is either invalid due to different variable type or they are actually not the same thing. I always thought converting integer 1 to character is (char)1. Can anyone tell me how I can convert integer 1 to char '1'?

'1' is equal to (char)49 according to http://www.asciitable.com/
(char)1 is equal to SOH (start of heading) which is a non-printable character.

Because the ASCII equivalent of '1' is 49, not 1.

'1' == The character CODE value for the printable 1, traditionally ASCII value, but today, the code point value in whatever charset is used.
The old trick is (ch - '0') to get the numeric value.
Depending on the language you should use a conversion function for a full string.
C++ - stoi, stol or strol or stringstream
C - atoi or atol (these work in C++ too)

As ibiza said, char(49) is in fact what 1 is. This is because char draws from the ASCII library.

Because when you do (char)X with X a number, you are just converting X into the range of a char, either -128 to 127 or 0 to 255 (like a modulo).
For example, (char)300 gives 44 (because 300 % 256 = 44) and (char)1 gives 1. As said in the others comments, 1 is the ASCII equivalent of SOH (Start of Heading), and not of the character '1'.

Extra numbers being appended to string, don't know why?

Number(string binary)
{
int raw_string_int[size];
char raw_string_char[size];
strcpy(raw_string_char,binary.c_str());
printf("Raw String is %s",raw_string_char);
for (int i=0;i<size;i++)
{
raw_string_int[i] = int(raw_string_char[i]);
printf("%i\n",int(raw_string_char[i]));
if (raw_string_int[i] != 0 || raw_string_int[i] != 1)
{
printf("ERROR NOT A BINARY NUMBER\n");
exit(0);
}
}
Hi, I'm entering 0001 as binary at the command prompt, but raw_string_char is being appended with two extra numbers. Can anyone explain to me why this is? Is the carriage return being brought in as a char?
Here is What I'm getting at the command prompt:
./test
0001
Raw String is 000148
ERROR NOT A BINARY NUMBER

You forgot the "\n" in your first printf. The 48 is from the second printf, and is the result of casting the first '0' (ASCII 0x30 = 48) to an int.
To convert a textual 0 or 1 to the corresponding integer, you need to subtract 0x30.

Your assumption that char('0') == int(0) and char('1') == int(1) just doesn't hold. In ASCII these characters have the values of 48 and 49.
What you should do to get integer values of digit characters is substract '0' instead of simple casting (raw_string_int[x] = raw_string_char[x] - '0';).
I think you have conceptual problems though. The array can't be full of valid values to the end (the corresponding C-string would at least contain a null-terminator, which is not a valid binary character). You can use the string's size() method to find out how many characters the string actually contains. And naturally you are risking buffer overflows, should the binary string contain size characters or more.
If the intention is to check if the input is a valid binary number, why can't you test the original string, why would you copy data around to two more arrays?

You're printing every character in raw_string_char. C-style strings go until the first zero character (that's '\0', not 0).
Change to for (int i = 0; raw_string_char[i] != 0 && i < size; i++).

Like others said, '0' is converted to an integer 48. You don't really need to convert the C++ string to a C style string. You can use iterators or the index operator [] on the C++ string. You also need to use logical AND && rather than logical OR || in your if statement.
#include<cstdio>
#include<string>
void Number(std::string binary) {
for(std::string::const_iterator i = binary.begin(); i != binary.end(); i++ )
if( *i != '0' && *i != '1')
{
printf("ERROR NOT A BINARY NUMBER\n");
return;
}
}
int main() {
Number("0001");
}

The raw_string_char is never initialized, the extra characters are possibly due to this. Use memset to initialize the array.
memset(raw_string_array, 0, size);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js