Non-Integer numbers in an String and using atoi - c++

If there are non-number characters in a string and you call atoi [I'm assuming wtoi will do the same]. How will atoi treat the string?
Lets say for an example I have the following strings:
"20234543"
"232B"
"B"
I'm sure that 1 will return the integer 20234543. What I'm curious is if 2 will return "232." [Thats what I need to solve my problem]. Also 3 should not return a value. Are these beliefs false? Also... if 2 does act as I believe, how does it handle the e character at the end of the string? [Thats typically used in exponential notation]

You can test this sort of thing yourself. I copied the code from the Cplusplus reference site. It looks like your intuition about the first two examples are correct, but the third example returns '0'. 'E' and 'e' are treated just like 'B' is in the second example also.
So the rules are
On success, the function returns the converted integral number as an int value.
If no valid conversion could be performed, a zero value is returned.
If the correct value is out of the range of representable values, INT_MAX or INT_MIN is returned.

According to the standard, "The functions atof, atoi, atol, and atoll need not affect the value of the integer expression errno on an error. If the value of the result cannot be represented, the behavior is undefined." (7.20.1, Numeric conversion functions in C99).
So, technically, anything could happen. Even for the first case, since INT_MAX is guaranteed to be at least 32767, and since 20234543 is greater than that, it could fail as well.
For better error checking, use strtol:
const char *s = "232B";
char *eptr;
long value = strtol(s, &eptr, 10); /* 10 is the base */
/* now, value is 232, eptr points to "B" */
s = "20234543";
value = strtol(s, &eptr, 10);
s = "123456789012345";
value = strtol(s, &eptr, 10);
/* If there was no overflow, value will contain 123456789012345,
otherwise, value will contain LONG_MAX and errno will be ERANGE */
If you need to parse numbers with "e" in them (exponential notation), then you should use strtod. Of course, such numbers are floating-point, and strtod returns double. If you want to make an integer out of it, you can do a conversion after checking for the correct range.

atoi reads digits from the buffer until it can't any more. It stops when it encounters any character that isn't a digit, except whitespace (which it skips) or a '+' or a '-' before it has seen any digits (which it uses to select the appropriate sign for the result). It returns 0 if it saw no digits.
So to answer your specific questions: 1 returns 20234543. 2 returns 232. 3 returns 0. The character 'e' is not whitespace, a digit, '+' or '-' so atoi stops and returns if it encounters that character.
See also here.

If atoi encounters a non-number character, it returns the number formed up until that point.

I tried using atoi() in a project, but it wouldn't work if there were any non-digit characters in the mix and they came before the digit characters - it'll return zero. It seems to not mind if they come after the digits, for whatever reason.
Here's a pretty bare bones string to int converter I wrote up that doesn't seem to have that problem (bare bones in that it doesn't work with negative numbers and it doesn't incorporate any error handling, but it might be helpful in specific instances). Hopefully it might be helpful.
int stringToInt(std::string newIntString)
{
unsigned int dataElement = 0;
unsigned int i = 0;
while ( i < newIntString.length())
{
if (newIntString[i]>=48 && newIntString[i]<=57)
{
dataElement += static_cast<unsigned int>(newIntString[i]-'0')*(pow(10,newIntString.length()-(i+1)));
}
i++;
}
return dataElement;
}

I blamed myself up to this atoi-function behaviour when I was learning-approached coding program with function calculating integer factorial result given input parameter by launching command line parameter.
atoi-function returns 0 if value is something else than numeral value and "3asdf" returns 3. C -language handles command line input parameters in char -array pointer variable as we all already know.
I was told that down at the book "Linux Hater's Handbook" there's some discussion appealing for computer geeks doesn't really like atoi-function, it's kind of foolish in reason that there's no way to check validity of given input type.
Some guy asked me why I don't brother to use strtol -function located on stdlib.h -library and he gave me an example attached to my factorial-calculating recursive method but I don't care about factorial result is bigger than integer primary type value -range, out of ranged (too large base number). It will result in negative values in my program.
I solved my problem with atoi-function first checking if given user's input parameter is truly numerical value and if that matches, after then I calculate the factorial value.
Using isdigit() -function located on chtype.h -library is following:
int checkInput(char *str[]) {
for (int x = 0; x < strlen(*str); ++x)
{
if (!isdigit(*str[x])) return 1;
}
return 0;
}
My forum-pal down in other Linux programming forum told me that if I would use strtol I could handle the situations with out of ranged values or even parse signed int to unsigned long -type meaning -0 and other negative values are not accepted.
It's important upper on my code check if charachter is not numerical value. Negotation way to check this one the function returns failed results when first numerical value comes next to check in string. (or char array in C)

Writing simple code and looking to see what it does is magical and illuminating.
On point #3, it won't return "nothing." It can't. It'll return something, but that something won't be useful to you.
http://www.cplusplus.com/reference/clibrary/cstdlib/atoi/
On success, the function returns the converted integral number as an int value.
If no valid conversion could be performed, a zero value is returned.
If the correct value is out of the range of representable values, INT_MAX or INT_MIN is returned.

Related

C++ Turning Character types into int type

So I read and was taught that subtracting '0' from my given character turns it into an int, however my Visual Studio isn't recognizing that here, saying a value of type "const char*" cannot be used to initialize an entity of type int in C++ programming here.
bigint::bigint(const char* number) : bigint() {
int number1 = number - '0'; // error code
for (int i = 0; number1 != 0 ; ++i)
{
digits[i] = number1 % 10;
number1 /= 10;
digits[i] = number1;
}
}
The goal of the first half is to simply turn the given number into a type int. The second half is outputting that number backwards with no leading zeroes. Please note this function is apart of the class declared given in a header file here:
class bigint {
public:
static const int MAX_DIGITS = 50;
private:
int digits[MAX_DIGITS];
public:
// constructors
bigint();
bigint(int number);
bigint(const char * number);
}
Is there any way to convert the char parameter to an int so I can then output an int? Without using the std library or strlen, since I know there is a way to use the '0' char but I can't seem to be doing it right.
You can turn a single character in the range '0'..'9' into a single digit 0..9 by subtracting '0', but you cannot turn a string of characters into a number by subtracting '0'. You need a parsing function like std::stoi() to do the conversion work character-by-character.
But that's not what you need here. If you convert the string to a number, you then have to take the number apart. The string is already in pieces, so:
bigint::bigint(const char* number) : bigint() {
while (number) // keep looping until we hit the string's null terminator
{
digits[i] = number - '0'; // store the digit for the current character
number++; // advance the string to the next character
}
}
There could be some extra work involved in a more advanced version, such as sizing digits appropriately to fit the number of digits in number. Currently we have no way to know how many slots are actually in use in digits, and this will lead to problems later when the program has to figure out where to stop reading digits.
I don't know what your understanding is, so I will go over everything I see in the code snippet.
First, what you're passing to the function is a pointer to a char, with const keyword making the char immutable or "read only" if you prefer.
A char is actually a 8-bit sized 1 integer. It can store a numerical value in binary form, which can be also interpreted as a character.
Fundamental types - cppreference.com
Standard also expects char to be a "type for character representation". It could be represented in ASCII code, but it could be something else like EBCDIC maybe, I'm not sure. For future reference just remember that ASCII is not guaranteed, although you're likely to never use a system where it's no ASCII (if I'm correct). But it's not so much that char is somehow enforcing encoding - it's the functions that you pass those chars and char pointers to, that interpret their content as characters in ASCII encoding, while on some obscure or legacy platforms they could actually interpret them as characters in some less common encoding. Standard however demands that encoding used has this property: codes for characters '0' to '9' are subsequent, and thus '9' - '0' means: subtract code of '0' from code of '9'. The result is 9, because code for '9' is 9 positions from code for '0' in ASCII. Ranges 'a'-'z' and 'A'-'Z' have this quality as well, in case you need that, but it's a little bit trickier if your input is in base higher than 10, like a popular base of 16 called hexadecimal.
A pointer stores an address, so the most basic functionality for it is to "point" to a variable. But it can be used in various ways, one of which, very frequent in C, is to store address of the beginning of an array of variables of the same type. Those could be chars. We could interpret such an array as a line of text, or a string (a concept, not to be confused with C++ specific string class).
Since a pointer does not contain information on length or end of such an array, we need to get that information across to the function we pass the pointer to. Sometimes we can just provide the length, sometimes we provide the end pointer. When dealing with "lines of text" or c-style strings, we use (and c standard library functions expect) what is callled a null-terminated string. In such a string, the first char after the last one used for a line is a null, which is, to simplify, basically a 0. A 0, but not a '0'.
So what you're passing to the function, and what you interpret as, say 416, is actually a pointer to a place in memory where '4' is econded and stored as a number, followed by '1' and then '6', taking up three bytes. And depending on how you obtained this line of text, '6' is probably followed by a NULL, that is - a zero.
NULL - cppreference.com
Conversion of such a string to a number first requires a data type able to hold it. In case of 416 it could be anything from short upwards. If you wanted to do that on your own, you would need to iterate over entire line of text and add the numbers multiplied by proper powers of 10, take care of signedness too and maybe check if there are any edge cases. You could however use a standard function like int atoi (const char * str);
atoi - cplusplus.com
Now, that would be nice of course, but you're trying to work with "bigints". However you define them, it means your class' purpose is to deal with numbers to big to be stored in built-in types. So there is no way you can convert them just like that.
What you're trying to do right now seems to be a constructor that creates a bigint out of number represented as a c style string. How shall I put it... you want to store your bigint internally as an array of it's digits in base 10 (a good choice for code simplicity, readability and maintainability, as well as interoperation with base 10 textual representation, but it doesn't make efficient use of memory and processing power.) and your input is also an array of digits in base 10, except internally you're storing numbers as numbers, while your input is encoded characters. You need to:
sanitize the input (you need criteria for what kind of input is acceptable, fe. if there can be any leading or trailing whitespace, can the number be followed by any non-numerical characters to be discarded, how to represent signedness, is + for positive numbers optional or forbidden etc., throw exception if the input is invalid.
convert whatever standard you enforce for your input into whatever uniform standard you employ internally, fe. strip leading whitespace, remove + sign if it's optional and you don't use it internally etc.
when you know which positions in your internal array correspond with which positions in the input string, you can iterate over it and copy every number, decoding it first from ASCII.
A side note - I can't be sure as to what exactly it is that you expect your input to be, because it's only likely that it is a textual representation - as it could just as easily be an array of unencoded chars. Of course it's obviously the former, which I know because of your post, but the function prototype (the line with return type and argument types) does not assure anyone about that. Just another thing to be aware of.
Hope this answer helped you understand what is happening there.
PS. I cannot emphasize strongly enough that the biggest problem with your code is that even if this line worked:
int number1 = number - '0'; // error code
You'd be trying to store a number on the order of 10^50 into a variable capable of holding on the order of 10^9
The crucial part in this problem, which I have a vague feeling you may have found on spoj.com is that you're handling BIGints. Integers too big to be stored in a trivial manner.
1 ) The standard does not actually require for char to be this size directly, but indirectly it requires for it to be at least 8 bits, possibly more on obscure platforms. And yes, I think there were some platforms where it was indeed over 8 bits. Same thing with pointers that may behave strange on obscure architectures.

Filtering out negative numbers using scanf

my aim is to scan for some positive-only a, if negative number is entered, the function should print error:
if ( scanf("%u %lf", &a, &b) != 2 ) {
//error
}
Now the theory is that scanf returns successful writing attempts, so if I enter a negative number, scanf shouldn't return 2. My theory seems to be incorrect, why?
Obviously I could simply scanf %d and then check whether %d is negative but right now I'm curious why my initial theory is incorrect. So is there a way without scanning and then comparing?
Your theory doesn't work because scanf doesn't fail or doesn't refuse to copy the user input to the memory address specified even if a signed number is entered when an unsigned number is expected. Therefore, as per the scanf documentation, scanf will return the number of items copied to the provided memory address.
My theory seems to be incorrect, why?
When %u is used as the format spefifier, scanf expects (from http://en.cppreference.com/w/cpp/io/c/fscanf):
The format of the number is the same as expected by strtoul() with the value 10 for the base argument.
strtoul documentation says this about negative numbeers:
If the minus sign was part of the input sequence, the numeric value calculated from the sequence of digits is negated as if by unary minus in the result type, which applies unsigned integer wraparound rules.
Hence %u does not fail even when you enter a negative number.
scanf() still succeeds (and increments the return value) if the input is not the right type of number.
For example, if scanf("%u", &my_unsigned_int); reads the text "-1", it will return 1 and my_unsigned_int will be written to like my_unsigned_int = (unsigned int) -1;
This will also be the case if they enter a decimal value - it will be assigned to your int field with a type cast. It doesn't look like you rule out floats, so you probably just want to scanf("%f", &some_float); and then check the value of some_float.

Correct way of using isDigit() function in c++

I am new to C++ and I have studied some basics of C language. Here's my code snippet.
#include "iostream"
using namespace std;
int main() {
int a=108;
if(!isdigit(a)) {
cout<<"The number is not a digit";
}
else
cout<<"It's a Number!";
}
}
I dont know why, but it satisfies the condition. It should have outputted, It's a Number! Please correct me and also if u have a better solution to this, do suggest! (To make it more clear) I want to check whether the entered int is actually composed of digits. Thank you
First of all, I'm not sure if you realise that there is a difference between a digit and a number. A digit is a single character from 0 to 9, a number is composed of digits.
Second, std::isigit has a lousy, confusing legacy interface. As documentation will tell you, it takes an int but requires its argument to be representable as unsigned char or EOF to avoid undefined behaviour. The int you pass to the function represents a single character; whether the mapping is according to ASCII or not is not mandated by C++ and thus implementation-defined.
Nevertheless, your C++ implementation very likely uses ASCII or a superset thereof. In ASCII, 108 is the lower-case letter 'l'. isdigit therefore returns false.
I can see where your confusion comes from. The prototype of isdigit says it takes a single int parameter; however, all parameters of type int are digits, so that would be pointless to check!
Here's when you can see the big difference between cplusplus.com and cppreference.com. The former shows little information, while the latter explains a lot more. cppreference gives you the real hint:
The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF
The function is expecting a value between [0,127] and you can see on the page linked that the digits 0123456789 are represented by the numbers [48,57]. As others have pointed out, 108 is actually the ASCII character l.
for (unsigned int i = 0; i < 128; ++i)
{
if (std::isdigit(i))
{
std::cout << i << " is a digit";
}
}
You can't check a number like 108, you would have to check each digit.
isdigit uses the character representation of the int value 108, which is ASCII for l, which is not a digit.
Function Prototype of isdigit()
int isdigit(int argument);
if you pass a=108 to the function it will convert the value to it's equivalent ASCII Value and return the result false. Because 108 is equivalent to 'l' and 'l' is not a digit.
Now pass a = 48 to the function because 48 equivalent to char '0' now the function will return true.
You can also read this and this tutorial for more.
you are using isdigit wrong, as you were told in the answers above it's meant to be used with character representations, to check whether a certain char is a digit or not. you can check this page for more help on isdigit: http://www.tutorialspoint.com/c_standard_library/c_function_isdigit.htm
to your question - I guess you are trying to check if the number you are sending is a single digit number. for this you can simply do:
if (a >= 0 && a <= 9){
// a is a single digit...
}

array of char is equal to int?

I am trying to figure out why an array of char is assigned to a int value, now I am a little confused in using cast operator.
I didn't get what is in do statement, I hope somebody can explain
char *readword()
{
int c,i;
char t[255];
char *p;
//jump over chars who aren't letters
while ((c=getchar())<'A'|| (c>'Z' && c<'a') || c>'z')
if (c==EOF) return 0;
i=0;
do {
t[i++]=c;// shouldn't be like (char)c
} while ((c=getchar())>='A' && c<='Z' || c>='a' && c<='z');
//keep the word in heap memory
if ( c==EOF)
return 0;
t[i++]='\0';
if ((p=(char *)malloc(i))==0)
{
printf(" not enough memory\n");
exit(1);
}
strcpy(p,t);
return p;
}
The getchar() function returns an int type; and it is important to use an int to capture the getchar() return value. This is due to if getchar() fails, it returns an (int)(EOF)(as per chux comment. When it successfully returns, it will return a value that is suitable for a char.
The question code is building a char string or array, one char at a time:
t[i++]=c;
The above line could be written:
t[i++]=(char)c;
Either is suitable due to the compiler automatically converting the first case.
The mixture of char and int is fairly simple: EOF is intended as a file that can be distinguished from any value you could have read from the file.
To support that, you need to initially read the data from the file into something larger than a char, so it can accommodate at least one value that couldn't possibly have come from the file. The type they chose for that purpose was int.
So, you read a character from the file, into an int. You compare that to EOF to see if it's really a character that came from the file or not. If (and only if) you verify that it really came from the file, you save the value into a char, because you now know that's what it really represents.
That said, I'd consider it pretty poor code as it stands right now. Just for one particularly obvious example, instead of the c<'A'|| (c>'Z' && c<'a') || c>'z') type of code, you almost certainly want to use isalpha(c) instead.
It's also a lot easier to do this with scanf instead.
You can assign any int value to a char. Only the lowest 8 bits will be used. A cast would be more "correct" in terms of communicating your intent - people might not otherwise remember that anything larger than an 8-bit value will get truncated and results are likely to be unexpected.
Note that since you didn't say "unsigned char t[255]" that you actually get 7 bits and the most significant (8th) bit will be interpreted as a sign. So for example if you were to say
char t = 0xFF;
then you would in fact get -1 assigned to t.
If you assign numbers > 0xFF then all bits higher than the 8th bit will get stripped. So if you were to say:
char t = 0x101;
Than in fact you'd get the value 1 assigned to t.
The code in question is correct because getchar() returns an int and -1 is an error value so it's important to check it. For non-error cases the return will fit in an 8-bit char.

how compilers detect overflow in numbers while compiling?

Compiler deal with source code as strings so in C++ for example when it encourage statement like unsigned char x = 150; it knows from type limits that unsigned char must be in range between 0 and 255.
My question is while the number 150 remain string what algorithm compiler use to compare digit sequence - 150 in this case - against type limits?
I made a simple algorithm to do that for type 'int' for decimal, octal, hexadecimal and little endian binary but i don't think compiler do such thing like that to detect overflow in numbers.
the algorithm i made are coded in C++:
typedef signed char int8;
typedef signed int int32;
#define DEC 0
#define HEX 1
#define OCT 2
#define BIN 3
bool isOverflow(const char* value, int32 base)
{
// left-most digit for maximum and minimum number
static const char* max_numbers[4][2] =
{
// INT_MAX INT_MIN
{ "2147483647", "2147483648" }, // decimal
{ "7fffffff", "80000000" }, // hexadecimal
{ "17777777777", "20000000000" }, // octal
{ "01111111111111111111111111111111", "10000000000000000000000000000000" } // binary
};
// size of strings in max_numbers array
static const int32 number_sizes[] = { 10, 8, 11, 32 };
// input string size
int32 str_len = strlen(value);
// is sign mark exist in input string
int32 signExist = ((base == DEC || base == OCT) && *value == '-');
// first non zero digit in input number
int32 non_zero_index = signExist;
// locate first non zero index
while(non_zero_index < str_len && value[non_zero_index] == 0) non_zero_index++;
// if non_zero_index equal length then all digits are zero
if (non_zero_index == str_len) return false;
// get number of digits that actually represent the number
int32 diff = str_len - non_zero_index;
// if difference less than 10 digits then no overflow will happened
if (diff < number_sizes[base]) return false;
// if difference greater than 10 digits then overflow will happened
if (diff > number_sizes[base]) return true;
// left digit in input and search strings
int8 left1 = 0, left2 = 0;
// if digits equal to 10 then loop over digits from left to right and compare
for (int32 i = 0; non_zero_index < str_len; non_zero_index++, i++)
{
// get input digit
left1 = value[non_zero_index];
// get match digit
left2 = max_numbers[signExist][i];
// if digits not equal then if left1 is greater overflow will occurred, false otherwise
if (left1 != left2) return left1 > left2;
}
// overflow won't happened
return false;
}
This algorithm can be optimized to work with all integers types but with float-point i have to make new one to work with IEEE float-point representation.
i think compilers use efficient algorithm to detect overflow other than mine, don't you?
Compilers handle it pretty much the easiest possible way: they convert the number to an integer or float as appropriate. There's no law that says the compiler can't convert from strings to some other representation as appropriate.
But now, consider your original problem; what about if you took the digits and just built routines to treat them as numbers? Say, for example, an algorithm that could take
6 + 5
and compute the sum as a two-digit string 11? Extend that to other operations and you could compute whether 32769 is greater than 32768 directly.
It seems simplest for the compiler to convert the string representation into an integer in one step, and then compare against upper and lower bounds of the type in a secondary step.
I can't imagine why it would be better to compare strings.
For floats, the problem is harder due to precision and rounding.
I'm not sure what particular algorithms most compliers employ to do this, but here are a few options that could work:
The compiler could try using an existing library (for example, in C++, a stringstream) to try to convert the string into the number of the appropriate type. This could then be used to check for errors.
The compiler could convert the string into a very high-precision number format (for example, a 128-bit integer) and then check, whenever an assignment is made from a numeric literal to a primitive type, whether the value could fit in that range without a cast.
Seeing that compilers will have to convert to the integral/numeric type anyway, they can just as well let their atoi, atol, atof functions raise an error when the destination capacity gets exceeded.
There is no need to operate on strings beforehand, and convert in a separate step.
Most likely, I'd think, compilers will convert to integral types directly in their (highly optimized) parser's semantic actions.
In most compiler theory, the text of a program (translation unit) is converted into tokens and values. For example, the text "150" would be converted into a token of constant integer with a value of 150. This is of course, after the preprocessor has run.
The compiler then begins the process of syntax and semantic checking. So an assignment statement is evaluated for syntax (correct spelling and format), then checked for semantics.
The compiler can either complain about a value that is out of range (such as -150 for unsigned char) or apply some transformations. In the case of -150, this would be transformed into an 8-bit value (the Most Significant Bit that indicated negativity is now the value 128). I am not a language lawyer, so I don't exactly know the freedom the compiler has in this respect, nor whether a warning is required or not.
In summary, the compiler has some freedoms when evaluating statements and checking the semantics. All text is converted into an internal representation for tokens and values (a more compact data structure). Checking for whether a constant integer literal is within range for an assignment statement takes place during the semantics stage of the compilation process. Semantics are decided from the language standard or company policy. Some semantics are turned into compiler options and left for the programmer.