std::string to int / double in one pass

std::string to int / double in one pass - c++

I'm parsing a string which may contain either a real or an integral value. I would like to parse that string and get either the integral or the real value in a single parsing.
I could use std::stoi and std::stod, but if i call stoi first and it is a real, then it's going to fail and i will have to call stof, causing a second parsing. And if i call stof first and that the string contains an integral, it's going to consider it as a valid real value, losing the information that it is an integral.
Is there some kind of function that can parse both types in a single pass ? Or do i first have to look for a dot manually and call the right function ?
Thank you. :)

You will not find a standard call to achieve this for the simple reason that a string of digits without a dot is both a valid integer and a valid double.
If your criterion is "double if and only if dot", then look for the dot by hand. Alternatively, read as double and check that the fractional part is null.

Since you said (in the comments above) that simple dot notation is all you want in real numbers, and you want a single-pass (i.e. no back-stepping to already-parsed input), and (again from your comment) are more after the programming experience than efficiency / maintainability / extendability, how about this:
char const * input = /*...*/;
char const * parse_end;
size_t pos;
size_t pos2 = 0;
// parse integer (or pre-digit part of real)
int integer = strtol( input, &parse_end, 10 );
if ( *parse_end == '.' )
{
// you have a real number -- parse the post-digit part
input = parse_end;
double real = strtod( input, &parse_end );
// real + integer is your result
}
else
{
// integer is your result
}
// in either case, parse_end is your position
Why did I use C functions... stoi returns an index, but stod expects a string. So I'd have to do a substr() or similar, while the C functions work with pointers, making things easier.
What I said in my comment holds true: As a brain experiment this holds some value, but any real parsing work should make use of existing solutions like Boost.Spirit. Getting familiar with such building blocks is, IMHO, more valuable than learning how to roll your own.

You should parse it by yourself, using std::string::substr, std::string::find_first_of, std::string::find_first_not_of, etc.
As you know, each of std::stoi and std::stof interprets the first longest substring matching a right representation pattern of required type. You might think the integral-parsed result is always different real-parsed result if both possible, but it isn't.
Example 1: think about "123.". std::stoi will parse the substring "123" and std::stof will parse the whole "123.". "123." is a valid floating-point literal, but it represents an exact integer.
Example 2: think about "123.0". This is a trivial real value representation. std::stoi will parse the substring "123" and std::stof will parse the whole "123.0". Two results evaluate arithmetically same.
This is where you should decide what to parse and what not to. Please see cppreference.com article integer literal and floating-point literal for possible patterns.
With this difficulties, many lexers just tokenize the input (separating it by spaces) and check if the full token matches any of valid representation. I think, If you don't know whether the input is integral or approx real, just parse it by std::stof.
In addition, some solutions casting float to int would cause an erroneous behavior. A float typed variable having integral value is not guaranteed to be evaluated equal to an int typed variable with the same integral value. It's because float, commonly compiled to use float32_t(IEEE 754-1985 single / IEEE 754-2008 binary32) has 24 bits width of significand. So a valid string representation of integer which fits in 32-bit signed, may not fit in float. You lose the precision. double, commonly IEEE 754-2008 binary64, will not lose significand width compared with int32_t, but same problem with int64_t and so on.

Related

C++ Turning Character types into int type

So I read and was taught that subtracting '0' from my given character turns it into an int, however my Visual Studio isn't recognizing that here, saying a value of type "const char*" cannot be used to initialize an entity of type int in C++ programming here.
bigint::bigint(const char* number) : bigint() {
int number1 = number - '0'; // error code
for (int i = 0; number1 != 0 ; ++i)
{
digits[i] = number1 % 10;
number1 /= 10;
digits[i] = number1;
}
}
The goal of the first half is to simply turn the given number into a type int. The second half is outputting that number backwards with no leading zeroes. Please note this function is apart of the class declared given in a header file here:
class bigint {
public:
static const int MAX_DIGITS = 50;
private:
int digits[MAX_DIGITS];
public:
// constructors
bigint();
bigint(int number);
bigint(const char * number);
}
Is there any way to convert the char parameter to an int so I can then output an int? Without using the std library or strlen, since I know there is a way to use the '0' char but I can't seem to be doing it right.

You can turn a single character in the range '0'..'9' into a single digit 0..9 by subtracting '0', but you cannot turn a string of characters into a number by subtracting '0'. You need a parsing function like std::stoi() to do the conversion work character-by-character.
But that's not what you need here. If you convert the string to a number, you then have to take the number apart. The string is already in pieces, so:
bigint::bigint(const char* number) : bigint() {
while (number) // keep looping until we hit the string's null terminator
{
digits[i] = number - '0'; // store the digit for the current character
number++; // advance the string to the next character
}
}
There could be some extra work involved in a more advanced version, such as sizing digits appropriately to fit the number of digits in number. Currently we have no way to know how many slots are actually in use in digits, and this will lead to problems later when the program has to figure out where to stop reading digits.

I don't know what your understanding is, so I will go over everything I see in the code snippet.
First, what you're passing to the function is a pointer to a char, with const keyword making the char immutable or "read only" if you prefer.
A char is actually a 8-bit sized 1 integer. It can store a numerical value in binary form, which can be also interpreted as a character.
Fundamental types - cppreference.com
Standard also expects char to be a "type for character representation". It could be represented in ASCII code, but it could be something else like EBCDIC maybe, I'm not sure. For future reference just remember that ASCII is not guaranteed, although you're likely to never use a system where it's no ASCII (if I'm correct). But it's not so much that char is somehow enforcing encoding - it's the functions that you pass those chars and char pointers to, that interpret their content as characters in ASCII encoding, while on some obscure or legacy platforms they could actually interpret them as characters in some less common encoding. Standard however demands that encoding used has this property: codes for characters '0' to '9' are subsequent, and thus '9' - '0' means: subtract code of '0' from code of '9'. The result is 9, because code for '9' is 9 positions from code for '0' in ASCII. Ranges 'a'-'z' and 'A'-'Z' have this quality as well, in case you need that, but it's a little bit trickier if your input is in base higher than 10, like a popular base of 16 called hexadecimal.
A pointer stores an address, so the most basic functionality for it is to "point" to a variable. But it can be used in various ways, one of which, very frequent in C, is to store address of the beginning of an array of variables of the same type. Those could be chars. We could interpret such an array as a line of text, or a string (a concept, not to be confused with C++ specific string class).
Since a pointer does not contain information on length or end of such an array, we need to get that information across to the function we pass the pointer to. Sometimes we can just provide the length, sometimes we provide the end pointer. When dealing with "lines of text" or c-style strings, we use (and c standard library functions expect) what is callled a null-terminated string. In such a string, the first char after the last one used for a line is a null, which is, to simplify, basically a 0. A 0, but not a '0'.
So what you're passing to the function, and what you interpret as, say 416, is actually a pointer to a place in memory where '4' is econded and stored as a number, followed by '1' and then '6', taking up three bytes. And depending on how you obtained this line of text, '6' is probably followed by a NULL, that is - a zero.
NULL - cppreference.com
Conversion of such a string to a number first requires a data type able to hold it. In case of 416 it could be anything from short upwards. If you wanted to do that on your own, you would need to iterate over entire line of text and add the numbers multiplied by proper powers of 10, take care of signedness too and maybe check if there are any edge cases. You could however use a standard function like int atoi (const char * str);
atoi - cplusplus.com
Now, that would be nice of course, but you're trying to work with "bigints". However you define them, it means your class' purpose is to deal with numbers to big to be stored in built-in types. So there is no way you can convert them just like that.
What you're trying to do right now seems to be a constructor that creates a bigint out of number represented as a c style string. How shall I put it... you want to store your bigint internally as an array of it's digits in base 10 (a good choice for code simplicity, readability and maintainability, as well as interoperation with base 10 textual representation, but it doesn't make efficient use of memory and processing power.) and your input is also an array of digits in base 10, except internally you're storing numbers as numbers, while your input is encoded characters. You need to:
sanitize the input (you need criteria for what kind of input is acceptable, fe. if there can be any leading or trailing whitespace, can the number be followed by any non-numerical characters to be discarded, how to represent signedness, is + for positive numbers optional or forbidden etc., throw exception if the input is invalid.
convert whatever standard you enforce for your input into whatever uniform standard you employ internally, fe. strip leading whitespace, remove + sign if it's optional and you don't use it internally etc.
when you know which positions in your internal array correspond with which positions in the input string, you can iterate over it and copy every number, decoding it first from ASCII.
A side note - I can't be sure as to what exactly it is that you expect your input to be, because it's only likely that it is a textual representation - as it could just as easily be an array of unencoded chars. Of course it's obviously the former, which I know because of your post, but the function prototype (the line with return type and argument types) does not assure anyone about that. Just another thing to be aware of.
Hope this answer helped you understand what is happening there.
PS. I cannot emphasize strongly enough that the biggest problem with your code is that even if this line worked:
int number1 = number - '0'; // error code
You'd be trying to store a number on the order of 10^50 into a variable capable of holding on the order of 10^9
The crucial part in this problem, which I have a vague feeling you may have found on spoj.com is that you're handling BIGints. Integers too big to be stored in a trivial manner.
1 ) The standard does not actually require for char to be this size directly, but indirectly it requires for it to be at least 8 bits, possibly more on obscure platforms. And yes, I think there were some platforms where it was indeed over 8 bits. Same thing with pointers that may behave strange on obscure architectures.

What is the correct type in c\c++ to store a COM's VT_DECIMAL?

I'm trying to write a wrapper to ADO.
A DECIMAL is one type a COM VARIANT can be, when the VARIANT type is VT_DECIMAL.
I'm trying to put it in c native data type, and keep the variable value.
it seem that the correct type is long double, but I get "no suitable conversion error".
For example:
_variant_t v;
...
if(v.vt == VT_DECIMAL)
{
double d = (double)v; //this works but I'm afraid can be loss of data...
long double ld1 = (long double)v; //error: more then one conversion from variant to long double applied.
long double ld2 = (long double)v.decVal; //error: no suitable conversion function from decimal to long double exist.
}
So my questions are:
is it totally safe to use double to store all possible decimal values?
if not, how can I convert the decimal to a long double?
How to convert a decimal to string? (using the << operator, sprintf is also good for me)

The internal representation for DECIMAL is not a double precision floating point value, it is integer instead with sign/scale options. If you are going to initialize DECIMAL parts, you should initialize these fields - 96-bit integer value, scale, sign, then you get valid decimal VARIANT value.
DECIMAL on MSDN:
scale - The number of decimal places for the number. Valid values are from 0 to 28. So 12.345 is represented as 12345 with a scale of 3.
sign - Indicates the sign; 0 for positive numbers or DECIMAL_NEG for negative numbers. So -1 is represented as 1 with the DECIMAL_NEG bit set.
Hi32 - The high 32 bits of the number.
Lo64 - The low 64 bits of the number. This is an _int64.
Your questions:
is it totally safe to use double to store all possible decimal values?
You cannot initialize as double directly (e.g. VT_R8), but you can initialize as double variant and use variant conversion API to convert to VT_DECIMAL. A small rounding can be applied to value.
if not, how can I convert the decimal to a long double?
How to convert a decimal to string? (using the << operator, sprintf is also good for me)
VariantChangeType can convert decimal variant to variant of another type, including integer, double, string - you provide the type to convert to. Vice versa, you can also convert something different to decimal.

"Safe" isn't exactly the correct word, the point of DECIMAL is to not introduce rounding errors due to base conversions. Calculations are done in base 10 instead of base 2. That makes them slow but accurate, the kind of accuracy that an accountant likes. He won't have to chase a billionth-of-a-penny mismatches.
Use _variant_t::ChangeType() to make conversions. Pass VT_R8 to convert to double precision. Pass VT_BSTR to convert to a string, the kind that the accountant likes. No point in chasing long double, that 10-byte FPU type is history.

this snippets is taken from http://hackage.haskell.org/package/com-1.2.1/src/cbits/AutoPrimSrc.c
the Hackage.org says:
Hackage is the Haskell community's central package archive of open
source software.
but please check the authors permissions
void writeVarWord64( unsigned int hi, unsigned int lo, VARIANT* v )
{
ULONGLONG r;
r = (ULONGLONG)hi;
r >>= 32;
r += (ULONGLONG)lo;
if (!v) return;
VariantInit(v);
v->vt = VT_DECIMAL;
v->decVal.Lo64 = r;
v->decVal.Hi32 = 0;
v->decVal.sign = 0;
v->decVal.scale = 0;
}

If I understood Microsoft's documentation (https://msdn.microsoft.com/en-us/library/cc234586.aspx) correctly, VT_DECIMAL is an exact 92-bit integer value with a fixed scale and precision. In that case you can't store this without loss of information in a float, a double or a 64-bit integer variable.
You're best bet would be to store it in a 128-bit integer like __int128 but I don't know the level of compiler support for it. I'm also not sure you will be able to just cast one to the other without resorting to some bit manipulations.

Is it totally safe to use double to store all possible decimal values?
It actually depends what you mean by safe. If you mean "is there any risk of introducing some degree of conversion imprecision?", yes there is a risk. The internal representations are far too different to guarantee perfect conversion, and conversion noise is likely to be introduced.
How can I convert the decimal to a long double / a string?
It depends (again) of what you want to do with the object:
For floating-point computation, see #Gread.And.Powerful.Oz's link to the following answer: C++ converting Variant Decimal to Double Value
For display, see MSDN documentation on string conversion
For storage without any conversion imprecision, you should probably store the decimal as a scaled integer of the form pair<long long,short>, where first holds the 96-bits mantissa and second holds the number of digits to the right of the decimal point. This representation is as close as possible to the decimal's internal representation, will not introduce any conversion imprecision and won't waste CPU resources on integer-to-string formatting.

How to convert the double to become integer

It is hard to explain the question, i would like to convert a double number to integer without rounding the value after the decimal point.
For example
double a = 123.456
I want to convert become
int b = 123456
I want to know how many digit there is, and move it back after calculated to become 123.456
PS:I just want pure mathematical method to solve this issue, without calculating the character of it.

Sorry, there's no solution to your problem because the number 123.456 does not exist as a double. It's rounded to 123.4560000000000030695446184836328029632568359375, and this number obviously does not fit into any integer type after you remove the decimal point.
If you want 123.456 to be treated as the exact number 123.456, then the only remotely simple way to do this is to convert it to a string and remove the decimal point from the string. This can be achieved with something like
snprintf(buf, sizeof buf, "%.13f", 123.456);
Actually figuring out the number of places you want to print it to, however, is rather difficult. If you use too many, you'll end up picking up part of the exact value I showed above. If you use too few, then obviously you'll drop places you wanted to keep.

try this :
double a = 123.456;
int i;
char str[20];
char str2[20];
sptrintf(str,"%d",a);
for(i=0;i<strlen(str);i++)
{
if(!str[i] == '.')
{
sptrintf(str2,%c,str[i]);
}
}
int b = atoi(str2);

I believe the canonical way to do this would be
#include <math.h>
#include <stdio.h>
int main()
{
double d = 123.456;
double int_part;
double fract_part;
fract_part = modf(d, &int_part);
int i = (int)int_part*1000 + (int)(fract_part*1000);
printf("%d", i);
}
where the literal 1000 is a constant determining the number of desired decimals.

If you have the text "123.456" you can simply remove the decimal point and convert the resulting text representation to an integer value. If you have already converted the text to a floating-point value (double a = 123.456;) then all bets are off: the floating-point value does not have a pre-set number of decimal digits, because it is represented as a binary fraction. It's sort of like 1/3 versus .3333 in ordinary usage: they do not have the same value, even though we usually pretend that .3333 means 1/3.

Multiply each time original value with 10^i, increasing each time i until abs(value' - abs(value')) < epsilon for a very small epsilon. value' should be computed from the original each time, e.g.
value' = value * pow(10, i)
if ( abs(value' - abs(value')) < epsilon ) then stop
Originally I suggested that you should simply multiply by ten, but as R.. suggested, each time the numerical error gets accumulated. As result you might get a result of e.g. 123.456999 for an epsilon = .0000001 instead of 123.456000 due to floating point math.
Please note that you might exceed int type boundaries this way and might want to handle infinity values as well.
As Ignacio Vazquez-Abrams noted this might lead to problems with scenarios where you want to convert 123.500 to 123500. You might solve it by adding a very small value first (and it should be smaller than epsilon). Adding such a value could lead to a numeric error though.

parsing integer in exponential notation from string

Apparently std::stoi does not accept strings representing integers in exponential notation, like "1e3" (= 1000). Is there an easy way to parse such a string into an integer? One would think that since this notation works in C++ source code, the standard library has a way to parse this.

You can use stod (see docs) to do this, by parsing it as a double first. Be wary of precision issues when casting back though...
#include <iostream> // std::cout
#include <string> // std::string, std::stod
int main () {
std::string text ("1e3");
std::string::size_type sz; // alias of size_t
double result = std::stod(text,&sz);
std::cout << "The result is " << (int)result << std::endl; // outputs 1000
return 0;
}

One would think that since this notation works in C++ source code, the standard library has a way to parse this.
The library and the compiler are unrelated. The reason this syntax works in C++ is that the language allows you to assign expressions of type double to integer variables:
int n = 1E3;
assigns a double expression (i.e. a numeric literal of type double) to an integer variable.
Knowing what's going on here you should be able to easily identify the function in the Standard C++ Library that does what you need.

You can read it as a double using standard streams, for example
double d;
std::cin >> d; //will read scientific notation properly
and then cast it to an int, but obviously double can represent far more values than int, so be careful about that.

Emitting exponential notation into std::stoi would overflow too often and integer overflow in C++ is undefined behaviour.
You need to build your own where you can taylor the edge cases to your specific requirements.
I'd be inclined not to go along the std::stod route since a cast from a double to int is undefined behaviour if the integral part of the double cannot be represented by the int.

Non-Integer numbers in an String and using atoi

If there are non-number characters in a string and you call atoi [I'm assuming wtoi will do the same]. How will atoi treat the string?
Lets say for an example I have the following strings:
"20234543"
"232B"
"B"
I'm sure that 1 will return the integer 20234543. What I'm curious is if 2 will return "232." [Thats what I need to solve my problem]. Also 3 should not return a value. Are these beliefs false? Also... if 2 does act as I believe, how does it handle the e character at the end of the string? [Thats typically used in exponential notation]

You can test this sort of thing yourself. I copied the code from the Cplusplus reference site. It looks like your intuition about the first two examples are correct, but the third example returns '0'. 'E' and 'e' are treated just like 'B' is in the second example also.
So the rules are
On success, the function returns the converted integral number as an int value.
If no valid conversion could be performed, a zero value is returned.
If the correct value is out of the range of representable values, INT_MAX or INT_MIN is returned.

According to the standard, "The functions atof, atoi, atol, and atoll need not affect the value of the integer expression errno on an error. If the value of the result cannot be represented, the behavior is undefined." (7.20.1, Numeric conversion functions in C99).
So, technically, anything could happen. Even for the first case, since INT_MAX is guaranteed to be at least 32767, and since 20234543 is greater than that, it could fail as well.
For better error checking, use strtol:
const char *s = "232B";
char *eptr;
long value = strtol(s, &eptr, 10); /* 10 is the base */
/* now, value is 232, eptr points to "B" */
s = "20234543";
value = strtol(s, &eptr, 10);
s = "123456789012345";
value = strtol(s, &eptr, 10);
/* If there was no overflow, value will contain 123456789012345,
otherwise, value will contain LONG_MAX and errno will be ERANGE */
If you need to parse numbers with "e" in them (exponential notation), then you should use strtod. Of course, such numbers are floating-point, and strtod returns double. If you want to make an integer out of it, you can do a conversion after checking for the correct range.

atoi reads digits from the buffer until it can't any more. It stops when it encounters any character that isn't a digit, except whitespace (which it skips) or a '+' or a '-' before it has seen any digits (which it uses to select the appropriate sign for the result). It returns 0 if it saw no digits.
So to answer your specific questions: 1 returns 20234543. 2 returns 232. 3 returns 0. The character 'e' is not whitespace, a digit, '+' or '-' so atoi stops and returns if it encounters that character.
See also here.

If atoi encounters a non-number character, it returns the number formed up until that point.

I tried using atoi() in a project, but it wouldn't work if there were any non-digit characters in the mix and they came before the digit characters - it'll return zero. It seems to not mind if they come after the digits, for whatever reason.
Here's a pretty bare bones string to int converter I wrote up that doesn't seem to have that problem (bare bones in that it doesn't work with negative numbers and it doesn't incorporate any error handling, but it might be helpful in specific instances). Hopefully it might be helpful.
int stringToInt(std::string newIntString)
{
unsigned int dataElement = 0;
unsigned int i = 0;
while ( i < newIntString.length())
{
if (newIntString[i]>=48 && newIntString[i]<=57)
{
dataElement += static_cast<unsigned int>(newIntString[i]-'0')*(pow(10,newIntString.length()-(i+1)));
}
i++;
}
return dataElement;
}

I blamed myself up to this atoi-function behaviour when I was learning-approached coding program with function calculating integer factorial result given input parameter by launching command line parameter.
atoi-function returns 0 if value is something else than numeral value and "3asdf" returns 3. C -language handles command line input parameters in char -array pointer variable as we all already know.
I was told that down at the book "Linux Hater's Handbook" there's some discussion appealing for computer geeks doesn't really like atoi-function, it's kind of foolish in reason that there's no way to check validity of given input type.
Some guy asked me why I don't brother to use strtol -function located on stdlib.h -library and he gave me an example attached to my factorial-calculating recursive method but I don't care about factorial result is bigger than integer primary type value -range, out of ranged (too large base number). It will result in negative values in my program.
I solved my problem with atoi-function first checking if given user's input parameter is truly numerical value and if that matches, after then I calculate the factorial value.
Using isdigit() -function located on chtype.h -library is following:
int checkInput(char *str[]) {
for (int x = 0; x < strlen(*str); ++x)
{
if (!isdigit(*str[x])) return 1;
}
return 0;
}
My forum-pal down in other Linux programming forum told me that if I would use strtol I could handle the situations with out of ranged values or even parse signed int to unsigned long -type meaning -0 and other negative values are not accepted.
It's important upper on my code check if charachter is not numerical value. Negotation way to check this one the function returns failed results when first numerical value comes next to check in string. (or char array in C)

Writing simple code and looking to see what it does is magical and illuminating.
On point #3, it won't return "nothing." It can't. It'll return something, but that something won't be useful to you.
http://www.cplusplus.com/reference/clibrary/cstdlib/atoi/
On success, the function returns the converted integral number as an int value.
If no valid conversion could be performed, a zero value is returned.
If the correct value is out of the range of representable values, INT_MAX or INT_MIN is returned.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js