efficiency of using stringstream to convert string to int? - c++

Is the code below less (or more, or equally) efficient than:
make substring from cursor
make stringstream from substring
extract integer using stream operator
? (question edit) or is it less (or more, or equally) efficient than:
std::stoi
? and why?
Could this function be made more efficient?
(The class brings these into scope:)
std::string expression // has some numbers and other stuff in it
int cursor // points somewhere in the string
The code:
int Foo_Class::read_int()
{
/** reads an integer out of the expression from the cursor */
// make stack of digits
std::stack<char> digits;
while (isdigit(expression[cursor])) // this is safe, returns false, for the end of the string (ISO/IEC 14882:2011 21.4.5)
{
digits.push(expression[cursor] - 48); // convert from ascii
++cursor;
}
// add up the stack of digits
int total = 0;
int exponent = 0; // 10 ^ exponent
int this_digit;
while (! digits.empty())
{
this_digit = digits.top();
for (int i = exponent; i > 0; --i)
this_digit *= 10;
total += this_digit;
++exponent;
digits.pop();
}
return total;
}
(I know it doesn't handle overflow.)
(I know someone will probably say something about the magic numbers.)
(I tried pow(10, exponent) and got incorrect results. I'm guessing because of floating point arithmetic, but not sure why because all the numbers are integers.)

I find using std::stringstream to convert numbers is really quite slow.
Better to use the many dedicated number conversion functions like std::stoi, std::stol, std::stoll. Or std::strtol, std::strtoll.

I found lots of information on this page:
http://www.kumobius.com/2013/08/c-string-to-int/
As Galik said, std::stringstream is very slow compared to everything else.
std::stoi is much faster than std::stringstream
The manual code can be faster still, but as has been pointed out, it doesn't do all the error checking and could have problems.
This website also has an improvement over the code above, multiplying the total by 10, instead of the digit before it's added to the total (in sequential order, instead of reverse, with the stack). This makes for less multiplying by 10.
int Foo_Class::read_int()
{
/** reads an integer out of the expression from the cursor */
int to_return = 0;
while (isdigit(expression[cursor])) // this is safe, returns false, for the end of the string (ISO/IEC 14882:2011 21.4.5)
{
to_return *= 10;
to_return += (expression[cursor] - '0'); // convert from ascii
++cursor;
}
return to_return;
}

Related

How to convert large number strings into integer in c++?

Suppose, I have a long string number input in c++. and we have to do numeric operations on it. We need to convert this into the integer or any possible way to do operations, what are those?
string s="12131313123123213213123213213211312321321321312321213123213213";
Looks like the numbers you want to handle are way to big for any standard integer type, so just "converting" it won't give you a lot. You have two options:
(Highly recommended!) Use a big integer library like e.g. gmp. Such libraries typically also provide functions for parsing and formatting the big numbers.
Implement your big numbers yourself, you could e.g. use an array of uintmax_t to store them. You will have to implement all sorts of arithmetics you'd possibly need yourself, and this isn't exactly an easy task. For parsing the number, you can use a reversed double dabble implementation. As an example, here's some code I wrote a while ago in C, you can probably use it as-is, but you need to provide some helper functions and you might want to rewrite it using C++ facilities like std::string and replacing the struct used here with a std::vector -- it's just here to document the concept
typedef struct hugeint
{
size_t s; // number of used elements in array e
size_t n; // number of total elements in array e
uintmax_t e[];
} hugeint;
hugeint *hugeint_parse(const char *str)
{
char *buf;
// allocate and initialize:
hugeint *result = hugeint_create();
// this is just a helper function copying all numeric characters
// to a freshly allocated buffer:
size_t bcdsize = copyNum(&buf, str);
if (!bcdsize) return result;
size_t scanstart = 0;
size_t n = 0;
size_t i;
uintmax_t mask = 1;
for (i = 0; i < bcdsize; ++i) buf[i] -= '0';
while (scanstart < bcdsize)
{
if (buf[bcdsize - 1] & 1) result->e[n] |= mask;
mask <<= 1;
if (!mask)
{
mask = 1;
// this function increases the storage size of the flexible array member:
if (++n == result->n) result = hugeint_scale(result, result->n + 1);
}
for (i = bcdsize - 1; i > scanstart; --i)
{
buf[i] >>= 1;
if (buf[i-1] & 1) buf[i] |= 8;
}
buf[scanstart] >>= 1;
while (scanstart < bcdsize && !buf[scanstart]) ++scanstart;
for (i = scanstart; i < bcdsize; ++i)
{
if (buf[i] > 7) buf[i] -= 3;
}
}
free(buf);
return result;
}
Your best best would be to use a large numbers computational library.
One of the best out there is the GNU Multiple Precision Arithmetic Library
Example of a useful function to solve your problem::
Function: int mpz_set_str (mpz_t rop, const char *str, int base)
Set the value of rop from str, a null-terminated C string in base
base. White space is allowed in the string, and is simply ignored.
The base may vary from 2 to 62, or if base is 0, then the leading
characters are used: 0x and 0X for hexadecimal, 0b and 0B for binary,
0 for octal, or decimal otherwise.
For bases up to 36, case is ignored; upper-case and lower-case letters
have the same value. For bases 37 to 62, upper-case letter represent
the usual 10..35 while lower-case letter represent 36..61.
This function returns 0 if the entire string is a valid number in base
base. Otherwise it returns -1.
Documentation: https://gmplib.org/manual/Assigning-Integers.html#Assigning-Integers
If string contains number which is less than std::numeric_limits<uint64_t>::max(), then std::stoull() is the best opinion.
unsigned long long = std::stoull(s);
C++11 and later.

Arithmetic Error When Converting String to Double

I'm writing a function to convert a user provided string into a double. It works quite well for certain values, but fails for others. For example
string_to_double("123.45") = 123.45
string_to_double(12345) = 12345
but
string_to_double(123.4567) = 123.457
I'm fairly certain that this is some kind of round off error, but I'm not using approximations nor am I using very small or large values. My question is two-fold why am I getting these strange results and how can I change my code to get more accurate results? I'm also doing this as a personal challenge, so suggestions to use methods such as std::stod are not helpful. I believe the problem occurs in the second for-loop, but I felt it was wise to include the entire method because if I missed something it isn't that much extra code to read.
My Code
template <class T>
double numerical_descriptive_measures<T>::string_to_double(std::string user_input)
{
double numeric_value = 0;//Stores numeric value of string. Return value.
int user_input_size = user_input.size();
int power = 0;
/*This loop is for the characteristic portion of the input
once this loop finishes, we know what to multiply the
characterstic portion by(e.g. 1234 = 1*10^3 + 2*10^2 + 3*10^1 + 4)
*/
for(int i = 0;i < user_input_size;i++)
{
if(user_input[i] == '.')
break;
else
power++;
}
/*This loop is for the mantissa. If this portion is zero,
the loop doesn't execute because i will be greater than
user_input_size.*/
for(int i = 0;i < user_input_size;i++)
{
if(user_input[i] != '.')
{
numeric_value += ((double)user_input[i] - 48.0)*pow(10,power-i-1);
}
else
{
double power = -1.0;
for(int j = i+1;j < user_input_size;j++)
{
numeric_value += ((double)user_input[j] - 48.0)*pow(10.0,power);
power = power-1.0;
}
break;
}
}
return numeric_value;
}
The problem is not that you are producing the wrong floating point value, the problem is that you are printing it with insufficient precision:
std::cout<<data<<std::endl
This will only print about six digits of precision. You can use std::setprecision or other methods to print more.
Your code is not producing an incorrect value for "123.4567" but it will produce incorrect values in general. For example, string_to_double("0.0012") produces (on Visual Studio 2015)
0.0012000000000000001117161918529063768801279366016387939453125
but the correct answer is
0.00119999999999999989487575735580549007863737642765045166015625
(You would have to print them to 17 significant digits to tell the difference.)
The problem is that you can't use floating-point to convert to floating-point -- it does not have enough precision in general.
(I've written a lot about this on my site; for example, see http://www.exploringbinary.com/quick-and-dirty-decimal-to-floating-point-conversion/ and http://www.exploringbinary.com/decimal-to-floating-point-needs-arbitrary-precision/ .)

What is wrong with the following code? It converts double to string without using sprintf or ostream

I wrote the following code for converting double to string.I was not supposed to use
sprintf or ostream . The output is quite erratic.
The list of input with corresponding output :
2.0 2.0
2.5 2.5
-2.0 -2.0
2.987 2.9879947598364142621957397469375
-2.987 -2.9879947598364142621957397469375
Where did these extra digits come from and how to overcome this?My code can be found below.
#include <iostream>
#include <math.h>
using namespace std;
string reverse(string input);
string myDtoA(double num);
string itoa(int num);
int main()
{
double inp=-2.987;
cout<<myDtoA(inp)<<endl;
}
string myDtoA(double num)
{
if(num>0)
{
int inPart;
double intPart,fractPart;
fractPart = modf(num,&intPart);
inPart=(int)intPart;
string ret;
ret = itoa(inPart);
if(fractPart!=0)
{
ret.append(".");
double ceilOfFraction = ceil(fractPart);
while(ceilOfFraction !=0)
{
double inP,fP;
fractPart*=10;
fP=modf(fractPart,&inP);
int a =(int)inP;
ret.append(itoa(a));
fractPart=fP;
ceilOfFraction = ceil(fractPart);
}
}
else
{ret.append(".0");}
return ret;
}
else if(num==0)
{
return "0";
}
else if(num<0)
{
string ret = "-";
ret.append(myDtoA(-num));
return ret;
}
}
string itoa(int num)
{
char* str = new char[120];
int i=0;
// Process individual digits
while (num != 0)
{
int rem = num % 10;
str[i++] = (rem > 9)? (rem-10) + 'a' : rem + '0';
num = num/10;
}
string ret(str);
return reverse(ret);
}
/* A utility function to reverse a string */
string reverse(string input)
{
return std::string(input.rbegin(), input.rend());
}
Rounding floating point output is hard.
Here's the most recent paper I found on the subject:
http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf
In the footnotes, you'll find a reference to:
[Steele Jr. and White(2004)] G. L. Steele Jr. and J. L. White. How to
print floating-point numbers accurately (retrospective). In 20 Years of
the ACM SIGPLAN Conference on Programming Language Design and
Implementation 1979-1999, A Selection, pages 372–374. ACM, 2004. ISBN
1-58113-623-4. doi: 10.1145/989393.989431.
which is a wonderful exposition. No one is going to be able to pick through your program and tell you what to do to it.
The problem is in your implementation of itoa. What happens if the input to itoa is 0?
Your output of -2.9879947598364142621957397469375 for an input of -2.987 should be -2.9870000000000000994759830064140260219573974609375. Notice that the zeros in my result are missing from yours. Those missing zeros are because of that bug in itoa.
Once you get to the point of dealing with single decimal digits, your itoa is complete overkill. It would be better to use an array that maps the integers 0 to 9 to the characters '0' to '9'. (Or you could just use the fact that '0' to '9' are almost certainly contiguous characters on your computer. That hasn't alway been the case, but I can pretty much guarantee that you aren't working with such a beast.)
Even better would be to recognize that the substring starting with 99475983… is completely extraneous. It would be better to print this as -2.9870000000000001, and even better to print it as -2.987.

How to convert large integers to base 2^32?

First off, I'm doing this for myself so please don't suggest "use GMP / xint / bignum" (if it even applies).
I'm looking for a way to convert large integers (say, OVER 9000 digits) into a int32 array of 232 representations. The numbers will start out as base 10 strings.
For example, if I wanted to convert string a = "4294967300" (in base 10), which is just over INT_MAX, to the new base 232 array, it would be int32_t b[] = {1,5}. If int32_t b[] = {3,2485738}, the base 10 number would be 3 * 2^32 + 2485738. Obviously the numbers I'll be working with are beyond the range of even int64 so I can't exactly turn the string into an integer and mod my way to success.
I have a function that does subtraction in base 10. Right now I'm thinking I'll just do subtraction(char* number, "2^32") and count how many times before I get a negative number, but that will probably take a long time for larger numbers.
Can someone suggest a different method of conversion? Thanks.
EDIT
Sorry in case you didn't see the tag, I'm working in C++
Assuming your bignum class already has multiplication and addition, it's fairly simple:
bignum str_to_big(char* str) {
bignum result(0);
while (*str) {
result *= 10;
result += (*str - '0');
str = str + 1;
}
return result;
}
Converting the other way is the same concept, but requires division and modulo
std::string big_to_str(bignum num) {
std::string result;
do {
result.push_back(num%10);
num /= 10;
} while(num > 0);
std::reverse(result.begin(), result.end());
return result;
}
Both of these are for unsigned only.
To convert from base 10 strings to your numbering system, starting with zero continue adding and multiplying each base 10 digit by 10. Every time you have a carry add a new digit to your base 2^32 array.
The simplest (not the most efficient) way to do this is to write two functions, one to multiply a large number by an int, and one to add an int to a large number. If you ignore the complexities introduced by signed numbers, the code looks something like this:
(EDITED to use vector for clarity and to add code for actual question)
void mulbig(vector<uint32_t> &bignum, uint16_t multiplicand)
{
uint32_t carry=0;
for( unsigned i=0; i<bignum.size(); i++ ) {
uint64_t r=((uint64_t)bignum[i] * multiplicand) + carry;
bignum[i]=(uint32_t)(r&0xffffffff);
carry=(uint32_t)(r>>32);
}
if( carry )
bignum.push_back(carry);
}
void addbig(vector<uint32_t> &bignum, uint16_t addend)
{
uint32_t carry=addend;
for( unsigned i=0; carry && i<bignum.size(); i++ ) {
uint64_t r=(uint64_t)bignum[i] + carry;
bignum[i]=(uint32_t)(r&0xffffffff);
carry=(uint32_t)(r>>32);
}
if( carry )
bignum.push_back(carry);
}
Then, implementing atobignum() using those functions is trivial:
void atobignum(const char *str,vector<uint32_t> &bignum)
{
bignum.clear();
bignum.push_back(0);
while( *str ) {
mulbig(bignum,10);
addbig(bignum,*str-'0');
++str;
}
}
I think Docjar: gnu/java/math/MPN.java might contain what you're looking for, specifically the code for public static int set_str (int dest[], byte[] str, int str_len, int base).
Start by converting the number to binary. Starting from the right, each group of 32 bits is a single base2^32 digit.

Fastest way to determine whether a string contains a real or integer value

I'm trying to write a function that is able to determine whether a string contains a real or an integer value.
This is the simplest solution I could think of:
int containsStringAnInt(char* strg){
for (int i =0; i < strlen(strg); i++) {if (strg[i]=='.') return 0;}
return 1;
}
But this solution is really slow when the string is long... Any optimization suggestions?
Any help would really be appreciated!
What's the syntax of your real numbers?
1e-6 is valid C++ for a literal, but will be passed as integer by your test.
Is your string hundreds of characters long? Otherwise, don't care about any possible performance issues.
The only inefficiency is that you are using strlen() in a bad way, which means a lot of iterations over the string (inside strlen). For a simpler solution, with the same time complexity (O(n)), but probably slightly faster, use strchr().
You are using strlen, which means you are not worried about unicode. In that case why to use strlen or strchr, just check for '\0' (Null char)
int containsStringAnInt(char* strg){
for (int i =0;strg[i]!='\0'; i++) {
if (strg[i]=='.') return 0;}
return 1; }
Only one parsing through the string, than parsing through the string in each iteration of the loop.
Your function does not take into account exponential notation of reals (1E7, 1E-7 are both doubles)
Use strtol() to try to convert the string to integer first; it will also return the first position in the string where the parsing failed (this will be '.' if the number is real). If the parsing stopped at '.', use strtod() to try to convert to double. Again, the function will return the position in the string where the parsing stopped.
Don't worry about performance, until you have profiled the program. Otherwise, for fastest possible code, construct a regular expression that describes acceptable syntax of numbers, and hand-convert it first into a FSM, then into highly optimized code.
So the standard note first, please don't worry about performance too much if not profiled yet :)
I'm not sure about the manual loop and checking for a dot. Two issues
Depending on the locale, the dot can actually be a "," too (here in Germany that's the case :)
As others noted, there is the issue with numbers like 1e7
Previously I had a version using sscanf here. But measuring performance showed that sscanf is is significantly slower for bigger data-sets. So I'll show the faster solution first (Well, it's also a whole more simple. I had several bugs in the sscanf version until I got it working, while the strto[ld] version worked the first try):
enum {
REAL,
INTEGER,
NEITHER_NOR
};
int what(char const* strg){
char *endp;
strtol(strg, &endp, 10);
if(*strg && !*endp)
return INTEGER;
strtod(strg, &endp);
if(*strg && !*endp)
return REAL;
return NEITHER_NOR;
}
Just for fun, here is the version using sscanf:
int what(char const* strg) {
// test for int
{
int d; // converted value
int n = 0; // number of chars read
int rd = std::sscanf(strg, "%d %n", &d, &n);
if(!strg[n] && rd == 1) {
return INTEGER;
}
}
// test for double
{
double v; // converted value
int n = 0; // number of chars read
int rd = std::sscanf(strg, "%lf %n", &v, &n);
if(!strg[n] && rd == 1) {
return REAL;
}
}
return NEITHER_NOR;
}
I think that should work. Have fun.
Test was done by converting test strings (small ones) randomly 10000000 times in a loop:
6.6s for sscanf
1.7s for strto[dl]
0.5s for manual looping until "."
Clear win for strto[ld], considering it will parse numbers correctly I will praise it as the winner over manual looping. Anyway, 1.2s/10000000 = 0.00000012 difference roughly for one conversion isn't all that much in the end.
Strlen walks the string to find the length of the string.
You are calling strlen with every pass of the loop. Hence, you are walking the string way many more times than necessary. This tiny change should give you a huge performance improvement:
int containsStringAnInt(char* strg){
int len = strlen(strg);
for (int i =0; i < len; i++) {if (strg[i]=='.') return 0;}
return 1;
}
Note that all I did was find the length of the string once, at the start of the function, and refer to that value repeatedly in the loop.
Please let us know what kind of performance improvement this gets you.
#Aaron, with your way also you are traversing the string twice. Once within strlen, and once again in for loop.
Best way for ASCII string traversing in for loop is to check for Null char in the loop it self. Have a look at my answer, that parses the string only once within for loop, and may be partial parsing if it finds a '.' prior to end. that way if a string is like 0.01xxx (anotther 100 chars), you need not to go till end to find the length.
#include <stdlib.h>
int containsStringAnInt(char* strg){
if (atof(strg) == atoi(strg))
return 1;
return 0;
}