Related
Suppose, I have a long string number input in c++. and we have to do numeric operations on it. We need to convert this into the integer or any possible way to do operations, what are those?
string s="12131313123123213213123213213211312321321321312321213123213213";
Looks like the numbers you want to handle are way to big for any standard integer type, so just "converting" it won't give you a lot. You have two options:
(Highly recommended!) Use a big integer library like e.g. gmp. Such libraries typically also provide functions for parsing and formatting the big numbers.
Implement your big numbers yourself, you could e.g. use an array of uintmax_t to store them. You will have to implement all sorts of arithmetics you'd possibly need yourself, and this isn't exactly an easy task. For parsing the number, you can use a reversed double dabble implementation. As an example, here's some code I wrote a while ago in C, you can probably use it as-is, but you need to provide some helper functions and you might want to rewrite it using C++ facilities like std::string and replacing the struct used here with a std::vector -- it's just here to document the concept
typedef struct hugeint
{
size_t s; // number of used elements in array e
size_t n; // number of total elements in array e
uintmax_t e[];
} hugeint;
hugeint *hugeint_parse(const char *str)
{
char *buf;
// allocate and initialize:
hugeint *result = hugeint_create();
// this is just a helper function copying all numeric characters
// to a freshly allocated buffer:
size_t bcdsize = copyNum(&buf, str);
if (!bcdsize) return result;
size_t scanstart = 0;
size_t n = 0;
size_t i;
uintmax_t mask = 1;
for (i = 0; i < bcdsize; ++i) buf[i] -= '0';
while (scanstart < bcdsize)
{
if (buf[bcdsize - 1] & 1) result->e[n] |= mask;
mask <<= 1;
if (!mask)
{
mask = 1;
// this function increases the storage size of the flexible array member:
if (++n == result->n) result = hugeint_scale(result, result->n + 1);
}
for (i = bcdsize - 1; i > scanstart; --i)
{
buf[i] >>= 1;
if (buf[i-1] & 1) buf[i] |= 8;
}
buf[scanstart] >>= 1;
while (scanstart < bcdsize && !buf[scanstart]) ++scanstart;
for (i = scanstart; i < bcdsize; ++i)
{
if (buf[i] > 7) buf[i] -= 3;
}
}
free(buf);
return result;
}
Your best best would be to use a large numbers computational library.
One of the best out there is the GNU Multiple Precision Arithmetic Library
Example of a useful function to solve your problem::
Function: int mpz_set_str (mpz_t rop, const char *str, int base)
Set the value of rop from str, a null-terminated C string in base
base. White space is allowed in the string, and is simply ignored.
The base may vary from 2 to 62, or if base is 0, then the leading
characters are used: 0x and 0X for hexadecimal, 0b and 0B for binary,
0 for octal, or decimal otherwise.
For bases up to 36, case is ignored; upper-case and lower-case letters
have the same value. For bases 37 to 62, upper-case letter represent
the usual 10..35 while lower-case letter represent 36..61.
This function returns 0 if the entire string is a valid number in base
base. Otherwise it returns -1.
Documentation: https://gmplib.org/manual/Assigning-Integers.html#Assigning-Integers
If string contains number which is less than std::numeric_limits<uint64_t>::max(), then std::stoull() is the best opinion.
unsigned long long = std::stoull(s);
C++11 and later.
Is the code below less (or more, or equally) efficient than:
make substring from cursor
make stringstream from substring
extract integer using stream operator
? (question edit) or is it less (or more, or equally) efficient than:
std::stoi
? and why?
Could this function be made more efficient?
(The class brings these into scope:)
std::string expression // has some numbers and other stuff in it
int cursor // points somewhere in the string
The code:
int Foo_Class::read_int()
{
/** reads an integer out of the expression from the cursor */
// make stack of digits
std::stack<char> digits;
while (isdigit(expression[cursor])) // this is safe, returns false, for the end of the string (ISO/IEC 14882:2011 21.4.5)
{
digits.push(expression[cursor] - 48); // convert from ascii
++cursor;
}
// add up the stack of digits
int total = 0;
int exponent = 0; // 10 ^ exponent
int this_digit;
while (! digits.empty())
{
this_digit = digits.top();
for (int i = exponent; i > 0; --i)
this_digit *= 10;
total += this_digit;
++exponent;
digits.pop();
}
return total;
}
(I know it doesn't handle overflow.)
(I know someone will probably say something about the magic numbers.)
(I tried pow(10, exponent) and got incorrect results. I'm guessing because of floating point arithmetic, but not sure why because all the numbers are integers.)
I find using std::stringstream to convert numbers is really quite slow.
Better to use the many dedicated number conversion functions like std::stoi, std::stol, std::stoll. Or std::strtol, std::strtoll.
I found lots of information on this page:
http://www.kumobius.com/2013/08/c-string-to-int/
As Galik said, std::stringstream is very slow compared to everything else.
std::stoi is much faster than std::stringstream
The manual code can be faster still, but as has been pointed out, it doesn't do all the error checking and could have problems.
This website also has an improvement over the code above, multiplying the total by 10, instead of the digit before it's added to the total (in sequential order, instead of reverse, with the stack). This makes for less multiplying by 10.
int Foo_Class::read_int()
{
/** reads an integer out of the expression from the cursor */
int to_return = 0;
while (isdigit(expression[cursor])) // this is safe, returns false, for the end of the string (ISO/IEC 14882:2011 21.4.5)
{
to_return *= 10;
to_return += (expression[cursor] - '0'); // convert from ascii
++cursor;
}
return to_return;
}
So I've created my own function to compare two C Strings:
bool list::compareString(const char array1[], const char array2[])
{
unsigned char count;
for (count = 0; array1[count] != '\0' && array2[count] != '\0' && (array1[count] == array2[count] || array1[count + 32] == array2[count] || array1[count] == array2[count+32]); count++);
if (array1[count] == '\0' && array2[count] == '\0')
return true;
else
return false;
}
The parameter of my for loop is very long because it brings count to the end of at least one of the strings, and compares each char in each array in such a way that it their case won't matter (adding 32 to an uppercase char turns that char into its lowercase counterpart).
Now, I'm guessing that this is the most efficient way to go about comparing two C Strings, but that for loop is hard to read because of its length. What I've been told is to use a for loop instead of a while loop whenever possible because a for loop has the starting, ending, and incrementing conditions in its starting parameter, but for this, that seems like it may not apply.
What I'm asking is, how should I format this loop, and is there a more efficient way to do it?
Instead of indexing into the arrays with count, which you don't know the size of, you can instead operate directly on the pointers:
bool list::compareString(const char* array1, const char* array2)
{
while (*array1 != '\0' || *array2 != '\0')
if (*array1++ != *array2++) return false; // not the same character
return true;
}
For case insensitive comparison, replace the if condition with:
if (tolower(*array1++) != tolower(*array2++)) return false;
This does a safe character conversion to lower case.
The while loop checks if the strings are terminated. It continues while one of the strings is not yet terminated. If only 1 string has terminated, the next line - the if statement, will realize that the characters don't match (since only 1 character is '\0', and returns false.
If the strings differ at any point, the if statement returns false.
The if statement also post-increments the pointers so that it tests the next character in the next iteration of the while loop.
If both strings are equal, and terminate at the same time, at some point, the while condition will become false. In this case, the return true statement will execute.
If you want to write the tolower function yourself, you need to check that the character is a capital letter, and not a different type of character (eg. a number of symbol).
This would be:
inline char tolower(char ch)
{
return (ch >= 'A' && ch <= 'Z' ? (ch + 'a' - 'A') : ch);
}
I guess you are trying to do a case-insensitive comparison here. If you just need the fastest version, use a library function: strcasecmp or stricmp or strcmpi (name depends on your platform).
If you need to understand how to do it (I mean, is your question for learning purpose?), start with a readable version, something like this:
for (index = 0; ; ++index)
{
if (array1[index] == '\0' && array2[index] == '\0')
return true; // end of string reached
if (tolower(array1[index]) != tolower(array2[index]))
return false; // different characters discovered
}
Then measure its performance. If it's good enough, done. If not, investigate why (by looking at the machine code generated by the compiler). The first step in optimization might be replacing the tolower library function by a hand-crafted piece of code (which disregards non-English characters - is it what you want to do?):
int tolower(int c)
{
if (c >= 'A' && c <= 'Z')
return c + 'a' - 'A';
}
Note that I am still keeping the code readable. Readable code can be fast, because the compiler is going to optimize it.
array1[count + 32] == array2[count]
can lead to an OutOfRangeException, if the length of the array is smaller than 32.
You can use strcmp for comparing two strings
You have a few problems with your code.
What I'd do here is move some of your logic into the body of the for loop. Cramming everything into the for loop expression massively reduces readability without giving you any performance boosts that I can think of. The code just ends up being messy. Keep the conditions of the loop to testing incrementation and put the actual task in the body.
I'd also point out that you're not adding 32 to the character at all. You're adding it to the index of the array putting you at risk of running out of bounds. You need to test the value at the index, not the index itself.
Using an unsigned char to index an array gives you no benefits and only serves to reduce the maximum length of the strings that you can compare. Use an int.
You could restructure the code so that it looks like this:
bool list::compareString(const char array1[], const char array2[])
{
// Iterate over the strings until we find the string termination character
for (int count = 0; array1[count] != '\0' && array2[count] != '\0'; count++) {
// Note 0x20 is hexadecimal 32. We're comparing two letters for
// equality in a case insensitive way.
if ( (array1[count] | 0x20) != (array2[count] | 0x20) ) {
// Return false if the letters aren't equal
return false;
}
}
// We made it to the end of the loop. Strings are equal.
return true;
}
As for efficiency, it looks to me like you were trying to reduce:
The size of the variables that you're using to store data in
memory
The number of individual lines of code in your solution
Neither of these are worth your time. Efficiency is about how many steps (not lines of code, mind you) it will take to perform a task and how those steps scale as the inputs get bigger. For instance, how much slower would it be to compare the content of two novels for equality than two single word strings?
I hope that helps :)
As I read, in signed arithmetic there are many cases of undefined behaviour. Thus, I prefer to calculate results (even signed ones) using unsigned arithmetic, which is specified without undefined cases.
However, when the result is obtained using unsigned arithmetic, the last step, the conversion to the signed value remains to be done.
Here is a code I wrote and my question is if the code works in accordance with the rules, i.e., whether it is safe, not relying on some undefined/unspecified behaviour?
/*
function to safely convert given unsigned value
to signed result having the required sign
sign == 1 means the result shall be negative,
sign == 0 means the result shall be nonnegative
returns 1 on success, 0 on failure
*/
int safe_convert(unsigned value, int sign, int *result) {
if (sign) {
if (value > -(unsigned)INT_MIN) return 0; // value too big
if (!value) return 0; // cannot convert zero to negative int
*result = INT_MIN + (int)((-(unsigned)INT_MIN) - value);
} else {
if (value > (unsigned)INT_MAX) return 0; //value too big
*result = (int)value;
}
return 1;
}
Eventually, is there a way that is simpler, not relying on undefined/unspecified behaviour and doing the same thing?
Eventually, is there a way that is simpler, not relying on undefined behaviour and doing the same thing?
short x = (short) value;
int y = (int) value;
But be sure on what integral type you are casting to. value may go out of the range of the signed type used.
The only value that could be problematic is INT_MIN. Therefore I would just do something like
int safe_convert(unsigned value, int sign, int *result) {
if (sign) {
if (value > -(unsigned)INT_MIN) return 0; // value too big
if (-(unsigned)INT_MIN > (unsigned)INT_MAX // compile constant
&&
value == -(unsigned)INT_MIN) // special case
*result = INT_MIN;
else *result = -(int)value;
} else {
if (value > (unsigned)INT_MAX) return 0; //value too big
*result = (int)value;
}
return 1;
}
I don't think that the case of asking for a negative zero justifies an error return.
Conversion from unsigned to signed is not undefined, but implementation defined. From C++ Standard, chapter 4.7 Integral conversions, paragraph 3:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and
bit-field width); otherwise, the value is implementation-defined
Therefore the following is implementation defined and on many platforms exactly what you may expect (wrap around):
unsigned u = -1;
int i = (int)u;
The condition when sign is false (a positive number) is all ready well handled, it is when sign is true (a negative number) is tricky. So rather than:
if (value > -(unsigned)INT_MIN) return 0; // value too big
*result = INT_MIN + (int)((-(unsigned)INT_MIN) - value);
suggest
// 1st half is for 2's compliment machines
// 2nd half is for symmetric ints like 1's compliment and signed ints
// Optimization will simplify the real code to 1 compare
if ((((INT_MIN + 1) == -INT_MAX) && (value > ((unsigned)INT_MAX + 1u))) ||
(( INT_MIN == -INT_MAX) && (value > (unsigned)INT_MAX ))) return 0;
int i = (int) value;
*result = -i;
The INT_MIN == -INT_MAX tests could be used to conditionally allow a signed zero.
I wrote a program to write numbers in different bases (base 10, binary, base 53, whatever...)
I inicially wrote it as a win32 console application, in visual c++ 2010, and then converted it to a Windows Form Application (I know, I know...)
In the original form, it worked perfectly, but after the conversion, it stopped working. I narrowed down the problem to this:
The program uses a function that receives a digit and returns a char:
char corresponding_digit(int digit)
{
char corr_digit[62] = {'0','1','2','3','4','5','6','7','8','9',
'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P' ,'Q','R','S' ,'T','U','V','W','X','Y','Z',
'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p' , 'q','r','s' ,'t','u','v','w','x','y','z'};
return corr_digit[digit];
}
This function takes numbers from 1 to 61, and return the corresponding character: 0-9 -> 0-9; 10-35 -> A-Z; 36-61 a->z.
The program uses the function like this:
//on a button click
//base is an integer got from the user
String^ number_base_n = "";
if(base == 10)
number_base_n = value.ToString();
else if(base==0)
number_base_n = "0";
else
{
int digit, power, copy_value;
bool number_started = false;
copy_value = value;
if(copy_value > (int)pow(float(base), MAX_DIGITS)) //cmath was included
number_base_n = "Number too big";
else
{
for(int i = MAX_DIGITS; i >= 0; i--)
{
power = (int)pow(float(base), i);
if(copy_value >= power)
{
digit = copy_value/power;
copy_value -= digit*power;
number_started = true;
number_base_n += corresponding_digit(digit);
}
else if (number_started || i==0)
{
number_base_n += "0";
}
}
}
}
textBox6->Text = number_base_n;
After debugging a bit, I realized the problem happens when function corresponding_digit is called with digit value "1", which should return '1', in the expression
//number base_n equals ""
number_base_n += String(corresponding_digit(digit));
//number_base_n equals "49"
number_base_n, starting at "", ends with "49", which is actually the ASCII value of 1. I looked online, and all I got was converting the result, with String(value) or value.ToString(), but apparently I can't do
number_base_n += corresponding_digit(digit).ToString();
I tried using an auxiliar variable:
aux = corresponding_digit(digit);
number_base_n += aux.ToString();
but I got the exact same (wrong) result... (Same thing with String(value) )
I fumbled around a bit more, but not anything worth mentioning, I believe.
So... any help?
Also: base 10 and base 0 are working perfectly
Edit: If the downvoter would care to comment and explain why he downvoted... Constructive criticism, I believe is the term.
In C++/CLI, char is the same thing as it is in C++: a single byte, representing a single character. In C#, char (or System.Char) is a two byte Unicode codepoint. The C++ and C++/CLI equivalent to C#'s char is wchar_t. C++'s char is equivalent to System::Byte in C#.
As you have it now, attempting to do things with managed strings results in the managed APIs treating your C++ char as a C# byte, which is a number, not a character. That's why you're getting the ASCII value of the character, because it's being treated as a number, not a character.
To be explicit about things, I'd recommend you switch the return type of your corresponding_digit method to be System::Char. This way, when you operate with managed strings, the managed APIs will know that the data in question are characters, and you'll get your expected results.
System::Char corresponding_digit(int digit)
{
System::Char corr_digit[62] = {'0','1','2','3','4','5','6','7','8','9',
'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'};
return corr_digit[digit];
}
Other possible changes you could make:
Use a StringBuilder instead of appending strings.
Switch corr_digit to a managed array (array<System::Char>^), and store it somewhere reusable. As the code is written now, the corresponding_digit method has to re-create this array from scratch every time the method is called.