Custom strtoull functions producing invalid results - c++

I do not have access to anything like strtoull on my actual platform, so I need to find/handroll one. I've tried all four from this answer and they all give me the same, wrong answer, on my windows testing platform. I also tried on online compilers.
One of the functions is
unsigned long long strtoull_simple(const char *s) {
unsigned long long sum = 0;
while (*s) {
sum = sum*10 + (*s++ - '0');
}
return sum;
}
And given "87ddb08343547aec" it produces 9277008343552481 instead of the real value 9790175242790140652 evident here and also evident if you use strtoull where available. Why do all of those functions fail to provide the correct value?

As mentioned in the comments, the code looks like it is for base-10 numbers but the number you are trying to parse looks base-16. You can add a parameter to specify the base:
unsigned long long strtoull_simple(const char *s, int base)
{
unsigned long res = 0;
while (*s) {
// TODO: handle invalid chars
char c = tolower(*s);
res = (res * base) + (isdigit(c) ? c - '0' : c - 'a' + 10);
*s++
}
return res;
}
And call it with
printf("%llu", strtoull_simple("87ddb08343547aec", 16));
Output:
9790175242790140652

Related

How to convert large number strings into integer in c++?

Suppose, I have a long string number input in c++. and we have to do numeric operations on it. We need to convert this into the integer or any possible way to do operations, what are those?
string s="12131313123123213213123213213211312321321321312321213123213213";
Looks like the numbers you want to handle are way to big for any standard integer type, so just "converting" it won't give you a lot. You have two options:
(Highly recommended!) Use a big integer library like e.g. gmp. Such libraries typically also provide functions for parsing and formatting the big numbers.
Implement your big numbers yourself, you could e.g. use an array of uintmax_t to store them. You will have to implement all sorts of arithmetics you'd possibly need yourself, and this isn't exactly an easy task. For parsing the number, you can use a reversed double dabble implementation. As an example, here's some code I wrote a while ago in C, you can probably use it as-is, but you need to provide some helper functions and you might want to rewrite it using C++ facilities like std::string and replacing the struct used here with a std::vector -- it's just here to document the concept
typedef struct hugeint
{
size_t s; // number of used elements in array e
size_t n; // number of total elements in array e
uintmax_t e[];
} hugeint;
hugeint *hugeint_parse(const char *str)
{
char *buf;
// allocate and initialize:
hugeint *result = hugeint_create();
// this is just a helper function copying all numeric characters
// to a freshly allocated buffer:
size_t bcdsize = copyNum(&buf, str);
if (!bcdsize) return result;
size_t scanstart = 0;
size_t n = 0;
size_t i;
uintmax_t mask = 1;
for (i = 0; i < bcdsize; ++i) buf[i] -= '0';
while (scanstart < bcdsize)
{
if (buf[bcdsize - 1] & 1) result->e[n] |= mask;
mask <<= 1;
if (!mask)
{
mask = 1;
// this function increases the storage size of the flexible array member:
if (++n == result->n) result = hugeint_scale(result, result->n + 1);
}
for (i = bcdsize - 1; i > scanstart; --i)
{
buf[i] >>= 1;
if (buf[i-1] & 1) buf[i] |= 8;
}
buf[scanstart] >>= 1;
while (scanstart < bcdsize && !buf[scanstart]) ++scanstart;
for (i = scanstart; i < bcdsize; ++i)
{
if (buf[i] > 7) buf[i] -= 3;
}
}
free(buf);
return result;
}
Your best best would be to use a large numbers computational library.
One of the best out there is the GNU Multiple Precision Arithmetic Library
Example of a useful function to solve your problem::
Function: int mpz_set_str (mpz_t rop, const char *str, int base)
Set the value of rop from str, a null-terminated C string in base
base. White space is allowed in the string, and is simply ignored.
The base may vary from 2 to 62, or if base is 0, then the leading
characters are used: 0x and 0X for hexadecimal, 0b and 0B for binary,
0 for octal, or decimal otherwise.
For bases up to 36, case is ignored; upper-case and lower-case letters
have the same value. For bases 37 to 62, upper-case letter represent
the usual 10..35 while lower-case letter represent 36..61.
This function returns 0 if the entire string is a valid number in base
base. Otherwise it returns -1.
Documentation: https://gmplib.org/manual/Assigning-Integers.html#Assigning-Integers
If string contains number which is less than std::numeric_limits<uint64_t>::max(), then std::stoull() is the best opinion.
unsigned long long = std::stoull(s);
C++11 and later.

C Code Acting Differently to C++ on Lookup

I have the following code block (NOT written by me), which performs mapping and recodes ASCII characters to EBCDIC.
// Variables.
CodeHeader* tchpLoc = {};
...
memset(tchpLoc->m_ucpEBCDCMap, 0xff, 256);
for (i = 0; i < 256; i++) {
if (tchpLoc->m_ucpASCIIMap[i] != 0xff) {
ucTmp2 = i;
asc2ebn(&ucTmp1, &ucTmp2, 1);
tchpLoc->m_ucpEBCDCMap[ucTmp1] = tchpLoc->m_ucpASCIIMap[i];
}
}
The CodeHeader definition is
typedef struct {
...
UCHAR* m_ucpASCIIMap;
UCHAR* m_ucpEBCDCMap;
} CodeHeader;
and the method that seems to be giving me problems is
void asc2ebn(char* szTo, char* szFrom, int nChrs)
{
while (nChrs--)
*szTo++ = ucpAtoe[(*szFrom++) & 0xff];
}
[Note, the unsigned char array ucpAtoe[256] is copied at the end of the question for reference].
Now, I have an old C application and my C++11 conversion running side-by-side, the two codes write a massive .bin file and there is a tiny discrepancy which I have traced to the above code. What is happening for both codes is that the block
...
if (tchpLoc->m_ucpASCIIMap[i] != 0xff) {
ucTmp2 = i;
asc2ebn(&ucTmp1, &ucTmp2, 1);
tchpLoc->m_ucpEBCDCMap[ucTmp1] = tchpLoc->m_ucpASCIIMap[i];
}
gets entered into for i = 32 and the asc2ebn method returns ucTmp1 as 64 or '#' for both C and C++ variants great. The next entry is for i = 48, for this value the asc2ebn method returns ucTmp1 as 240 or 'ð' and the C++ code returns ucTmp1 as -16 or 'ð'. My question is why is this lookup/conversion producing different results for exactly the same input and look up array (copied below)?
In this case the old C code is taken as correct, so I want the C++ to produce the same result for this lookup/conversion. Thanks for your time.
static UCHAR ucpAtoe[256] = {
'\x00','\x01','\x02','\x03','\x37','\x2d','\x2e','\x2f',/*00-07*/
'\x16','\x05','\x25','\x0b','\x0c','\x0d','\x0e','\x0f',/*08-0f*/
'\x10','\x11','\x12','\xff','\x3c','\x3d','\x32','\xff',/*10-17*/
'\x18','\x19','\x3f','\x27','\x22','\x1d','\x35','\x1f',/*18-1f*/
'\x40','\x5a','\x7f','\x7b','\x5b','\x6c','\x50','\xca',/*20-27*/
'\x4d','\x5d','\x5c','\x4e','\x6b','\x60','\x4b','\x61',/*28-2f*/
'\xf0','\xf1','\xf2','\xf3','\xf4','\xf5','\xf6','\xf7',/*30-37*/
'\xf8','\xf9','\x7a','\x5e','\x4c','\x7e','\x6e','\x6f',/*38-3f*/
'\x7c','\xc1','\xc2','\xc3','\xc4','\xc5','\xc6','\xc7',/*40-47*/
'\xc8','\xc9','\xd1','\xd2','\xd3','\xd4','\xd5','\xd6',/*48-4f*/
'\xd7','\xd8','\xd9','\xe2','\xe3','\xe4','\xe5','\xe6',/*50-57*/
'\xe7','\xe8','\xe9','\xad','\xe0','\xbd','\xff','\x6d',/*58-5f*/
'\x79','\x81','\x82','\x83','\x84','\x85','\x86','\x87',/*60-67*/
'\x88','\x89','\x91','\x92','\x93','\x94','\x95','\x96',/*68-6f*/
'\x97','\x98','\x99','\xa2','\xa3','\xa4','\xa5','\xa6',/*70-77*/
'\xa7','\xa8','\xa9','\xc0','\x6a','\xd0','\xa1','\xff',/*78-7f*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*80-87*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*88-8f*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*90-97*/
'\xff','\xff','\xff','\x4a','\xff','\xff','\xff','\xff',/*98-9f*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*a0-a7*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*a8-af*/
'\xff','\xff','\xff','\x4f','\xff','\xff','\xff','\xff',/*b0-b7*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*b8-bf*/
'\xff','\xff','\xff','\xff','\xff','\x8f','\xff','\xff',/*c0-c7*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*c8-cf*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*d0-d7*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*d8-df*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*e0-e7*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff',/*e8-ef*/
'\xff','\xff','\xff','\x8c','\xff','\xff','\xff','\xff',/*f0-f7*/
'\xff','\xff','\xff','\xff','\xff','\xff','\xff','\xff' };
In both C and C++, the standard doesn't require char to be a signed or unsigned type. It's implementation defined, and apparently, your C compiler decided char to be unsigned char, while your C++ compiler decided it to be signed char.
For GCC, the flag to make char to be unsigned char is -funsigned-char. For MSVC, it's /J.

C++ function convertCtoD

I'm new to C++. As part of an assignment we have to write to functions, but I don't know what the teacher mean by what he is requesting. Has anyone seen this or at least point me in the right direction. I don't want you to write the function, I just don't know what the output or what he is asking. I'm actually clueless right now.
Thank you
convertCtoD( )
This function is sent a null terminated character array
where each character represents a Decimal (base 10) digit.
The function returns an integer which is the base 10 representation of the characters.
convertBtoD( )
This function is sent a null terminated character array
where each character represents a Binary (base 2) digit.
The function returns an integer which is the base 10 representation of the character.
This function is sent a null terminated character array where each character represents a Decimal (base 10) digit. The function returns an integer which is the base 10 representation of the characters.
I'll briefly mention the fact that "an integer which is the base 10 representation of the characters" is useless here, the integer will represent the value whereas "base 10 representation" is the presentation of said value.
However, the desription given simply means you take in a (C-style) string of digits and put out an integer. So you would start with:
int convertCtoD(char *decimalString) {
int retVal = 0
// TBD: needs actual implementation.
return retVal;
}
This function is sent a null terminated character array where each character represents a Binary (base 2) digit. The function returns an integer which is the base 10 representation of the character.
This will be very similar:
int convertBtoD(char *binaryString) {
int retVal = 0
// TBD: needs actual implementation.
return retVal;
}
You'll notice I've left the return type as signed even though there's no need to handle signed values at all. You'll see why in the example implementation I provide below as I'm using it to return an error condition. The reason I'm providing code even though you didn't ask for it is that I think five-odd years is enough of a gap to ensure you can't cheat by passing off my code as your own :-)
Perhaps the simplest example would be:
int convertCToD(char *decimalStr) {
// Initialise accumulator to zero.
int retVal = 0;
// Process each character.
while (*str != '\0') {
// Check character for validity, add to accumulator (after
// converting char to int) then go to next character.
if ((*str < '0') || (*str > '9')) return -1;
retVal *= 10;
retVal += *str++ - '0';
}
return retVal;
}
The binary version would basically be identical except that it would use '1' as the upper limit and 2 as the multiplier (as opposed to '9' and 10).
That's the simplest form but there's plenty of scope for improvement to make your code more robust and readable:
Since the two functions are very similar, you could refactor out the common bits so as to reduce duplication.
You may want to consider an empty string as invalid rather than just returning zero as it currently does.
You probably want to detect overflow as an error.
With those in mind, it may be that the following is a more robust solution:
#include <stdbool.h>
#include <limits.h>
int convertBorCtoD(char *str, bool isBinary) {
// Configure stuff that depends on binary/decimal choice.
int maxDigit = isBinary ? '1' : '9';
int multiplier = maxDigit - minDigit + 1;
// Initialise accumulator to zero.
int retVal = 0;
// Optional check for empty string as error.
if (*str == '\0') return -1;
// Process each character.
while (*str != '\0') {
// Check character for validity.
if ((*str < '0') || (*str > maxDigit)) return -1;
// Add to accumulator, checking for overflow.
if (INT_MAX / multiplier < retVal) return -1;
retVal *= multiplier;
if (INT_MAX - (*str - '0') < retVal) return -1;
retVal += *str++ - '0';
}
return retVal;
}
int convertCtoD(char *str) { return convertBorCtoD(str, false); }
int convertBtoD(char *str) { return convertBorCtoD(str, true); }

Shortest way to calculate difference between two numbers?

I'm about to do this in C++ but I have had to do it in several languages, it's a fairly common and simple problem, and this is the last time. I've had enough of coding it as I do, I'm sure there must be a better method, so I'm posting here before I write out the same long winded method in yet another language;
Consider the (lilies!) following code;
// I want the difference between these two values as a positive integer
int x = 7
int y = 3
int diff;
// This means you have to find the largest number first
// before making the subtract, to keep the answer positive
if (x>y) {
diff = (x-y);
} else if (y>x) {
diff = (y-x);
} else if (x==y) {
diff = 0;
}
This may sound petty but that seems like a lot to me, just to get the difference between two numbers. Is this in fact a completely reasonable way of doing things and I'm being unnecessarily pedantic, or is my spidey sense tingling with good reason?
Just get the absolute value of the difference:
#include <cstdlib>
int diff = std::abs(x-y);
Using the std::abs() function is one clear way to do this, as others here have suggested.
But perhaps you are interested in succinctly writing this function without library calls.
In that case
diff = x > y ? x - y : y - x;
is a short way.
In your comments, you suggested that you are interested in speed. In that case, you may be interested in ways of performing this operation that do not require branching. This link describes some.
#include <cstdlib>
int main()
{
int x = 7;
int y = 3;
int diff = std::abs(x-y);
}
All the existing answers will overflow on extreme inputs, giving undefined behaviour. #craq pointed this out in a comment.
If you know that your values will fall within a narrow range, it may be fine to do as the other answers suggest, but to handle extreme inputs (i.e. to robustly handle any possible input values), you cannot simply subtract the values then apply the std::abs function. As craq rightly pointed out, the subtraction may overflow, causing undefined behaviour (consider INT_MIN - 1), and the std::abs call may also cause undefined behaviour (consider std::abs(INT_MIN)). It's no better to determine the min and max of the pair and to then perform the subtraction.
More generally, a signed int is unable to represent the maximum difference between two signed int values. The unsigned int type should be used for the output value.
I see 3 solutions. I've used the explicitly-sized integer types from stdint.h here, to close the door on uncertainties like whether long and int are the same size and range.
Solution 1. The low-level way.
// I'm unsure if it matters whether our target platform uses 2's complement,
// due to the way signed-to-unsigned conversions are defined in C and C++:
// > the value is converted by repeatedly adding or subtracting
// > one more than the maximum value that can be represented
// > in the new type until the value is in the range of the new type
uint32_t difference_int32(int32_t i, int32_t j) {
static_assert(
(-(int64_t)INT32_MIN) == (int64_t)INT32_MAX + 1,
"Unexpected numerical limits. This code assumes two's complement."
);
// Map the signed values across to the number-line of uint32_t.
// Preserves the greater-than relation, such that an input of INT32_MIN
// is mapped to 0, and an input of 0 is mapped to near the middle
// of the uint32_t number-line.
// Leverages the wrap-around behaviour of unsigned integer types.
// It would be more intuitive to set the offset to (uint32_t)(-1 * INT32_MIN)
// but that multiplication overflows the signed integer type,
// causing undefined behaviour. We get the right effect subtracting from zero.
const uint32_t offset = (uint32_t)0 - (uint32_t)(INT32_MIN);
const uint32_t i_u = (uint32_t)i + offset;
const uint32_t j_u = (uint32_t)j + offset;
const uint32_t ret = (i_u > j_u) ? (i_u - j_u) : (j_u - i_u);
return ret;
}
I tried a variation on this using bit-twiddling cleverness taken from https://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax but modern code-generators seem to generate worse code with this variation. (I've removed the static_assert and the comments.)
uint32_t difference_int32(int32_t i, int32_t j) {
const uint32_t offset = (uint32_t)0 - (uint32_t)(INT32_MIN);
const uint32_t i_u = (uint32_t)i + offset;
const uint32_t j_u = (uint32_t)j + offset;
// Surprisingly it helps code-gen in MSVC 2019 to manually factor-out
// the common subexpression. (Even with optimisation /O2)
const uint32_t t = (i_u ^ j_u) & -(i_u < j_u);
const uint32_t min = j_u ^ t; // min(i_u, j_u)
const uint32_t max = i_u ^ t; // max(i_u, j_u)
const uint32_t ret = max - min;
return ret;
}
Solution 2. The easy way. Avoid overflow by doing the work using a wider signed integer type. This approach can't be used if the input signed integer type is the largest signed integer type available.
uint32_t difference_int32(int32_t i, int32_t j) {
return (uint32_t)std::abs((int64_t)i - (int64_t)j);
}
Solution 3. The laborious way. Use flow-control to work through the different cases. Likely to be less efficient.
uint32_t difference_int32(int32_t i, int32_t j)
{ // This static assert should pass even on 1's complement.
// It's just about impossible that int32_t could ever be capable of representing
// *more* values than can uint32_t.
// Recall that in 2's complement it's the same number, but in 1's complement,
// uint32_t can represent one more value than can int32_t.
static_assert( // Must use int64_t to subtract negative number from INT32_MAX
((int64_t)INT32_MAX - (int64_t)INT32_MIN) <= (int64_t)UINT32_MAX,
"Unexpected numerical limits. Unable to represent greatest possible difference."
);
uint32_t ret;
if (i == j) {
ret = 0;
} else {
if (j > i) { // Swap them so that i > j
const int32_t i_orig = i;
i = j;
j = i_orig;
} // We may now safely assume i > j
uint32_t magnitude_of_greater; // The magnitude, i.e. abs()
bool greater_is_negative; // Zero is of course non-negative
uint32_t magnitude_of_lesser;
bool lesser_is_negative;
if (i >= 0) {
magnitude_of_greater = i;
greater_is_negative = false;
} else { // Here we know 'lesser' is also negative, but we'll keep it simple
// magnitude_of_greater = -i; // DANGEROUS, overflows if i == INT32_MIN.
magnitude_of_greater = (uint32_t)0 - (uint32_t)i;
greater_is_negative = true;
}
if (j >= 0) {
magnitude_of_lesser = j;
lesser_is_negative = false;
} else {
// magnitude_of_lesser = -j; // DANGEROUS, overflows if i == INT32_MIN.
magnitude_of_lesser = (uint32_t)0 - (uint32_t)j;
lesser_is_negative = true;
}
// Finally compute the difference between lesser and greater
if (!greater_is_negative && !lesser_is_negative) {
ret = magnitude_of_greater - magnitude_of_lesser;
} else if (greater_is_negative && lesser_is_negative) {
ret = magnitude_of_lesser - magnitude_of_greater;
} else { // One negative, one non-negative. Difference is sum of the magnitudes.
// This will never overflow.
ret = magnitude_of_lesser + magnitude_of_greater;
}
}
return ret;
}
Well it depends on what you mean by shortest. The fastet runtime, the fastest compilation, the least amount of lines, the least amount of memory. I'll assume you mean runtime.
#include <algorithm> // std::max/min
int diff = std::max(x,y)-std::min(x,y);
This does two comparisons and one operation (this one is unavoidable but could be optimized through certain bitwise operations with specific cases, compiler might actually do this for you though). Also if the compiler is smart enough it could do only one comparison and save the result for the other comparison. E.g if X>Y then you know from the first comparison that Y < X but I'm not sure if compilers take advantage of this.

How to convert large integers to base 2^32?

First off, I'm doing this for myself so please don't suggest "use GMP / xint / bignum" (if it even applies).
I'm looking for a way to convert large integers (say, OVER 9000 digits) into a int32 array of 232 representations. The numbers will start out as base 10 strings.
For example, if I wanted to convert string a = "4294967300" (in base 10), which is just over INT_MAX, to the new base 232 array, it would be int32_t b[] = {1,5}. If int32_t b[] = {3,2485738}, the base 10 number would be 3 * 2^32 + 2485738. Obviously the numbers I'll be working with are beyond the range of even int64 so I can't exactly turn the string into an integer and mod my way to success.
I have a function that does subtraction in base 10. Right now I'm thinking I'll just do subtraction(char* number, "2^32") and count how many times before I get a negative number, but that will probably take a long time for larger numbers.
Can someone suggest a different method of conversion? Thanks.
EDIT
Sorry in case you didn't see the tag, I'm working in C++
Assuming your bignum class already has multiplication and addition, it's fairly simple:
bignum str_to_big(char* str) {
bignum result(0);
while (*str) {
result *= 10;
result += (*str - '0');
str = str + 1;
}
return result;
}
Converting the other way is the same concept, but requires division and modulo
std::string big_to_str(bignum num) {
std::string result;
do {
result.push_back(num%10);
num /= 10;
} while(num > 0);
std::reverse(result.begin(), result.end());
return result;
}
Both of these are for unsigned only.
To convert from base 10 strings to your numbering system, starting with zero continue adding and multiplying each base 10 digit by 10. Every time you have a carry add a new digit to your base 2^32 array.
The simplest (not the most efficient) way to do this is to write two functions, one to multiply a large number by an int, and one to add an int to a large number. If you ignore the complexities introduced by signed numbers, the code looks something like this:
(EDITED to use vector for clarity and to add code for actual question)
void mulbig(vector<uint32_t> &bignum, uint16_t multiplicand)
{
uint32_t carry=0;
for( unsigned i=0; i<bignum.size(); i++ ) {
uint64_t r=((uint64_t)bignum[i] * multiplicand) + carry;
bignum[i]=(uint32_t)(r&0xffffffff);
carry=(uint32_t)(r>>32);
}
if( carry )
bignum.push_back(carry);
}
void addbig(vector<uint32_t> &bignum, uint16_t addend)
{
uint32_t carry=addend;
for( unsigned i=0; carry && i<bignum.size(); i++ ) {
uint64_t r=(uint64_t)bignum[i] + carry;
bignum[i]=(uint32_t)(r&0xffffffff);
carry=(uint32_t)(r>>32);
}
if( carry )
bignum.push_back(carry);
}
Then, implementing atobignum() using those functions is trivial:
void atobignum(const char *str,vector<uint32_t> &bignum)
{
bignum.clear();
bignum.push_back(0);
while( *str ) {
mulbig(bignum,10);
addbig(bignum,*str-'0');
++str;
}
}
I think Docjar: gnu/java/math/MPN.java might contain what you're looking for, specifically the code for public static int set_str (int dest[], byte[] str, int str_len, int base).
Start by converting the number to binary. Starting from the right, each group of 32 bits is a single base2^32 digit.