String (const char*, size_t) to int? - c++

What's the fastest way to convert a string represented by (const char*, size_t) to an int?
The string is not null-terminated.
Both these ways involve a string copy (and more) which I'd like to avoid.
And yes, this function is called a few million times a second. :p
int to_int0(const char* c, size_t sz)
{
return atoi(std::string(c, sz).c_str());
}
int to_int1(const char* c, size_t sz)
{
return boost::lexical_cast<int>(std::string(c, sz));
}

Given a counted string like this, you may be able to gain a little speed by doing the conversion yourself. Depending on how robust the code needs to be, this may be fairly difficult though. For the moment, let's assume the easiest case -- that we're sure the string is valid, containing only digits, (no negative numbers for now) and the number it represents is always within the range of an int. For that case:
int to_int2(char const *c, size_t sz) {
int retval = 0;
for (size_t i=0; i<sz; i++)
retval *= 10;
retval += c[i] -'0';
}
return retval;
}
From there, you can get about as complex as you want -- handling leading/trailing whitespace, '-' (but doing so correctly for the maximally negative number in 2's complement isn't always trivial [edit: see Nawaz's answer for one solution to this]), digit grouping, etc.

Another slow version, for uint32:
void str2uint_aux(unsigned& number, unsigned& overflowCtrl, const char*& ch)
{
unsigned digit = *ch - '0';
++ch;
number = number * 10 + digit;
unsigned overflow = (digit + (256 - 10)) >> 8;
// if digit < 10 then overflow == 0
overflowCtrl += overflow;
}
unsigned str2uint(const char* s, size_t n)
{
unsigned number = 0;
unsigned overflowCtrl = 0;
// for VC++10 the Duff's device is faster than loop
switch (n)
{
default:
throw std::invalid_argument(__FUNCTION__ " : `n' too big");
case 10: str2uint_aux(number, overflowCtrl, s);
case 9: str2uint_aux(number, overflowCtrl, s);
case 8: str2uint_aux(number, overflowCtrl, s);
case 7: str2uint_aux(number, overflowCtrl, s);
case 6: str2uint_aux(number, overflowCtrl, s);
case 5: str2uint_aux(number, overflowCtrl, s);
case 4: str2uint_aux(number, overflowCtrl, s);
case 3: str2uint_aux(number, overflowCtrl, s);
case 2: str2uint_aux(number, overflowCtrl, s);
case 1: str2uint_aux(number, overflowCtrl, s);
}
// here we can check that all chars were digits
if (overflowCtrl != 0)
throw std::invalid_argument(__FUNCTION__ " : `s' is not a number");
return number;
}
Why it's slow? Because it processes chars one-by-one. If we'd had a guarantee that we can access bytes upto s+16, we'd can use vectorization for *ch - '0' and digit + 246.
Like in this code:
uint32_t digitsPack = *(uint32_t*)s - '0000';
overflowCtrl |= digitsPack | (digitsPack + 0x06060606); // if one byte is not in range [0;10), high nibble will be non-zero
number = number * 10 + (digitsPack >> 24) & 0xFF;
number = number * 10 + (digitsPack >> 16) & 0xFF;
number = number * 10 + (digitsPack >> 8) & 0xFF;
number = number * 10 + digitsPack & 0xFF;
s += 4;
Small update for range checking:
the first snippet has redundant shift (or mov) on every iteration, so it should be
unsigned digit = *s - '0';
overflowCtrl |= (digit + 256 - 10);
...
if (overflowCtrl >> 8 != 0) throw ...

Fastest:
int to_int(char const *s, size_t count)
{
int result = 0;
size_t i = 0 ;
if ( s[0] == '+' || s[0] == '-' )
++i;
while(i < count)
{
if ( s[i] >= '0' && s[i] <= '9' )
{
//see Jerry's comments for explanation why I do this
int value = (s[0] == '-') ? ('0' - s[i] ) : (s[i]-'0');
result = result * 10 + value;
}
else
throw std::invalid_argument("invalid input string");
i++;
}
return result;
}
Since in the above code, the comparison (s[0] == '-') is done in every iteration, we can avoid this by calculating result as negative number in the loop, and then return result if s[0] is indeed '-', otherwise return -result (which makes it a positive number, as it should be):
int to_int(char const *s, size_t count)
{
size_t i = 0 ;
if ( s[0] == '+' || s[0] == '-' )
++i;
int result = 0;
while(i < count)
{
if ( s[i] >= '0' && s[i] <= '9' )
{
result = result * 10 - (s[i] - '0'); //assume negative number
}
else
throw std::invalid_argument("invalid input string");
i++;
}
return s[0] == '-' ? result : -result; //-result is positive!
}
That is an improvement!
In C++11, you could however use any function from std::stoi family. There is also std::to_string family.

llvm::StringRef s(c,sz);
int n;
s.getAsInteger(10,n);
return n;
http://llvm.org/docs/doxygen/html/classllvm_1_1StringRef.html

You'll have to either write custom routine or use 3rd party library if you're dead set on avoiding string copy.
You probably don't want to write atoi from scratch (it is still possible to make a bug here), so I'd advise to grab existing atoi from public domain or BSD-licensed code and modify it. For example, you can get existing atoi from FreeBSD cvs tree.

If you run the function that often, I bet you parse the same number many times. My suggestion is to BCD encode the string into a static char buffer (you know it's not going to be very long, since atoi only can handle +-2G) when there's less than X digits (X=8 for 32 bit lookup, X=16 for 64 bit lookup) then place a cache in a hash map.
When you're done with the first version, you can probably find nice optimizations, such as skipping the BCD encoding entirely and just using X characters in the string (when length of string <= X) for lookup in the hash table. If the string is longer, you fallback to atoi.
Edit: ... or fallback instead of atoi to Jerry Coffin's solution, which is as fast as they come.

Related

Hexadecimal to decimal conversion problem.Also, how to convert a char number to an actual int number

Please help me to identify the error in this program, as for me it's looking correct,I have checked it,but it is giving wrong answers.
In this program I have checked explicitly for A,B,C,D,E,F,and according to them their respective values.
[Edited]:Also,this question relates to how a character number is converted to actual integer number.
#include<iostream>
#include<cmath>
#include<bits/stdc++.h>
using namespace std;
void convert(string num)
{
long int last_digit;
int s=num.length();
int i;
long long int result=0;
reverse(num.begin(),num.end());
for(i=0;i<s;i++)
{
if(num[i]=='a' || num[i]=='A')
{
last_digit=10;
result+=last_digit*pow(16,i);
}
else if(num[i]=='b'|| num[i]=='B')
{
last_digit=11;
result+=last_digit*pow(16,i);
}
else if(num[i]=='c' || num[i]=='C')
{
last_digit=12;
result+=last_digit*pow(16,i);
}
else if(num[i]=='d'|| num[i]=='D' )
{
last_digit=13;
result+=last_digit*pow(16,i);
}
else if(num[i]=='e'|| num[i]=='E' )
{
last_digit=14;
result+=last_digit*pow(16,i);
}
else if(num[i]=='f' || num[i]=='F')
{
last_digit=15;
result+=last_digit*pow(16,i);
}
else {
last_digit=num[i];
result+=last_digit*pow(16,i);
}
}
cout<<result;
}
int main()
{
string hexa;
cout<<"Enter the hexadecimal number:";
getline(cin,hexa);
convert(hexa);
}
Your code is very convoluted and wrong.
You probably want this:
void int convert(string num)
{
long int last_digit;
int s = num.length();
int i;
long long int result = 0;
for (i = 0; i < s; i++)
{
result <<= 4; // multiply by 16, using pow is overkill
auto digit = toupper(num[i]); // convert to upper case
if (digit >= 'A' && digit <= 'F')
last_digit = digit - 'A' + 10; // digit is in range 'A'..'F'
else
last_digit = digit - '0'; // digit is (hopefully) in range '0'..'9'
result += last_digit;
}
cout << result;
}
But this is still not very good:
the function should return a long long int instead of printing the result
a few other thing can be done mor elegantly
So a better version would be this:
#include <iostream>
#include <string>
using namespace std;
long long int convert(const string & num) // always pass objects as const & if possible
{
long long int result = 0;
for (const auto & ch : num) // use range based for loops whenever possible
{
result <<= 4;
auto digit = toupper(ch);
long int last_digit; // declare local variables in the inner most scope
if (digit >= 'A' && digit <= 'F')
last_digit = digit - 'A' + 10;
else
last_digit = digit - '0';
result += last_digit;
}
return result;
}
int main()
{
string hexa;
cout << "Enter the hexadecimal number:";
getline(cin, hexa);
cout << convert(hexa);
}
There is still room for more improvements as the code above assumes that the string to convert contains only hexadecimal characters. Ideally a check for invalid characters should be done somehow. I leave this as an exercise.
The line last_digit = digit - 'A' + 10; assumes that the codes for letters A to F are contiguous, which in theory might not be the case. But the probability that you'll ever encounter an encoding scheme where this is not the case is close to zero though. The vast majority of computer systems in use today use the ASCII encoding scheme, some use EBCDIC, but in both of these encoding schemes the character codes for letters A to F are contiguous. I'm not aware of any other encoding scheme in use today.
Your problem is in the elsecase in which you convert num[i] from char to its ascii equivalent. Thus, for instance, if you try to convert A0, the 0is converted into 48 but not 0.
To correct, you should instead convert your num[i] into its equivalent integer (not in asci).
To do so, replace :
else {
last_digit=num[i];
result+=last_digit*pow(16,i);
with
else {
last_digit = num[i]-'0';
result+=last_digit*pow(16,i);
}
In the new line, last_digit = num[i]-'0'; is equivalent to last_digit = (int)num[i]-(int)'0';which substracts the representation code of any one-digit-number from num[i] from the representation code of '0'
It works because the C++ standard guarantee that the number representation of the 10 decimal digits are contiguous and in incresing order (official ref iso-cpp and is stated in chapter 2.3 and paragraph 3
Thus, if you take the representation (for instance the ascii code) of any one-digit-number num[i] and substract it with the representation code of '0' (which is 48 in ascii), you obtain directly the number itself as an integer value.
An example of execution after the correction would give:
A0
160
F5
245
A small codereview:
You are repeating yourself with many result+=last_digit*pow(16,i);. you may do it only once at the end of the loop. But that's another matter.
You are complicating the problem more than you need to (std::pow is also kinda slow). std::stoul can take a numerical base and automatically convert to an integer for you:
#include <string>
#include <iostream>
std::size_t char_count{0u};
std::string hexa{};
std::getline(std::cin, hexa);
hexa = "0x" + hexa;
unsigned long value_uint = std::stoul(hexa, &char_count, 16);

Checking the size of hashes in C++

As one would do with a blockchain, I want to check if a hash satisfies a size requirement. This is fairly easy in Python, but I am having some difficulty implementing the same system in C++. To be clear about what I am after, this first example is the python implementation:
difficulty = 25
hash = "0000004fbbc4261dc666d31d4718566b7e11770c2414e1b48c9e37e380e8e0f0"
print(int(hash, 16) < 2 ** (256 - difficulty))
The main problem I'm having is with these numbers - it is difficult to deal with such large numbers in C++ (2 ** 256, for example). This is solved with the boost/multiprecision library:
boost::multiprecision::cpp_int x = boost::multiprecision::pow(2, 256)
However, I cannot seem to find a way to convert my hash into a numeric value for comparison. Here is a generic example of what I am trying to do:
int main() {
string hash = "0000004fbbc4261dc666d31d4718566b7e11770c2414e1b48c9e37e380e8e0f0";
double difficulty = 256 - 25;
cpp_int requirement = boost::multiprecision::pow(2, difficulty);
// Something to convert hash into a number for comparison (converted_hash)
if (converted_hash < requirement) {
cout << "True" << endl;
}
return 1;
}
The hash is either being received from my web server or from a local python script, in which case the hash is read into the C++ program via fstream. Either way, it will be a string upon arrival.
Since I am already integrating python into this project, I am not entirely opposed to simply using the Python version of this algorithm; however, sometimes taking the easier path prevents you from learning, so unless this is a really cumbersome task, I would like to try to accomplish it in C++.
Your basic need is to compute how many zero bits exist before the first non-zero bit. This has nothing to do with multi-precision really, it can be reformulated into a simple counting problem:
// takes hexadecimal ASCII [0-9a-fA-F]
inline int count_zeros(char ch) {
if (ch < '1') return 4;
if (ch < '2') return 3;
if (ch < '4') return 2;
if (ch < '8') return 1;
return 0; // see ASCII table, [a-zA-Z] are all greater than '8'
}
int count_zeros(const std::string& hash) {
int sum = 0;
for (char ch : hash) {
int zeros = count_zeros(ch);
sum += zeros;
if (zeros < 4)
break;
}
return sum;
}
A fun optimization is to realize there are two termination conditions for the loop, and we can fold them together if we check for characters less than '0' which includes the null terminator and also will stop on any invalid input:
// takes hexadecimal [0-9a-fA-F]
inline int count_zeros(char ch) {
if (ch < '0') return 0; // change 1
if (ch < '1') return 4;
if (ch < '2') return 3;
if (ch < '4') return 2;
if (ch < '8') return 1;
return 0; // see ASCII table, [a-zA-Z] are all greater than '8'
}
int count_zeros(const std::string& hash) {
int sum = 0;
for (const char* it = hash.c_str(); ; ++it) { // change 2
int zeros = count_zeros(*it);
sum += zeros;
if (zeros < 4)
break;
}
return sum;
}
This produces smaller code when compiled with g++ -Os.

Convert a HEX single literal character to its value

I need to convert a hex literal character to its value. Consider the following:
char hex1 = 'f'; // hex equals 102, as ´f´ is ASCII 102.
char hexvalue = converter(hex1); // I need on hexvalue 0x0F, or 1111 binary
What shall be the most straightfoward converter function here ?
Thanks for helping.
A straight forward converter function would be to use a lookup array:
unsigned int Convert_Char_Digit_To_Hex(char digit)
{
static const std::string char_to_hex[] = "0123456789ABCDEF";
const std::string::size_type posn =
char_to_hex.find(digit);
if (posn != std::string::npos)
{
return posn;
}
return 0; // Error if here.
}
But why write your own when you can use existing functions to convert from textual representation to internal representation?
See also strtol, strtoul, std::istringstream, sscanf.
Edit 1: Comparisons
Another alternative is to use comparisons and math:
unsigned int Hex_Char_Digit_To_Int(char digit)
{
unsigned int value = 0U;
digit = toupper(digit);
if ((digit >= '0') and (digit <= '9'))
{
value = digit - '0';
}
else
{
if ((digit >= 'A') and (digit <= 'F'))
{
value = digit - 'A' + 10;
}
}
return value;
}

hash function for well-defined string c++

I have a string which will be exactly consist of numbers between 1-30 and one of 'R','T'or'M' char. Let me illustrate it by some examples.
string a="15T","1R","12M","24T","24M" ... // they are all valid for my string
Now I need to have a hash function which gives me a unique hash value for every input string. Since my input have a finite set I think it is possible.
Is there anyone who can tell what kind of hash function could I define ?
By the way, I'll create my hash table using vector therefore I guess size is not an important issue but I'll define 10000 as an upper bound. I mean I assume I can not have more than 10000 such a string
Thanks in advance.
Just have a large enough integer type and put the (maximal) three characters into the integer:
std::size_t hash(const char* s) {
std::size_t result = 0;
while(*s) {
result <<= 8;
result |= *s++;
}
return result;
}
You could define an algebraic function:
result = string[0] * 0x010000
+ string[1] * 0x000100
+ string[2];
Basically, each character fits into an uint8_t, which has a range of 256. So each column is a power of 256.
Yes, there are big gaps, but this insures a unique hash.
You could compress the gaps by using various "powers" for the different character columns.
Given "15T":
result = (string[0] - '0') * 10 // 10 == number of digits in the 2nd column
+ (string[1] - '0') * 3; // 3 == number of choices in 1st column.
switch (string[2])
{
case 'T' : result += 0; break;
case 'M' : result += 1; break;
case 'R' : result += 2; break;
}
It's a number / counting system where each column has a different number of digits.
Something along the line of:
unsigned myhash(const char * str)
{
int n = 0;
// Parse the number part
for ( ; *str >= '0' && *str <= '9'; ++str)
n = n * 10 + (*str - '0');
int c = *str == 'R' ? 0 :
*str == 'T' ? 1 :
*str == 'M' ? 2 :
3;
// Check for invalid strings
if ( c == 3 || n <= 0 || n > 30 || *(++str) != 0 )
{
// Some error or anything
// (Or replace the if condition with an assert)
throw std::runtime_error("Invalid string");
}
// Since 0 <= c < 3 and 0 <= (n-1) < 30
// There are only 90 possible values
return c * 30 + (n-1);
}
In my experience whenever you have to deal with something like this it is often better to do the opposite, that is work with integers and have a function to perform the opposite conversion if necessary.
You can rebuild the original string with:
int n = hash % 30 + 1;
int c = hash / 30; // 0 is 'R', 1 is 'T', 2 is 'M'

testing a string to see if a number is present and asigning that value to a variable while skipping all the non-numeric values?

given a string say " a 19 b c d 20", how do I test to see if at that particular position on the string there is a number? (not just the character '1' but the whole number '19' and '20').
char s[80];
strcpy(s,"a 19 b c d 20");
int i=0;
int num=0;
int digit=0;
for (i =0;i<strlen(s);i++){
if ((s[i] <= '9') && (s[i] >= '0')){ //how do i test for the whole integer value not just a digit
//if number then convert to integer
digit = s[i]-48;
num = num*10+digit;
}
if (s[i] == ' '){
break; //is this correct here? do nothing
}
if (s[i] == 'a'){
//copy into a temp char
}
}
These are C solutions:
Are you just trying to parse the numbers out of the string? Then you can just walk the string using strtol().
long num = 0;
char *endptr = NULL;
while (*s) {
num = strtol(s, &endptr, 10);
if (endptr == s) { // Not a number here, move on.
s++;
continue;
}
// Found a number and it is in num. Move to next location.
s = endptr;
// Do something with num.
}
If you have a specific location and number to check for you can still do something similar.
For example: Is '19' at position 10?
int pos = 10;
int value = 19;
if (pos >= strlen(s))
return false;
if (value == strtol(s + pos, &endptr, 10) && endptr != s + pos)
return true;
return false;
Are you trying to parse out the numbers without using any library routines?
Note: I haven't tested this...
int num=0;
int sign=1;
while (*s) {
// This could be done with an if, too.
switch (*s) {
case '-':
sign = -1;
case '+':
s++;
if (*s < '0' || *s > '9') {
sign = 1;
break;
}
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
// Parse number, start with zero.
num = 0;
do {
num = (num * 10) + (*s - '0');
s++;
} while (*s >= '0' && *s <= '9');
num *= sign;
// Restore sign, just in case
sign = 1;
// Do something with num.
break;
default:
// Not a number
s++;
}
}
It seems like you want to parse the string and extract all the numbers from it; if so, here's a more "C++" way to do it:
string s = "a 19 b c d 20"; // your char array will work fine here too
istringstream buffer(s);
string token;
int num;
while (!buffer.eof())
{
buffer >> num; // Try to read a number
if (!buffer.fail()) { // if it doesn't work, failbit is set
cout << num << endl; // It's a number, do what you want here
} else {
buffer.clear(); // wasn't a number, clear the failbit
buffer >> token; // pull out the non-numeric token
}
}
This should print out the following:
19
20
The stream extraction operator pulls out space-delimited tokens automatically, so you're saved from having to do any messy character-level operations or manual integer conversion. You'll need to #include <sstream> for the stringstream class.
You can use atoi().
after your if you need to shift to while to collect subqsequent digits until you hit a non-digit.
BUT, more inportantly, have you clearly defined your requirements? Will you allow whitespace between the digits? What if there are two numbers, like abc123def456gh?
Its not very clear what you are looking for.. Assuming you want to extract all the digits from a string and then from a whole number from the found digits you can try the following:
int i;
unsigned long num=0; // to hold the whole number.
int digit;
for (i =0;i<s[i];i++){
// see if the ith char is a digit..if yes extract consecutive digits
while(isdigit(s[i])) {
num = num * 10 + (s[i] - '0');
i++;
}
}
It is assumed that all the digits in your string when concatenated to from the whole number will not overflow the long data type.
There's no way to test for a whole number. Writing a lexer, as you've done is one way to go. Another would be to try and use the C standard library's strtoul function (or some similar function depending on whether the string has floating point numbers etc).
Your code needs to allow for whitespaces and you can use the C library's isdigit to test if the current character is a digit or not:
vector<int> parse(string const& s) {
vector<int> vi;
for (size_t i = 0; i < s.length();) {
while (::isspace((unsigned char)s[ i ]) i++;
if (::isdigit((unsigned char)s[ i ])) {
int num = s[ i ] - '0';
while (::isdigit((unsigned char)s[ i ])) {
num = num * 10 + (s[ i ] - '0');
++i;
}
vi.push_back(num);
}
....
Another approach will be to use boost::lexical_cast:
vector<string> tokenize(string const& input) {
vector<string> tokens;
size_t off = 0, start = 0;
while ((off = input.find(' ', start)) != string::npos) {
tokens.push_back(input.substr(start, off-start));
start = off + 1;
}
return tokens;
}
vector<int> getint(vector<string> tokens) {
vector<int> vi;
for (vector<string> b = tokens.begin(), e = tokens.end(); b! = e; ++b) {
try
{
tokens.push_back(lexical_cast<short>(*b));
}
catch(bad_lexical_cast &) {}
}
return vi;
}