Integer to string optimized function? - c++

I have currently this function to convert an unsigned integer to a string (I need a function that works on non-standard types like __uint128_t):
#include <iostream>
#include <algorithm>
#include <string>
template <typename UnsignedIntegral>
std::string stringify(UnsignedIntegral number, const unsigned int base)
{
static const char zero = '0';
static const char alpha = 'A';
static const char ten = 10;
std::string result;
char remainder = 0;
do {
remainder = number%base;
result += (remainder < ten) ? (zero+remainder) : (alpha+remainder-ten);
number /= base;
} while (number > 0);
std::reverse(std::begin(result), std::end(result));
return result;
}
int main()
{
std::cout<<stringify(126349823, 2)<<std::endl;
return 0;
}
Is there any way to optimize this code ?

You may want to read this article by Alexei Alexandrescu, where he talks about low-level optimizations by using a (fixed-radix) int to string conversion as an example:
https://www.facebook.com/notes/facebook-engineering/three-optimization-tips-for-c/10151361643253920
Keep in mind that the most important thing when optimizing is always profiling.

One simple thing is avoiding multiple heap allocations, which can be done by either result.reserve(CHAR_BIT * sizeof(Integral)) (the largest possible string being base 2) or building the string into a local array first and then creating the std::string from it. Even with that, I agree with #SebastianRedl; you cannot optimize w/o measuring. Also, your code doesn't take into account negative numbers.

If you're lucky you will be inside "short string optimization" buffer size for the string. If not, then you incur a dynamic memory allocation which is probably at least an order of magnitude slower than the conversion code. So first, get rid of that std::string, and add support for determining suitable raw buffer size.
When you've done that, get rid of the branching caused by the choice operator. Table lookup might be faster (or not). But it's also possible to use bit tricks such as converting small negative number to all one's by bit-shifting right, then use that as a mask.
Finally, instead of reversing the result you can build it directly backward from the end of the supplied buffer, and produce pointer to start as function result.
All this said, do remember to MEASURE.
For optimizations that logically can't be significantly worse than the original, such as the above, measuring might be more work than simply doing the optimizations coding. But when you have done the obvious and you're interested in eeking out the last bit of performance, measuring is necessary. Also, for most programmers measuring is necessary just in order to not waste time on completely unnecessary optimization, or introducing new inefficiency.

You are asking for a way to optimize the code. There is indeed a faster way than pure digit-by-digit conversion: you can work with groups of digits, i.e. in a base that is a power of the desired base.
For example:
Base 2 -> base 256 (8 bits at a time)
Base 8 -> base 512 (3 octal digits at a time)
Base 10 -> base 100 (2 decimal digits at a time)
Base 16 -> base 256 (2 hexadecimal digits at a time)
You will need to pre-tabulate the representation of the digit combinations as short ASCII character strings. And you will need to add special processing of the high-order digits to avoid or undo leading zeroes.
OctalStrings[]= { "000", "001", "002" ... }
But the main loop will remain of the form:
do
Q= N / Base
R= N - Q * Base
N= Q
Insert(Strings[R])
while N>0
or, for a binary base:
do
R= N & (Base - 1)
N= N >> LgBase
Insert(Strings[R])
while N>0
You can also do the conversion directly left-to-right by precomputing all powers of the base and using quotient/remainder.
Base100Powers[]= { 1, 100, 10000, 1000000... }
do
Q= N / Powers[k]
N= N - Powers[k] * Q
Append(Strings[Q])
k--
while k>0

Here is, I think, a more efficient version, I have just coded:
#include <iostream>
#include <type_traits>
#include <algorithm>
#include <string>
#include <array>
template <bool Upper = true,
typename Char = char,
Char Zero = '0',
Char Nine = '9',
Char Alpha = Upper ? 'A' : 'a',
Char Zeta = Upper ? 'Z' : 'z',
Char One = 1,
Char Ten = Nine-Zero+One,
Char Size = (Nine-Zero+One)+(Zeta-Alpha+One),
typename... Types,
class = typename std::enable_if<
(std::is_convertible<Char, char>::value) &&
(sizeof...(Types) == Size)>::type>
constexpr std::array<char, Size> alphabet(const Types&... values)
{
return {{values...}};
}
template <bool Upper = true,
typename Char = char,
Char Zero = '0',
Char Nine = '9',
Char Alpha = Upper ? 'A' : 'a',
Char Zeta = Upper ? 'Z' : 'z',
Char One = 1,
Char Ten = Nine-Zero+One,
Char Size = (Nine-Zero+One)+(Zeta-Alpha+One),
typename... Types,
class = typename std::enable_if<
(std::is_convertible<Char, char>::value) &&
(sizeof...(Types) < Size)>::type>
constexpr std::array<char, Size> alphabet(Types&&... values)
{
return alphabet<Upper, Char, Zero, Nine, Alpha, Zeta, One, Ten, Size>
(std::forward<Types>(values)...,
Char(sizeof...(values) < Ten ? Zero+sizeof...(values)
: Alpha+sizeof...(values)-Ten));
}
template <typename Integral,
Integral Base = 10,
Integral Zero = 0,
Integral One = 1,
Integral Value = ~Zero,
class = typename std::enable_if<
(std::is_convertible<Integral, int>::value) &&
(Value >= Zero) &&
(Base > One)>::type>
constexpr Integral digits()
{
return (Value != ~Zero)+
(Value > Zero ? digits<Integral, Base, Zero, One, Value/Base>()
: Zero);
}
template <bool Upper = true,
typename Integral,
std::size_t Size = digits<Integral, 2>()>
std::string stringify(Integral number, const std::size_t base)
{
static constexpr auto letters = alphabet<Upper>();
std::array<char, Size+1> string = {};
std::size_t i = 0;
do {
string[Size-i++] = letters[number%base];
} while ((number /= base) > 0);
return &string[Size+1-i];
}
int main()
{
std::cout<<stringify(812723U, 16)<<std::endl;
return 0;
}
It could be optimized more efficiently using the technique of Yves Daoust (using a base that is a power of the provided base).

Related

How to convert large number strings into integer in c++?

Suppose, I have a long string number input in c++. and we have to do numeric operations on it. We need to convert this into the integer or any possible way to do operations, what are those?
string s="12131313123123213213123213213211312321321321312321213123213213";
Looks like the numbers you want to handle are way to big for any standard integer type, so just "converting" it won't give you a lot. You have two options:
(Highly recommended!) Use a big integer library like e.g. gmp. Such libraries typically also provide functions for parsing and formatting the big numbers.
Implement your big numbers yourself, you could e.g. use an array of uintmax_t to store them. You will have to implement all sorts of arithmetics you'd possibly need yourself, and this isn't exactly an easy task. For parsing the number, you can use a reversed double dabble implementation. As an example, here's some code I wrote a while ago in C, you can probably use it as-is, but you need to provide some helper functions and you might want to rewrite it using C++ facilities like std::string and replacing the struct used here with a std::vector -- it's just here to document the concept
typedef struct hugeint
{
size_t s; // number of used elements in array e
size_t n; // number of total elements in array e
uintmax_t e[];
} hugeint;
hugeint *hugeint_parse(const char *str)
{
char *buf;
// allocate and initialize:
hugeint *result = hugeint_create();
// this is just a helper function copying all numeric characters
// to a freshly allocated buffer:
size_t bcdsize = copyNum(&buf, str);
if (!bcdsize) return result;
size_t scanstart = 0;
size_t n = 0;
size_t i;
uintmax_t mask = 1;
for (i = 0; i < bcdsize; ++i) buf[i] -= '0';
while (scanstart < bcdsize)
{
if (buf[bcdsize - 1] & 1) result->e[n] |= mask;
mask <<= 1;
if (!mask)
{
mask = 1;
// this function increases the storage size of the flexible array member:
if (++n == result->n) result = hugeint_scale(result, result->n + 1);
}
for (i = bcdsize - 1; i > scanstart; --i)
{
buf[i] >>= 1;
if (buf[i-1] & 1) buf[i] |= 8;
}
buf[scanstart] >>= 1;
while (scanstart < bcdsize && !buf[scanstart]) ++scanstart;
for (i = scanstart; i < bcdsize; ++i)
{
if (buf[i] > 7) buf[i] -= 3;
}
}
free(buf);
return result;
}
Your best best would be to use a large numbers computational library.
One of the best out there is the GNU Multiple Precision Arithmetic Library
Example of a useful function to solve your problem::
Function: int mpz_set_str (mpz_t rop, const char *str, int base)
Set the value of rop from str, a null-terminated C string in base
base. White space is allowed in the string, and is simply ignored.
The base may vary from 2 to 62, or if base is 0, then the leading
characters are used: 0x and 0X for hexadecimal, 0b and 0B for binary,
0 for octal, or decimal otherwise.
For bases up to 36, case is ignored; upper-case and lower-case letters
have the same value. For bases 37 to 62, upper-case letter represent
the usual 10..35 while lower-case letter represent 36..61.
This function returns 0 if the entire string is a valid number in base
base. Otherwise it returns -1.
Documentation: https://gmplib.org/manual/Assigning-Integers.html#Assigning-Integers
If string contains number which is less than std::numeric_limits<uint64_t>::max(), then std::stoull() is the best opinion.
unsigned long long = std::stoull(s);
C++11 and later.

Is there a shorter way to write compound 'if' conditions? [duplicate]

This question already has answers here:
Shorthand for checking for equality to multiple possibilities [duplicate]
(3 answers)
Closed 6 years ago.
Just instead of:
if ( ch == 'A' || ch == 'B' || ch == 'C' || .....
For example, to do it like:
if ( ch == 'A', 'B', 'C', ...
is there even a shorter way to summarize conditions?
strchr() can be used to see if the character is in a list.
const char* list = "ABCXZ";
if (strchr(list, ch)) {
// 'ch' is 'A', 'B', 'C', 'X', or 'Z'
}
In this case you could use a switch:
switch (ch) {
case 'A':
case 'B':
case 'C':
// do something
break;
case 'D':
case 'E':
case 'F':
// do something else
break;
...
}
While this is slightly more verbose than using strchr, it doesn't involve any function calls. It also works for both C and C++.
Note that the alternate syntax you suggested won't work as you might expect because of the use of the comma operator:
if ( ch == 'A', 'B', 'C', 'D', 'E', 'F' )
This first compares ch to 'A' and then discards the result. Then 'B' is evaluated and discarded, then 'C', and so forth until 'F' is evaluated. Then 'F' becomes the value of the conditional. Since any non-zero value evaluated to true in a boolean context (and 'F' is non-zero), then the above expression will always be true.
Templates allow us to express ourselves in this way:
if (range("A-F").contains(ch)) { ... }
It requires a little plumbing, which you can put in a library.
This actually compiles out to be incredibly efficient (at least on gcc and clang).
#include <cstdint>
#include <tuple>
#include <utility>
#include <iostream>
namespace detail {
template<class T>
struct range
{
constexpr range(T first, T last)
: _begin(first), _end(last)
{}
constexpr T begin() const { return _begin; }
constexpr T end() const { return _end; }
template<class U>
constexpr bool contains(const U& u) const
{
return _begin <= u and u <= _end;
}
private:
T _begin;
T _end;
};
template<class...Ranges>
struct ranges
{
constexpr ranges(Ranges...ranges) : _ranges(std::make_tuple(ranges...)) {}
template<class U>
struct range_check
{
template<std::size_t I>
bool contains_impl(std::integral_constant<std::size_t, I>,
const U& u,
const std::tuple<Ranges...>& ranges) const
{
return std::get<I>(ranges).contains(u)
or contains_impl(std::integral_constant<std::size_t, I+1>(),u, ranges);
}
bool contains_impl(std::integral_constant<std::size_t, sizeof...(Ranges)>,
const U& u,
const std::tuple<Ranges...>& ranges) const
{
return false;
}
constexpr bool operator()(const U& u, std::tuple<Ranges...> const& ranges) const
{
return contains_impl(std::integral_constant<std::size_t, 0>(), u, ranges);
}
};
template<class U>
constexpr bool contains(const U& u) const
{
range_check<U> check {};
return check(u, _ranges);
}
std::tuple<Ranges...> _ranges;
};
}
template<class T>
constexpr auto range(T t) { return detail::range<T>(t, t); }
template<class T>
constexpr auto range(T from, T to) { return detail::range<T>(from, to); }
// this is the little trick which turns an ascii string into
// a range of characters at compile time. It's probably a bit naughty
// as I am not checking syntax. You could write "ApZ" and it would be
// interpreted as "A-Z".
constexpr auto range(const char (&s)[4])
{
return range(s[0], s[2]);
}
template<class...Rs>
constexpr auto ranges(Rs...rs)
{
return detail::ranges<Rs...>(rs...);
}
int main()
{
std::cout << range(1,7).contains(5) << std::endl;
std::cout << range("a-f").contains('b') << std::endl;
auto az = ranges(range('a'), range('z'));
std::cout << az.contains('a') << std::endl;
std::cout << az.contains('z') << std::endl;
std::cout << az.contains('p') << std::endl;
auto rs = ranges(range("a-f"), range("p-z"));
for (char ch = 'a' ; ch <= 'z' ; ++ch)
{
std::cout << ch << rs.contains(ch) << " ";
}
std::cout << std::endl;
return 0;
}
expected output:
1
1
1
1
0
a1 b1 c1 d1 e1 f1 g0 h0 i0 j0 k0 l0 m0 n0 o0 p1 q1 r1 s1 t1 u1 v1 w1 x1 y1 z1
For reference, here was my original answer:
template<class X, class Y>
bool in(X const& x, Y const& y)
{
return x == y;
}
template<class X, class Y, class...Rest>
bool in(X const& x, Y const& y, Rest const&...rest)
{
return in(x, y) or in(x, rest...);
}
int main()
{
int ch = 6;
std::cout << in(ch, 1,2,3,4,5,6,7) << std::endl;
std::string foo = "foo";
std::cout << in(foo, "bar", "foo", "baz") << std::endl;
std::cout << in(foo, "bar", "baz") << std::endl;
}
If you need to check a character against an arbitrary set of characters, you could try writing this:
std::set<char> allowed_chars = {'A', 'B', 'C', 'D', 'E', 'G', 'Q', '7', 'z'};
if(allowed_chars.find(ch) != allowed_chars.end()) {
/*...*/
}
Yet another answer on this overly-answered question, which I'm just including for completeness. Between all of the answers here you should find something that works in your application.
So another option is a lookup table:
// On initialization:
bool isAcceptable[256] = { false };
isAcceptable[(unsigned char)'A'] = true;
isAcceptable[(unsigned char)'B'] = true;
isAcceptable[(unsigned char)'C'] = true;
// When you want to check:
char c = ...;
if (isAcceptable[(unsigned char)c]) {
// it's 'A', 'B', or 'C'.
}
Scoff at the C-style static casts if you must, but they do get the job done. I suppose you could use an std::vector<bool> if arrays keep you up at night. You can also use types besides bool. But you get the idea.
Obviously this becomes cumbersome with e.g. wchar_t, and virtually unusable with multibyte encodings. But for your char example, or for anything that lends itself to a lookup table, it'll do. YMMV.
Similarly to the C strchr answer, In C++ you can construct a string and check the character against its contents:
#include <string>
...
std::string("ABCDEFGIKZ").find(c) != std::string::npos;
The above will return true for 'F' and 'Z' but false for 'z' or 'O'. This code does not assume contiguous representation of characters.
This works because std::string::find returns std::string::npos when it can't find the character in the string.
Live on Coliru
Edit:
There's another C++ method which doesn't involve dynamic allocation, but does involve an even longer piece of code:
#include <algorithm> // std::find
#include <iterator> // std::begin and std::end
...
char const chars[] = "ABCDEFGIKZ";
return std::find(std::begin(chars), std::end(chars), c) != std::end(chars);
This works similarly to the first code snippet: std::find searches the given range and returns a specific value if the item isn't found. Here, said specific value is the range's end.
Live on Coliru
One option is the unordered_set. Put the characters of interest into the set. Then just check the count of the character in question:
#include <iostream>
#include <unordered_set>
using namespace std;
int main() {
unordered_set<char> characters;
characters.insert('A');
characters.insert('B');
characters.insert('C');
// ...
if (characters.count('A')) {
cout << "found" << endl;
} else {
cout << "not found" << endl;
}
return 0;
}
There is solution to your problem, not in language but in coding practices - Refactoring.
I'm quite sure that readers will find this answer very unorthodox, but - Refactoring can, and is used often to, hide a messy piece of code behind a method call. That method can be cleaned later or it can be left as it is.
You can create the following method:
private bool characterIsValid(char ch) {
return (ch == 'A' || ch == 'B' || ch == 'C' || ..... );
}
and then this method can be called in a short form as:
if (characterIsValid(ch)) ...
Reuse that method with so many checks and only returning a boolean, anywhere.
For a simple and effective solution, you can use memchr():
#include <string.h>
const char list[] = "ABCXZ";
if (memchr(list, ch, sizeof(list) - 1)) {
// 'ch' is 'A', 'B', 'C', 'X', or 'Z'
}
Note that memchr() is better suited than strchr() for this task as strchr() would find the null character '\0' at the end of the string, which is incorrect for most cases.
If the list is dynamic or external and its length is not provided, the strchr() approach is better, but you should check if ch is different from 0 as strchr() would find it at the end of the string:
#include <string.h>
extern char list[];
if (ch && strchr(list, ch)) {
// 'ch' is one of the characters in the list
}
Another more efficient but less terse C99 specific solution uses an array:
#include <limits.h>
const char list[UCHAR_MAX + 1] = { ['A'] = 1, ['B'] = 1, ['C'] = 1, ['X'] = 1, ['Z'] = 1 };
if (list[(unsigned char)ch]) {
/* ch is one of the matching characters */
}
Note however that all of the above solutions assume ch to have char type. If ch has a different type, they would accept false positives. Here is how to fix this:
#include <string.h>
extern char list[];
if (ch == (char)ch && ch && strchr(list, ch)) {
// 'ch' is one of the characters in the list
}
Furthermore, beware of pitfalls if you are comparing unsigned char values:
unsigned char ch = 0xFF;
if (ch == '\xFF') {
/* test fails if `char` is signed by default */
}
if (memchr("\xFF", ch, 1)) {
/* test succeeds in all cases, is this OK? */
}
For this specific case you can use the fact that char is an integer and test for a range:
if(ch >= 'A' && ch <= 'C')
{
...
}
But in general this is not possible unfortunately. If you want to compress your code just use a boolean function
if(compare_char(ch))
{
...
}
The X-Y answer on the vast majority of modern systems is don't bother.
You can take advantage of the fact that practically every character encoding used today stores the alphabet in one sequentially-ordered contiguous block. A is followed by B, B is followed by C, etc... on to Z. This allows you to do simple math tricks on letters to convert the letter to a number. For example the letter C minus the letter A , 'C' - 'A', is 2, the distance between c and a.
Some character sets, EBCDIC was discussed in the comments above, are not sequential or contiguous for reasons that are out of scope for discussion here. They are rare, but occasionally you will find one. When you do... Well, most of the other answers here provide suitable solutions.
We can use this to make a mapping of letter values to letters with a simple array:
// a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p, q,r,s,t,u,v,w,x,y, z
int lettervalues[] = {1,3,3,2,1,4,2,4,1,8,5,1,3,1,1,3,10,1,1,1,1,4,4,8,4,10};
So 'c' - 'a' is 2 and lettervalues[2] will result in 3, the letter value of C.
No if statements or conditional logic required what-so-ever. All the debugging you need to do is proof reading lettervalues to make sure you entered the correct values.
As you study more in C++, you will learn that lettervalues should be static (current translation unit-only access) and const (cannot be changed), possibly constexpr (cannot be changed and fixed at compile time). If you don't know what I'm talking about, don't worry. You'll cover all three later. If not, google them. All are very useful tools.
Using this array could be as simple as
int ComputeWordScore(std::string in)
{
int score = 0;
for (char ch: in) // for all characters in string
{
score += lettervalues[ch - 'a'];
}
return score;
}
But this has two fatal blind spots:
The first is capital letters. Sorry Ayn Rand, but 'A' is not 'a', and 'A'-'a' is not zero. This can be solved by using std::tolower or std::toupper to convert all input to a known case.
int ComputeWordScore(std::string in)
{
int score = 0;
for (char ch: in) // for all characters in string
{
score += lettervalues[std::tolower(ch) - 'a'];
}
return score;
}
The other is input of characters that aren't letters. For example, '1'. 'a' - '1' will result in an array index that is not in the array. This is bad. If you're lucky your program will crash, but anything could happen, including looking as though your program works. Read up on Undefined Behaviour for more.
Fortunately this also has a simple fix: Only compute the score for good input. You can test for valid alphabet characters with std::isalpha.
int ComputeWordScore(std::string in)
{
int score = 0;
for (char ch: in) // for all characters in string
{
if (std::isalpha(ch))
{
score += lettervalues[std::tolower(ch) - 'a'];
}
else
{
// do something that makes sense here.
}
}
return score;
}
My something else would be return -1;. -1 is an impossible word score, so anyone who calls ComputeWordScore can test for -1 and reject the user's input. What they do with it is not ComputeWordScore's problem. Generally the stupider you can make a function, the better, and errors should be handled by the closest piece of code that has all the information needed to make a decision. In this case, whatever read in the string would likely be tasked with deciding what to do with bad strings and ComputeWordScore can keep on computing word scores.
Most of the terse versions have been covered, so I will cover the optimized cases with some helper macros to make them a little more terse.
It just so happens that if your range falls within your number of bits per long that you can combine all of your constants using a bitmask and just check that your value falls in the range and the variable's bitmask is non-zero when bitwise-anded with the constant bitmask.
/* This macro assumes the bits will fit in a long integer type,
* if it needs to be larger (64 bits on x32 etc...),
* you can change the shifted 1ULs to 1ULL or if range is > 64 bits,
* split it into multiple ranges or use SIMD
* It also assumes that a0 is the lowest and a9 is the highest,
* You may want to add compile time assert that:
* a9 (the highest value) - a0 (the lowest value) < max_bits
* and that a1-a8 fall within a0 to a9
*/
#define RANGE_TO_BITMASK_10(a0,a1,a2,a3,a4,a5,a6,a7,a8,a9) \
(1 | (1UL<<((a1)-(a0))) | (1UL<<((a2)-(a0))) | (1UL<<((a3)-(a0))) | \
(1UL<<((a4)-(a0))) | (1UL<<((a5)-(a0))) | (1UL<<((a6)-(a0))) | \
(1UL<<((a7)-(a0))) | (1UL<<((a8)-(a0))) | (1UL<<((a9)-(a0))) )
/*static inline*/ bool checkx(int x){
const unsigned long bitmask = /* assume 64 bits */
RANGE_TO_BITMASK_10('A','B','C','F','G','H','c','f','y','z');
unsigned temp = (unsigned)x-'A';
return ( ( temp <= ('z'-'A') ) && !!( (1ULL<<temp) & bitmask ) );
}
Since all of a# values are constants, they will be combined into 1 bitmask at compile time. That leaves 1 subtraction and 1 compare for the range, 1 shift and 1 bitwise and ... unless the compiler can optimize further, it turns out clang can (it uses the bit test instruction BTQ):
checkx: # #checkx
addl $-65, %edi
cmpl $57, %edi
ja .LBB0_1
movabsq $216172936732606695, %rax # imm = 0x3000024000000E7
btq %rdi, %rax
setb %al
retq
.LBB0_1:
xorl %eax, %eax
retq
It may look like more code on the C side, but if you are looking to optimize, this looks like it may be worth it on the assembly side. I'm sure someone could get creative with the macro to make it more useful in a real programming situations than this "proof of concept".
That will get a little complex as a macro, so here is an alternative set of macros to setup a C99 lookup table.
#include <limits.h>
#define INIT_1(v,a) [ a ] = v
#define INIT_2(v,a,...) [ a ] = v, INIT_1(v, __VA_ARGS__)
#define INIT_3(v,a,...) [ a ] = v, INIT_2(v, __VA_ARGS__)
#define INIT_4(v,a,...) [ a ] = v, INIT_3(v, __VA_ARGS__)
#define INIT_5(v,a,...) [ a ] = v, INIT_4(v, __VA_ARGS__)
#define INIT_6(v,a,...) [ a ] = v, INIT_5(v, __VA_ARGS__)
#define INIT_7(v,a,...) [ a ] = v, INIT_6(v, __VA_ARGS__)
#define INIT_8(v,a,...) [ a ] = v, INIT_7(v, __VA_ARGS__)
#define INIT_9(v,a,...) [ a ] = v, INIT_8(v, __VA_ARGS__)
#define INIT_10(v,a,...) [ a ] = v, INIT_9(v, __VA_ARGS__)
#define ISANY10(x,...) ((const unsigned char[UCHAR_MAX+1]){ \
INIT_10(-1, __VA_ARGS__) \
})[x]
bool checkX(int x){
return ISANY10(x,'A','B','C','F','G','H','c','f','y','z');
}
This method will use a (typically) 256 byte table and a lookup that reduces to something like the following in gcc:
checkX:
movslq %edi, %rdi # x, x
cmpb $0, C.2.1300(%rdi) #, C.2
setne %al #, tmp93
ret
NOTE: Clang doesn't fare as well on the lookup table in this method because it sets up const tables that occur inside functions on the stack on each function call, so you would want to use INIT_10 to initialize a static const unsigned char [UCHAR_MAX+1] outside of the function to achieve similar optimization to gcc.

Shortest way to calculate difference between two numbers?

I'm about to do this in C++ but I have had to do it in several languages, it's a fairly common and simple problem, and this is the last time. I've had enough of coding it as I do, I'm sure there must be a better method, so I'm posting here before I write out the same long winded method in yet another language;
Consider the (lilies!) following code;
// I want the difference between these two values as a positive integer
int x = 7
int y = 3
int diff;
// This means you have to find the largest number first
// before making the subtract, to keep the answer positive
if (x>y) {
diff = (x-y);
} else if (y>x) {
diff = (y-x);
} else if (x==y) {
diff = 0;
}
This may sound petty but that seems like a lot to me, just to get the difference between two numbers. Is this in fact a completely reasonable way of doing things and I'm being unnecessarily pedantic, or is my spidey sense tingling with good reason?
Just get the absolute value of the difference:
#include <cstdlib>
int diff = std::abs(x-y);
Using the std::abs() function is one clear way to do this, as others here have suggested.
But perhaps you are interested in succinctly writing this function without library calls.
In that case
diff = x > y ? x - y : y - x;
is a short way.
In your comments, you suggested that you are interested in speed. In that case, you may be interested in ways of performing this operation that do not require branching. This link describes some.
#include <cstdlib>
int main()
{
int x = 7;
int y = 3;
int diff = std::abs(x-y);
}
All the existing answers will overflow on extreme inputs, giving undefined behaviour. #craq pointed this out in a comment.
If you know that your values will fall within a narrow range, it may be fine to do as the other answers suggest, but to handle extreme inputs (i.e. to robustly handle any possible input values), you cannot simply subtract the values then apply the std::abs function. As craq rightly pointed out, the subtraction may overflow, causing undefined behaviour (consider INT_MIN - 1), and the std::abs call may also cause undefined behaviour (consider std::abs(INT_MIN)). It's no better to determine the min and max of the pair and to then perform the subtraction.
More generally, a signed int is unable to represent the maximum difference between two signed int values. The unsigned int type should be used for the output value.
I see 3 solutions. I've used the explicitly-sized integer types from stdint.h here, to close the door on uncertainties like whether long and int are the same size and range.
Solution 1. The low-level way.
// I'm unsure if it matters whether our target platform uses 2's complement,
// due to the way signed-to-unsigned conversions are defined in C and C++:
// > the value is converted by repeatedly adding or subtracting
// > one more than the maximum value that can be represented
// > in the new type until the value is in the range of the new type
uint32_t difference_int32(int32_t i, int32_t j) {
static_assert(
(-(int64_t)INT32_MIN) == (int64_t)INT32_MAX + 1,
"Unexpected numerical limits. This code assumes two's complement."
);
// Map the signed values across to the number-line of uint32_t.
// Preserves the greater-than relation, such that an input of INT32_MIN
// is mapped to 0, and an input of 0 is mapped to near the middle
// of the uint32_t number-line.
// Leverages the wrap-around behaviour of unsigned integer types.
// It would be more intuitive to set the offset to (uint32_t)(-1 * INT32_MIN)
// but that multiplication overflows the signed integer type,
// causing undefined behaviour. We get the right effect subtracting from zero.
const uint32_t offset = (uint32_t)0 - (uint32_t)(INT32_MIN);
const uint32_t i_u = (uint32_t)i + offset;
const uint32_t j_u = (uint32_t)j + offset;
const uint32_t ret = (i_u > j_u) ? (i_u - j_u) : (j_u - i_u);
return ret;
}
I tried a variation on this using bit-twiddling cleverness taken from https://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax but modern code-generators seem to generate worse code with this variation. (I've removed the static_assert and the comments.)
uint32_t difference_int32(int32_t i, int32_t j) {
const uint32_t offset = (uint32_t)0 - (uint32_t)(INT32_MIN);
const uint32_t i_u = (uint32_t)i + offset;
const uint32_t j_u = (uint32_t)j + offset;
// Surprisingly it helps code-gen in MSVC 2019 to manually factor-out
// the common subexpression. (Even with optimisation /O2)
const uint32_t t = (i_u ^ j_u) & -(i_u < j_u);
const uint32_t min = j_u ^ t; // min(i_u, j_u)
const uint32_t max = i_u ^ t; // max(i_u, j_u)
const uint32_t ret = max - min;
return ret;
}
Solution 2. The easy way. Avoid overflow by doing the work using a wider signed integer type. This approach can't be used if the input signed integer type is the largest signed integer type available.
uint32_t difference_int32(int32_t i, int32_t j) {
return (uint32_t)std::abs((int64_t)i - (int64_t)j);
}
Solution 3. The laborious way. Use flow-control to work through the different cases. Likely to be less efficient.
uint32_t difference_int32(int32_t i, int32_t j)
{ // This static assert should pass even on 1's complement.
// It's just about impossible that int32_t could ever be capable of representing
// *more* values than can uint32_t.
// Recall that in 2's complement it's the same number, but in 1's complement,
// uint32_t can represent one more value than can int32_t.
static_assert( // Must use int64_t to subtract negative number from INT32_MAX
((int64_t)INT32_MAX - (int64_t)INT32_MIN) <= (int64_t)UINT32_MAX,
"Unexpected numerical limits. Unable to represent greatest possible difference."
);
uint32_t ret;
if (i == j) {
ret = 0;
} else {
if (j > i) { // Swap them so that i > j
const int32_t i_orig = i;
i = j;
j = i_orig;
} // We may now safely assume i > j
uint32_t magnitude_of_greater; // The magnitude, i.e. abs()
bool greater_is_negative; // Zero is of course non-negative
uint32_t magnitude_of_lesser;
bool lesser_is_negative;
if (i >= 0) {
magnitude_of_greater = i;
greater_is_negative = false;
} else { // Here we know 'lesser' is also negative, but we'll keep it simple
// magnitude_of_greater = -i; // DANGEROUS, overflows if i == INT32_MIN.
magnitude_of_greater = (uint32_t)0 - (uint32_t)i;
greater_is_negative = true;
}
if (j >= 0) {
magnitude_of_lesser = j;
lesser_is_negative = false;
} else {
// magnitude_of_lesser = -j; // DANGEROUS, overflows if i == INT32_MIN.
magnitude_of_lesser = (uint32_t)0 - (uint32_t)j;
lesser_is_negative = true;
}
// Finally compute the difference between lesser and greater
if (!greater_is_negative && !lesser_is_negative) {
ret = magnitude_of_greater - magnitude_of_lesser;
} else if (greater_is_negative && lesser_is_negative) {
ret = magnitude_of_lesser - magnitude_of_greater;
} else { // One negative, one non-negative. Difference is sum of the magnitudes.
// This will never overflow.
ret = magnitude_of_lesser + magnitude_of_greater;
}
}
return ret;
}
Well it depends on what you mean by shortest. The fastet runtime, the fastest compilation, the least amount of lines, the least amount of memory. I'll assume you mean runtime.
#include <algorithm> // std::max/min
int diff = std::max(x,y)-std::min(x,y);
This does two comparisons and one operation (this one is unavoidable but could be optimized through certain bitwise operations with specific cases, compiler might actually do this for you though). Also if the compiler is smart enough it could do only one comparison and save the result for the other comparison. E.g if X>Y then you know from the first comparison that Y < X but I'm not sure if compilers take advantage of this.

C++ Integer [?]

In Java, strings have a charAt() function.
In C++, that function is simply stringname[INDEX]
However, what if I wanted to use a particular number at a certain index of an integer?
E.g.
int value = 9123;
Let's say I wanted to work with the index 0, which is just the 9.
Is there a way to use index at's with integers?
int value = 9123;
std::stringstream tmp;
tmp << value;
char digit = (tmp.str())[0];
No, there is no standard function to extract decimal digits from an integer.
In C++11, there is a function to convert to a string:
std::string string = std::to_string(value);
If you can't use C++11, then you could use a string stream:
std::ostringstream stream;
stream << value;
std::string string = stream.str();
or old-school C formatting:
char buffer[32]; // Make sure it's large enough
snprintf(buffer, sizeof buffer, "%d", value);
std::string string = buffer;
or if you just want one digit, you could extract it arithmetically:
int digits = 0;
for (int temp = value; temp != 0; temp /= 10) {
++digits;
}
// This could be replaced by "value /= std::pow(10, digits-index-1)"
// if you don't mind using floating-point arithmetic.
for (int i = digits-index-1; i > 0; --i) {
value /= 10;
}
int digit = value % 10;
Handling negative numbers in a sensible way is left as an exercise for the reader.
You can use the following formula (pseudo-code) :
currDigit = (absolute(value) / 10^index) modulo 10; // (where ^ is power-of)
Just to make things complete, you can also use boost::lexical_cast, for more info check out the documentation here.
Basically its just a nice wrapper around the code which can be found at Andreas Brinck answear.
Another solution, which does use 0 for the lestmost digit. digits is used to break down value into individual digits in written order. (i.e. "9347" becomes 9,3,4,7). We then discard the first index values. I.e. to get the 3nd digit, we discard the first two and take the new front.
if (value==0 && index ==0) return 0; // Special case.
if (value <0) { ... } // Unclear what to do with this.
std::list<char> digits;
while (value) {
digits.push_front(value % 10);
value /= 10;
}
for(; index > 0 && !digits.empty(); index--) {
digits.pop_front();
}
if (!digits.empty()) {
return digits.front();
} else
{
throw std::invalid_argument("Index too large");
}
An integer is not a string and therefor you can not do that. What you need is indeed to convert an integer to string. You can use itoa or have a look here.
Try sprintf to write the integer out to a string:
http://www.cplusplus.com/reference/clibrary/cstdio/sprintf/
Then you can index into the char array that you've just printed into.
I've implemented a variant of giorashc s solution, with all the suggested fixes and issues resolved: Its a bit long but it should be fast if everything is inlined: Most of the code is tests which I've left in for completeness.
#include <iostream>
#include <math.h>
char get_kth_digit( int v, int index)
{
assert(v>0);
int mask = pow(10,index);
return '0'+(v % (mask*10))/mask;
}
int count_digits( int v )
{
assert(v>0);
int c=0;
while(v>0)
{
++c;
v/=10;
}
return c;
}
char get_int_index(int v, int index)
{
if( v==0 ) return '0';
if( v < 0 )
{
if(index==0) { return '-'; }
return get_int_index(-v,index-1);
}
// get_kth_digit counts the wrong way, so we need to reverse the count
int digits = count_digits(v);
return get_kth_digit( v, digits-index-1);
}
template<typename X, typename Y>
void compare(const X & v1, const Y & v2, const char * v1t, const char * v2t, uint32_t line, const char * fname )
{
if(v1!=v2)
{
std::cerr<<fname<<":"<<line<<": Equality test failed "<< v1t << "("<<v1<<") <> " << v2t <<" ("<<v2<<")"<<std::endl;
}
}
#define test_eq(X,Y) compare(X,Y,#X,#Y,__LINE__,__FILE__)
int main()
{
test_eq( 1, count_digits(1) );
test_eq( 1, count_digits(9) );
test_eq( 2, count_digits(10) );
test_eq( 2, count_digits(99) );
test_eq( 3, count_digits(100) );
test_eq( 3, count_digits(999) );
test_eq( '1', get_kth_digit(123,2) );
test_eq( '2', get_kth_digit(123,1) );
test_eq( '3', get_kth_digit(123,0) );
test_eq( '0', get_kth_digit(10,0) );
test_eq( '1', get_kth_digit(10,1) );
test_eq( '1', get_int_index(123,0) );
test_eq( '2', get_int_index(123,1) );
test_eq( '3', get_int_index(123,2) );
test_eq( '-', get_int_index(-123,0) );
test_eq( '1', get_int_index(-123,1) );
test_eq( '2', get_int_index(-123,2) );
test_eq( '3', get_int_index(-123,3) );
}
Longer version respect to Andreas Brink.
The C++ library is designed so that between "sequences" and "values" there is a "mediator" named "stream", that actually act as a translator from the value to their respecting sequence.
"sequences" is an abstract concept whose concrete implementation are "strings" and "files".
"stream" is another abstract concept whose correspondent concrete implementation are "stringstream" and "fstream", that are implemented in term of helper classes "stringbuf" and "filebuf" (both derived form the abstract "streambuf") and from a helper object of "locale" class, containing some "facets".
The cited answer code, works this way:
The tmp object of class stringstream is default-constructed: this will construct also internally a stingbuf and a string, plus a locale referencing the facets of the system global locale (the default one remaps the "classic" or "C" locale)
The operator<< between stream and int function is called: there is one of them, for all the basic types
The "int version" gets the num_put facet from the locale, and a "buffer iterator" from the buffer, and calls the put function passing the format flags of the given stream.
the "put function" actually converts the number into the character sequence thus filling the buffer
When the buffer is full, or when a particular character is inserted or when the str function is called, the buffer content is "sent" (copyed, in this case) to the string, and the string content returned.
This very convoluted process looks complex at first but:
Can be completely hidden (resulting in two lines of code)
Cam be extended to virtually anything but...
It is often kept as a (sort of ) misery in its details in the most of C++ courses and tutorials
I would convert it to a string, then index it -- CPP also has the:
str.at(i)
function similar to Java's.
Another simpler loop in C++11 would be a range based loop --
int i = 0
for(auto s : int_or_str){
if(i == idx)
cout << s;
else
i++
}
I guess this isn't easier than the standard for loop -- thought auto may be helpful, not really. I know this is answered, but I prefer simple and familiar answers.
Zach

How to convert large integers to base 2^32?

First off, I'm doing this for myself so please don't suggest "use GMP / xint / bignum" (if it even applies).
I'm looking for a way to convert large integers (say, OVER 9000 digits) into a int32 array of 232 representations. The numbers will start out as base 10 strings.
For example, if I wanted to convert string a = "4294967300" (in base 10), which is just over INT_MAX, to the new base 232 array, it would be int32_t b[] = {1,5}. If int32_t b[] = {3,2485738}, the base 10 number would be 3 * 2^32 + 2485738. Obviously the numbers I'll be working with are beyond the range of even int64 so I can't exactly turn the string into an integer and mod my way to success.
I have a function that does subtraction in base 10. Right now I'm thinking I'll just do subtraction(char* number, "2^32") and count how many times before I get a negative number, but that will probably take a long time for larger numbers.
Can someone suggest a different method of conversion? Thanks.
EDIT
Sorry in case you didn't see the tag, I'm working in C++
Assuming your bignum class already has multiplication and addition, it's fairly simple:
bignum str_to_big(char* str) {
bignum result(0);
while (*str) {
result *= 10;
result += (*str - '0');
str = str + 1;
}
return result;
}
Converting the other way is the same concept, but requires division and modulo
std::string big_to_str(bignum num) {
std::string result;
do {
result.push_back(num%10);
num /= 10;
} while(num > 0);
std::reverse(result.begin(), result.end());
return result;
}
Both of these are for unsigned only.
To convert from base 10 strings to your numbering system, starting with zero continue adding and multiplying each base 10 digit by 10. Every time you have a carry add a new digit to your base 2^32 array.
The simplest (not the most efficient) way to do this is to write two functions, one to multiply a large number by an int, and one to add an int to a large number. If you ignore the complexities introduced by signed numbers, the code looks something like this:
(EDITED to use vector for clarity and to add code for actual question)
void mulbig(vector<uint32_t> &bignum, uint16_t multiplicand)
{
uint32_t carry=0;
for( unsigned i=0; i<bignum.size(); i++ ) {
uint64_t r=((uint64_t)bignum[i] * multiplicand) + carry;
bignum[i]=(uint32_t)(r&0xffffffff);
carry=(uint32_t)(r>>32);
}
if( carry )
bignum.push_back(carry);
}
void addbig(vector<uint32_t> &bignum, uint16_t addend)
{
uint32_t carry=addend;
for( unsigned i=0; carry && i<bignum.size(); i++ ) {
uint64_t r=(uint64_t)bignum[i] + carry;
bignum[i]=(uint32_t)(r&0xffffffff);
carry=(uint32_t)(r>>32);
}
if( carry )
bignum.push_back(carry);
}
Then, implementing atobignum() using those functions is trivial:
void atobignum(const char *str,vector<uint32_t> &bignum)
{
bignum.clear();
bignum.push_back(0);
while( *str ) {
mulbig(bignum,10);
addbig(bignum,*str-'0');
++str;
}
}
I think Docjar: gnu/java/math/MPN.java might contain what you're looking for, specifically the code for public static int set_str (int dest[], byte[] str, int str_len, int base).
Start by converting the number to binary. Starting from the right, each group of 32 bits is a single base2^32 digit.