How to convert a string into an arbitrary length integer - c++

I'm working on a project to implement multi-precision arithmetic in C++. I've sort of fallen at the first hurdle. I'm trying to convert a c-string into the binary representation of an integer that will probably contain more bits than an int can hold (this may be an arbitrary number of bits in theory). I essentially want to create an array of longs that will hold the binary representation of the number contained in the string, with index 0 being the least significant "limb" of the number. I am assuming that the number is in base ten.
I have already looked into using code from GMP but it is needlessly complex for my needs and has huge amounts of platform dependent code.
Any help would be great! If you require more details let me know.

Like #SteveJessop said
class Number {
public:
Number();
void FromString( const char * );
void operator *= ( int );
void operator += ( int );
void operator = ( int );
}
Number::FromString( const char * string )
{
*this = 0;
while( *string != '\0' ) {
*this *= 10;
*this += *string - '0';
string++;
}
}

The first thing you want to do is have a working test engine. This is a brain-dead, easy to understand, arbitrary precision arithmetic engine.
The purpose of this engine is a few fold. First, it makes converting strings into arbitrary precision integers really easy. Second, it is a means to test your later, improved engines. Even if it is really slow, you'll be more convinced it is correct (and having two independent implementations means corner case errors in one might be caught in another, even if you aren't more confident in either).
Assumes short is at least 16 bits and char is at least 8 (use the actual int_8 style types if your compiler supports them)
short Add(unsigned char left, unsigned char right, unsigned char extra=0) { return unsigned short(left)+unsigned short(right)+unsigned short(extra); }
unsigned short Multiply(unsigned char left, unsigned char right) { return unsigned short(left)*unsigned short(right); }
std::pair<unsigned char,unsigned char> CarryCalc(unsigned short input) {
std::pair<unsigned char,unsigned char> retval;
retval.first = input & (1<<8-1);
retval.second = input>>8;
return retval;
}
struct BigNum {
std::vector<char> base256;
BigNum& operator+=( BigNum const& right ) {
if (right.base256.size() > base256.size())
base256.resize(right.base256.size());
auto lhs = base256.begin();
auto rhs = right.base256.begin();
char carry = 0;
for(; rhs != right.base256.end(); ++rhs, ++lhs) {
auto result = CarryCalc( Add( *lhs, *rhs, carry ) );
*lhs = result.first;
carry = result.second;
}
while( carry && lhs != base256.end() ) {
auto result = CarryCalc( Add( *lhs, 0, carry ) );
*lhs = result.first;
carry = result.second;
}
if (carry)
base256.push_back(carry);
return *this;
}
BigNum& scaleByChar( unsigned char right ) {
char carry = 0;
for(auto lhs = base256.begin(); lhs != base256.end(); ++lhs) {
unsigned short product = Multiply( *lhs, right );
product += carry;
auto result = CarryCalc( product );
*lhs = result.first;
carry = result.second;
}
if (carry)
base256.push_back(carry);
return *this;
}
BigNum& shiftRightBy8BitsTimes( unsigned int x ) {
if (x > base256.size()) {
base256.clear();
return *this;
}
base256.erase( base256.begin(), base256.begin()+x) )
return *this;
}
// very slow, O(x * n) -- should be O(n) at worst
BigNum& shiftLeftBy8BitsTimes( unsigned int x ) {
while( x != 0 ) {
base256.insert( base256.begin(), 0 );
--x;
}
return *this;
}
// very slow, like O(n^3) or worse (should be O(n^2) at worst, fix shiftLeft)
BigNum& operator*=( BigNum const& right ) {
unsigned int digit = 0;
BigNum retval;
while (digit < right.base256.size()) {
BigNum tmp = *this;
tmp.shiftLeftBy8BitsTimes( digit );
tmp.scaleByChar( right.base256[digit] );
retval += tmp;
++digit;
}
*this = retval;
return *this;
}
};
which is a quick and dirty arbitrary precision integer type (not even compiled yet) with horrible performance. Test something like the above, convince yourself it is solid, then build up from there.
Much of your code could take the actual BigNum class in question as a template argument, so you can do the same algorithm with two different implementations, and compare the results for testing purposes.
Oh, and another piece of advice -- write a template class that "improves" a bare-bones arbitrary precision library via CRTP. The goal is to only have to write *=, +=, unary -, and maybe /= and some shift_helper and compare_helper functions, and have the rest of your methods automatically written for you by the template. By putting the boilerplate in one spot it makes it easier to maintain more than one version of your BigNum class: and having more than one version is very important for testing purposes.

Related

Efficient way to convert int to string

I'm creating a game in which I have a main loop. During one cycle of this loop, I have to convert int value to string about ~50-100 times. So far I've been using this function:
std::string Util::intToString(int val)
{
std::ostringstream s;
s << val;
return s.str();
}
But it doesn't seem to be quite efficient as I've encountered FPS drop from ~120 (without using this function) to ~95 (while using it).
Is there any other way to convert int to string that would be much more efficient than my function?
It's 1-72 range. I don't have to deal with negatives.
Pre-create an array/vector of 73 string objects, and use an index to get your string. Returning a const reference will let you save on allocations/deallocations, too:
// Initialize smallNumbers to strings "0", "1", "2", ...
static vector<string> smallNumbers;
const string& smallIntToString(unsigned int val) {
return smallNumbers[val < smallNumbers.size() ? val : 0];
}
The standard std::to_string function might be a useful.
However, in this case I'm wondering if maybe it's not the copying of the string when returning it might be as big a bottleneck? If so you could pass the destination string as a reference argument to the function instead. However, if you have std::to_string then the compiler probably is C++11 compatible and can use move semantics instead of copying.
Yep — fall back on functions from C, as explored in this previous answer:
namespace boost {
template<>
inline std::string lexical_cast(const int& arg)
{
char buffer[65]; // large enough for arg < 2^200
ltoa( arg, buffer, 10 );
return std::string( buffer ); // RVO will take place here
}
}//namespace boost
In theory, this new specialisation will take effect throughout the rest of the Translation Unit in which you defined it. ltoa is much faster (despite being non-standard) than constructing and using a stringstream.
However, I've experienced problems with name conflicts between instantiations of this specialisation, and instantiations of the original function template, between competing shared libraries.
In order to get around that, I actually just give this function a whole new name entirely:
template <typename T>
inline std::string fast_lexical_cast(const T& arg)
{
return boost::lexical_cast<std::string>(arg);
}
template <>
inline std::string my_fast_lexical_cast(const int& arg)
{
char buffer[65];
if (!ltoa(arg, buffer, 10)) {
boost::throw_exception(boost::bad_lexical_cast(
typeid(std::string), typeid(int)
));
}
return std::string(buffer);
}
Usage: std::string myString = fast_lexical_cast<std::string>(42);
Disclaimer: this modification is reverse-engineered from Kirill's original SO code, not the version that I created and put into production from my company codebase. I can't think right now, though, of any other significant modifications that I made to it.
Something like this:
const int size = 12;
char buf[size+1];
buf[size] = 0;
int index = size;
bool neg = false
if (val < 0) { // Obviously don't need this if val is always positive.
neg = true;
val = -val;
}
do
{
buf[--index] = (val % 10) + '0';
val /= 10;
} while(val);
if (neg)
{
buf[--index] = '-';
}
return std::string(&buf[index]);
I use this:
void append_uint_to_str(string & s, unsigned int i)
{
if(i > 9)
append_uint_to_str(s, i / 10);
s += '0' + i % 10;
}
If You want negative insert:
if(i < 0)
{
s += '-';
i = -i;
}
at the beginning of function.

Pointer type casting altering unintended memory

#define ARRAY_SIZE 20
float DataSource[ARRAY_SIZE];
void Read(unsigned char const *Source, unsigned char *Destination, unsigned long DataSize)
{
for ( unsigned long i = 0; i < DataSize; i++)
{
*(Destination + i*DataSize) = *(Source + i*DataSize);
}
}
void fun()
{
int Index;
float Dest;
for ( Index = 0; Index < ARRAY_SIZE; Index++ )
{
Read((unsigned char *)&DataSource[Index], (unsigned char *)&Dest, sizeof(DataSource[Index]));
}
}
I'm having an issue with the above code where upon calling Read(), my Index variable gets overwritten and I am certain the ugly pointer casting is the culprit, but I'm having trouble understanding exactly what is happening here.
The unsigned char pointer types are mandatory because the above code is intended to simulate some driver level software and maintain the same prototype.
Can someone help me to understand the issue here? All the above code is changeable except for the prototype of Read().
The error is here:
for ( unsigned long i = 0; i < DataSize; i++)
{
// vvvvvvvvvv vvvvvvvvvv
*(Destination + i*DataSize) = *(Source + i*DataSize);
}
i * DataSize is always greater than i => "out of bound" access.
Replace with:
for ( unsigned long i = 0; i < DataSize; i++)
{
*(Destination + i) = *(Source + i);
}
You pass in a single float's address to Read (&Dest) and then proceed to write many valuese to consecutive memory locations. Since you're writing random memory at that point it's not unlikely that it could have overwritten index (and other stuff) because stacks usually grow downwards.
This is wrong:
*(Destination + i*DataSize) = *(Source + i*DataSize);
You want to copy DataSize adjacent bytes, not bytes DataSize apart (total span DataSize*DataSize)
Just say
Destination[i] = Source[i];
An amusing (to me) C++ way.
template<typename Data>
struct MemBlockRefHelper {
typedef Data value_type;
Data* data;
size_t size;
MemBlockRefHelper( Data* d, size_t s):data(d), size(s) {}
template<typename Target, typename Other=typename Target::value_type>
Target& Assign( MemBlockRefHelper<Other> const& other ) {
Assert(size == other.size);
for (size_t i = 0; i < size; ++i) {
if (i < other.size) {
data[i] = other.data[i];
} else {
data[i] = 0;
}
}
Target* self = static_cast<Target*>(this);
return *self;
}
};
struct MemBlockRef;
struct MemBlockCRef:MemBlockRefHelper<const unsigned char> {
MemBlockCRef( const unsigned char* d, size_t s ):MemBlockRefHelper<const unsigned char>( d, s ) {}
MemBlockCRef( const MemBlockRef& other );
};
struct MemBlockRef:MemBlockRefHelper<unsigned char> {
MemBlockRef( unsigned char* d, size_t s ):MemBlockRefHelper<unsigned char>( d, s ) {}
MemBlockRef& operator=( MemBlockRef const& other ) {
return Assign< MemBlockRef >( other );
}
MemBlockRef& operator=( MemBlockCRef const& other ) {
return Assign< MemBlockRef, const unsigned char >( other );
}
};
inline MemBlockCRef::MemBlockCRef( const MemBlockRef& other ): MemBlockRefHelper<const unsigned char>( other.data, other.size ) {}
void Read( unsigned char const* Source, unsigned char* Dest, unsigned long DataSize ) {
MemBlockCRef src( Source, DataSize );
MemBlockRef dest( Dest, DataSize );
dest = src;
}
massively over engineered, but the idea is to wrap up the idea of a block of POD memory of a certain size, and provide reference semantics to its contents (initialization is creating a new reference to the same data, assignment does a copy over the referred to data).
Once you have such classes, the code for Read becomes a 3 liner. Well, you can do it in one:
MemBlockRef( Dest, DataSize ) = MemBlockCRef( Source, DataSize );
but that is needless.
Well, so it this entire framework.
But I was amused by writing it.
Let's take a closer look at your Read(): i changes from 0 to DataSize-1; each time you access memory by an offset of i*DataSize... that is, by an offset from 0 to DataSize*(DataSize-1). Looks wrong, as DataSize**2-DataSize makes no sense.
Unlike other answers, I don't want to guess what you wanted. Just showing a kind of "dimensional analysis" that can help spotting the wrongest part of code without reading the author's mind.
You are treating the scalar variable Dest declared inside fun() as an array inside Read(). It seems that both Dest and your Index variable are placed adjacent on the stack which explains that Index gets overwritten exactly when the loop inside Read() is executed for i==1.
So the solution is: declare Dest as an array, too:
float Dest[ARRAY_SIZE];

boost::dynamic_bitset concat performance

I want to concat a big bitset with a smaller one in a way that wont kill performance. Currently my application spends 20% of cpu time in just the following code:
boost::dynamic_bitset<> encode(const std::vector<char>& data)
{
boost::dynamic_bitset<> result;
std::for_each(data.begin(), data.end(), [&](unsigned char symbol)
{
for(size_t n = 0; n < codes_[symbol].size(); ++n)
result.push_back(codes_[symbol][n]); // codes_[symbol][n].size() avarage ~5 bits
});
return result;
}
I have read this post which proposes a solution, which unfortunately will not work for me as the size difference between the sizes of destination bitset and the source bitset is very large.
Any ideas?
If this is not possible to do efficiently with boost::dynamic_bitset then I'm open for other suggestions.
This is because you keep using push_back(), but in actual fact, you already know the size in advance. This means lots of redundant copying and reallocating. You should resize it first. In addition, you don't have to push_back() every value- it should be possible for you to use some form of insert() (I don't actually know it's exact interface, but I think append() is the name) to insert the whole target vector at once, which should be significantly better.
In addition, you're leaving the dynamic_bitset as unsigned long, but as far as I can see, you're only actually inserting unsigned char into it. Changing that could make life easier for you.
I'm also curious as to what type codes_ is- if it's a map you could replace it with a vector, or infact since it's statically sized maximally (256 entries is the max of an unsigned char) , a static array.
I've tried using boost bitset in performance code before and been disappointed. I dug into it a bit, and concluded I'd be better off implementing my own bit-buffer class, although I forget the details of what convinced me boost's class was never going to be fast (I did get as far as inspecting the assembly produced).
I still don't know what the fastest way of building bit-buffers/bitsets/bitstreams or whatever you want to call them is. A colleague is trying to find out with this related question, but at time of writing it's still awaiting a good answer.
I wrote my own bitset class. I appreciate any suggestions for improvements. I will try to look into SSE and see if there is anything useful there.
With my very rough benchmark I got a 11x performance increase while appending 6 bits at a time.
class fast_bitset
{
public:
typedef unsigned long block_type;
static const size_t bits_per_block = sizeof(block_type)*8;
fast_bitset()
: is_open_(true)
, blocks_(1)
, space_(blocks_.size()*bits_per_block){}
void append(const fast_bitset& other)
{
assert(!other.is_open_);
for(size_t n = 0; n < other.blocks_.size()-1; ++n)
append(other.blocks_[n], bits_per_block);
append(other.blocks_.back() >> other.space_, bits_per_block - other.space_);
}
void append(block_type value, size_t n_bits)
{
assert(is_open_);
assert(n_bits < bits_per_block);
if(space_ < n_bits)
{
blocks_.back() = blocks_.back() << space_;
blocks_.back() = blocks_.back() | (value >> (n_bits - space_));
blocks_.push_back(value);
space_ = bits_per_block - (n_bits - space_);
}
else
{
blocks_.back() = blocks_.back() << n_bits;
blocks_.back() = blocks_.back() | value;
space_ -= n_bits;
}
}
void push_back(bool bit)
{
append(bit, 1);
}
bool operator[](size_t index) const
{
assert(!is_open_);
static const size_t high_bit = 1 << (bits_per_block-1);
const size_t block_index = index / bits_per_block;
const size_t bit_index = index % bits_per_block;
const size_t bit_mask = high_bit >> bit_index;
return blocks_[block_index] & bit_mask;
}
void close()
{
blocks_.back() = blocks_.back() << space_;
is_open_ = false;
}
size_t size() const
{
return blocks_.size()*bits_per_block-space_;
}
const std::vector<block_type>& blocks() const {return blocks_;}
class reader
{
public:
reader(const fast_bitset& bitset)
: bitset_(bitset)
, index_(0)
, size_(bitset.size()){}
bool next_bit(){return bitset_[index_++];}
bool eof() const{return index_ >= size_;}
private:
const fast_bitset& bitset_;
size_t index_;
size_t size_;
};
private:
bool is_open_;
std::vector<block_type> blocks_;
size_t space_;
};

What is the best way of comparing a string variable to a set of string constants?

if statement looks too awkward, because i need a possibility to increase the number of constatnts.
Sorry for leading you into delusion by that "constant" instead of what i meant.
Add all your constants to a std::set then you can check if the set contains your string with
std::set<std::string> myLookup;
//populate the set with your strings here
set<std::string>::size_type i;
i = myLookup.count(searchTerm);
if( i )
std::cout << "Found";
else
std::cout << "Not found";
Depends whether you care about performance.
If not, then the simplest code is probably to put the various strings in an array (or vector if you mean you want to increase the number of constants at run time). This will also be pretty fast for a small number of strings:
static const char *const strings[] = { "fee", "fie", "fo", "fum" };
static const int num_strings = sizeof(strings) / sizeof(char*);
Then either:
int main() {
const char *search = "foe";
bool match = false;
for (int i = 0; i < num_strings; ++i) {
if (std::strcmp(search, strings[i]) == 0) match = true;
}
}
Or:
struct stringequal {
const char *const lhs;
stringequal(const char *l) : lhs(l) {}
bool operator()(const char *rhs) {
return std::strcmp(lhs, rhs) == 0;
}
};
int main() {
const char *search = "foe";
std::find_if(strings, strings+num_strings, stringequal(search));
}
[Warning: I haven't tested the above code, and I've got the signatures wrong several times already...]
If you do care about performance, and there are a reasonable number of strings, then one quick option would be something like a Trie. But that's a lot of effort since there isn't one in the standard C++ library. You can get much of the benefit either using a sorted array/vector, searched with std::binary_search:
// These strings MUST be in ASCII-alphabetical order. Don't add "foo" to the end!
static const char *const strings[] = { "fee", "fie", "fo", "fum" };
static const int num_strings = sizeof(strings) / sizeof(char*);
bool stringcompare(const char *lhs, const char *rhs) {
return std::strcmp(lhs, rhs) < 0;
}
std::binary_search(strings, strings+num_strings, "foe", stringcompare);
... or use a std::set. But unless you're changing the set of strings at runtime, there is no advantage to using a set over a sorted array with binary search, and a set (or vector) has to be filled in with code whereas an array can be statically initialized. I think C++0x will improve things, with initializer lists for collections.
Put the strings to be compared in a static vector or set and then use std::find algorithm.
The technically best solution is: build a 'perfect hash function' tailored to your set of string constants, so later there are no collisions during hashing.
const char * values[]= { "foo", "bar", ..., 0 };
bool IsValue( const std::string & s ) {
int i = 0;
while( values[i] ) {
if ( s == values[i] ) {
return true;
}
i++;
}
return false;
}
Or use a std::set.

C++ Qt: bitwise operations

I'm working on a little project for college, and I need to model transmission over network, and to impment and visualize different sorts of error correction algorithms. My improvized packet consists of one quint8: I need to convert it into a bit array, like QBitArray, append a check bit to it, trasfer it over UDP, check the success of transmission with the check bit, and then construct quint8 out of it.
Once again, it's not a practical but educational task, so don't suggest me to use real algoriths like CRC...
So my question is: how do I convert any data type (in this case quint8) into QBitArray? I mean any data in computer is a bit array, but how do I access it is the question.
Thanks, Dmitri.
Lets see if we can get it correct
template < class T >
static QBitArray toQBit ( const T &obj ) {
int const bitsInByte= 8;
int const bytsInObject= sizeof(T);
const quint8 *data = static_cast<const quint8*>(&obj) ;
QBitArray result(bytsInObject*bitsInByte);
for ( int byte=0; byte<bytsInObject ; ++byte ) {
for ( int bit=0; bit<bitsInByte; ++bit ) {
result.setBit ( byte*bitsInByte + bit, data[byte] & (1<<bit) ) ;
}
}
return result;
}
void Foo () {
Bar b ;
QBitArray qb = toQBit ( b ) ;
}
qint8 is actually signed char. So you can treat your objs as a char array.
template < class T >
QBitArray toQBit ( T &obj ) {
int len = sizeof(obj) * 8 ;
qint8 *data = (qint8*)(&obj) ;
QBitArray result ;
for ( int i=0; i< sizeof(data); ++i ) {
for ( int j=0; j<8; ++j ) {
result.setBit ( i*8 + j, data[i] & (1<<j) ) ;
}
}
return result;
}
void Foo () {
Bar b ;
QBitArray qb = toQBit ( b ) ;
}
I don't see any point in declaring template function and then casting its argument to uint8.
Solution for types that can be promoted to unsigned long
#include <bitset>
template <typename T>
QBitArray toQBit(T val) {
std::bitset<sizeof(T) * 8> bs(val);
QBitArray result(bs.size());
for (int ii = 0; ii < bs.size(); ++ii) {
result.setBit(ii, bs.test(ii));
}
return result;
}
There is no way to generically convert any data type to bit array. Especially if your data type contains pointers you probably want to transfer the pointee not the pointer. So any complex type should be treated separately. And be aware about different endiannes (little-endian and big-endian) in different architectures. I think that std::bitset is safe according to this problem, but for example casting a pointer to struct to a char array and storing its bits may not be safe.