What data structure should I use for BigInt class - c++

I would like to implement a BigInt class which will be able to handle really big numbers. I want only to add and multiply numbers, however the class should also handle negative numbers.
I wanted to represent the number as a string, but there is a big overhead with converting string to int and back for adding. I want to implement addition as on the high school, add corresponding order and if the result is bigger than 10, add the carry to next order.
Then I thought that it would be better to handle it as a array of unsigned long long int and keep the sign separated by bool. With this I'm afraid of size of the int, as C++ standard as far as I know guarantees only that int < float < double. Correct me if I'm wrong. So when I reach some number I should move in array forward and start adding number to the next array position.
Is there any data structure that is appropriate or better for this?

So, you want a dynamic array of integers of a well known size?
Sounds like vector<uint32_t> should work for you.

As you already found out, you will need to use specific types in your platform (or the language if you have C++11) that have a fixed size. A common implementation of big number would use 32bit integers and ensure that only the lower 16 bits are set. This enables you to operate on the digits (where digit would be [0..2^16) ) and then normalize the result by applying the carry-overs.

On a modern, 64-bit x86 platform, the best approach is probably to store your bigint as a dynamically-allocated array of unsigned 32-bit integers, so your arithmetic can fit in 64 bits. You can handle your sign separately, as a member variable of the class, or you can use 2's-complement arithmetic (which is how signed int's are typically represented).
The standard C <stdint.h> include file defines uint32_t and uint64_t, so you can avoid platform-dependent integer types. Or, (if your platform doesn't provide these), you can improvise and define this kind of thing yourself -- preferably in a separate "platform_dependent.h" file...

Related

Parsing long strings of binary data with C++

I am looking for idea how to parse long binary data, so for example :"10100011111000111001"
bits: 0-4 are the id
bits 5-15 are the data
etc etc...
the binary data structure can be change so I need to build a kind of data-base will store the data how to parse each string.
illustration (it could be 200~ bits) :
Ideas how to implement it?
Thanks
Edit
What am I missing here?
struct Bitfield {
uint16_t a : 10 , b:6;};
void diag(){
uint16_t t= 61455;
struct Bitfield test = {t};
cout<<"a: "<<test.a<<endl;
cout<<"b: "<<test.b<<endl;
return;}
and the output is:
a: 15
b: 0
Options available
To manage a large structured set of bits, you have the following options:
C++ bit-fields: you define a structure with bitfield members. You can have as many members as you want, provided that each single one has no more bits than an unsigned long long.
It's super easy to use; The compiler manages the access to bits or groups of bits for you. The major inconvenience is that the bit layout is implementation dependent. So this is not an option for writing portable code that exchanges data in a binary format.
Container of unsigned integral type: you define an array large enough to hold the all the bits, and access bits or groups of bits using a combination of logical operations.
It requires to be at ease with binary operations and is not practical if groups of bits are split over consecutive elements. For exchanging data in binary format with the outside world in a protable way, you'd need to either take care of differences between big and little endian architectures or use arrays of uint8_t.
std::vector<bool>: gives you total flexibility to manage you bits. The main constraint is that you need to address each bit separately. Moreover, there's no data() member that could give direct access to the binary data .
std::bitset: is very similar to vector<bool> for accessing bits. It has a fixed size at compile time, but offers useful features such as reading and writing binary in ascci from strings or streams]5, converting from binary values of integral types, and logical operations on the full bitset.
A combination of these techniques
Make your choice
To communicate with the outside world in a portable way, the easiest approach is to use bitsets. Bitsets offer easy input/output/string conversion in a format using ascci '0' or '1' (or any substitutes thereof)
bitset<msg_header_size> bh,bh2;
bitset<msg_body_size> bb,bb2;
cin>>bh>>bb; // reads a string od ascii 0 and 1
cout<<bh<<"-"<<bb<<endl<<endl; // writes a string of ascii 0 and 1
You can also convert from/to binary data (but a single element, large enough for the bitset size):
bitset<8> b(static_cast<uint8_t>(c));
cout<<b<<endl;
cout<<b.to_ulong()<<endl;
For reading/writing large sets, you'd need to read small bitsets and use logical operators to aggregate them in a larger bitset. It this seems time consuming, it's in fact very close to what you'd do in containers of integrals, but without having to care about byte boundaries.
In your case, with a fixed size header and a maximum size, the bitset seems to be a good choice (be careful however because the variable part is right justified) for exchanging binary data with the external world.
For working the data content, it's easy to access a specific bit, but you have to use some logical operations (shift, and) to access to groups of bits. Moreover, if you want readable and maintainable code, it's better to abstract the bit layout.
Conclusion:
I would therefore strongly advise to use internally a bit-field structure for working with the data and keep a comparable memory footprint than the original data and at the same time, use bitsets just to convert from/to this structure for the purpose of external data exchanges.
The "best way" depends on the details of the problem.
If the whole number fits into the largest integer type available (usually long long), convert the string into an integer first (for example with stoi/stol/stoll functions, assuming C++11 is available). Then use bit-shifting combined with binary and (&) to extract the sections of the value you are interested in.
If the whole number does not fit into the largest integer type available, chop it up as a string (using the substr function) and then convert the substrings into integers one by one.

Is the using `int` is more preferably than the using of `unsigned int`? [duplicate]

Should one ever declare a variable as an unsigned int if they don't require the extra range of values? For example, when declaring the variable in a for loop, if you know it's not going to be negative, does it matter? Is one faster than the other? Is it bad to declare an unsigned int just as unsigned in C++?
To reitterate, should it be done even if the extra range is not required? I heard they should be avoided because they cause confusion (IIRC that's why Java doesn't have them).
The reason to use uints is that it gives the compiler a wider variety of optimizations. For example, it may replace an instance of 'abs(x)' with 'x' if it knows that x is positive. It also opens up a variety of bitwise 'strength reductions' that only work for positive numbers. If you always mult/divide an int by a power of two, then the compiler may replace the operation with a bit shift (ie x*8 == x<<3) which tends to perform much faster. Unfortunately, this relation only holds if 'x' is positive because negative numbers are encoded in a way that precludes this. With ints, the compiler may apply this trick if it can prove that the value is always positive (or can be modified earlier in the code to be so). In the case of uints, this attribute is trivial to prove, which greatly increases the odds of it being applied.
Another example might be the equation y = 16 * x + 12. If x can be negative, then a multiply and add would be required. Yet if x is always positive, then not only can the x*16 term be replaced with x<<4, but since the term would always end with four zeros this opens up replacing the '+ 12' with a binary OR (as long as the '12' term is less than 16). The result would be y = (x<<4) | 12.
In general, the 'unsigned' qualifier gives the compiler more information about the variable, which in turn allows it to squeeze in more optimizations.
You should use unsigned integers when it doesn't make sense for them to have negative values. This is completely independent of the range issue. So yes, you should use unsigned integer types even if the extra range is not required, and no, you shouldn't use unsigned ints (or anything else) if not necessary, but you need to revise your definition of what is necessary.
More often than not, you should use unsigned integers.
They are more predictable in terms of undefined behavior on overflow and such.
This is a huge subject of its own, so I won't say much more about it.
It's a very good reason to avoid signed integers unless you actually need signed values.
Also, they are easier to work with when range-checking -- you don't have to check for negative values.
Typical rules of thumb:
If you are writing a forward for loop with an index as the control variable, you almost always want unsigned integers. In fact, you almost always want size_t.
If you're writing a reverse for loop with an index as a the control variable, you should probably use signed integers, for obvious reasons. Probably ptrdiff_t would do.
The one thing to be careful with is when casting between signed and unsigned values of different sizes.
You probably want to double-check (or triple-check) to make sure the cast is working the way you expect.
int is the general purpose integer type. If you need an integer, and int meets your requirements (range [-32767,32767]), then use it.
If you have more specialized purposes, then you can choose something else. If you need an index into an array, then use size_t. If you need an index into a vector, then use std::vector<T>::size_type. If you need specific sizes, then pick something from <cstdint>. If you need something larger than 64 bits, then find a library like gmp.
I can't think of any good reasons to use unsigned int. At least, not directly (size_t and some of the specifically sized types from <cstdint> may be typedefs of unsigned int).
The problem with the systematic use of unsigned when values can't be negative isn't that Java doesn't have unsigned, it is that expressions with unsigned values, especially when mixed with signed one, give sometimes confusing results if you think about unsigned as an integer type with a shifted range. Unsigned is a modular type, not a restriction of integers to positive or zero.
Thus the traditional view is that unsigned should be used when you need a modular type or for bitwise manipulation. That view is implicit in K&R — look how int and unsigned are used —, and more explicit in TC++PL (2nd edition, p. 50):
The unsigned integer types are ideal for uses that treat storage as a bit array. Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea. Attempts to ensure that some values are positive by declaring variables unsigned will typically be defeated by the implicit conversion rules.
In almost all architectures the cost of signed operation and unsigned operation is the same. So efficiency wise you wont get any advantage for using unsigned over signed. But as you pointed out, if you use unsigned you will have a bigger range
Even if you have variables that should only take non negative values unsigned can be a problem. Here is an example. Suppose a programmer is asked to write a code to print all pairs of integer numbers (a,b) with 0 <= a < b <= n where n is a given input. An incorrect code is
for (unsigned b = 0; b <= n; b++)
for (unsigned a=0; a <=b-1; b++)
cout << a << ',' << b << n ;
This is easy to correct, but thinking with unsigned is a bit less natural than thinking with int.

Is there a way to increase the size of an int in C++ without using long?

If the range of int is up to 32768, then I have to input a value of around 50000 and use it,I want to input it without using long and if possible without using typecasting also. Is there any way to do it. I want the datatype to remain int only.
Any built-in type cannot be altered nor expanded in any sense. You have to switch to a different type.
The type int has the following requirements:
represents at least the range -32767 to 32767 (16bit)
is at least as large as short (sizeof(short) <= sizeof(int))
This means, that strictly speaking (although most platforms use at least 32bit for int), you can't safely store the value 50000 in an int.
If you need a guaranteed range, use int16_t, int32_t or int64_t. They are defined in the header <cstdint>. There is no arbitrary precision integer type in the language or in the standard library.
If you only need to observe the range of valid integers, use the header <limits>:
std::cout << std::numeric_limits<int>::min() << " to " << std::numeric_limits<int>::max() << "\n";
You may try unsigned int. Its same as int but with positive range(if you really dont want to use long).
see this for the range of data types
suggestion:
You might aswell consider switching your compiler. From the range you've mentioned for int, it seems you are using a 16 bit compiler(probably turbo c). A 16-bit compiler would restrict unsigned int range to 0-65536(2^16) and signed int to –32,768 to 32,767.
No!
An int depends on the native machine word, which really means it depends on 3 things - the processor, the OS, and the compiler.
The only way you can "increase" an int foo; (not a long foo;, int is not a long) is:
You are compiling with Turbo-C or a legacy 16-bit DOS compiler on a modern computer, likely because your university requires you to use that, because that's what your professor knows. Switch the compiler. If your professor insists you use it, switch the university.
You are compiling with a 32-bit compiler on a 64-bit OS. Switch the compiler.
You have 32-bit OS on a 64-bit computer. Reinstall a 64-bit OS.
You have 32-bit processor. Buy a new computer.
You have a 16-bit processor. Really, buy a new computer.
Several possibilities come to mind.
#abcthomas had the idea to use unsigned; since you are restricted to int, you may abuse int as unsigned. That will probably work, although it is UB according to the standard (cf. Integer overflow in C: standards and compilers).
Use two ints. probably involves writing your own scanf and printf versions, but that shouldn't be too hard. Strictly spoken though, you still haven't expanded the range of an int.
[Use long long] Not possible since you must use int.
You can always use some big number library. Probably not allowed either.
Keep the numbers in strings and do arithmetic digit-wise on the strings. Doesn't use int though.
But you'll never ever be able to store something > MAX_INT in an int.
Try splitting up your value (that would fit inside a 64-bit int) into two 32-bit chunks of data, then use two 32-bit ints to store it. A while ago, I wrote some code that helped me split 16-bit values into 8-bit ones. If you alter this code a bit, then you can split your 64-bit values into two 32-bit values each.
#define BYTE_T uint8_t
#define TWOBYTE_T uint16_t
#define LOWBYTE(x) ((BYTE_T)x)
#define HIGHBYTE(x) ((TWOBYTE_T)x >> 0x8)
#define BYTE_COMBINE(h, l) (((BYTE_T)h << 0x8) + (BYTE_T)l)
I don't know if this is helpful or not, since it doesn't actually answer your original question, but at least you could store your values this way even if your platform only supports 32-bit ints.
Here is an idea to actually store values larger than MAX_INT in an int. It is based on the condition that there is only a small, known number of possible values.
You could write a compression method which computes something akin to a 2-byte hash. The hashes would have to have a bijective (1:1) relation to the known set of possible values. That way you would actually store the value (in compressed form) in the int, and not in a string as before, and thus expand the range of possible values at the cost of not being able to represent every value within that range.
The hashing algorithm would depend on the set of possible values. As a simple example let's assume that the possible values are 2^0, 2^1, 2^2... 2^32767. The obvious hash algorithm is to store the exponent in the int. A stored value of 4 would represent the value 16, 5 would represent 32, 1000 would represent a number close to 10^301 etc. One can see that one can "store" extraordinarily large numbers in a 16 bit int ;-). Less regular sets would require more complicated algorithms, of course.

Type that can hold the product of two size_t

I have two size_t integers and need to take their product. In what type should I store the result?
#include <limits>
#include <vector>
#include <iostream>
int main() {
typedef std::size_t size_t;
typedef unsigned long long product_t;
std::vector<double> a(100000);
std::vector<double> b(100000);
size_t na {a.size()};
size_t nb {b.size()};
product_t prod = na * nb;
std::cout << prod << std::endl;
}
It looks like gcc defines size_t as an unsigned long long so I am not guaranteed I will be able to store the product... any alternatives?
Edit:
The point here is that I am developing a library that needs to handle vectors of an arbitrary size, compute some statistic on it
double stat = computeStatisticOnVectors(a, b);
and then compute the following:
double result = stat / prod
It really depends on what you are trying to achieve with your code.
If you are later on going to use the value as a size_t (in other words, for sizing a vector, allocating memory, or some such), then you probably should do some checks that it's not overflowing, but store the value as a size_t. You won't be able to use a bigger type anyway, if the purpose is to create a new object based on the size.
If you are doing something like "calculating the number of possible combinations from these X vectors", then using a floating point type will probably be "good enough".
Have you considered not restricting yourself to a primitive type? If it's important to your application that such huge size_type values are handled, why not create a custom type which holds both original values?
Up until 128 bits, and assuming you don't need much portability, you may just use built-in types such as uint128_t (supported at least by gcc and clang on x86_64 platforms).
If you wish for more portability than this, then 128-bits integers are not standard, so you will need to:
Define your own, a pair of 64-bits integer with overload operators would work
Use an existing library, such as GMP (LGPL though, but much more generic)
From Marc Glisse: Boost.Multiprecision (without the license issue)
Of course, if you could simply eliminate this requirement it would be easier; this product you are computing does not seem to mean much in itself, so just doing stat / na / nb might well be enough.
In your example you're multiplying 100000 and 100000 (rather untypically large values for sizes), where apparently you would want to obtain 10^10 exactly as a result.
As a rough calculation based on 2^10 ~= 10^3, divide the 10's exponent by 3 and multiply by 10 to get the number of bits. Now 10*10/3 is roughly 33, which means you need more than 32 bits, which means 64 bits. Thus, use a 64-bit type.
In addition to being 64-bit the type should be signed, because it's a good idea to use signed types for numbers (otherwise, due to implicit conversions you risk inadvertently using modular arithmetic, with hard to track down bugs). So, the built-in type that you're looking for is – signed 64-bit integer – oh, that's long long.
Or you can use one of the type aliases from <stdint.h>.
That said, why on Earth are you multiplying large sizes, and why do you need the result as an exact integer?
Your statistics will be computed on vectors, which size will not exceed size_t capacity.
I think it's enough to detect overflow in your case.
I can think of a double conversion of each size, then compare the two products (size_t based product vs double product)

Should unsigned ints be used if not necessary?

Should one ever declare a variable as an unsigned int if they don't require the extra range of values? For example, when declaring the variable in a for loop, if you know it's not going to be negative, does it matter? Is one faster than the other? Is it bad to declare an unsigned int just as unsigned in C++?
To reitterate, should it be done even if the extra range is not required? I heard they should be avoided because they cause confusion (IIRC that's why Java doesn't have them).
The reason to use uints is that it gives the compiler a wider variety of optimizations. For example, it may replace an instance of 'abs(x)' with 'x' if it knows that x is positive. It also opens up a variety of bitwise 'strength reductions' that only work for positive numbers. If you always mult/divide an int by a power of two, then the compiler may replace the operation with a bit shift (ie x*8 == x<<3) which tends to perform much faster. Unfortunately, this relation only holds if 'x' is positive because negative numbers are encoded in a way that precludes this. With ints, the compiler may apply this trick if it can prove that the value is always positive (or can be modified earlier in the code to be so). In the case of uints, this attribute is trivial to prove, which greatly increases the odds of it being applied.
Another example might be the equation y = 16 * x + 12. If x can be negative, then a multiply and add would be required. Yet if x is always positive, then not only can the x*16 term be replaced with x<<4, but since the term would always end with four zeros this opens up replacing the '+ 12' with a binary OR (as long as the '12' term is less than 16). The result would be y = (x<<4) | 12.
In general, the 'unsigned' qualifier gives the compiler more information about the variable, which in turn allows it to squeeze in more optimizations.
You should use unsigned integers when it doesn't make sense for them to have negative values. This is completely independent of the range issue. So yes, you should use unsigned integer types even if the extra range is not required, and no, you shouldn't use unsigned ints (or anything else) if not necessary, but you need to revise your definition of what is necessary.
More often than not, you should use unsigned integers.
They are more predictable in terms of undefined behavior on overflow and such.
This is a huge subject of its own, so I won't say much more about it.
It's a very good reason to avoid signed integers unless you actually need signed values.
Also, they are easier to work with when range-checking -- you don't have to check for negative values.
Typical rules of thumb:
If you are writing a forward for loop with an index as the control variable, you almost always want unsigned integers. In fact, you almost always want size_t.
If you're writing a reverse for loop with an index as a the control variable, you should probably use signed integers, for obvious reasons. Probably ptrdiff_t would do.
The one thing to be careful with is when casting between signed and unsigned values of different sizes.
You probably want to double-check (or triple-check) to make sure the cast is working the way you expect.
int is the general purpose integer type. If you need an integer, and int meets your requirements (range [-32767,32767]), then use it.
If you have more specialized purposes, then you can choose something else. If you need an index into an array, then use size_t. If you need an index into a vector, then use std::vector<T>::size_type. If you need specific sizes, then pick something from <cstdint>. If you need something larger than 64 bits, then find a library like gmp.
I can't think of any good reasons to use unsigned int. At least, not directly (size_t and some of the specifically sized types from <cstdint> may be typedefs of unsigned int).
The problem with the systematic use of unsigned when values can't be negative isn't that Java doesn't have unsigned, it is that expressions with unsigned values, especially when mixed with signed one, give sometimes confusing results if you think about unsigned as an integer type with a shifted range. Unsigned is a modular type, not a restriction of integers to positive or zero.
Thus the traditional view is that unsigned should be used when you need a modular type or for bitwise manipulation. That view is implicit in K&R — look how int and unsigned are used —, and more explicit in TC++PL (2nd edition, p. 50):
The unsigned integer types are ideal for uses that treat storage as a bit array. Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea. Attempts to ensure that some values are positive by declaring variables unsigned will typically be defeated by the implicit conversion rules.
In almost all architectures the cost of signed operation and unsigned operation is the same. So efficiency wise you wont get any advantage for using unsigned over signed. But as you pointed out, if you use unsigned you will have a bigger range
Even if you have variables that should only take non negative values unsigned can be a problem. Here is an example. Suppose a programmer is asked to write a code to print all pairs of integer numbers (a,b) with 0 <= a < b <= n where n is a given input. An incorrect code is
for (unsigned b = 0; b <= n; b++)
for (unsigned a=0; a <=b-1; b++)
cout << a << ',' << b << n ;
This is easy to correct, but thinking with unsigned is a bit less natural than thinking with int.