Hello everybody out there! I have a home work assigment where I need to build a high presision calculator that will operate with very large numbers. The whole point of this assigment is that storing the values in arrays as one digit goes to separate array cell is now allowed.
That is memory representation of number
335897294593872
like so
int number[] = {3, 3, 5, 8, 9, 7, 2, 9, 4, 5, 9, 3, 8, 7, 2};
is not legit,
nor
char number[] = {3, 3, 5, 8, 9, 7, 2, 9, 4, 5, 9, 3, 8, 7, 2};
nor
std::string number("335897294593872");
What I want to do is to split up the whole number into 32bit chunks and store each individual chunk in separate array cell data type of which is u32int_t.
Since I get the input from keyboard I store all values in std::string initially and later put them in integer arrays to perform operations.
How do I put binary representation of a large number into an integer array filling in all bits properly?
Thank you in advance.
EDIT: Using standard C++ libraries only
EDIT2: I want to be able to add, subtract, multiply, divide those arrays with large numbers so I mean not to merely cut the string up and store decimal representation in integer array, but rather preserve bits order of the number itself to be able to calculate carry.
This is a rather naïve solution:
If last digit in string is odd store a 1 in result (otherwise leave it 0).
Divide digits in string by 2 (considering carries).
If 32 bits have written add another element to result vector.
Repeat this until string contains 0s only.
Source Code:
#include <iomanip>
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
std::vector<uint32_t> toBigInt(std::string text)
{
// convert string to BCD-like
for (char &c : text) c -= '0';
// build result vector
std::vector<uint32_t> value(1, 0);
uint32_t bit = 1;
for (;;) {
// set next bit if last digit is odd
if (text.back() & 1) value.back() |= bit;
// divide BCD-like by 2
bool notNull = false; int carry = 0;
for (char &c : text) {
const int carryNew = c & 1;
c /= 2; c += carry * 5;
carry = carryNew;
notNull |= c;
}
if (!notNull) break;
// shift bit
bit <<= 1;
if (!bit) {
value.push_back(0); bit = 1;
}
}
// done
return value;
}
std::ostream& operator<<(std::ostream &out, const std::vector<uint32_t> &value)
{
std::ios fmtOld(0); fmtOld.copyfmt(out);
for (size_t i = value.size(); i--;) {
out << std::hex << value[i] << std::setfill('0') << std::setw(sizeof (uint32_t) * 2);
}
out.copyfmt(fmtOld);
return out;
}
int main()
{
std::string tests[] = {
"0", "1",
"4294967295", // 0xffffffff
"4294967296", // 0x100000000
"18446744073709551615", // 0xffffffffffffff
"18446744073709551616", // 0x100000000000000
};
for (const std::string &test : tests) {
std::cout << test << ": " << toBigInt(test) << '\n';
}
return 0;
}
Output:
0: 0
1: 1
4294967295: ffffffff
4294967296: 100000000
18446744073709551615: ffffffffffffffff
18446744073709551616: 10000000000000000
Live Demo on coliru
Notes:
The output is little-endian. (The least significant element is first.)
For the tests, I used numbers where hex-code is simple to check by eyes.
To use an array to store the different parts of a big number is a common way to do the work. Another thing to think of is to consider the different architecture implementations for signed ints, that lead you to have to sacrifice (this is what normal libraries to deal with big integers do) to allow signed to unsigned conversions (you have several ways of doing here) between the parts of your number or how are you going to implement the different arithmetic operations.
I don't generally recommend to use long long integer versions for the array cells, as they are not generally the native size of the architecture, so to give the architecture some chance to do things efficiently, I should use a reduced (at least one bit, to be able to see the carries out from one extended digit to the next) standard unsigned (for example, gnu **libgmp* uses 24bit integers on each array cell ---last time I checked that). It's also common to reduce it to a multiple of char size, so displacements and reallocation of numbers are easier than to make 31 bit displacements on a full array of bits.
It's common that when you work with money or delicate numbers like that you often use Integers, because you can assure, many things of it, so my recommendation is that whenever you work with this big numbers, simulate a fixpoint or floating-point arithmetic with two Ints, so you can "watch" how everything is executing, you could check the IEEE 754 standard for the floating point.
If you store the number in an array make sure to take a constant number of steps to make all the operations that you are doing while manipulating it. Which could be tricky.
I recommend you to trust the integers but fix the size of bits.
But if you really want to go for interesting stuff, try and use the bit-wise operators, and maybe you could get something interesting out if it.
You could check the details of the data types here, in particular, the signed short int, or the long long int, and to confirm sizes of the data types check this
Related
I have am enum like below:
enum types : uint16_t
{
A = 1 << 0,
B = 1 << 1,
C = 1 << 2,
D = 1 << 3,
E = 1 << 4,
F = 1 << 5,
G = 1 << 6
};
Assume I have number:
uint16_t val = A | C | F;
how to split the val to an array, I know I can do this by using for
for(int i=0;i<7;++i){
if(val & (1 << i)){
//push_back(1 << i)
}
}
but what if the enum has 1000 rows?
Is there any simple and faster way to do this?
Use a std::bitset<N> to store bit flags like that. Have the enum be, instead of 1 << 5, just be a normal incrementing enum. Then you use the bitset like this:
std::bitset<1000> myBits;
myBits.set(A);
if(myBits[A]) {
// do some bit flag logic.
}
but what if the enum has 1000 rows?
C++ doesn't have a primitive type with 1000 bits (even __int128 uses 128 bits) so we will have to make our own.
struct Bits1000
{
uint8_t bits[1000 / 8] = {};
};
Is there any simple and faster way to do this?
One way or the other, you will end up using a loop. It's the other optimizations that can be done to improve performance. The reason is that C++ does not have a way to index bits (like bits[0] is equal to the first bit, unless we're talking about std::bitset). Even a struct containing a bitfield also gets packed to a byte. And considering that we have an if condition that we use to check if a certain enum is present, this means that this process should be done at runtime.
Optimizations
Since we are looking at the poweres of two, we can always increment the index by multiplying it by two. This way we dont have to do the bit shift opearation, in addition to the index increment and might help reduce the total CPU instruction cycles.
/**
* Helper to calculate the power.
*/
template<size_t Base, size_t Exponent>
struct Power
{
static constexpr size_t Value = Base * Power<Base, Exponent - 1>::Value;
};
/**
* Helper template specialization.
*/
template<size_t Base>
struct Power<Base, 1>
{
static constexpr size_t Value = Base;
};
// ...
for (int i = 1; i < Power<2, sizeof(uint8_t) * 8>::Value; i *= 2)
{
if(val & i)
{
// Do your work here.
}
}
Another optimization would be to precompute and store the powers of two in an array, then index that array to get the poweres.
uint64_t poweres[64] = { 1, 2, 4, 8, ... };
for (int i = 0; i < sizeof(uint8_t) * 8; i++)
{
if(val & poweres[i])
{
// Do your work here.
}
}
The trick here is that with standard types, you can get away with almost constant time (O(8), O(16), O(32), and O(64) time complexities). So the performance hits are pretty low. Even with custom types, we will be using the absolute minimum that's available for us.
I should also note that if we're dealing with very large numbers (for example with 1000 bits), the iterating integer would not be able to handle that (nor would bit shifting because the result will always be at max uint64_t depending on the build). In that case, make sure to iterate through the maximum available (4 bytes in x86 and 8 bytes in x64).
Bit scanning is a technique to efficiently do this sort of thing. To do that you'll need bit-level operations. Traditionally, that has involved compiler extensions (like GCC's builtins). Starting with C++20 you'll find what you need in the bit header.
Example using your types type:
std::vector<types> parse_num(uint16_t num)
{
std::vector<types> res;
while (num) {
auto bit = std::countr_zero(num);
auto mask = 1 << bit;
res.push_back(static_cast<types>(mask));
num &= ~mask;
}
return res;
}
This is something you may want to reach for in the "the enum has 1000 rows" kind of case - especially if the data is sparse - rather than the simple example in the question where your loop is simpler and possibly more performant. As always, it depends.
For a (non-portable) pre-C++20 solution you can:
Replace std::countr_zero(num) with __builtin_ctz(num) for GCC/Clang.
Use _BitScanForward for MSVC.
I have an array of bytes and length of that array. The goal is to output the string containing that number represented as base-10 number.
My array is little endian. It means that the first (arr[0]) byte is the least significant byte. This is an example:
#include <iostream>
using namespace std;
typedef unsigned char Byte;
int main(){
int len = 5;
Byte *arr = new Byte[5];
int i = 0;
arr[i++] = 0x12;
arr[i++] = 0x34;
arr[i++] = 0x56;
arr[i++] = 0x78;
arr[i++] = 0x9A;
cout << hexToDec(arr, len) << endl;
}
The array consists of [0x12, 0x34, 0x56, 0x78, 0x9A]. The function hexToDec which I want to implement should return 663443878930 which is that number in decimal.
But, the problem is because my machine is 32-bit so it instead outputs 2018915346 (notice that this number is obtained from integer overflow). So, the problem is because I am using naive way (iterating over the array and multiplying it by 256 to the power of position in the array, then multiplying by the byte at that position and finally adding to the sum). This of course yields integer overflow.
I also tried with long long int, but at some point of course, integer overflow occurs.
The arrays I want to represent as decimal number can be very long (more that 1000 bytes) which definitelly requires a lot more clever algorithm than my naive one.
Question
What would be the good algorithm to achieve that? Also, another question I must ask is what is the optimal complexity of that algorithm? Can it be done in linear complexity O(n) where n is the length of the array? I really cannot think about a good idea. Implementation is not the problem, my lack of ideas is.
Advice or idea how to do that will be enough. But, if it is easier to explain using some code, feel free to write in C++.
You can and can not achieve this in O(n). All depends on the internal representation of your number.
For truly binary form (power of 2 base like 256)
is this not solvable in O(n) The hex print of such number is in O(n) however and you can convert HEX string to decadic and back like this:
How to convert a gi-normous integer (in string format) to hex format?
As creating hex string does not require bignum math. You just consequently print the array from MSW to LSW in HEX. This is O(n) but the conversion to DEC is not.
To print bigint in decadic you need to continuously mod/div it by 10 obtaining digits from LSD to MSD until the subresult is zero. Then just print them in reverse order ... The division and modulus can be done at once as they are the same operation. So if your number has N decadic digits then you need N bigint divisions. Each bigint division can be done for example by binary division so we need log2(n) bit shifts and substraction which are all O(n) so the complexity of native bigint print is:
O(N.n.log2(n))
We can compute N from n by logarithms so for BYTEs:
N = log10(base^n)
= log10(2^(8.n))
= log2(2^(8.n))/log2(10)
= 8.n/log2(10)
= 8.n*0.30102999
= 2.40824.n
So the complexity will be:
O(2.40824.n.n.log2(n)) = O(n^2.log2(n))
Which is insaine for really big numbers.
power of 10 base binary form
To do this in O(n) you need to slightly change the base of your number. it will still be represented in binary form but the base will be power of 10.
For example if your number will be represented by 16bit WORDs you can use highest base 10000 which still fits in it (max is 16536). Now you print in decadic easily just print consequently each word in you array from MSW to LSW.
Example:
lets have big number 1234567890 stored as BYTEs with base 100 where MSW goes first. So the number will be stored as follows
BYTE x[] = { 12, 34, 56, 78, 90 }
But as you can see while using BYTEs and base 100 we are wasting space as only 100*100/256=~39% is used from the full BYTE range. The operations on such numbers are slightly different then in raw binary form as we need to handle overflow/underflow and carry flag differently.
BCD (binary coded decimal)
There is also another option which is to use BCD (binary coded decimal) it is almost the same as previous option but the base 10 is used for single digit of number... each nibel (4 bits) contains exactly one digit. Processors usually have instruction set for this number representation. The usage is like for binary encoded numbers but after each arithmetics operation is BCD recovery instruction called like DAA which uses Carry and Auxiliary Carry flags state to recover BCD encoding of the result. To print value in BCD in decadic you just print the value as HEX. Our number from previous example would be encoded in BCD like this:
BYTE x[] = { 0x12, 0x34, 0x56, 0x78, 0x90 }
Off course both #2,#3 will make impossible the HEX print of your number in O(n).
The number you posted 0x9a78563412, as you have represented it in little endian format, can be converted to a proper uint64_t with the following code:
#include <iostream>
#include <stdint.h>
int main()
{
uint64_t my_number = 0;
const int base = 0x100; /* base 256 */
uint8_t array[] = { 0x12, 0x34, 0x56, 0x78, 0x9a };
/* go from right to left, as it is little endian */
for (int i = sizeof array; i > 0;) {
my_number *= base;
my_number += array[--i];
}
std::cout << my_number << std::endl; /* conversion uses 10 base by default */
}
sample run gives:
$ num
663443878930
as we are in a base that is an exact power of 2, we can optimize the code by using
my_number <<= 8; /* left shift by 8 */
my_number |= array[--i]; /* bit or */
as these operations are simpler than integer multiplication and sum, it is expected some (but not much) efficiency improvement in doing that way. It should be more expressive to leave it as in the first example, as it more represents an arbitrary base conversion.
You'll need to brush up your elementary school skills and implement long division.
I think you'd be better off implementing the long division in base 16 (divide the number by 0x0A each iteration). Take the reminder of each division - these are your decimal digits (first one is the least significant digit).
I have the question of the title, but If not, how could I get away with using only 4 bits to represent an integer?
EDIT really my question is how. I am aware that there are 1 byte data structures in a language like c, but how could I use something like a char to store two integers?
In C or C++ you can use a struct to allocate the required number of bits to a variable as given below:
#include <stdio.h>
struct packed {
unsigned char a:4, b:4;
};
int main() {
struct packed p;
p.a = 10;
p.b = 20;
printf("p.a %d p.b %d size %ld\n", p.a, p.b, sizeof(struct packed));
return 0;
}
The output is p.a 10 p.b 4 size 1, showing that p takes only 1 byte to store, and that numbers with more than 4 bits (larger than 15) get truncated, so 20 (0x14) becomes 4. This is simpler to use than the manual bitshifting and masking used in the other answer, but it is probably not any faster.
You can store two 4-bit numbers in one byte (call it b which is an unsigned char).
Using hex is easy to see that: in b=0xAE the two numbers are A and E.
Use a mask to isolate them:
a = (b & 0xF0) >> 4
and
e = b & 0x0F
You can easily define functions to set/get both numbers in the proper portion of the byte.
Note: if the 4-bit numbers need to have a sign, things can become a tad more complicated since the sign must be extended correctly when packing/unpacking.
Say I have a function vector<unsigned char> byteVector(long long UID), returning a byte presentation of the UID, a 64bit integer, as a vector. This vector is later on used to write this data to a file.
Now, because I decided I want to read that file with Python, I have to comply to the utf-8 standard, which means I can only use the first 7bits of each char. If the highest significant bit is 1 I can't decode it to a string anymore, because those are only supporting ASCII-characters. Also, I'll have to pass those strings to other processes via a Command Line Interface, which also is only supporting the ASCII-character set.
Before that problem arose, my approach on splitting the 64bit integer up into 8 separate bytes was the following, which worked really great:
vector<unsigned char> outputVector = vector<unsigned char>();
unsigned char * uidBytes = (unsigned char*) &UID_;
for (int i = 0; i < 8; i++){
outputVector.push_back(uidBytes[i]);
}
Of course that doesn't work anymore, as the constrain "HBit may not be 1" limits the maximum value of each unsigned char to 127.
My easiest option now would of course be to replace the one push_back call with this:
outputVector.push_back(uidBytes[i] / 128);
outputVector.push_back(uidBytes[i] % 128);
But this seems kind of wasteful, as the first of each unsigned char pair can only be 0 or 1 and I would be wasting some space (6 bytes) I could otherwise use.
As I need to save 64 bits, and can use 7 bits per byte, I'll need 64//7 + 64%7 = 10 bytes.
It isn't really much (none of the files I write ever even reached the 1kB mark), but I was using 8 bytes before and it seems a bit wasteful to use 16 now when ten (not 9, I'm sorry) would suffice. So:
How do I convert a 64bit integer to a vector of ten 7-bit integers?
This is probably too much optimization, but there could be some very cool solution for this problem (probably using shift operators) and I would be really interested in seeing it.
You can use bit shifts to take 7-bit pieces of the 64-bit integer. However, you need ten 7-bit integers, nine is not enough: 9 * 7 = 63, one bit short.
std::uint64_t uid = 42; // Your 64-bit input here.
std::vector<std::uint8_t> outputVector;
for (int i = 0; i < 10; i++)
{
outputVector.push_back(uid >> (i * 7) & 0x7f);
}
In every iteration, we shift the input bits by a multiple of 7, and mask out a 7-bit part. The most significant bit of the 8-bit numbers will be zero. Note that the numbers in the vector are “reversed”: the least significant bits have the lowest index. This is irrelevant though, if you decode the parts in the correct way. Decoding can be done as follows:
std::uint64_t decoded = 0;
for (int i = 0; i < 10; i++)
{
decoded |= static_cast<std::uint64_t>(outputVector[i]) << (i * 7);
}
Please note that it seems like a bad idea to interpret the resulting vector as UTF-8 encoded text: the sequence can still contain control characters and and \0. If you want to encode your 64-bit integer in printable characters, take a look at base64. In that case, you will need one more character (eleven in total) to encode 64 bits.
I suggest using assembly language.
Many assembly languages have instructions for shifting a bit into a "spare" carry bit and shifting the carry bit into a register. The C language has no convenient or efficient method to do this.
The algorithm:
for i = 0; i < 7; ++i
{
right shift 64-bit word into carry.
right shift carry into character.
}
You should also look into using std::bitset.
I am learning C/C++ programming & have encountered the usage of 'Bit arrays' or 'Bit Vectors'. Am not able to understand their purpose? here are my doubts -
Are they used as boolean flags?
Can one use int arrays instead? (more memory of course, but..)
What's this concept of Bit-Masking?
If bit-masking is simple bit operations to get an appropriate flag, how do one program for them? is it not difficult to do this operation in head to see what the flag would be, as apposed to decimal numbers?
I am looking for applications, so that I can understand better. for Eg -
Q. You are given a file containing integers in the range (1 to 1 million). There are some duplicates and hence some numbers are missing. Find the fastest way of finding missing
numbers?
For the above question, I have read solutions telling me to use bit arrays. How would one store each integer in a bit?
I think you've got yourself confused between arrays and numbers, specifically what it means to manipulate binary numbers.
I'll go about this by example. Say you have a number of error messages and you want to return them in a return value from a function. Now, you might label your errors 1,2,3,4... which makes sense to your mind, but then how do you, given just one number, work out which errors have occured?
Now, try labelling the errors 1,2,4,8,16... increasing powers of two, basically. Why does this work? Well, when you work base 2 you are manipulating a number like 00000000 where each digit corresponds to a power of 2 multiplied by its position from the right. So let's say errors 1, 4 and 8 occur. Well, then that could be represented as 00001101. In reverse, the first digit = 1*2^0, the third digit 1*2^2 and the fourth digit 1*2^3. Adding them all up gives you 13.
Now, we are able to test if such an error has occured by applying a bitmask. By example, if you wanted to work out if error 8 has occured, use the bit representation of 8 = 00001000. Now, in order to extract whether or not that error has occured, use a binary and like so:
00001101
& 00001000
= 00001000
I'm sure you know how an and works or can deduce it from the above - working digit-wise, if any two digits are both 1, the result is 1, else it is 0.
Now, in C:
int func(...)
{
int retval = 0;
if ( sometestthatmeans an error )
{
retval += 1;
}
if ( sometestthatmeans an error )
{
retval += 2;
}
return retval
}
int anotherfunc(...)
{
uint8_t x = func(...)
/* binary and with 8 and shift 3 plaes to the right
* so that the resultant expression is either 1 or 0 */
if ( ( ( x & 0x08 ) >> 3 ) == 1 )
{
/* that error occurred */
}
}
Now, to practicalities. When memory was sparse and protocols didn't have the luxury of verbose xml etc, it was common to delimit a field as being so many bits wide. In that field, you assign various bits (flags, powers of 2) to a certain meaning and apply binary operations to deduce if they are set, then operate on these.
I should also add that binary operations are close in idea to the underlying electronics of a computer. Imagine if the bit fields corresponded to the output of various circuits (carrying current or not). By using enough combinations of said circuits, you make... a computer.
regarding the usage the bits array :
if you know there are "only" 1 million numbers - you use an array of 1 million bits. in the beginning all bits will be zero and every time you read a number - use this number as index and change the bit in this index to be one (if it's not one already).
after reading all numbers - the missing numbers are the indices of the zeros in the array.
for example, if we had only numbers between 0 - 4 the array would look like this in the beginning: 0 0 0 0 0.
if we read the numbers : 3, 2, 2
the array would look like this: read 3 --> 0 0 0 1 0. read 3 (again) --> 0 0 0 1 0. read 2 --> 0 0 1 1 0. check the indices of the zeroes: 0,1,4 - those are the missing numbers
BTW, of course you can use integers instead of bits but it may take (depends on the system) 32 times memory
Sivan
Bit Arrays or Bit Vectors can be though as an array of boolean values. Normally a boolean variable needs at least one byte storage, but in a bit array/vector only one bit is needed.
This gets handy if you have lots of such data so you save memory at large.
Another usage is if you have numbers which do not exactly fit in standard variables which are 8,16,32 or 64 bit in size. You could this way store into a bit vector of 16 bit a number which consists of 4 bit, one that is 2 bit and one that is 10 bits in size. Normally you would have to use 3 variables with sizes of 8,8 and 16 bit, so you only have 50% of storage wasted.
But all these uses are very rarely used in business aplications, the come to use often when interfacing drivers through pinvoke/interop functions and doing low level programming.
Bit Arrays of Bit Vectors are used as a mapping from position to some bit value. Yes it's basically the same thing as an array of Bool, but typical Bool implementation is one to four bytes long and it uses too much space.
We can store the same amount of data much more efficiently by using arrays of words and binary masking operations and shifts to store and retrieve them (less overall memory used, less accesses to memory, less cache miss, less memory page swap). The code to access individual bits is still quite straightforward.
There is also some bit field support builtin in C language (you write things like int i:1; to say "only consume one bit") , but it is not available for arrays and you have less control of the overall result (details of implementation depends on compiler and alignment issues).
Below is a possible way to answer to your "search missing numbers" question. I fixed int size to 32 bits to keep things simple, but it could be written using sizeof(int) to make it portable. And (depending on the compiler and target processor) the code could only be made faster using >> 5 instead of / 32 and & 31 instead of % 32, but that is just to give the idea.
#include <stdio.h>
#include <errno.h>
#include <stdint.h>
int main(){
/* put all numbers from 1 to 1000000 in a file, except 765 and 777777 */
{
printf("writing test file\n");
int x = 0;
FILE * f = fopen("testfile.txt", "w");
for (x=0; x < 1000000; ++x){
if (x == 765 || x == 777760 || x == 777791){
continue;
}
fprintf(f, "%d\n", x);
}
fprintf(f, "%d\n", 57768); /* this one is a duplicate */
fclose(f);
}
uint32_t bitarray[1000000 / 32];
/* read file containing integers in the range [1,1000000] */
/* any non number is considered as separator */
/* the goal is to find missing numbers */
printf("Reading test file\n");
{
unsigned int x = 0;
FILE * f = fopen("testfile.txt", "r");
while (1 == fscanf(f, " %u",&x)){
bitarray[x / 32] |= 1 << (x % 32);
}
fclose(f);
}
/* find missing number in bitarray */
{
int x = 0;
for (x=0; x < (1000000 / 32) ; ++x){
int n = bitarray[x];
if (n != (uint32_t)-1){
printf("Missing number(s) between %d and %d [%x]\n",
x * 32, (x+1) * 32, bitarray[x]);
int b;
for (b = 0 ; b < 32 ; ++b){
if (0 == (n & (1 << b))){
printf("missing number is %d\n", x*32+b);
}
}
}
}
}
}
That is used for bit flags storage, as well as for parsing different binary protocols fields, where 1 byte is divided into a number of bit-fields. This is widely used, in protocols like TCP/IP, up to ASN.1 encodings, OpenPGP packets, and so on.