Algorithm for modulo bitset - c++

Say I have 2 bitsets
bitset<1024> test, current;
How am I supposed to modulus current with test and output it in another bitset<1024>? Note that test may be of any form, not just powers of two?
Looking for an answer with either complete code or complete pseudocode. I will not accept answers involving converting to another type except bitset because although using bitsets here may work slower, but later in the program bitsets are going to be very fast.

Here's something you could try if you don't want to implement the modulo algorithm yourself:
Instead of std::bitset, use boost::dynamic_bitset.
Use boost::to_block_range to copy the bitset's bytes to a buffer.
Use one of the many bigint libraries to represent a 256-byte integer.
Make the 256-byte bigint use the bytes copied in step #2.
Perform the modulo operation on the bigint.
Convert the result back to a dynamic_bitset.
Profit
Hopefully, there's a bigint library out there that lets you access its buffer, so that you can copy the bytes from the dynamic_bitset directly into the bigint.
And hopefully, the overhead of copying the 256 bytes around is negligible compared to the modulo operation itself.
Oh, and the bigint representation should have the same byte order as the dynamic_bitset.

Related

Bitwise operator on two large sparse vectors without looping?

I have three large arbitrary sparse boolean vectors, all of the same size - say: pool1, pool2, intersection_of_other_pools. I'm interested in performing a bitwise operator. It'd be great if I could do: intersection_of_other_pools |= pool1 | pool2, but that doesn't seem to be an option - as far as I could find.
Since the size of all these vectors are very large, and pool1 and pool2 are very sparse, I'd be interested in a way to perform a bitwise operation on these vectors without looping. I understand that the under-the-hood implementation of std::vector<bool> is just an array of bits, which led me to believe it's possible to do this without looping.
I'm open to strange bitwise hacky solutions in the name of speed.
Of course, if the fastest way (or the only way) to do this is just looping, then I'll happily accept that as an answer as well.
I've checked out valarray as a potential alternative to vector, but I couldn't tell if it looping or is doing some magical bitwise operation. But ideally, I don't want to change the existing codebase.
Don't use std::vector<bool> or similar for a sparse array.
A truly sparse array should have the ability to skip over large sections.
Encode your data as block headers, which state how long a region it is in bytes. Use all 1s in the length to say "the length of the length field is twice as long and follows", recursively.
So 0xFF0100 states that there is a block of length 512 following. (you can do a bit better by not permitting 0 nor 1-254, but that is rounding error).
Alternate blocks of "all 0s" with blocks of mixed 1s and 0s.
Don't read the block headers directly; use memcpy into aligned storage.
Once you have this, your | or & operation is more of a stitch than a bitwise operation. Only in the rare case where both have a non-zero block do you actually do bitwise work.
After doing & you might want to check if any of the non-0 regions are actually all 0.
This assumes an extremely sparse bitfield. Like, 1 bit in every few 10000 is set is a typical case. If by sparse you mean "1 in 10", then just use a vector of uint64_t or something.
implement as std::vector<uint64_t>, your cpu will propably be quite fast to do the bitwise "or" on these. They will be memory-aligned, so cache friendly.
the loop is not as bad as you think, as there will be a hidden implicit loop anyway on a different data structure.
if its extremly sparse (<< 1 in 1000), then just store the indices of the "set" bits in a (sorted) vector and use std::set_intersection to do the matching

should I use a bit set or a vector? C++

I need to be able to store an array of binary numbers in c++ which will be passed through different methods and eventually outputted to file and terminal,
What are the key differences between vectors and bit sets and which would be easiest and/or more efficient to use?
(I do not know how many bits I need to store)
std::bitset size should be known at compile time, so your choice is obvious - use std::vector<bool>.
Since it's implemented not the same as std::vector<char> (since a single element takes a bit, not a full char), it should be a good solution in terms of memory use.
It all depends on what you want to do binary.
You could also use boost.dynamic_bitset which is like std::bitset but not with fixed bits.
The main drawback is dependency on boost if you don't already use it.
You could also store your input in std::vector<char> and use a bitset per char to convert binary notation.
As others already told: std::bitset uses a fixed number of bits.
std::vector<bool> is not always advised because it has its quirks, as it is not a real container (gotw).
If you don't know how many bits you need to store at compilation time, you cannot use bitset, as it's size is fixed. Because of this, you should you vector<bool>, as it can be resized dynamically. If you want an array of bits saved like this, you can then use vector< vector<bool> >.
If you don't have any specific upper bound for the size of the numbers you need to store, then you need to store your data in two different dimensions:
The first dimension will be the number, whose size varies.
The second dimension will be your array of numbers.
For the latter, using std:vector is fine if you require your values to be contiguous in memory. For the numbers themselves you don't need any data structure at all: just allocate memory using new and an unsigned primitive type such as unsigned char, uint8_t or any other if you have alignment constraints.
If, on the other hand, you know that your numbers won't be larger than let's say 64 bits, then use a data type that you know will hold this amount of data such as uint64_t.
PS: remember that what you store are numbers. The computer will store them in binary whether you use them as such or with any other representation.
I think that in your case you should use std::vector with the value type of std::bitset. Using such an approach you can consider your "binary numbers" either like strings or like objects of some integer type and at the same time it is easy to do binary operations like setting or resetting bits.

How to save integer greater than 64 bits in C++?

I want to store integers with greater than 64 bits in length. Number of bits per integer can go up to millions as each entry gets added in the application.
And then for 64 such integers (of equal length) bit-wise AND operation has to be performed.
So what would be the best C++ data structure for operations to be time efficient?
Earlier i had considered vectors for it as it would allow to increase the length dynamically. Other option is to use std:bitset.
But i am not sure how to performs bit-wise ANDs with both these approaches so that its done in most time-efficient manner.
Thanks
The GNU Multiprecision Library is a good arbitrary-precision integer library. It is most likely heavily optimized down to specifics for your compiler/CPU, so I'd go with that as a first start and if it's not fast enough roll your own specific implementation.
It is quite expensive to reallocate memory for vector when taking large data, so i would define
struct int_node{
bitset<256> holder;
int_node *next_node;
}
I think this approach would save time on memory management and save some cycles on bitwise ops.

How can I manage bits/binary in c++?

What I need to do is open a text file with 0s and 1s to find patterns between the columns in the file.
So my first thought was to parse each column into a big array of bools, and then do the logic between the columns (now in arrays). Until I found that the size of bools is actually a byte not a bit, so i would be wasting 1/8 of memory, assigning each value to a bool.
Is it even relevant in a grid of 800x800 values? What would be the best way to handle this?
I would appreciate a code snippet in case its a complicated answer
You could use std::bitset or Boosts dynamic_bitset which provide different methods which will help you manage your bits.
They for example support constructors which create bitsets from other default types like int or char. You can also export the bitset into an ulong or into a string (which then could be turned into a bitset again etc)
I once asked about concatenating those, which wasn't performantly possible to do. But perhaps you could use the info in that question too.
you can use std::vector<bool> which is a specialization of vector that uses a compact store for booleans....1 bit not 8 bits.
I think it was Knuth who said "premature optimization is the root of all evil." Let's find out a little bit more about the problem. Your array is 800**2 == 640,000 bytes, which is no big deal on anything more powerful than a digital watch.
While storing it as bytes may seem wasteful -- as you say, 7/8ths of the memory is redundant -- but on the other hand, most machines don't do bit operations as efficiently as bytes; by saving the memory, you might waste so much effort masking and testing that you would have been better off with the bytes model.
On the other hand, if what you want to do with it is look for larger patterns, you might want to use a bitwise representation because you can do things with 8 bits at a time.
The real point here is that there are several possibilities, but no one can tell you the "right" representation without knowing what the problem is.
For that size grid your array of bools would be about 640KB. Depends how much memory you have if that will be a problem. It would probably be the simplest for the logic analysis code.
By grouping the bits and storing in an array of int you could drop the memory requirement to 80KB, but the logic code would be more complicated as you'd be always isolating the bits you wanted to check.

When to use STL bitsets instead of separate variables?

In what situation would it be more appropriate for me to use a bitset (STL container) to manage a set of flags rather than having them declared as a number of separate (bool) variables?
Will I get a significant performance gain if I used a bitset for 50 flags rather than using 50 separate bool variables?
Well, 50 bools as a bitset will take 7 bytes, while 50 bools as bools will take 50 bytes. These days that's not really a big deal, so using bools is probably fine.
However, one place a bitset might be useful is if you need to pass those bools around a lot, especially if you need to return the set from a function. Using a bitset you have less data that has to be moved around on the stack for returns. Then again, you could just use refs instead and have even less data to pass around. :)
std::bitset will give you extra points when you need to serialize / deserialize it. You can just write it to a stream or read from a stream with it. But certainly, the separate bools are going to be faster. They are optimized for this kind of use after all, while a bitset is optimized for space, and has still function calls involved. It will never be faster than separate bools.
Bitset
Very space efficient
Less efficient due to bit fiddling
Provides serialize / de-serialize with op<< and op>>
All bits packed together: You will have the flags at one place.
Separate bools
Very fast
Bools are not packed together. They will be members somewhere.
Decide on the facts. I, personally, would use std::bitset for some not-performance critical, and would use bools if I either have only a few bools (and thus it's quite overview-able), or if I need the extra performance.
It depends what you mean by 'performance gain'. If you only need 50 of them, and you're not low on memory then separate bools is pretty much always a better choice than a bitset. They will take more memory, but the bools will be much faster. A bitset is usually implemented as an array of ints (the bools are packed into those ints). So the first 32 bools (bits) in your bitset will only take up a single 32bit int, but to read each value you have to do some bitwise operations first to mask out all the values you don't want. E.g. to read the 2nd bit of a bitset, you need to:
Find the int that contains the bit you want (in this case, it's the first int)
Bitwise And that int with '2' (i.e. value & 0x02) to find out if that bit is set
However, if memory is a bottleneck and you have a lot of bools using a bitset could make sense (e.g. if you're target platform is a mobile phone, or it's some state in a very busy web service)
NOTE: A std::vector of bool usually has a specialisation to use the equivalent of a bitset, thus making it much smaller and also slower for the same reasons. So if speed is an issue, you'll be better off using a vector of char (or even int), or even just use an old school bool array.
RE #Wilka:
Actually, bitsets are supported by C/C++ in a way that doesn't require you to do your own masking. I don't remember the exact syntax, but it's something like this:
struct MyBitset {
bool firstOption:1;
bool secondOption:1;
bool thirdOption:1;
int fourBitNumber:4;
};
You can reference any value in that struct by just using dot notation, and the right things will happen:
MyBitset bits;
bits.firstOption = true;
bits.fourBitNumber = 2;
if(bits.thirdOption) {
// Whatever!
}
You can use arbitrary bit sizes for things. The resulting struct can be up to 7 bits larger than the data you define (its size is always the minimum number of bytes needed to store the data you defined).