Bitwise operator on two large sparse vectors without looping? - c++

I have three large arbitrary sparse boolean vectors, all of the same size - say: pool1, pool2, intersection_of_other_pools. I'm interested in performing a bitwise operator. It'd be great if I could do: intersection_of_other_pools |= pool1 | pool2, but that doesn't seem to be an option - as far as I could find.
Since the size of all these vectors are very large, and pool1 and pool2 are very sparse, I'd be interested in a way to perform a bitwise operation on these vectors without looping. I understand that the under-the-hood implementation of std::vector<bool> is just an array of bits, which led me to believe it's possible to do this without looping.
I'm open to strange bitwise hacky solutions in the name of speed.
Of course, if the fastest way (or the only way) to do this is just looping, then I'll happily accept that as an answer as well.
I've checked out valarray as a potential alternative to vector, but I couldn't tell if it looping or is doing some magical bitwise operation. But ideally, I don't want to change the existing codebase.

Don't use std::vector<bool> or similar for a sparse array.
A truly sparse array should have the ability to skip over large sections.
Encode your data as block headers, which state how long a region it is in bytes. Use all 1s in the length to say "the length of the length field is twice as long and follows", recursively.
So 0xFF0100 states that there is a block of length 512 following. (you can do a bit better by not permitting 0 nor 1-254, but that is rounding error).
Alternate blocks of "all 0s" with blocks of mixed 1s and 0s.
Don't read the block headers directly; use memcpy into aligned storage.
Once you have this, your | or & operation is more of a stitch than a bitwise operation. Only in the rare case where both have a non-zero block do you actually do bitwise work.
After doing & you might want to check if any of the non-0 regions are actually all 0.
This assumes an extremely sparse bitfield. Like, 1 bit in every few 10000 is set is a typical case. If by sparse you mean "1 in 10", then just use a vector of uint64_t or something.

implement as std::vector<uint64_t>, your cpu will propably be quite fast to do the bitwise "or" on these. They will be memory-aligned, so cache friendly.
the loop is not as bad as you think, as there will be a hidden implicit loop anyway on a different data structure.
if its extremly sparse (<< 1 in 1000), then just store the indices of the "set" bits in a (sorted) vector and use std::set_intersection to do the matching

Related

memcmp - is there a faster way to bitwise compare two buffers

I want to most quickly & efficiently find out if two memory buffers - holding arbitrarily defined values - are identical in a bitwise comparision.
I'm not interested in anything but the Boolean "is identical" and I want the method to return as quickly as possible, i.e. at first difference found.
What is the best way to achieve this?
I'm currenlty first comparing the overall size - which I know - and use
memcmp if they are of same size
memcmp( buf1_ptr, buf2_ptr, sizeof(buf1) )
Is this the most efficient I can do? Should I split the comparison into junks of a for-loop?
In general, memcmp will have been written in assembler by experts. It is very, very, unlikely you can do any better than them at the general purpose problem it solves.
If you can promise that the pointers will always be (eg) aligned on a 16 byte boundary, and that the length will always be a multiple of 16 bytes, you might be able to do a little better by using some vectorized solution like SSE. (memcmp wil probably end up using SSE too under those circumstances, but it will have to do some tests first to make sure - and you can save the cost of those tests).
Otherwise - just use memcmp.

Computing lots of bools with OpenCL

I'm using OpenCL to compute values, and then check if they're on a whitelist or not. I then need to store and eventually return the results of this check to the host.
The nature of my calculations is once the CL Kernel gets some initial data from the host, it can keep calculating successive values without host intervention - which, as I understand it, is a Good Thing™.
Obviously, the limitation on the number of calculations in this case is device memory - the amount needed to store all the results of the calculations increases exponentially with each iteration of the kernel.
I'm currently using unsigned chars to hold the booleans, one char for each calculation. This results in an 8x larger memory usage than plain bitfields... unfortunately, OpenCL does not support bitfields.
What's the most efficient way to store lots of bools with OpenCL?
Although there is no built-in bitfield type, nothing keeps you from doing bitcoded operations on any integral type. Plus, you can further increase your throughput by using vector types and calculating multiple outputs at once.
If I were you, I'd make my calculations (if possible) use vector types all the way through, one element of a vector contributing to one boolean. int4 as a general rule of thumb. Set to either 1 or 0. Then bitshift the result and bitwise or collect the result of 32 such operations into an int4 using |= which will be your output.
Thus one kernel instance can produce 4*32bit=128 boolean values and calculate all of them in a vectorized manner. Register pressure should depend on the intensity of the function producing the booleans, which if too high, might push you to falling back to using scalar types.

How to save integer greater than 64 bits in C++?

I want to store integers with greater than 64 bits in length. Number of bits per integer can go up to millions as each entry gets added in the application.
And then for 64 such integers (of equal length) bit-wise AND operation has to be performed.
So what would be the best C++ data structure for operations to be time efficient?
Earlier i had considered vectors for it as it would allow to increase the length dynamically. Other option is to use std:bitset.
But i am not sure how to performs bit-wise ANDs with both these approaches so that its done in most time-efficient manner.
Thanks
The GNU Multiprecision Library is a good arbitrary-precision integer library. It is most likely heavily optimized down to specifics for your compiler/CPU, so I'd go with that as a first start and if it's not fast enough roll your own specific implementation.
It is quite expensive to reallocate memory for vector when taking large data, so i would define
struct int_node{
bitset<256> holder;
int_node *next_node;
}
I think this approach would save time on memory management and save some cycles on bitwise ops.

Algorithm for modulo bitset

Say I have 2 bitsets
bitset<1024> test, current;
How am I supposed to modulus current with test and output it in another bitset<1024>? Note that test may be of any form, not just powers of two?
Looking for an answer with either complete code or complete pseudocode. I will not accept answers involving converting to another type except bitset because although using bitsets here may work slower, but later in the program bitsets are going to be very fast.
Here's something you could try if you don't want to implement the modulo algorithm yourself:
Instead of std::bitset, use boost::dynamic_bitset.
Use boost::to_block_range to copy the bitset's bytes to a buffer.
Use one of the many bigint libraries to represent a 256-byte integer.
Make the 256-byte bigint use the bytes copied in step #2.
Perform the modulo operation on the bigint.
Convert the result back to a dynamic_bitset.
Profit
Hopefully, there's a bigint library out there that lets you access its buffer, so that you can copy the bytes from the dynamic_bitset directly into the bigint.
And hopefully, the overhead of copying the 256 bytes around is negligible compared to the modulo operation itself.
Oh, and the bigint representation should have the same byte order as the dynamic_bitset.

How can I manage bits/binary in c++?

What I need to do is open a text file with 0s and 1s to find patterns between the columns in the file.
So my first thought was to parse each column into a big array of bools, and then do the logic between the columns (now in arrays). Until I found that the size of bools is actually a byte not a bit, so i would be wasting 1/8 of memory, assigning each value to a bool.
Is it even relevant in a grid of 800x800 values? What would be the best way to handle this?
I would appreciate a code snippet in case its a complicated answer
You could use std::bitset or Boosts dynamic_bitset which provide different methods which will help you manage your bits.
They for example support constructors which create bitsets from other default types like int or char. You can also export the bitset into an ulong or into a string (which then could be turned into a bitset again etc)
I once asked about concatenating those, which wasn't performantly possible to do. But perhaps you could use the info in that question too.
you can use std::vector<bool> which is a specialization of vector that uses a compact store for booleans....1 bit not 8 bits.
I think it was Knuth who said "premature optimization is the root of all evil." Let's find out a little bit more about the problem. Your array is 800**2 == 640,000 bytes, which is no big deal on anything more powerful than a digital watch.
While storing it as bytes may seem wasteful -- as you say, 7/8ths of the memory is redundant -- but on the other hand, most machines don't do bit operations as efficiently as bytes; by saving the memory, you might waste so much effort masking and testing that you would have been better off with the bytes model.
On the other hand, if what you want to do with it is look for larger patterns, you might want to use a bitwise representation because you can do things with 8 bits at a time.
The real point here is that there are several possibilities, but no one can tell you the "right" representation without knowing what the problem is.
For that size grid your array of bools would be about 640KB. Depends how much memory you have if that will be a problem. It would probably be the simplest for the logic analysis code.
By grouping the bits and storing in an array of int you could drop the memory requirement to 80KB, but the logic code would be more complicated as you'd be always isolating the bits you wanted to check.