C++: Group integers by common radix to save memory

C++: Group integers by common radix to save memory - c++

Consider a (sorted) vector of unsigned int values.
std::vector<unsigned int> data = {1234,1254,1264,1265,1267,1268,1271,1819,1832,1856,
1867,1892,3210,3214,3256,3289};
Assuming each unsigned int is 4 bytes, this vector of 16 elements would take at least 64 bytes of RAM.
I am thinking it would be possible to reduce the memory usage by grouping those values by common radix. Consider for example a data representation of the kind
data =
{
{12..
..34, ..54, ..64, ..65, ..67, ..68, ..71
},
{18..
..19, ..32, ..56, ..67, ..92
}
{32..
..10, ..14, ..56, ..89
}
};
In the above example, I grouped the values by blocks of a 100. It would be more logical to group data by a group of 2^8=256 or of 2^16=65536.
Is there a data type (in std:: or boost:: or other) that can do this kind of trick for me or do I have to code my own type of container for that? Does it sound like a potentially good idea?

It is a workable idea for particular inputs in very constrained scenarios. Alas the std:: does not provide that structure. Your particular suggestion could reasonably be implemented in a tree like fashion.
Beware of general radix aware ordering as they generally use big lists, hence using more memory than a regular vector would.

Related

Looking for a concept related to "either or" in a data structure (almost like std::variant)

I'm modernizing the code that reads and writes to a custom binary file format now.
I'm allowed to use C++17 and have already modernized large parts of the code base.
There are mainly two problems at hand.
binary selectors (my own name)
cased selectors (my own name as well)
For #1 it is as follows:
Given that 1 bit is set in a binary string. You either (read/write) two completely different structs.
For example, if bit 17 is set to true, it means bits 18+ should be streamed with Struct1.
But if bit 17 is false, bits 18+ should be streamed with Struct2.
Struct1 and Struct2 are completely different with minimal overlap.
For #2 it is basically the same but as follows:
Given that x bits in the bit stream are set. You have a potential pool of completely different structs. The number of structs is allowed to be between [0, 2**x] (inclusive range).
For instance, in one case you might have 3 bits and 5 structs.
But in another case, you might have 3 bits and 8 structs.
Again the overlap between the structs is minimal.
I'm currently using std::variant for this.
For case #1, it would be just two structs std::variant<Struct1, Struct2>
For case #2, it would be just a flat list of the structs again using std::variant.
The selector I use is naturally the index in the variant, but it needs to be remapped for a different bit pattern that actually needs to be written/read to/from the format.
Have any of you used or encountered some better strategies for these cases?
Is there a generally known approach to solve this in a much better way?

Is there a generally known approach to solve this in a much better way?
Nope, it's highly specific.
Have any of you used or encountered some better strategies for these cases?
The bit patterns should be encoded in the type, somehow. Almost all the (de)serialization can be generic so long as the required information is stored somewhere.
For example,
template <uint8_t BitPattern, typename T>
struct IdentifiedVariant;
// ...
using Field1 = std::variant< IdentifiedVariant<0x01, Field1a>,
IdentifiedVariant<0x02, Field1b> >;
I've absolutely used types like this in the past to automate all the boilerplate, but the details are extremely specific to the format and rarely reusable.
Note that even though you can't overlay your variant type on a buffer, there's no need for (de)serialization to proceed bit-by-bit. There's hardly any speed penalty so long as the data is already read into a buffer - if you really need to go full zero-copy, you can just have your FieldNx types keep a pointer into the buffer and decode fields on demand.

If you want your data to be bit-continuous you can't use std::variant You will need to use std::bitset or managing the memory completely manually to do that. But it isn't practical to do so because your structs will not be byte-aligned so you will need to do every read/write manually bit by bit. This will reduce speed greatly, so I only recommend this way if you want to save every bit of memory even at the cost of speed. And at this storage it will be hard to find the nth element you will need to iterate.
std::variant<T1,T2> will waste a bit of space because 1) it will always use enough space for storing the the bigger data, but using that over bit-manipulation will increase the speed and the readability of the code. (And will be easier to write)

High performance table structure for really small tables (<10 items usually) where once the table is created it doesn't change?

I am searching for a high performance C++ structure for a table. The table will have void* as keys and uint32 as values.
The table itself is very small and will not change after creation. The first idea that came to my mind is using something like ska::flat_hash_map<void*, int32_t> or std::unordered_map<void*, int32_t>. However that will be overkill and will not provide me the performance I want (those tables are suited for high number of items too).
So I thought about using std::vector<std::pair<void*, int32_t>>, sorting it upon creation and linear probing it. The next ideas will be using SIMD instructions but it is possible with the current structure.
Another solution which I will shortly evaluate is like that:
struct Group
{
void* items[5]; // search using SIMD
int32_t items[5];
}; // fits in cache line
struct Table
{
Group* groups;
size_t capacity;
};
Are there any better options? I need only 1 operation: finding values by keys, not modifying them, not anything. Thanks!
EDIT: another thing I think I should mention are the access patterns: suppose I have an array of those hash tables, each time I will look up from a random one in the array.

Linear probing is likely the fastest solution in this case on common mainstream architectures, especially since the number of element is very small and bounded (ie. <10). Sorting the items should not speed up the probing with so few items (it would be only useful for a binary search which is much more expensive in this case).
If you want to use SIMD instruction, then you need to use structure of arrays instead of array of structures for the sake of performance. This means you should use std::pair<std::vector<void*>, std::vector<int32_t>> instead of std::vector<std::pair<void*, int32_t>> (which alternates void* types and int32_t values in memory with some padding overhead due to the alignment constraints of void* on 64-bit architectures). Having two std::vector is not great too because you pay its overhead twice. As mentioned by #JorgeBellon
in the comments, you can simply use a std::array instead of std::vector assuming the number of items is known or bounded.
A possible optimization with SIMD instructions is to compact the key pointers on 64-bit architectures by splitting them in 32-bit lower/upper part. Indeed, it is very unlikely that two pointers have the same lower part (least significant bits) while having a different upper part. This tricks help you to check 2 times more pointers at a time.
Note that using SIMD instructions may not be so great in this case in practice. This is especially true if the number of items is smaller than the one fitting in a SIMD vector. For example, with AVX2 (on 86-64 processors), you can work on 4 64-bit values at a time (or 8 32-bit values) but if you have less than 8 values, then you need to mask the unwanted values to check (or even not load them if the memory buffer do not contain some padding). This introduces an additional overhead. This is not much a problem with AVX-512 and SVE (only available on a small fraction of processors yet) since they provides advanced masking operations. Moreover, some processors lower they frequency when they execute SIMD instructions (especially with AVX-512 although the down-clocking is not so strong with integer instructions). SIMD instructions also introduce some additional latency compared to scalar version (which can be better pipelined) and modern processors tends to be able to execute more scalar instructions in parallel than SIMD ones. For all these reasons, it is certainly a good idea to try to write a scalar branchless implementation (possibly unrolled for better performance if the number of items is known at compile time).

You may want to look into perfect hashing -- not too difficult, and can provide simple constant time lookups. It can take technically unbounded time to create the table, though, and it's not as fast as a regular hash table when the regular hash table gets lucky.
I think a nice alternative is an optimization of your simple linear probing idea.
Your lookup procedure would look like this:
Slot *s = &table[hash(key)];
Slot *e = s + s->max_extent;
for (;s<e; ++s) {
if (s->key == key) {
return s->value;
}
}
return NOT_FOUND;
table[h].max_extent is the maximum number of elements you may have to look at if you're looking for an element with hash code h. You would pre-calculate this when you generate the table, so your lookup doesn't have to iterate until it gets a null. This greatly reduces the amount of probing you have to do for misses.
Of course you want max_extent to be as small as possible. Pick a hash result size (at least 2n) to make it <= 1 in most cases, and try a few different hash functions before picking the one that produces the best results by whatever metric you like. You hash can be as simple as key % P, where trying different hashes means trying different P values. Fill your hash table in hash(key) order to produce the best result.
NOTE that we do not wrap around from the end to the start of the table while probing. Just allocate however many extra slots you need to avoid it.

should I use a bit set or a vector? C++

I need to be able to store an array of binary numbers in c++ which will be passed through different methods and eventually outputted to file and terminal,
What are the key differences between vectors and bit sets and which would be easiest and/or more efficient to use?
(I do not know how many bits I need to store)

std::bitset size should be known at compile time, so your choice is obvious - use std::vector<bool>.
Since it's implemented not the same as std::vector<char> (since a single element takes a bit, not a full char), it should be a good solution in terms of memory use.

It all depends on what you want to do binary.
You could also use boost.dynamic_bitset which is like std::bitset but not with fixed bits.
The main drawback is dependency on boost if you don't already use it.
You could also store your input in std::vector<char> and use a bitset per char to convert binary notation.
As others already told: std::bitset uses a fixed number of bits.
std::vector<bool> is not always advised because it has its quirks, as it is not a real container (gotw).

If you don't know how many bits you need to store at compilation time, you cannot use bitset, as it's size is fixed. Because of this, you should you vector<bool>, as it can be resized dynamically. If you want an array of bits saved like this, you can then use vector< vector<bool> >.

If you don't have any specific upper bound for the size of the numbers you need to store, then you need to store your data in two different dimensions:
The first dimension will be the number, whose size varies.
The second dimension will be your array of numbers.
For the latter, using std:vector is fine if you require your values to be contiguous in memory. For the numbers themselves you don't need any data structure at all: just allocate memory using new and an unsigned primitive type such as unsigned char, uint8_t or any other if you have alignment constraints.
If, on the other hand, you know that your numbers won't be larger than let's say 64 bits, then use a data type that you know will hold this amount of data such as uint64_t.
PS: remember that what you store are numbers. The computer will store them in binary whether you use them as such or with any other representation.

I think that in your case you should use std::vector with the value type of std::bitset. Using such an approach you can consider your "binary numbers" either like strings or like objects of some integer type and at the same time it is easy to do binary operations like setting or resetting bits.

C++ 2.5 bytes (20-bit) integer

I know it's ridiculous, but I need it for storage optimization. Is there any good way to implement it in C++?
It has to be flexible enough so that I can use it as a normal data type e.g Vector< int20 >, operator overloading, etc..

If storage is your main concern, I suspect you need quite a few 20-bit variables. How about storing them in pairs? You could create a class representing two such variables and store them in 2.5+2.5 = 5 bytes.
To access the variables conveniently you could override the []-operator so you could write:
int fst = pair[0];
int snd = pair[1];
Since you may want to allow for manipulations such as
pair[1] += 5;
you would not want to return a copy of the backing bytes, but a reference. However, you can't return a direct reference to the backing bytes (since it would mess up it's neighboring value), so you'd actually need to return a proxy for the backing bytes (which in turn has a reference to the backing bytes) and let the proxy overload the relevant operators.
As a metter of fact, as #Tony suggest, you could generalize this to have a general container holding N such 20-bit variables.
(I've done this myself in a specialization of a vector for efficient storage of booleans (as single bits).)

No... you can't do that as a single value-semantic type... any class data must be a multiple of the 8-bit character size (inviting all the usual quips about CHAR_BITS etc).
That said, let's clutch at straws...
Unfortunately, you're obviously handling very many data items. If this is more than 64k, any proxy object into a custom container of packed values will probably need a >16 bit index/handle too, but still one of the few possibilities I can see worth further consideration. It might be suitable if you're only actively working with and needing value semantic behaviour for a small subset of the values at one point in time.
struct Proxy
{
Int20_Container& container_; // might not need if a singleton
Int20_Container::size_type index_;
...
};
So, the proxy might be 32, 64 or more bits - the potential benefit is only if you can create them on the fly from indices into the container, have them write directly back into the container, and keep them short-lived with few concurrently. (One simple way - not necessarily the fastest - to implement this model is to use an STL bitset or vector as the Int20_Container, and either store 20 times the logical index in index_, or multiply on the fly.)
It's also vaguely possible that although your values range over a 20-bit space, you've less than say 64k distinct values in actual use. If you have some such insight into your data set, you can create a lookup table where 16-bit array indices map to 20-bit values.

Use a class. As long as you respect the copy/assign/clone/etc... STL semantics, you won't have any problem.
But it will not optimize the memory space on your computer. Especially if you put in in a flat array, the 20bit will likely be aligned on a 32bit boundary, so the benefit of a 20bit type there is useless.
In that case, you will need to define your own optimized array type, that could be compatible with the STL. But don't expect it to be fast. It won't be.

Use a bitfield. (I'm really surprised nobody has suggested this.)
struct int20_and_something_else {
int less_than_a_million : 20;
int less_than_four_thousand : 12; // total 32 bits
};
This only works as a mutual optimization of elements in a structure, where you can spackle the gaps with some other data. But it works very well!
If you truly need to optimize a gigantic array of 20-bit numbers and nothing else, there is:
struct int20_x3 {
int one : 20;
int two : 20;
int three : 20; // 60 bits is almost 64
void set( int index, int value );
int get( int index );
};
You can add getter/setter functions to make it prettier if you like, but you can't take the address of a bitfield, and they can't participate in an array. (Of course, you can have an array of the struct.)
Use as:
int20_x3 *big_array = new int20_x3[ array_size / 3 + 1 ];
big_array[ index / 3 ].set( index % 3, value );

You can use C++ std::bitset. Store everything in a bitset and access your data using the correct index (x20).

Your not going to be able to get exactly 20 bits as a type(even with a bit packed struct), as it will always be aligned (at smallest grainularity) to a byte. Imo the only way to go, if you must have 20 bits, is to create a bitstream to handle the data(which you can overload to accept indexing etc)

You can use the union keyword to create a bit field. I've used it way back when bit fields were a necessity. Otherwise, you can create a class that holds 3 bytes, but through bitwise operations exposes just the most significant 20.

As far as I know that isn't possible.
The easiest option would be to define a custom type, that uses an int32_t as the backing storage, and implements appropriate maths as override operators.
For better storage density, you could store 3 int20 in a single int64_t value.

Just an idea: use optimized storage (5 bytes for two instances), and for operations, convert it into 32-bit int and then back.

While its possible to do this a number of ways.
One possibilty would be to use bit twidling to store them as the left and right parts of a 5 byte array with a class to store/retrieve which converts yoiur desired array entry to an array entry in byte5[] array and extracts the left ot right half as appropriate.
However on most hardware requires integers to be word aligned so as well as the bit twiddling to extract the integer you would need some bit shifiting to align it properly.
I think it would be more efficient to increase your swap space and let virtual memory take care of your large array (after all 20 vs 32 is not much of a saving!) always assuming you have a 64 bit OS.

When to use STL bitsets instead of separate variables?

In what situation would it be more appropriate for me to use a bitset (STL container) to manage a set of flags rather than having them declared as a number of separate (bool) variables?
Will I get a significant performance gain if I used a bitset for 50 flags rather than using 50 separate bool variables?

Well, 50 bools as a bitset will take 7 bytes, while 50 bools as bools will take 50 bytes. These days that's not really a big deal, so using bools is probably fine.
However, one place a bitset might be useful is if you need to pass those bools around a lot, especially if you need to return the set from a function. Using a bitset you have less data that has to be moved around on the stack for returns. Then again, you could just use refs instead and have even less data to pass around. :)

std::bitset will give you extra points when you need to serialize / deserialize it. You can just write it to a stream or read from a stream with it. But certainly, the separate bools are going to be faster. They are optimized for this kind of use after all, while a bitset is optimized for space, and has still function calls involved. It will never be faster than separate bools.
Bitset
Very space efficient
Less efficient due to bit fiddling
Provides serialize / de-serialize with op<< and op>>
All bits packed together: You will have the flags at one place.
Separate bools
Very fast
Bools are not packed together. They will be members somewhere.
Decide on the facts. I, personally, would use std::bitset for some not-performance critical, and would use bools if I either have only a few bools (and thus it's quite overview-able), or if I need the extra performance.

It depends what you mean by 'performance gain'. If you only need 50 of them, and you're not low on memory then separate bools is pretty much always a better choice than a bitset. They will take more memory, but the bools will be much faster. A bitset is usually implemented as an array of ints (the bools are packed into those ints). So the first 32 bools (bits) in your bitset will only take up a single 32bit int, but to read each value you have to do some bitwise operations first to mask out all the values you don't want. E.g. to read the 2nd bit of a bitset, you need to:
Find the int that contains the bit you want (in this case, it's the first int)
Bitwise And that int with '2' (i.e. value & 0x02) to find out if that bit is set
However, if memory is a bottleneck and you have a lot of bools using a bitset could make sense (e.g. if you're target platform is a mobile phone, or it's some state in a very busy web service)
NOTE: A std::vector of bool usually has a specialisation to use the equivalent of a bitset, thus making it much smaller and also slower for the same reasons. So if speed is an issue, you'll be better off using a vector of char (or even int), or even just use an old school bool array.

RE #Wilka:
Actually, bitsets are supported by C/C++ in a way that doesn't require you to do your own masking. I don't remember the exact syntax, but it's something like this:
struct MyBitset {
bool firstOption:1;
bool secondOption:1;
bool thirdOption:1;
int fourBitNumber:4;
};
You can reference any value in that struct by just using dot notation, and the right things will happen:
MyBitset bits;
bits.firstOption = true;
bits.fourBitNumber = 2;
if(bits.thirdOption) {
// Whatever!
}
You can use arbitrary bit sizes for things. The resulting struct can be up to 7 bits larger than the data you define (its size is always the minimum number of bytes needed to store the data you defined).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js