std:bitset c++ external initialization - c++

I want to wrap or cast a std:bitset over a given constant data arrary
or to formulate it differently, initialize a bitset with foreign data.
The user knows the index of the bit which he can check then via bitset.test(i). Data is big, so it must be efficient. (Machine bitorder does not matter, we can store it in the right way).
Thats what I tried:
constexpr uint32_t data[32] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32};
constexpr std::bitset<1000> bset(1); //bit bitset initialized with a value here
constexpr std::bitset<1000> bset2(data); //init it with our data, this is not working
The number of bits is 32*32=1024 that is held by data. With my bitset i can address the almost full range. User does not need more than 1000. Can someone please explain to me how this is done in cpp in my example above with bset2?

Unfortunately std::bitset does not have suitable design for what you want.
It is not designed an aggregate (like std::array is) so aggregate initialiation is impossible (and also copying bits into it with std::memcpy is undefined behavior).
It can take only one unsigned long long in constexpr constructor.
The operator [] and set method will become constexpr in C++23 so there will be a way after that.
Just use constexpr raw array or std::array and add bit accessing methods until then.

Related

What does size_type in the constructor of vector mean?

The specification for the constructor of vector I am using is:
vector(size_type count, const T& value, const Allocator& alloc = Allocator());
I am trying to initialize a vector, and I am not very familiar with size_type. Both cplusplus and cppreference don't have an entry for size_type. A quick Google tells me it is some kind of data type for representing sizes, capacity etc. for things like containers (I think). I'm still not very sure if I'm understanding it correctly or how to use it though.
Let's say I wish to initialise an int vector of count (10*n/3) + 1 where n is of type int. Can I cast count as type long? Am I even doing it right? How do I understand and use size_type?
Please ignore hardware considerations like whether the computer can even assign enough memory in the first place. I'll worry about that later, for now I just want to focus on understanding this concept.
It's a typedef defined inside std::vector; it's actually a synonym for std::size_t, which, in turn, is a typedef for an implementation-defined unsigned integer type capable of holding the size of the biggest object that is possible to create on the current machine. In practice, you can think of it as some kind of unsigned integer, which is always used throughout the std::vector interface when referring to an index or an element count.
On "regular" machines (where you have 32 bit integers) and if you are not hitting the limits of "regular" ints in your code for elements counts you can use int for indexes without problems (and you are actually safer from subtle bugs that arise from arithmetic/comparisons on unsigned integers).
std::vector<T>::size_type is an unsigned integer type which can represent the size of the largest object in the allocation model. This will often by something like std::size_t.
Unless you think you're going to be bumping up against max integral values, you shouldn't need to worry about this; just pass in an integral type and the compiler will tell you if you did something outrageous.
std::size_t is a data type, that is large enough to hold the size of any structure that your machine can address. So on 32 bit machines, this is usually a 32 bit unsigned integer.
T::size_type is defined by a lot of containers and usually redefines just std::size_t (but strictly does not have to). The reason to define it is, that you can just write T::size_type x = T::your_operation(); without having to think, which container you are using. You can later switch the container and T::size_type still is valid C++ and does not need reworking.

C++ Variable Width Bit Field

I'm writing a program that is supposed to manipulate very long strings of boolean values. I was originally storing them as a dynamic array of unsigned long long int variables and running C-style bitwise operations on them.
However, I don't want the overhead that comes with having to iterate over an array even if the processor is doing it at the machine code level - i.e. it is my belief that the compiler is probably more efficient than I am.
So, I'm wondering if there's a way to store them as a bitfield. The only problem with that is that I heard you needed to declare a constant at runtime for that to work and I don't particularly care to do that as I don't know how many bits I need when the program starts. Is there a way to do this?
As per the comments, std::bitset or std::vector<bool> are probably what you need. bitset is fixed-length, vector<bool> is dynamic.
vector<bool> is a specialization of vector that only uses one bit per value, rather than sizeof(bool), like you might expect... While good for memory use, this exception is actually disliked by the standards body these days, because (among other things) vector<bool> doesn't fulfil the same contract that vector<T> does - it returns proxy objects instead of references, which wreaks havoc in generic code.

Correct storage of container size on 32, 64 bit

I am currently converting an application to 64 bit.
I have some occurrences of the following pattern:
class SomeOtherClass;
class SomeClass
{
std::vector<SomeOtherClass*> mListOfThings;
void SomeMemberFunction (void)
{
// Needs to know the size of the member list variable
unsigned int listSize = mListOfThings.size();
// use listSize in further computations
//...
}
}
Obviously in a practical case I will not have more then MAX_INT items in my list. But I wondered if there is consensus about the 'best' way to represent this type.
Each collection defines its own return type for size(), so my first approximation would be:
std::vector<SomeOtherClass*>::size_type listSize = mListOfThings.size()
I would assume this to be correct, but (personally) I dont find this 'easy reading', so -1 for clarity.
For a c++011 aware compiler I could write
auto listSize = mListOfThings.size()
which is clearly more readable.
So my question, is the latter indeed the best way to handle storing container sizes in a variable and using them in computations, regardless of underlying architecture (win32, win64, linux, macosx) ?
What exactly you want to use is a matter of how "purist" you want your code to be.
If you're on C++11, you can just use auto and be done with.
Otherwise, in extremely generic code (which is designed to work with arbitrary allocators), you can use the container's nested typedef size_type. That is taken verbatim from the container's allocator.
In normal use of standard library containers, you can use std::size_t. That is the size_type used by the default allocators, and is the type guaranteed to be able to store any object size.
I wouldn't recommend using [unsigned] int, as that will likely be smaller than necessary on 64-bit platforms (it's usually left at 32 bits, although this of course depends on compiler and settings). I've actually seen production code fail due to unsigned int not being enough to index a container.
It depends on why you need the size, and what is going to be
in the vector. Internally, vector uses std::size_t. But
that's an unsigned type, inappropriate for numerical values. If
you just want to display the value, or something, fine, but if
you're using it in any way as a numerical value, the
unsignedness will end up biting you.
Realistically, there are a lot of times the semantics of the
code ensure that the number of values cannot be more than
INT_MAX. For example, when evaluating financial instruments,
the maximum number of elements is less than 20000, so there's no
need to worry about overflowing an int. In other cases,
you'll validate your input first, to ensure that there will
never be overflow. If you can't do this, the best solution is
probably ptrdiff_t (which is the type you get from subtracting
to iterators). Or if you're using non-standard allocators,
MyVectorType::difference_type.
Not sure if you've already considered this, but what is wrong with size_t?
It is what you compiler uses for sizes of builtin containers (i.e. arrays).

Bitset Reference

From http://www.cplusplus.com/reference/stl/bitset/:
Because no such small elemental type exists in most C++ environments, the individual elements are accessed as special references which mimic bool elements.
How, exactly, does this bit reference work?
The only way I could think of would be to use a static array of chars, but then each instance would need to store its index in the array. Since each reference instance would have at least the size of a size_t, that would destroy the compactness of the bitset. Additionally, resizing may be slow, and bit manipulation is expected to be fast.
I think you are confusing two things.
The bitset class stores the bits in a compact representations, e.g. in a char array, typically 8 bits per char (but YMMV on "exotic" platforms).
The bitset::reference class is provided to allow users of the bitset class to have reference-like objects to the bits stored in a bitset.
Because regular pointers and references don't have enough granularity to point to the single bits stored in the bitset (their minimum granularity is the char), such class mimics the semantic of a reference to fake reference-like lvalue operations on the bits. This is needed, in particular, to allow the value returned by operator[] to work "normally" as an lvalue (and it probably costitutes 99% of its "normal" use). In this case it can be seen as a "proxy-object".
This behavior is achieved by overloading the assignment operator and the conversion-to-bool operator; the bitset::reference class will probably encapsulate a reference to the parent bitset object and the offset (bytes+bit) of the referenced bit, that are used by such operators to retrieve and store the value of the bit.
---EDIT---
Actually, the g++ implementation makes the bitset::reference store directly a pointer to the memory word in which the byte is stored, and the bit number in such word. This however is just an implementation detail to boost its performance.
By the way, in the library sources I found a very compact but clear explanation of what bitset::reference is and what it does:
/**
* This encapsulates the concept of a single bit. An instance of this
* class is a proxy for an actual bit; this way the individual bit
* operations are done as faster word-size bitwise instructions.
*
* Most users will never need to use this class directly; conversions
* to and from bool are automatic and should be transparent. Overloaded
* operators help to preserve the illusion.
*
* (On a typical system, this <em>bit %reference</em> is 64
* times the size of an actual bit. Ha.)
*/
I haven't looked at the STL source, but I would expect a Bitset reference to contain a pointer to the actual bitset, and a bit number of size size_t. The references are only created when you attempt to get a reference to a bitset element.
Normal use of bitsets is most unlikely to use references extensively (if at all), so there shouldn't be much of a performance issue. And, it's conceptually similar to char types. A char is normally 8 bits, but to store a 'reference' to a char requires a pointer, so typically 32 or 64 bits.
I've never looked at the reference implementation, but obviously it must know the bitset it is referring to via a reference, and the index of the bit it is responsible for changing. It then can use the rest of the bitsets interface to make the required changes. This can be quite efficient. Note bitsets cannot be resized.
I am not quite sure what you are asking, but I can tell you a way to access individual bits in a byte, which is perhaps what bitsets do. Mind you that the following code is not my own and is Microsoft spec (!).
Create a struct as such:
struct Byte
{
bool bit1:1;
bool bit2:1;
bool bit3:1;
bool bit4:1;
bool bit5:1;
bool bit6:1;
bool bit7:1;
bool bit8:1;
}
The ':1' part of this code are bitfields. http://msdn.microsoft.com/en-us/library/ewwyfdbe(v=vs.80).aspx
They define how many bits a variable is desired to occupy, so in this struct, there are 8 bools that occupy 1 bit each. In total, the 'Byte' struct is therefore 1 byte in size.
Now if you have a byte of data, such as a char, you can store this data in a Byte object as follows:
char a = 'a';
Byte oneByte;
oneByte = *(Byte*)(&a); // Get the address of a (a pointer, basically), cast this
// char* pointer to a Byte*,
// then use the reference operator to store the data that
// this points to in the variable oneByte.
Now you can access (and alter) the individual bits by accessing the bool member variables of oneByte. In order to store the altered data in a char again, you can do as follows:
char b;
b = *(char*)(&oneByte); // Basically, this is the reverse of what you do to
// store the char in a Byte.
I'll try to find the source of this technique, to give credit where credit is due.
Also, again I am not entirely sure whether this answer is of any use to you. I interpreted your question as being 'how would access to individual bits be handled internally?'.

C++ 2.5 bytes (20-bit) integer

I know it's ridiculous, but I need it for storage optimization. Is there any good way to implement it in C++?
It has to be flexible enough so that I can use it as a normal data type e.g Vector< int20 >, operator overloading, etc..
If storage is your main concern, I suspect you need quite a few 20-bit variables. How about storing them in pairs? You could create a class representing two such variables and store them in 2.5+2.5 = 5 bytes.
To access the variables conveniently you could override the []-operator so you could write:
int fst = pair[0];
int snd = pair[1];
Since you may want to allow for manipulations such as
pair[1] += 5;
you would not want to return a copy of the backing bytes, but a reference. However, you can't return a direct reference to the backing bytes (since it would mess up it's neighboring value), so you'd actually need to return a proxy for the backing bytes (which in turn has a reference to the backing bytes) and let the proxy overload the relevant operators.
As a metter of fact, as #Tony suggest, you could generalize this to have a general container holding N such 20-bit variables.
(I've done this myself in a specialization of a vector for efficient storage of booleans (as single bits).)
No... you can't do that as a single value-semantic type... any class data must be a multiple of the 8-bit character size (inviting all the usual quips about CHAR_BITS etc).
That said, let's clutch at straws...
Unfortunately, you're obviously handling very many data items. If this is more than 64k, any proxy object into a custom container of packed values will probably need a >16 bit index/handle too, but still one of the few possibilities I can see worth further consideration. It might be suitable if you're only actively working with and needing value semantic behaviour for a small subset of the values at one point in time.
struct Proxy
{
Int20_Container& container_; // might not need if a singleton
Int20_Container::size_type index_;
...
};
So, the proxy might be 32, 64 or more bits - the potential benefit is only if you can create them on the fly from indices into the container, have them write directly back into the container, and keep them short-lived with few concurrently. (One simple way - not necessarily the fastest - to implement this model is to use an STL bitset or vector as the Int20_Container, and either store 20 times the logical index in index_, or multiply on the fly.)
It's also vaguely possible that although your values range over a 20-bit space, you've less than say 64k distinct values in actual use. If you have some such insight into your data set, you can create a lookup table where 16-bit array indices map to 20-bit values.
Use a class. As long as you respect the copy/assign/clone/etc... STL semantics, you won't have any problem.
But it will not optimize the memory space on your computer. Especially if you put in in a flat array, the 20bit will likely be aligned on a 32bit boundary, so the benefit of a 20bit type there is useless.
In that case, you will need to define your own optimized array type, that could be compatible with the STL. But don't expect it to be fast. It won't be.
Use a bitfield. (I'm really surprised nobody has suggested this.)
struct int20_and_something_else {
int less_than_a_million : 20;
int less_than_four_thousand : 12; // total 32 bits
};
This only works as a mutual optimization of elements in a structure, where you can spackle the gaps with some other data. But it works very well!
If you truly need to optimize a gigantic array of 20-bit numbers and nothing else, there is:
struct int20_x3 {
int one : 20;
int two : 20;
int three : 20; // 60 bits is almost 64
void set( int index, int value );
int get( int index );
};
You can add getter/setter functions to make it prettier if you like, but you can't take the address of a bitfield, and they can't participate in an array. (Of course, you can have an array of the struct.)
Use as:
int20_x3 *big_array = new int20_x3[ array_size / 3 + 1 ];
big_array[ index / 3 ].set( index % 3, value );
You can use C++ std::bitset. Store everything in a bitset and access your data using the correct index (x20).
Your not going to be able to get exactly 20 bits as a type(even with a bit packed struct), as it will always be aligned (at smallest grainularity) to a byte. Imo the only way to go, if you must have 20 bits, is to create a bitstream to handle the data(which you can overload to accept indexing etc)
You can use the union keyword to create a bit field. I've used it way back when bit fields were a necessity. Otherwise, you can create a class that holds 3 bytes, but through bitwise operations exposes just the most significant 20.
As far as I know that isn't possible.
The easiest option would be to define a custom type, that uses an int32_t as the backing storage, and implements appropriate maths as override operators.
For better storage density, you could store 3 int20 in a single int64_t value.
Just an idea: use optimized storage (5 bytes for two instances), and for operations, convert it into 32-bit int and then back.
While its possible to do this a number of ways.
One possibilty would be to use bit twidling to store them as the left and right parts of a 5 byte array with a class to store/retrieve which converts yoiur desired array entry to an array entry in byte5[] array and extracts the left ot right half as appropriate.
However on most hardware requires integers to be word aligned so as well as the bit twiddling to extract the integer you would need some bit shifiting to align it properly.
I think it would be more efficient to increase your swap space and let virtual memory take care of your large array (after all 20 vs 32 is not much of a saving!) always assuming you have a 64 bit OS.