c++ vector of bitset pass to a function - c++

I want to implement an algorithm in C++ that needs a dynamically assigned huge vector of bitset (512x18,000,000 bits - I have 16Gb of RAM).
a) This works fine
int nofBits=....(function read from db);
vector < bitset <nofBits> > flags;
flags.resize(512);
but how do I pass it (by reference) to a function? Keep in mind, I do not know nofBits in compile time.
I could use a
vector<vector<bool> >
but would not it be worse in terms of memory usage?

I had that same problem recently, however just like a std::array you need to know the size of the bitset at compile-time, since it's a template parameter. I found boost::dynamic_bitset as an alternative, and it worked like a charm.

std::vector<bool> is specialised to use memory efficiently. It is roughly as space efficient as std::bitset<N> (a few extra bytes because its size is dynamic and the bits live on the heap).
Note, however, that std::vector<bool> has issues, so tread lightly.

Related

How do I best force-flatten a (one dimensional) vector for N values?

I need something that behaves like an std::vector (interface/features/etc.) but I need it to be flat, i.e. it mustn't dynamically allocate a buffer. Clearly, this doesn't work in general, as the available size must be determined at compile time. But I want the type to be able to deal with N objects without additional allocations, and only if further items are pushed resort to dynamic allocation.
Some implementations of std::vector already do this, but only to the extent that it uses its existing members if the accumulated size of the content fits (I believe about three pointers-worth of payload). So, firstly, this is not a guarantee and secondly it is not configurable at compile time.
My thoughts are that I could either
A) self-cook a type (probably bad because I'd loose the ridiculous performance optimisations from vector)
B) use some sort of variant<vector<T>,array<T,N>> with an access wrapper (oh, the boilerplate)
C) come up with a MyAllocator<T,N> that has an array<T,N> member which then may be used to hold the first N items and after this defer to allocator<T> but I'm not sure if this can work because I cannot find out whether vector must permanently hold an instance of its allocator type as a member (I believe it does not)
I figure I'm not the first person to want this, so perhaps there are already approaches to this? Some empirical values or perhaps even a free library?
You might find folly/small_vector of use.
folly::small_vector is a sequence container that
implements small buffer optimization. It behaves similarly to
std::vector, except until a certain number of elements are reserved it
does not use the heap.
Like standard vector, it is guaranteed to use contiguous memory. (So,
after it spills to the heap all the elements live in the heap buffer.)
Simple usage example:
small_vector<int,2> vec;
vec.push_back(0); // Stored in-place on stack
vec.push_back(1); // Still on the stack
vec.push_back(2); // Switches to heap buffer.
// With space for 32 in situ unique pointers, and only using a
// 4-byte size_type.
small_vector<std::unique_ptr<int>, 32, uint32_t> v;
// A inline vector of up to 256 ints which will not use the heap.
small_vector<int, 256, NoHeap> v;
// Same as the above, but making the size_type smaller too.
small_vector<int, 256, NoHeap, uint16_t> v;

Memory allocation of C++ vector<bool>

The vector<bool> class in the C++ STL is optimized for memory to allocate one bit per bool stored, rather than one byte. Every time I output sizeof(x) for vector<bool> x, the result is 40 bytes creating the vector structure. sizeof(x.at(0)) always returns 16 bytes, which must be the allocated memory for many bool values, not just the one at position zero. How many elements do the 16 bytes cover? 128 exactly? What if my vector has more or less elements?
I would like to measure the size of the vector and all of its contents. How would I do that accurately? Is there a C++ library available for viewing allocated memory per variable?
I don't think there's any standard way to do this. The only information a vector<bool> implementation gives you about how it works is the reference member type, but there's no reason to assume that this has any congruence with how the data are actually stored internally; it's just that you get a reference back when you dereference an iterator into the container.
So you've got the size of the container itself, and that's fine, but to get the amount of memory taken up by the data, you're going to have to inspect your implementation's standard library source code and derive a solution from that. Though, honestly, this seems like a strange thing to want in the first place.
Actually, using vector<bool> is kind of a strange thing to want in the first place. All of the above is essentially why its use is frowned upon nowadays: it's almost entirely incompatible with conventions set by other standard containers… or even those set by other vector specialisations.

CArray and memory pre-allocation

I'm working with the code in a MFC project that uses CArray class to work with dynamic arrays. It works as such:
CArray<CUSTOM_STRUCT> arr;
while(some_criteria)
{
CUSTOM_STRUCT cs;
add.add(cs);
}
This approach works, but becomes really slow with a large number of additions to dynamic array. So I was curious, is there a way to preallocate memory in CArray before I begin calling the add() method?
There's one caveat though. I can only estimate approximately the resulting number of elements in the array before I go into my while() loop.
PS. I cannot use any other arrays than CArray.
PS2. Due to complexity of this prokect, I would prefer to keep additions to the array via the add() method.
Really, really consider swapping out for a std::vector. It is surprisingly easy.
This is an attempt to make CArray follow a std::vector-like growth policy, instead of by 1 each time:
CArray<CUSTOM_STRUCT> arr;
while(some_criteria) {
CUSTOM_STRUCT cs;
arr.SetSize( arr.GetSize(), 1 + arr.GetSize()/2 );
arr.add(cs);
}
When I run into this problem, I replace the CArray with a std::vector, so I haven't tested the above. Reading the docs, it should work. Test it and see if you get a massive performance increase (it should go from O(n^2) down to O(n) amortized)).
Use CArray::SetSize() method to preallocate the memory.
Please note if the memory is preallocated you should use CArray::operator[] instead of CArray::Add method.

std::vector to hold much more than 2^32 elements

std::vector::size returns a size_t so I guess it can hold up to 2^32 elements.
Is there a standard container than can hold much more elements, e.g. 2^64 OR a way to tweak std::vector to be "indexed" by e.g. a unsigned long long?
Sure. Compile a 64-bit program. size_t will be 64 bits wide then.
But really, what you should be doing is take a step back, and consider why you need such a big vector. Because most likely, you don't, and there's a better way to solve whatever problem you're working on.
size_t doesn't have a predefined size, although it is often capped at 232 on 32 bit computers.
Since a std::vector must hold contiguous memory for all elements, you will run out of memory before exceeding the size.
Compile your program for a 64 bit computer and you'll have more space.
Better still, reconsider if std::vector is appropriate. Why do you want to hold trillions of adjacent objects directly in memory?
Consider a std::map<unsigned long long, YourData> if you only want large indexes and aren't really trying to store trillions of objects.

What is the difference between std::array and std::vector? When do you use one over other? [duplicate]

This question already has answers here:
std::vector versus std::array in C++
(6 answers)
Closed 6 years ago.
What is the difference between std::array and std::vector? When do you use one over other?
I have always used and considered std:vector as an C++ way of using C arrays, so what is the difference?
std::array is just a class version of the classic C array. That means its size is fixed at compile time and it will be allocated as a single chunk (e.g. taking space on the stack). The advantage it has is slightly better performance because there is no indirection between the object and the arrayed data.
std::vector is a small class containing pointers into the heap. (So when you allocate a std::vector, it always calls new.) They are slightly slower to access because those pointers have to be chased to get to the arrayed data... But in exchange for that, they can be resized and they only take a trivial amount of stack space no matter how large they are.
[edit]
As for when to use one over the other, honestly std::vector is almost always what you want. Creating large objects on the stack is generally frowned upon, and the extra level of indirection is usually irrelevant. (For example, if you iterate through all of the elements, the extra memory access only happens once at the start of the loop.)
The vector's elements are guaranteed to be contiguous, so you can pass &vec[0] to any function expecting a pointer to an array; e.g., C library routines. (As an aside, std::vector<char> buf(8192); is a great way to allocate a local buffer for calls to read/write or similar without directly invoking new.)
That said, the lack of that extra level of indirection, plus the compile-time constant size, can make std::array significantly faster for a very small array that gets created/destroyed/accessed a lot.
So my advice would be: Use std::vector unless (a) your profiler tells you that you have a problem and (b) the array is tiny.
I'm going to assume that you know that std::array is compile-time fixed in size, while std::vector is variable size. Also, I'll assume you know that std::array doesn't do dynamic allocation. So instead, I'll answer why you would use std::array instead of std::vector.
Have you ever found yourself doing this:
std::vector<SomeType> vecName(10);
And then you never actually increase the size of the std::vector? If so, then std::array is a good alternative.
But really, std::array (coupled with initializer lists) exists to make C-style arrays almost entirely worthless. They don't generally compete with std::vectors; they compete more with C-style arrays.
Think of it as the C++ committee doing their best to kill off almost all legitimate use of C-style arrays.
std::array
is an aggregate
is fixed-size
requires that its
elements be default constructible (vs
copy (C++03) or move (C++0x)
constructible)
is linearly
swappable (vs constant time)
is linearly movable (vs constant time)
potentially pays one less indirection than std::vector
A good use case is when doing things 'close-to-the-metal', while keeping the niceties of C++ and keeping all the bad things of raw arrays out of the way.
Same reasoning when using a C-style static array rather than a std::vector. And for that, I kindly refer you to here.
std::array has a fixed (compile time) size, while std::vector can grow.
As such, std::array is like using a C array, while std::vector is like dynamically allocating memory.
I use my own personal hand coded Array<> template class, which has a simpler API compared with std::array or std::vector. For example:
To use a dynamic Array:
Array<> myDynamicArray; // Note array size is not given at compile time
myDynamicArray.resize(N); // N is a run time value
...
To use a static Array, fixed size at compile time:
Array<100> myFixedArry;
I believe it has a better syntax than std::array, or std::vector. Also extremely efficient.