std::bitset<N> implementation causes size overheard - c++

Seems like std::bitset<N> under the hood is an array of unsigned longs now, this means that there will be an (heavy?) overhead when N is small. sizeof(std::bitset<8>) is 8 bytes!
Is there a reason why the type of underlying array itself is not a template parameter? Why does the implementation not use uint32_t/16_t/8_t when more appropriate? I do not see anything in the implementation that limits this?
I am guessing I am missing a particular reason but unsure how to look for it or maybe there is no reason at all ? Since this is such a simple container I am not able to understand how the zero overhead principle of C++ seems to be avoided here.
GCC Impl: https://gcc.gnu.org/onlinedocs/gcc-4.6.2/libstdc++/api/a00775_source.html
I believe clang is similar (used sizeof to confirm)

I am not able to understand how the zero overhead principle of C++ seems to be avoided here.
The zero-overhead principle is a principle, not an absolute rule of C++.
Many people use std::vector in contexts where a compile-time fixed capacity would be useful. Such a type could have only two pointers instead of three and thus be 50% smaller. Many people use std::string in contexts where an immutable string would work just as well if not better; it would reduce the size of the string (ignoring SSO), as well as its complexity. And so forth.
These all represent inefficiencies relative to the standard type. No standard library type can handle every possible usage scenario. The goal for such types is to be broadly useful, not perfect.
There is nothing preventing someone from writing a bitset-style type with the exact same interface which has a user-provided underlying type. But the standard has no such type.
Indeed, there's nothing preventing implementations of bitset from choosing an underlying type based on the given number of bits. Your implementation doesn't do that, but it could have.

Related

Is it a good idea to base a non-owning bit container on std::vector<bool>? std::span?

In a couple of projects of mine I have had an increasing need to deal with contiguous sequences of bits in memory - efficiently (*). So far I've written a bunch of inline-able standalone functions, templated on the choice of a "bit container" type (e.g. uint32_t), for getting and setting bits, applying 'or' and 'and' to their values, locating the container, converting lengths in bits to sizes in bytes or lengths in containers, etc. ... it looks like it's class-writing time.
I know the C++ standard library has a specialization of std::vector<bool>, which is considered by many to be a design flaw - as its iterators do not expose actual bools, but rather proxy objects. Whether that's a good idea or a bad one for a specialization, it's definitely something I'm considering - an explicit bit proxy class, which will hopefully "always" be optimized away (with a nice greasing-up with constexpr, noexcept and inline). So, I was thinking of possibly adapting std::vector code from one of the standard library implementation.
On the other hand, my intended class:
Will never own the data / the bits - it'll receive a starting bit container address (assuming alignment) and a length in bits, and won't allocate or free.
It will not be able resize the data dynamically or otherwise - not even while retaining the same amount of space like std::vector::resize(); its length will be fixed during its lifespan/scope.
It shouldn't anything know about the heap (and work when there is no heap)
In this sense, it's more like a span class for bits. So maybe start out with a span then? I don't know, spans are still not standard; and there are no proxies in spans...
So what would be a good basis (edit: NOT a base class) for my implementation? std::vector<bool>? std::span? Both? None? Or - maybe I'm reinventing the wheel and this is already a solved problem?
Notes:
The bit sequence length is known at run time, not compile time; otherwise, as #SomeProgrammerDude suggests I could use std::bitset.
My class doesn't need to "be-a" span or "be-a" vector, so I'm not thinking of specializing any of them.
(*) - So far not SIMD-efficiently but that may come later. Also, this may be used in CUDA code where we don't SIMDize but pretend the lanes are proper threads.
Rather than std::vector or std::span I suspect an implementation of your class would share more in common with std::bitset, since it is pretty much the same thing, except with a (fixed) runtime-determined size.
In fact, you could probably take a typical std::bitset implementation and move the <size_t N> template parameter into the class as a size_t size_ member (or whatever name you like) and you'll have your dynamic bitset class with almost no changes. You may want to get rid anything of you consider cruft like the constructors that take std::string and friends.
The last step is then to remove ownership of the underlying data: basically you'll remove the creation of the underlying array in the constructor and maintain a view of an existing array with some pointers.
If your clients disagree on what the underlying unsigned integer type to use for storage (what you call the "bit container"), then you may also need to make your class a template on this type, although it would be simpler if everyone agreed on say uint64_t.
As far as std::vector<bool> goes, you don't need much from that: everything that vector does that you want, std::bitset probably does too: the main thing that vector adds is dynamic growth - but you've said you don't want that. vector<bool> has the proxy object concept to represent a single bit, but so does std::bitset.
From std::span you take the idea of non-ownership of the underlying data, but I don't think this actually represents a lot of underlying code. You might want to consider the std::span approach of having either a compile-time known size or a runtime provided size (indicated by Extent == std::dynamic_extent) if that would be useful for you (mostly if you sometimes use compile-time sizes and could specialize some methods to be more efficient in that case).

Why do I need to use `size_t` in C++?

As a beginner, I'm really confused about size_t. I can use int, float or other types. Why still declare size_t type. I don't feel its advantages.
I've viewed some pages, but I still can't understand it.
Its main advantage is that it's the right tool for the job.
size_t is literally defined to be big enough to represent the size of any object on your platform. The others are not. So, when you want to store the size of an object, why would you use anything else?
You can use int if you like, but you'll be deliberately choosing the inferior option that leads to bugs. I don't quite understand why you'd want to do so, but hey it's your code.
If you choose to use float, though, please tell us what program you're writing so we can avoid it. :)
Using a float would be horrible since that would be a misuse of floating point types, plus type promotion would mean that multiplying the size of anything would take place in floating point!
Using a int would also be horrible since the specifics of an int are intentionally loosely defined by the C++ standard. (It could be as small as 16 bits).
But a size_t type is guaranteed to adequately represent the size of pretty much anything and certainly the sizes of containers in the C++ standard library. Its specific details are dependent on a particular platform and architecture. The fact that it's an unsigned type is the subject of much debate. (I personally believe it was a mistake to make it unsigned as it can mess up code using relational operators and introduce pernicious bugs that are difficult to spot).
I would advise you to use size_t whenever you want to store sizes of classes or structures or when you deal with raw memory(e.g. storing size of raw memory or using as an index of a raw array). However for indexing/iterating over standard containers (such as std::vector), I recommend using underlying size type of a given container(e.g. vector::size_type).

Performance of std::copy of portion of std::vector

I want to copy part of a vector to itself, e.g.
size_t offset; /* some offset */
std::vector<T> a = { /* blah blah blah */};
std::copy(a.begin() + offset, a.begin() + (offset*2), a.begin());
However, I'm concerned about the performance of this approach. I'd like to have this boil down to a single memmove (or equivalent) when the types in question allow it, but still behave as one would expect when given a non-trivially-copyable type.
When the template type T is trivially copyable (in particular int64_t if it matters), does this result in one memmove of length sizeof(T) * offset, or offset distinct memmoves of length sizeof(T)? I assume the later would give noticeably worse performance because it requires many separate memory reads. Or should I just assume that caching will make the performance in these situations effectively equivalent for relatively small offsets (<100)?
In cases where the template type T is not trivially copyable, is it guaranteed to result in offset distinct calls to the copy assignment operator T::operator=, or will something stranger happen?
If std::copy doesn't yield the result I'm looking for, is there some alternative approach that would satisfy my performance constraints without just writing template-specializations of the copy code for all the types in question?
Edit: GCC 5.1.0, compiling with -O3
There are no guarantees about how standard library functions are implemented, other than the guarantees which explicitly appear in the standard which cover:
the observable effect of valid invocations, and
space and time complexity (in this case: strictly linear in the number of objects to copy, assuming that copying an object is O(1)).
So that std::copy might or might not do the equivalent to memmove. It might do an element-by-element copy in a simple loop. Or it might unroll the loop. Or it might call memmove. Or it might find an even faster solution, based on the compiler's knowledge about the alignment of the datatypes, possibly using a vectorizing optimization.
<rant>Contrary to what seems to be popular opinion, the authors of the standard C++ library are not in a conspiracy to slow down your code, nor are they so incompetent that anyone with a couple of months' of coding experience could easily generated faster code. For particular use cases, you might be able to leverage your knowledge about the data being moved around to find a faster solution, but in general -- and particularly without profiling real code -- your best bet is to assume that the standard library authors are excellent coders dedicated to making your programmes as efficient as possible.
</rant>
If the question is about standard, the answer is 'anything can happen'. Might do memmove(), might not. On the other hand, if the question is about a particular implementation, than you should not ask, but instead check your implementation.
On my implementation, it is a memmove() call.
By the way, it is hard to imagine implementation doing offset memory moves. It would be either single call to memmove(), or looped element-by-element copy. Calling memmove() just makes no sense.

How to store a boost::quantity with possible different boost::dimension

I am using boost::units library to enforce physical consistency in a scientific project. I have read and tried several examples from boost documentation. I am able to create my dimensions, units and quantities. I did some calculus, it works very well. It is exactly what I expected, except that...
In my project, I deal with time series which have several different units (temperature, concentration, density, etc.) based on six dimensions. In order to allow safe and easy units conversions, I would like to add a member to each channel class representing the dimensions and units of time series. And, data treatment (import, conversion, etc.) are user-driven, therefore dynamic.
My problem is the following, because of the boost::units structure, quantities within an homogeneous system but with different dimensions have different types. Therefore you cannot directly declare a member such as:
boost::units::quantity channelUnits;
Compiler will claim you have to specify dimensions using template chevrons. But if you do so, you will not be able to store different type of quantities (say quantities with different dimensions).
Then, I looked for boost::units::quantity declaration to find out if there is a base class that I can use in a polymorphic way. But I haven't found it, instead I discovered that boost::units heavily uses Template Meta Programming which is not an issue but does not exactly fit my dynamic needs since everything is resolved at compile-time not at run-time.
After more reading, I tried to wrap different quantities in a boost::variant object (nice to meet it for the very first time).
typedef boost::variant<
boost::units::quantity<dim1>,
...
> channelUnitsType;
channelUnitsType channelUnits;
I performed some tests and it seems to work. But I am not confident with boost::variant and the visitor-pattern.
My questions are the following:
Is there another - maybe best - way to have run-time type resolution?
Is dynamic_cast one of them? Units conversion will not happen very often and only few data are in concern.
If boost::variant is a suitable solution, what are its drawbacks?
Going deeper in my problem I read two articles providing tracks for a solution:
Kostadin Damevski, Expressing Measurements Units in Interfaces for Scientific Component Software;
Lingxiao Jiang, A Practical Type System for Validating Dimensional Units Correctness of C Programs.
The first gives good ideas for the interface implementation. The second gives a complete overview of what you must cope with.
I keep in mind that boost::units is a complete and efficient way for dimension consistency at compile-time without overhead at runtime. Anyway, for runtime dimension consistency involving dimensions changes you do need a dynamic structure that boost::units does not provide. So here am I: designing a units class that will exactly fit my needs. More work to achieve, more satisfaction at the end...
About the original questions:
boost::variant works well (it provides the dynamic boost::units is missing) for this job. And furthermore, it can be serialized out of the box. Thus it is a effective approach. But it is adding a layer of abstraction for a simple - I am not saying trivial - task that could be done by a single class.
Casting is achieved by boost::variant_cast<> instead of dynamic_cast<>.
boost::any could be easier to implement but serialization becomes an hard way.
I have been thinking about this problem and came up with the following conclusion:
1. Implement type erasure (pros: nice interfaces, cons:memory overhead)
It looks impossible to store without overhead a general quantity with common dimension, that break one of the design principles of the libraries. Even type erasure won't help here.
2. Implement a convertible type (pros: nice interfaces, cons:operational overhead)
The only way I see without storage overhead, is to choose a conventional (possibly hidden) system where all units are converted to and from. There is no memory overhead but there is a multiplication overhead in almost all queries to the values and a tremendous number of conversion and some loose of precision of high exponent, (think of conversion from avogadro number to the 10 power).
3. Allow implicit conversions (pros: nice interfaces, cons:harder to debug, unexpected operational overheads)
Another option, mostly in the practical side to alleviate the problem is to allow implicit conversion at the interface level, see here: https://groups.google.com/d/msg/boost-devel-archive/JvA5W9OETt8/5fMwXWuCdDsJ
4. Template/Generic code (pros: no runtime or memory overhead, conceptually correct, philosophy follows that of the library, cons: harder to debug, ugly interfaces, possible code bloat, lots of template parameters everywhere)
If you ask the library designer probably they will tell you that you need to make your functions generic. This is possible but it complicates the code. For example:
template<class Length>
auto square(Length l) -> decltype(l*l){return l*l;}
I use C++11 to simplify the example here (it is possible to do it in C++98), and also to show that this is becoming easier to do in C++11 (and even simpler in C++14 with decltype(auto).
I know that this is not the type of code you had in mind but it is consistent with the design of the library. You may think, well how do I restrict this function to physical length and not something else? Well, the answer is that you don't need to this, however if you insist, in the worst case...
template<class Length, typename std::enable_if<std::is_same<typename get_dimension<Lenght>::type, boost::units::length_dimension>::value>::type>
auto square(Length l) -> decltype(l*l){return l*l;}
(In better cases decltype will do the SFINAE job.)
In my opinion, option 4. and possibly combined with 3. is the most elegant way ahead.
References:
https://www.boost.org/doc/libs/1_69_0/boost/units/get_dimension.hpp

Is std::vector::size() allowed to require non-trivial computations? When would it make sense?

I'm reviewing a piece of code and see a class where an std::vector is stored as a member variable and the size of that std::vector is stored as a separate member variable. Both std::vector and its "stored copy" of size are never change during the containing object lifetime and the comments say size is stored separately "for convenience and for cases when an implementation computes the size each time".
My first reaction was "WT*? Should't it be always trivial to extract std::vectors size?"
Now I've carefully read 23.2.4 of C++ Standard and can't see anything saying whether such implementations are allowed in the first place and I can't imagine why it would be necessary to implement std::vector in such way that its current size needs non-trivial computations.
Is such implementation that std::vector::size() requires some non-trivial actions allowed? When would having such implementation make sense?
C++03 says in Table 65, found in ยง23.1, that size() should have a constant complexity. (In C++0x, this is required for all containers.) You'd be hard-pressed to find a std::vector<> where it's not.
Typically, as Steve says, this is just the difference between two pointers, a simple operation.
I would guess that your definition of "trivial" doesn't match that of the author of the code.
If size isn't stored, I'd expect begin and end to be stored, and size to be computed as the difference of the two, and that code to be inlined. So we're basically talking two (nearby) memory accesses and a subtraction, instead of one memory access.
For most practical purposes, both of those are trivial, and if the standard library author thinks that the result of that computation isn't worth caching, then personally I am happy to accept their opinion. But the author of that code comment might think otherwise.
IIRC the standard says somewhere that size "should" be O(1), not sure if that's in the text for sequences or for containers. I don't think it anywhere specifies that it must be for vector. But even if we read that as a non-requirement there's a fundamental QOI issue here - what on earth am I doing optimizing my code for such a poor implementation at the expense of normal implementations?
If someone uses such an implementation, presumably that's because they want their code to run slowly. Who am I to judge otherwise? ;-)
It's also possible that the author of the code has profiled using a number of end-begin implementations, and measured a significant improvement by caching the size. But I think that's less likely than that the author is being too pessimistic about the worst case their code needs to handle.