How to avoid unnecessary copies with `boost::variant` recursive XML structure

How to avoid unnecessary copies with `boost::variant` recursive XML structure - c++

Let me refer to the following example from the boost spirit tutorial about parsing
a "mini XML" data structure. My question doesn't actually have anything to do with
spirit, it's really about boost::variant, and making efficient "recursive" variant
structures.
Here's the code:
struct mini_xml;
typedef
boost::variant<
boost::recursive_wrapper<mini_xml>
, std::string
>
mini_xml_node;
struct mini_xml
{
std::string name; // tag name
std::vector<mini_xml_node> children; // children
};
In the tutorial, they go on to show how to "adapt" the struct mini_xml for boost::fusion,
and then write spirit grammars that load data into it.
However, it occurred to me recently that there's a subtle issue in this example that may lead to significant overhead.
The issue is that the variant type, mini_xml_node, is not no-throw move constructible in this example. The reason is that it contains boost::recursive_wrapper. The recursive_wrapper<T>
always represents a heap-allocated instance of T, and it does not have an empty state. When a
variant containing recursive_wrapper is moved from, a new dynamic allocation is made and a new T
is move constructed from the old T -- we can't simply take ownership of the old T, because that would leave the old variant in an empty state. (I only looked into this after I was trying to implement my own variant type -- this is indeed what boost::variant does afaict, and it is indeed not no-throw move constructible in this example.)
Because mini_xml_node is not no-throw move constructible, the container std::vector<mini_xml_node> children will be on the "slow path" -- when it uses std::move_if_noexcept, it will elect to copy the elements rather
than move them. This happens for instance every time the vector capacity increases. And when we copy a mini_xml_node, we copy all of its children.
So for instance, if I want to parse from disk an xml tree which has depth d and branching factor b, I estimate that we will end up copying each leaf of the tree about d * log b times, since pushing back into a vector b times causes reallocation about log b times, and it will happen for each ancestor of the leaf.
I don't have an actual application right now where I care about this overhead, but it's easy for me to imagine that
I might. For instance I might want to write a high-performance parser using spirit for some small utility program
that will check that e.g. a data file has a certain form, or count some statistic of it or something. And then it
might very well be that the time to parse dominates the overall runtime of the utility, so these copies might be performance critical.
The question is, what is the simplest and cleanest way to change the code to prevent these copies from happening? I thought of a few approaches
but maybe you see a better way.
Quick and dirty: Write a wrapper class that contains a mini_xml_node but the move constructor is explicitly marked
noexcept, even though it isn't really noexcept. This will cause termination if an exception is thrown, but that should be pretty rare...
Use an alternative to std::vector that doesn't use std::move_if_noexcept, and instead just uses move. If I understand right, boost::vector does this. The tradeoff is it doesn't have strong exception safety, but it's not clear if that's really a problem here.
(This one doesn't actually work.) Add boost::blank as one of the options for mini_xml_node.
When blank is one of the value types, boost::variant has special code that adapts its behavior -- it will use
blank as a natural empty state when default constructing and when making type-changing assignments.
However, it appears that putting blank into boost::variant doesn't go so far as allowing me to no-throw move construct
the variant even though it contains a boost::recursive_wrapper type. Tested on coliru with boost 1.63:
http://coliru.stacked-crooked.com/a/87620e443470b70c
(Maybe this would be a useful feature for boost::variant in the future? What I'd like is that the move constructor transfers ownership of the recursively-wrapped guy, and puts the old variant in the blank state, and marks the move constructor as noexcept.)

Related

How to implement/use: class for a dynamic array of fixed size (known only at run time)

I'm introducing myself to C++, and sadly it's starting to seem like the support for dynamically created arrays of fixed size (but with the size known only at run time) is very poor in C++, as new[] can't call an arbitrary user-specified constructor with user-set arguments.
Consider class A which has a number of constructors, each with some parameters. Assume that a constructor without parameters would be useless (I don't want to have to write one if I essentially don't need it). I guess the following doesn't matter, but, just in case: assume that A contains only a possibly large std::vector<Internal> (Internal is a private class, T and S parameterize A) and an integer counter as far as data members go. Also, A is parameterized.
Assume we want n instances of A stored contiguously in memory as an array, where n is determined at run time and constant afterwards. We want to be able create and initialize the structure with a single call that passes arguments to a constructor of A, or something similar. So each instance in the array gets the same, but programmatic initialization. EDIT: sorry, I didn't mean to say I want O(1) initialization, as that's impossible, I just wanted O(n) initialization, but so that I can create the array in one statement. I.e., so that I don't have to write an initialization loop for every array I create.
A possible, but suboptimal, solution is std::vector<A<T,S>>, but assume we can't live with the inefficiency. (Remember that std::vector supports resizing.)
How to implement and/or use an efficient solution with a nice API?
I would prefer a solution that doesn't reimplement half of the standard library, i.e. consider C++20 features and the standard library available for the implementation. Also, don't make me violate the C++ aliasing rules.
A possibly related question is why is such a "fixed_size_vector" class missing from the standard library?
(BTW: not that it matters, but please don't say "just use vector", because in this case I'm indeed going to go with the mentioned suboptimal solution, as the performance is not significant for my toy program, but in the real world the performance will matter one day and I want to be prepared. EDIT: I did not mean I want to optimize my toy program, rather I was referring to the fact that one day I will have to optimize some other program.)
EDIT: answering to some commenters: wrapping std::vector could provide the right abstraction, but it would be unnecessarily inefficient. A comment linked a question whose top answer explains this nicely:
dynarray is smaller and simpler than vector, because it doesn't need
to manage separate size and capacity values, and it doesn't need to
store an allocator
(dynarray here was a proposed addition to stdlib that seems to be what I wanted, except that it was also supposed to rely on special compiler support for some of its semantics). Of course, this difference compared to std::vector won't matter most of the time, but it would still be good if I was able to simply use the right tool for the job.

There is a proposal to add a fixed capacity vector to the standard.
Note that this proposal proposes the capacity be known at compile-time, so it's not applicable in your case.
There are also some open source libraries that implement one, e.g., Boost's static_vector, or . If you really want a fixed-capacity vector, you can use one of the open source implementations that exist out there.
If you really know what you're doing, you could write one on your own, but that's not the case for >99% of C++ users.
However, it should be noted that reserve()ing space on a vector will probably have the effect you want, and there's probably no need for an actual fixed capacity vector.

Since you mention that the size is only known at runtime this is exactly what std::vector is meant to be used for.
template <typename T, typename...Args>
auto make_vector(std::size_t size, const Args&...args) -> std::vector<T>
{
auto result = std::vector<T>{};
result.reserve(size); // whatever the known size is
for (auto i = 0; i < size; ++i) {
result.emplace_back(args...);
}
return result;
}
// Use like:
auto vec = make_vector<std::string>(20, "hello world");
This will pre-allocate enough room for size entries of type T, and the loop will call T's constructor with whatever arguments you pass it.
Be aware that:
No additional constructors are called.
No extra memory is used.
No copies or relocations are performed.
The returned vector is not copied (or even moved) with c++17 or above thanks to guaranteed copy elision.
Doing this is as optimal as you can get whether you use a specialized container or otherwise. This is why every experienced C++ developer will tell you the same thing: std::vector is the solution.[2]
Note: The above function uses const Args&... for propagation and not proper forwarding references, since rvalue references could result in use-after-move bugs.[1]
A specialized container like a fixed_size_vector that you mention will either be one of two things:
Fixed at compile-time on the max size, in which case it wouldn't work for you since you mentioned the size is only known at runtime
Fixed at runtime on the max size, in which case it will do exactly what I suggested above, since it will reserve the storage space up-front.
It is not possible at the language level to dynamically construct N objects only known at runtime using a custom constructor. Full stop. This could be done if the sequence is known at compile-time, but not runtime.
C++ is statically compiled, so we cannot variadically expand a runtime n value into a pack of T{...} constructor calls; it's simply not possible. This means there will be a loop every time. Thus the most optimal thing you can do is allocate n objects once, and call T's constructor n times.
[1] A short-hand syntax for passing a list of arguments to all of a sequences constructors is not a good general solution in C++. In fact, it would be suboptional. This would either force copies via const lvalue references, or it would allow for rvalues -- in which case only the first object constructed will get a valid value, and everything after will receive a use-after-moved object! Just imagine unique_ptr to a sequence of T's. Only the first instance will get a valid pointer, and everything else will receive nullptr
[2] Honestly, about the only real optimization you might be able to make on this solution would be to use a custom allocator, such as a std::pmr::vector with a stack-allocated memory buffer resource.
Footnote
I strongly advise you to get over the "efficiency first" mentality. Most developers' intuition on what is and is not efficient is wrong; this is why profilers are so important. Things like speculative execution, cache locality, and pipelining play a huge role in performance -- and these things are far more complex than simply constructing a dynamic array of objects.
Real software is written for other developers, not for the machine. It's better to have code that is maintainable and scalable, and optimized in places where bottlenecks have been identified through proper tooling.

std::unique_ptr::release() vs std::move()

I have a class that represents a runtime context and builds a tree, the tree root is held in a unique_ptr. When building the tree is done I want to extract the tree. This is how it looks (not runnable, this is not a debugging question):
class Context {
private:
std::unique_ptr<Node> root{new Node{}};
public:
// imagine a constructor, attributes and methods to build a tree
std::unique_ptr<Node> extractTree() {
return std::move(this->root);
}
};
So I used std::move() to extract the root node from the Context instance.
However there are alternatives to using std::move() e.g.:
std::unique_ptr<Node> extractTree() {
// This seems less intuitive to me
return std::unique_ptr<Node>{this->root.release()};
}
Is std::move() the best choice?

You should definitely go with the first version as the second one basically does the everything the first version does with more code and less readability.
Philosophically speaking, the second version moves the unique pointer, no more, no less. so why going around the table instead of using the already existing, more readable and more idiomatic std::unique_ptr(std::unique_ptr&&) ?
And lastly, if your smart pointer somewhere in the future will hold custom deleter , the first version will make sure the deleter moves as well, the second version does not moves the deleter. I can definitely imagine a programmer building non-custom deleter unique pointer out of custom-deleter unique_pointer with using release. With move semantics, the program will fail to compile.

What you're doing is dangerous. Once you've called getTree(), you must not call it a second time, which is not clear from the interface. You may want to reconsider your design (e.g. a shared_ptr might do a better job, or simply store the root as a raw pointer and manually take care of the deallocation).
Anyway, using std::move is the better option of the two if you want to stick with your design as it makes your intent more clear.
EDIT: apparently 'must not' has a special meaning of forbideness in English I was not aware of. It is fine to call the function twice or as many times as you want, but will not return a pointer to a valid object if done consecutively.

Tradeoffs when returning a collection

There are various ways of returning a collection of items from a method of a class in C++.
For example, consider the class MessageSpy that listens on all Messages sent over a connection. A client could access the messaging information in a number of ways.
const CollectionClass MessageSpy::getMessages()
Iterator MessageSpy::begin(), Iterator MessageSpy::end()
void MessageSpy::getMessages(OutputIterator)
void MessageSpy::eachMessage(Functor)
others...
Each approach has its trade-offs. For example: Approach 1 would require copying the whole collection which is expensive for large collections. While approach 2 makes the class look like a collection which is inappropriate for a view...
Since I'm always strungling choosing the most appropriate approach in I wonder what you consider the trade-offs/costs when considering these approaches?

I suggest an iterator based/callback based approach in cases where you demand the most lightweight solution possible.
The reason is that it decouples the supplier from the usage patterns by the consumer.
In particular, slamming the result into a collection1 (even though the result maybe "optimized" - likely into (N)RVO or moving instead of copying the object) would still allocate a complete container for the full capacity.
Edit: 1 an excellent addition to "obligatory papers" (they're not; they're just incredibly helpful if you want to understand things): Want Speed? Pass By value by Dave Abrahams.
Now
this is overkill if the consumer actually stops processing data after the first few elements
for(auto f=myType.begin(), l=myType.end(); f!=l; ++f)
{
if (!doProcessing(*f))
break;
}
this can be suboptimal even if the consumer processes al elements eventually: there might not be a need to have all elements copied at any particular moment, so the 'slot' for the 'current element' can be reused, reducing memory requirements, increasing cache locality. E.g.:
for(auto f=myType.begin(), l=myType.end(); f!=l; ++f)
{
myElementType const& slot = *f; // making the temp explicit
doProcessing(slot);
}
Note that iterator interfaces are simply still superior if the consumer did want a collection containing all elements:
std::vector<myElementType> v(myType.begin(), myType.end());
// look: the client gets to _decide_ what container he wants!
std::set<myElementType, myComparer> s(myType.begin(), myType.end());
Try getting this flexibility otherwise.
Finally, there are some elements of style:
by nature it's easy to expose (const) references to the elements using iterators; this makes it much easier to avoid object slicing and to enable clients to use the elements polymorphically.
iterator-style interfaces could be overloaded to return non-const references on dereference. A container to be returned, couldn't contain references (directly)
if you adhere to the requirements of range-based-for in C++11 you can have some syntactic sugar:
for (auto& slot : myType)
{
doProcessing(slot);
}
Finally, (as shown above), in the general sense iterators work nicely with the standard library.
The callback style (and similarly the Output-iterator style) has many of the benefits of the iterator style (namely, you could use return values to abort iteration halfway, and you could do processing without allocating copies of all elements up front), but it seems to me to be slightly less flexible in use. Of course, there may be situations where you want to encourage a particular usage pattern, and this migh be a good way to go.

The first thing (you somehow didn't mention at all) I would think about is
const CollectionClass& MessageSpy::getMessages()
Note the &. That returns you const reference which can't be modified but can be freely accepted.
No copying, unless you really want to copy.
If that's not suitable, Qt, for example, uses "implicit data sharing" for plenty of classes.
I.e. your classes are "kinda" returned by value, BUT their internal data is shared until you attempt to perform write operation on one of them. In this case, class you're trying to write into, performs a deep copy, and data stops being shared. That means less data is moved around.
And there's return value optimization some people on SO seems to love too much. Basically, when you return something big by value, some compilers in certain situations can eliminate extra copy, and immediately pass value bypassing extra assignment which may be faster than returning by reference. I wouldn't rely on it too much, but if you profiled your code and figured out that using RVO provides a good speed-up, then it is worth using.
I wouldn't recommend "iterators", because using them on C++03 compiler without auto keyword is royal pain in the #&#. Long names or many typedefs. I would return const reference to container itself instead.

How much work should constructor of my class perform?

I have a class that represents a data stream, it basically
reads or writes into a file, but first the data are being encrypted/decrypted and there is also an underlying codec object that handles the media being accessed.
I'm trying to write this class in a RAII way and I'd like a clean, nice, usable design.
What bothers me is that right now there is a lot of work being done in the constructor.
Before the object's I/O routines can be safely used, first of all the codec needs to initialized (this isn't very demanding), but then a key is taken into account and crypto and other things are intialized - these require some analysis of the media which takes quite a lot of computation.
Right now I'm doing all this in the constructor, which makes it take a long time. I'm thinking of moving the crypto init stuff (most work) out of the ctor into a separate method (say, Stream::auth(key)), but then again, this would move some responsibility to the user of the class, as they'd be required to run auth() before they call any I/O ops. This also means I'd have to place a check in the I/O calls to verify that auth() had been called.
What do you think is a good design?
P.S. I did read similar question but I wasn't really able to apply the answers on this case. They're mostly like "It depens"... :-/
Thanks

The only truly golden unbreakable rule is that the class must be in a valid, consistent, state after the constructor has executed.
You can choose to design the class so that it is in some kind of "empty"/"inactive" state after the constructor has run, or you can put it directly into the "active" state that it is intended to be in.
Generally, it should be preferred to have the constructor construct your class. Usually, you wouldn't consider a class fully "constructed", until it's actually ready to be used, but exceptions do exist.
However, keep in mind that in RAII, one of the key ideas is that the class shouldn't exist unless it is ready, initalized and usable. That's why its destructor does the cleanup, and that's why its constructor should do the setup.
Again, exceptions do exist (for example, some RAII objects allow you to release the resource and perform cleanup early, and then have the destructor do nothing.)
So at the end of the day, it depends, and you'll have to use your own judgment.
Think of it in terms of invariants. What can I rely on if I'm given an instance of your class? The more I can safely assume about it, the easier it is to use. If it might be ready to use, and might be in some "constructed, but not initialized" state, and might be in a "cleaned up but not destroyed" state, then using it quickly becomes painful.
On the other hand, if it guarantees that "if the object exists, it can be used as-is", then I'll know that I can use it without worrying about what was done to it before.
It sounds like your problem is that you're doing too much in the constructor.
What if you split the work up into multiple smaller classes? Have the codec be initialized separately, then I can simply pass the already-initialized codec to your constructor. And all the authentication and cryptography stuff and whatnot could possibly be moved out into separate objects as well, and then simply passed to "this" constructor once they're ready.
Then the remaining constructor doesn't have to do everything from scratch, but can start from a handful of helper objects which are already initialized and ready to be used, so it just has to connect the dots.

you could just place the check in the IO calls to see if auth has been called, and if it has, then continue, if not, then call it.
this removes the burden from the user, and delays the expense until needed.

Basically, this all boils down to which design to choose from the following three:
Designs
Disclaimer: this post is not encouraging the use of exception specifications or exceptions for that matter. The errors may equivalently be reported using error codes if you wish. Exception specifications as used here are just meant to illustrate when different errors can occur using a concise syntax.
Design 1
This is the most recurring design out there, and totally non-RAII. The constructor just puts the object in some stale state and each instance must be initialized manually after construction takes place.
class SecureStream
{
public:
SecureStream();
void initialize(Stream&,const Key&) throw(InvalidKey,AlreadyInitialized);
std::size_t get( void*,std::size_t) throw(NotInitialized,IOError);
std::size_t put(const void*,std::size_t) throw(NotInitialized,IOError);
};
Pros:
Users have control over when to invoke the "heavy" initialization process
The object can be created before the key exists. This is important for frameworks such as COM, where all objects must have a default constructor (the CoCreateObject() does not allow you to forward extra arguments the object constructor). Sometimes, there are still workarounds, such as a builder object.
Cons:
Objects must be checked for the stale state before using the object. This may be enforced by the object by returning an error code or throwing an exception. Personally, I hate objects that allow me to use them and just appear to ignore my calls (e.g. a failed std::ostream).
Design 2
This is the RAII approch. Make sure the object is 100% usable with no extra artefacts (e.g. manually calling stream.initialize(...); on each instance.
class SecureStream
{
public:
SecureStream(Stream&,const Key&) throw(InvalidKey);
std::size_t get( void*,std::size_t) throw(IOError);
std::size_t put(const void*,std::size_t) throw(IOError);
};
Pros:
The object can always be assumed to be in a valid state. This is so much simpler to use.
Cons:
Constructor might take a long time to execute.
All required arguments must be available at the instance construction. This has once in a while been a problem for me, especially if most other objects in the code base use design #1.
Design 3
Somewhat of a compromise between the two previous cases. Don't initialize yet, but have the other methods lazily invoke the internal .initialize(...) method when necessary.
class SecureStream
{
public:
SecureStream(Stream&,const Key&);
std::size_t get( void*,std::size_t) throw(InvalidKey,IOError);
std::size_t put(const void*,std::size_t) throw(InvalidKey,IOError);
private:
void initialize() throw(InvalidKey);
};
Pros:
Almost as easy to use as design #1. Almost (see below).
Cons:
If the initialization step may fail, it may now fail anywhere there is a first call to any of the public methods. Proper error handling for this scenario is extremely difficult.
Discussion
If you absolutely must pay for the initialization for every instance, then design #1 is out of the question as it just results in more bugs in the software.
The question is just about when to pay for the initialization cost. Do you prefer paying it upfront, or on first use? In most scenarios, I prefer paying upfront because I don't want to assume users can handle errors later in the program. However, there might be specific threading semantics in your program, and you might not be able to stall threads at creation time (or, conversely, at use time).
In any case, you can still get the benefits of design #3 by using dynamic allocation of the class in design #2.
Conclusion
Basically, if the only reason you are hesitating is for some philosophical ideal where constructors execute quickly, I would just go with the pure RAII design.

There's no hard and fast rule on this, but in general it's best to avoid heavy constructors for two reasons that come to mind (maybe others as well):
The order of the objects created intializer list can give rise to subtle bugs
What to do with exceptions in the constructor? Will you need to handle partially-constructed objects in your app?

Boost shared_ptr use_count function

My application problem is the following -
I have a large structure foo. Because these are large and for memory management reasons, we do not wish to delete them when processing on the data is complete.
We are storing them in std::vector<boost::shared_ptr<foo>>.
My question is related to knowing when all processing is complete. First decision is that we do not want any of the other application code to mark a complete flag in the structure because there are multiple execution paths in the program and we cannot predict which one is the last.
So in our implementation, once processing is complete, we delete all copies of boost::shared_ptr<foo>> except for the one in the vector. This will drop the reference counter in the shared_ptr to 1. Is it practical to use shared_ptr.use_count() to see if it is equal to 1 to know when all other parts of my app are done with the data.
One additional reason I'm asking the question is that the boost documentation on the shared pointer shared_ptr recommends not using "use_count" for production code.
Edit -
What I did not say is that when we need a new foo, we will scan the vector of foo pointers looking for a foo that is not currently in use and use that foo for the next round of processing. This is why I was thinking that having the reference counter of 1 would be a safe way to ensure that this particular foo object is no longer in use.

My immediate reaction (and I'll admit, it's no more than that) is that it sounds like you're trying to get the effect of a pool allocator of some sort. You might be better off overloading operator new and operator delete to get the effect you want a bit more directly. With something like that, you can probably just use a shared_ptr like normal, and the other work you want delayed, will be handled in operator delete for that class.
That leaves a more basic question: what are you really trying to accomplish with this? From a memory management viewpoint, one common wish is to allocate memory for a large number of objects at once, and after the entire block is empty, release the whole block at once. If you're trying to do something on that order, it's almost certainly easier to accomplish by overloading new and delete than by playing games with shared_ptr's use_count.
Edit: based on your comment, overloading new and delete for class sounds like the right thing to do. If anything, integration into your existing code will probably be easier; in fact, you can often do it completely transparently.
The general idea for the allocator is pretty much the same as you've outlined in your edited question: have a structure (bitmaps and linked lists are both common) to keep track of your free objects. When new needs to allocate an object, it can scan the bit vector or look at the head of the linked list of free objects, and return its address.
This is one case that linked lists can work out quite well -- you (usually) don't have to worry about memory usage, because you store your links right in the free object, and you (virtually) never have to walk the list, because when you need to allocate an object, you just grab the first item on the list.
This sort of thing is particularly common with small objects, so you might want to look at the Modern C++ Design chapter on its small object allocator (and an article or two since then by Andrei Alexandrescu about his newer ideas of how to do that sort of thing). There's also the Boost::pool allocator, which is generally at least somewhat similar.

If you want to know whether or not the use count is 1, use the unique() member function.

I would say your application should have some method that eliminates all references to the Foo from other parts of the app, and that method should be used instead of checking use_count(). Besides, if use_count() is greater than 1, what would your program do? You shouldn't be relying on shared_ptr's features to eliminate all references, your application architecture should be able to eliminate references. As a final check before removing it from the vector, you could assert(unique()) to verify it really is being released.

I think you can use shared_ptr's custom deleter functionality to call a particular function when the last copy has been released. That way, you're not using use_count at all.
You would need to hold something other than a copy of the shared_ptr in your vector so that the shared_ptr is only tracking the outstanding processing.
Boost has several examples of custom deleters in the shared_ptr docs.

I would suggest that instead of trying to use the shared_ptr's use_count to keep track, it might be better to implement your own usage counter. this way you will have full control over this rather than using the shared_ptr's one which, as you rightly suggest, is not recommended. You can also pre-set your own counter to allow for the number of threads you know will need to act on the data, rather than relying on them all being initialised at the beginning to get their copies of the structure.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js