boost::serialization to serialize only the keys of a map - c++

I have a class with a map, and I want to serialize the class using boost serialize.
std::map<int, ComplicatedThing> stuff;
ComplicatedThing is derivable just by knowing the int. I want to serialize this efficiently. One way (ick, but works) is to make a vector of the keys and serialize the vector.
// illustrative, not test-compiled
std::vector<int> v;
std::copy(stuff.begin, stuff.end, std::back_inserter(v));
// or
for(std::vector<int> it = v.begin(); it != v.end(); it++)
stuff[*it] = ComplicatedThing(*it);
// ...and later, at serialize/deserialize time
template<class Archive>
void srd::leaf::serialize(Archive &ar, const unsigned int version)
{
ar & v;
}
But this is inelegant. Using BOOST_SERIALIZATION_SPLIT_MEMBER() and load/save methods, I think I should be able to skip the allocation of the intermediate vector completely. And there I am stuck.
Perhaps my answer lies in understanding boost/serialization/collections_load_imp.hpp. Hopefully there is a simpler path.

you can serialize it as list of ints (I don't mean std::list) instead of serializing it as a container (map or vector). fist write number of elements and then them one by one, deserizalize accordingly. it's 10 mins task. if you need this solution in many places, wrap map in your class and define serialization for it

If you want to make it not look clumsy, use range adaptors
ar & (stuff | transformed(boost::bind(&map_type::value_type::first, _1));
Or if you include the appropriate headers, I suppose you could reduce this to
ar & (stuff | transformed(&map_type::value_type::first))
Disclaimer
All of this assumes that Boost Serialization ships with serializers for Boost Ranges (haven't checked)
This might not work well in a bidirectional serialize setting (you'll want to read http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/serialization.html#splitting)
I haven't brought the above into the vicinity of a compiler

Related

Map, pair-vector or two vectors...?

I read through some posts and "wikis" but still cannot decide what approach is suitable for my problem.
I create a class called Sample which contains a certain number of compounds (lets say this is another class Nuclide) at a certain relative quantity (double).
Thus, something like (pseudo):
class Sample {
map<Nuclide, double>;
}
If I had the nuclides Ba-133, Co-60 and Cs-137 in the sample, I would have to use exactly those names in code to access those nuclides in the map. However, the only thing I need to do, is to iterate through the map to perform calculations (which nuclides they are is of no interest), thus, I will use a for- loop. I want to iterate without paying any attention to the key-names, thus, I would need to use an iterator for the map, am I right?
An alternative would be a vector<pair<Nuclide, double> >
class Sample {
vector<pair<Nuclide, double> >;
}
or simply two independent vectors
Class Sample {
vector<Nuclide>;
vector<double>;
}
while in the last option the link between a nuclide and its quantity would be "meta-information", given by the position in the respective vector only.
Due to my lack of profound experience, I'd ask kindly for suggestions of what approach to choose. I want to have the iteration through all available compounds to be fast and easy and at the same time keep the logical structure of the corresponding keys and values.
PS.: It's possible that the number of compunds in a sample is very low (1 to 5)!
PPS.: Could the last option be modified by some const statements to prevent changes and thus keep the correct order?
If iteration needs to be fast, you don't want std::map<...>: its iteration is a tree-walk which quickly gets bad. std::map<...> is really only reasonable if you have many mutations to the sequence and you need the sequence ordered by the key. If you have mutations but you don't care about the order std::unordered_map<...> is generally a better alternative. Both kinds of maps assume you are looking things up by key, though. From your description I don't really see that to be the case.
std::vector<...> is fast to iterated. It isn't ideal for look-ups, though. If you keep it ordered you can use std::lower_bound() to do a std::map<...>-like look-up (i.e., the complexity is also O(log n)) but the effort of keeping it sorted may make that option too expensive. However, it is an ideal container for keeping a bunch objects together which are iterated.
Whether you want one std::vector<std::pair<...>> or rather two std::vector<...>s depends on your what how the elements are accessed: if both parts of an element are bound to be accessed together, you want a std::vector<std::pair<...>> as that keeps data which is accessed together. On the other hand, if you normally only access one of the two components, using two separate std::vector<...>s will make the iteration faster as more iteration elements fit into a cache-line, especially if they are reasonably small like doubles.
In any case, I'd recommend to not expose the external structure to the outside world and rather provide an interface which lets you change the underlying representation later. That is, to achieve maximum flexibility you don't want to bake the representation into all your code. For example, if you use accessor function objects (property maps in terms of BGL or projections in terms of Eric Niebler's Range Proposal) to access the elements based on an iterator, rather than accessing the elements you can change the internal layout without having to touch any of the algorithms (you'll need to recompile the code, though):
// version using std::vector<std::pair<Nuclide, double> >
// - it would just use std::vector<std::pair<Nuclide, double>::iterator as iterator
auto nuclide_projection = [](Sample::key& key) -> Nuclide& {
return key.first;
}
auto value_projecton = [](Sample::key& key) -> double {
return key.second;
}
// version using two std::vectors:
// - it would use an iterator interface to an integer, yielding a std::size_t for *it
struct nuclide_projector {
std::vector<Nuclide>& nuclides;
auto operator()(std::size_t index) -> Nuclide& { return nuclides[index]; }
};
constexpr nuclide_projector nuclide_projection;
struct value_projector {
std::vector<double>& values;
auto operator()(std::size_t index) -> double& { return values[index]; }
};
constexpr value_projector value_projection;
With one pair these in-place, for example an algorithm simply running over them and printing them could look like this:
template <typename Iterator>
void print(std::ostream& out, Iterator begin, Iterator end) {
for (; begin != end; ++begin) {
out << "nuclide=" << nuclide_projection(*begin) << ' '
<< "value=" << value_projection(*begin) << '\n';
}
}
Both representations are entirely different but the algorithm accessing them is entirely independent. This way it is also easy to try different representations: only the representation and the glue to the algorithms accessing it need to be changed.

Boost property_tree for storing pointers

Is it possible to store pointers to objects in boost property trees, and then use an iterator to retrieve the data? I'm trying to do something like:
property_tree::ptree pt;
pt.put<CGUICrateElement*>("1.2.3.4", new MyObject() );
//... more tree construction here...
and then recursively itererate through all the tree nodes with something like:
property_tree::ptree::iterator iter = treeNode.begin();
property_tree::ptree::iterator iter_end = treeNode.end();
for ( ; iter != iter_end; ++iter )
{
MyObject *obj = lexical_cast<MyObject*>(iter->second.data());
//... etc
The problem is I get the error lexical_cast.hpp:1112: error: no match for 'operator>>' in 'stream >> output' on the lexical cast line.
and adding the following to MyObject doesn't help
friend std::istream& operator>>(std::istream& in, MyObject& obj){ return in; }
I've also tried c casts and dynamic casts to no avail.
Is using pointers even possible inside a ptree? I'm about to just create my own tree structure as a workaround by I figured I'd ask here first.
Cheers.
Adding an operator>> for a reference to MyObject won't help when you're actually trying to lexical_cast to a pointer to MyObject. You could conceivably create an operator>>(std::istream&, MyObject*&). However, remember that property_tree is designed for reading configuration from text files, so you'll have the joy of converting your object to and from text.
Don't use property_tree as a generic data structure. Internally it will be expecting to deal with text.
It is looking a lot like you wanted a serialization solution. I covered some ground (including storing through pointers) in this post:
copying and repopulating a struct instance with pointers
This example also shows serialization to XML, collections containing (potentially) duplicated pointers. On deserialization, the pointers will be reconstructed faithfully (including the duplication).

C++ how to mix a map with a circular buffer?

I wonder is it possible to have a map that would work like boost circular buffer. Meaning it would have limited size and when it would get to its limited size it will start overwriting first inserted elements. Also I want to be capable to search thru such buffer and find or create with [name]. Is It possible to create such thing and how to do it?
What you want is an LRU (least recently used) Map, or LRA (least recently added) Map depending on your needs.
Implementations already exist.
Well, I don't think that structure is present out of the box in boost (may exist elsewhere, though), so you should create it. I wouldn't recommend using operator[](), though, at least as it is implemented in std::map, because this may make difficult to track elements added to the map (for exapmle, using operator[]() with a value adds that empty value to the map), and go for a more explicit get and put operations for adding and retrieving elements of the map.
As for the easiest implementation, I would go for using an actual map as the storage, and a deque for the storage of the elements added (not tested):
template <typename K, typename V>
struct BoundedSpaceMap
{
typedef std::map<K,V> map_t;
typedef std::deque<K> deque_t;
// ...
typedef value_type map_t::value_type;
// Reuse map's iterators
typedef iterator map_t::iterator;
// ...
iterator begin() { return map_.begin(); }
// put
void put ( K k, V v)
{ map_.insert(std::make_pair(k,v));
deque_.push_back(k);
_ensure(); // ensure the size of the map, and remove the last element
}
// ...
private:
map_t map_;
deque_t deque_;
void _ensure() {
if (deque_size() > LIMIT) {
map_.erase(deque_.front()); deque_.pop_front();
}
}
};
Well not really a "circular buffer" since that doesn't make much sense for a map, but we can use a simple array without any additional linked lists or anything.
This is called closed hashing - the wiki article summarizes it quite nicely. Double hashing is the most often used as it avoids clustering (which leads to worse performance), but has its own problems (locality).
Edit: Since you want a specific implementation, I don't think boost has one but this or this were mentioned in another SO post about closed hashing..

Order a container by member with STL

Suppose I have some data stored in a container of unique_ptrs:
struct MyData {
int id; // a unique id for this particular instance
data some_data; // arbitrary additional data
};
// ...
std::vector<std::unique_ptr<MyData>> my_data_vec;
The ordering of my_data_vec is important. Suppose now I have another vector of IDs of MyDatas:
std::vector<int> my_data_ids;
I now want to rearrange my_data_vec such that the elements are in the sequence specified by my_data_ids. (Don't forget moving a unique_ptr requires move-semantics with std::move().)
What's the most algorithmically efficient way to achieve this, and do any of the STL algorithms lend themselves well to achieving this? I can't see that std::sort would be any help.
Edit: I can use O(n) memory space (not too worried about memory), but the IDs are arbitrary (in my specific case they are actually randomly generated).
Create a map that maps ids to their index in my_data_ids.
Create a function object that compares std::unique_ptr<MyData> based on their ID's index in that map.
Use std::sort to sort the my_data_vec using that function object.
Here's a sketch of this:
// Beware, brain-compiled code ahead!
typedef std::vector<int> my_data_ids_type;
typedef std::map<int,my_data_ids_type::size_type> my_data_ids_map_type;
class my_id_comparator : public std::binary_function< bool
, std::unique_ptr<MyData>
, std::unique_ptr<MyData> > {
public:
my_id_comparator(const my_data_ids_map_type& my_data_ids_map)
: my_data_ids_map_(my_data_ids_map) {}
bool operator()( const std::unique_ptr<MyData>& lhs
, const std::unique_ptr<MyData>& rhs ) const
{
my_data_ids_map_type::const_iterator it_lhs = my_data_ids_map_.find(lhs.id);
my_data_ids_map_type::const_iterator it_rhs = my_data_ids_map_.find(rhs.id);
if( it_lhs == my_data_ids_map_.end() || it_rhs == my_data_ids_map_.end() )
throw "dammit!"; // whatever
return it_lhs->second < it_rhs->second;
}
private
my_data_ids_map_type& my_data_ids_map_;
};
//...
my_data_ids_map_type my_data_ids_map;
// ...
// populate my_data_ids_map with the IDs and their indexes from my_data_ids
// ...
std::sort( my_data_vec.begin(), my_data_vec.end(), my_id_comparator(my_data_ids_map) );
If memory is scarce, but time doesn't matter, you could do away with the map and search the IDs in the my_data_ids vector for each comparison. However, you would have to be really desperate for memory to do that, since two linearly complex operations per comparison are going to be quite expensive.
Why don't you try moving the data into a STL Set ? you need only to implement the comparison function, and you will end up with a perfectly ordered set of data very fast.
Why don't you just use a map<int, unique_ptr<MyData>> (or multimap)?

How to use array optimization in boost serialization

I have to serialize an object that contains a std::vector<unsigned char> that can contain thousand of members, with that vector sizes the serialization doesn't scale well.
According with the documentation, Boost provides a wrapper class array that wraps the vector for optimizations but it generates the same xml output. Diving in boost code, i've found a class named use_array_optimization that seems to control the optimization but is somehow deactivated by default. i've also tried to override the serialize function with no results.
I would like to know how to activate that optimizations since the documents at boost are unclear.
The idea behind the array optimization is that, for arrays of types that can be archived by simply "dumping" their representation as-is to the archive, "dumping" the whole array at once is faster than "dumping" one element after the other.
I understand from your question that you are using the xml archive. The array optimization does not apply in that case because the serialization of the elements implies a transformation anyway.
Finally, I used the BOOST_SERIALIZATION_SPLIT_MEMBER() macro and coded two functions for loading and saving. The Save function looks like:
template<class Archive>
void save(Archive & ar, const unsigned int version) const
{
using boost::serialization::make_nvp;
std::string sdata;
Vector2String(vData, sdata);
ar & boost::serialization::make_nvp("vData", sdata);
}
The Vector2String function simply takes the data in vector and format it to a std::string. The load function uses a function that reverses the encoding.
You have several ways to serialize a vector with Boost Serialization to XML.
From what I read in the comments your are looking for Case 2 below.
I think you cannot change how std::vector is serialized by library after including boost/serialization/vector.hpp, however you can replace the code there by your own and by something close to Case 2.
0. Library default, not optimized
The first is to use the default given by the library, that as far as I know won't optimize anything:
#include <boost/serialization/vector.hpp>
...
std::vector<double> vec(4);
std::iota(begin(vec), end(vec), 0);
std::ofstream ofs{"default.xml", boost::archive::no_header};
boost::archive::xml_oarchive xoa{ofs};
xoa<< BOOST_NVP(vec);
output:
<vec>
<count>4</count>
<item_version>0</item_version>
<item>0.00000000000000000e+00</item>
<item>1.00000000000000000e+00</item>
<item>2.00000000000000000e+00</item>
<item>3.00000000000000000e+00</item>
</vec>
1. Manually, use that data is contiguous
#include <boost/serialization/array_wrapper.hpp> // for make_array
...
std::ofstream ofs{"array.xml"};
boost::archive::xml_oarchive xoa{ofs, boost::archive::no_header};
auto const size = vec.size();
xoa<< BOOST_NVP(size) << boost::serialization::make_nvp("data", boost::serialization::make_array(vec.data(), vec.size()));
output:
<size>4</size>
<data>
<item>0.00000000000000000e+00</item>
<item>1.00000000000000000e+00</item>
<item>2.00000000000000000e+00</item>
<item>3.00000000000000000e+00</item>
</data>
2. Manually, use that data is binary and contiguous
#include <boost/serialization/binary_object.hpp>
...
std::ofstream ofs{"binary.xml"};
boost::archive::xml_oarchive xoa{ofs, boost::archive::no_header};
auto const size = vec.size();
xoa<< BOOST_NVP(size) << boost::serialization::make_nvp("binary_data", boost::serialization::make_binary_object(vec.data(), vec.size()*sizeof(double)));
<size>4</size>
<binary_data>
AAAAAAAAAAAAAAAAAADwPwAAAAAAAABAAAAAAAAACEA=
</binary_data>
I think this makes the XML technically not portable.