How does boost::serialization allocate memory when deserializing through a pointer?

How does boost::serialization allocate memory when deserializing through a pointer? - c++

In short, I'd like to know how boost::serialization allocates memory for an object when deserializing through a pointer. Below, you'll find an example of my question, clearly illustrated alongside companion code. This code should be fully functional and compile fine, there are no errors, per se, just a question on how the code actually works.
#include <cstddef> // NULL
#include <iomanip>
#include <iostream>
#include <fstream>
#include <string>
#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
class non_default_constructor; // Forward declaration for boost serialization namespacing below
// In order to "teach" boost how to save and load your class with a non-default-constructor, you must override these functions
// in the boost::serialization namespace. Prototype them here.
namespace boost { namespace serialization {
template<class Archive>
inline void save_construct_data(Archive& ar, const non_default_constructor* ndc, const unsigned int version);
template<class Archive>
inline void load_construct_data(Archive& ar, non_default_constructor* ndc, const unsigned int version);
}}
// Here is the actual class definition with no default constructor
class non_default_constructor
{
public:
explicit non_default_constructor(std::string initial)
: some_initial_value{initial}, state{0}
{
}
std::string get_initial_value() const { return some_initial_value; } // For save_construct_data
private:
std::string some_initial_value;
int state;
// Notice that we only serialize state here, not the
// some_initial_value passed into the ctor
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive& ar, const unsigned int version)
{
std::cout << "serialize called" << std::endl;
ar & state;
}
};
// Define the save and load overides here.
namespace boost { namespace serialization {
template<class Archive>
inline void save_construct_data(Archive& ar, const non_default_constructor* ndc, const unsigned int version)
{
std::cout << "save_construct_data called." << std::endl;
ar << ndc->get_initial_value();
}
template<class Archive>
inline void load_construct_data(Archive& ar, non_default_constructor* ndc, const unsigned int version)
{
std::cout << "load_construct_data called." << std::endl;
std::string some_initial_value;
ar >> some_initial_value;
// Use placement new to construct a non_default_constructor class at the address of ndc
::new(ndc)non_default_constructor(some_initial_value);
}
}}
int main(int argc, char *argv[])
{
// Now lets say that we want to save and load a non_default_constructor class through a pointer.
non_default_constructor* my_non_default_constructor = new non_default_constructor{"initial value"};
std::ofstream outputStream("non_default_constructor.dat");
boost::archive::text_oarchive outputArchive(outputStream);
outputArchive << my_non_default_constructor;
outputStream.close();
// The above is all fine and dandy. We've serialized an object through a pointer.
// non_default_constructor will call save_construct_data then will call serialize()
// The output archive file will look exactly like this:
/*
22 serialization::archive 17 0 1 0
0 13 initial value 0
*/
/*If I want to load that class back into an object at a later time
I'd declare a pointer to a non_default_constructor */
non_default_constructor* load_from_archive;
// Notice load_from_archive was not initialized with any value. It doesn't make
// sense to intialize it with a value, because we're trying to load from
// a file, not create a whole new object with "new".
std::ifstream inputStream("non_default_constructor.dat");
boost::archive::text_iarchive inputArchive(inputStream);
// <><><> HERE IS WHERE I'M CONFUSED <><><>
inputArchive >> load_from_archive;
// The above should call load_construct_data which will attempt to
// construct a non_default_constructor object at the address of
// load_from_archive, but HOW DOES IT KNOW HOW MUCH MEMORY A NON_DEFAULT_CONSTRUCTOR
// class uses?? Placement new just constructs at the address, assuming
// memory at the passed address has been allocated for construction.
// So my question is this:
// I want to verify that *something* is (or isn't) allocating memory for a non_default_constructor
// class to be constructed at the address of load_from_archive.
std::cout << load_from_archive->get_initial_value() << std::endl; // This works.
return 0;
}
Per the boost::serialization documentation when a class with a non-default constructor is to be (de)serialized, the load/save_construct_data is used, but I'm not actually seeing a place where memory is being allocated for the object to be loaded into, just where placement new is constructing an object at a memory address. But what allocated the memory at that address?
It's probably a misunderstanding with how this line works:
::new(ndc)non_default_constructor(some_initial_value);
but I'd like to know where my misunderstanding lies. This is my first question, so I apologize if I've made some sort of mistake on how I've asked my question. Thanks for your time.

That's one excellent example program, with very apt comments. Let's dig in.
// In order to "teach" boost how to save and load your class with a
// non-default-constructor, you must override these functions in the
// boost::serialization namespace. Prototype them here.
You don't have to. Any overload (not override) accessible via ADL suffices, apart from the in-class option.
Skipping to the meat of it:
// So my question is this: I want to verify that *something* is (or isn't)
// allocating memory for a non_default_constructor
// class to be constructed at the address of load_from_archive.
Yes. The documentation states this. But it's a little bit trickier, because it's conditional. The reason is object tracking. Say, we serialize multiple pointers to the same object, they will get serialized once.
On deserialization, the objects will be represented in the archive stream with the object tracking-id. Only the first instance will lead to allocation.
See documentation.
Here's a simplified counter-example:
demonstrating ADL
demonstrating Object Tracking
removing all forward declarations (they're unnecessary due to template POI)
It serializes a vector with 10 copies of the pointer. I used unique_ptr to avoid leaking the instances (both the one manually created in main, as well as the one created by the deserialization).
Live On Coliru
#include <iomanip>
#include <iostream>
#include <fstream>
#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <boost/serialization/vector.hpp>
namespace mylib {
// Here is the actual class definition with no default constructor
class non_default_constructor {
public:
explicit non_default_constructor(std::string initial)
: some_initial_value{ initial }, state{ 0 } {}
std::string get_initial_value() const {
return some_initial_value;
} // For save_construct_data
private:
std::string some_initial_value;
int state;
// Notice that we only serialize state here, not the some_initial_value
// passed into the ctor
friend class boost::serialization::access;
template <class Archive> void serialize(Archive& ar, unsigned) {
std::cout << "serialize called" << std::endl;
ar& state;
}
};
// Define the save and load overides here.
template<class Archive>
inline void save_construct_data(Archive& ar, const non_default_constructor* ndc, unsigned)
{
std::cout << "save_construct_data called." << std::endl;
ar << ndc->get_initial_value();
}
template<class Archive>
inline void load_construct_data(Archive& ar, non_default_constructor* ndc, unsigned)
{
std::cout << "load_construct_data called." << std::endl;
std::string some_initial_value;
ar >> some_initial_value;
// Use placement new to construct a non_default_constructor class at the address of ndc
::new(ndc)non_default_constructor(some_initial_value);
}
}
int main() {
using NDC = mylib::non_default_constructor;
auto owned = std::make_unique<NDC>("initial value");
{
std::ofstream outputStream("vector.dat");
boost::archive::text_oarchive outputArchive(outputStream);
// serialize 10 copues, for fun
std::vector v(10, owned.get());
outputArchive << v;
}
/*
22 serialization::archive 17 0 0 10 0 1 1 0
0 13 initial value 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
*/
std::vector<NDC*> restore;
{
std::ifstream inputStream("vector.dat");
boost::archive::text_iarchive inputArchive(inputStream);
inputArchive >> restore;
}
std::unique_ptr<NDC> take_ownership(restore.front());
for (auto& el : restore) {
assert(el == take_ownership.get());
}
std::cout << "restored: " << restore.size() << " copies with " <<
std::quoted(take_ownership->get_initial_value()) << "\n";
}
Prints
save_construct_data called.
serialize called
load_construct_data called.
serialize called
restored: 10 copies with "initial value"
The vector.dat file contains:
22 serialization::archive 17 0 0 10 0 1 1 0
0 13 initial value 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
The Library Internals
You shouldn't really care, but you can of course read the source code. Predictably, it's way more involved than you'd naively expect, after all: this is C++.
The library deals with types that have overloaded operator new. In that case it calls T::operator new instead of the globale operator new. It always passes sizeof(T) as you correctly surmised.
The code lives in the exception-safe wrapper: detail/iserializer.hpp
struct heap_allocation {
explicit heap_allocation() { m_p = invoke_new(); }
~heap_allocation() {
if (0 != m_p)
invoke_delete(m_p);
}
T* get() const { return m_p; }
T* release() {
T* p = m_p;
m_p = 0;
return p;
}
private:
T* m_p;
};
Yes, this code be simplified a lot with C++11 or later. Also, the NULL-guard in the destructor is redunant for compliant implementations of operator delete.
Now of course, invoke_new and invoke_delete are where it's at. Presenting condensed:
static T* invoke_new() {
typedef typename mpl::eval_if<boost::has_new_operator<T>,
mpl::identity<has_new_operator>,
mpl::identity<doesnt_have_new_operator>>::type typex;
return typex::invoke_new();
}
static void invoke_delete(T* t) {
typedef typename mpl::eval_if<boost::has_new_operator<T>,
mpl::identity<has_new_operator>,
mpl::identity<doesnt_have_new_operator>>::type typex;
typex::invoke_delete(t);
}
struct has_new_operator {
static T* invoke_new() { return static_cast<T*>((T::operator new)(sizeof(T))); }
static void invoke_delete(T* t) { (operator delete)(t); }
};
struct doesnt_have_new_operator {
static T* invoke_new() { return static_cast<T*>(operator new(sizeof(T))); }
static void invoke_delete(T* t) { (operator delete)(t); }
};
There's some conditional compilation and verbose comments, so per-use the source code if you want the full picture.

Related

Making a safe buffer holder in C++

There are situations in which I need to pass a char* buffer back and forth. My idea is to create an object which can hold the object that owns the data, but also expose the data as char* for someone to read. Since this object holds the owner, there are no memory leaks because the owner is destructed with the object when it's no longer necessary.
I came with the implementation below, in which we have a segfault that I explain why it happens. In fact it's something that I know how to fix but it's something that my class kinda lured me into doing. So I consider what I've done to be not good and maybe there's a better way of doing this in C++ that is safer.
Please take a look at my class that holds the buffer owner and also holds the raw pointer to that buffer. I used GenericObjectHolder to be something that holds the owner for me, without my Buffer class being parametrized by this owner.
#include <iostream>
#include <string>
#include <memory>
#include <queue>
//The library:
class GenericObjectHolder
{
public:
GenericObjectHolder()
{
}
virtual ~GenericObjectHolder() {
};
};
template <class T, class Holder = GenericObjectHolder>
class Buffer final
{
public:
//Ownership WILL be passed to this object
static Buffer fromOwned(T rawBuffer, size_t size)
{
return Buffer(std::make_unique<T>(rawBuffer), size);
}
//Creates a buffer from an object that holds the buffer
//ownership and saves the object too so it's only destructed
//when this buffer itself is destructed
static Buffer fromObject(T rawBuffer, size_t size, Holder *holder)
{
return Buffer(rawBuffer, std::make_unique<T>(rawBuffer), size, holder);
}
//Allocates a new buffer with a size
static Buffer allocate(size_t size)
{
return Buffer(std::make_unique<T>(new T[size]), size);
}
~Buffer()
{
if (_holder)
delete _holder;
}
virtual T data()
{
return _rawBuffer;
}
virtual size_t size() const
{
return _size;
}
Buffer(T rawBuffer, std::unique_ptr<T> buffer, size_t size)
{
_rawBuffer = rawBuffer;
_buffer = std::move(buffer);
_size = size;
}
Buffer(T rawBuffer, std::unique_ptr<T> buffer, size_t size, Holder *holder)
{
_rawBuffer = rawBuffer;
_buffer = std::move(buffer);
_size = size;
_holder = holder;
}
Buffer(const Buffer &other)
: _size(other._size),
_holder(other._holder),
_buffer(std::make_unique<T>(*other._buffer))
{
}
private:
Holder *_holder;
T _rawBuffer;
std::unique_ptr<T> _buffer;
size_t _size = 0;
};
//Usage:
template <class T>
class MyHolder : public GenericObjectHolder
{
public:
MyHolder(T t) : t(t)
{
}
~MyHolder()
{
}
private:
T t;
};
int main()
{
std::queue<Buffer<const char*, MyHolder<std::string>>> queue;
std::cout << "begin" << std::endl;
{
//This string is going to be deleted, but `MyHolder` will still hold
//its buffer
std::string s("hello");
auto h = new MyHolder<std::string>(s);
auto b = Buffer<const char*, MyHolder<std::string>>::fromObject(s.c_str(),s.size(), h);
queue.emplace(b);
}
{
auto b = queue.front();
//We try to print the buffer from a deleted string, segfault
printf(b.data());
printf("\n");
}
std::cout << "end" << std::endl;
}
As you see, the s string is copied inside the object holder but gets destructed right after it. So when I try to access the raw buffer that buffer owns I get a segfault.
Of course I could simply copy the buffer from the s string into a new buffer inside my object, but It'd be inefficient.
Maybe there's a better way of doing such thing or maybe there's even something ready in C++ that does what I need.
PS: string is just an example. In pratice I could be dealing with any type of object that owns a char* buffer.
Live example: https://repl.it/repls/IncredibleHomelySdk

Your core problem is that you want your Holder to be moveable. But when the Owner object moves, the buffer object might also move. That will invalidate your pointer. You can avoid that by putting the owner in a fixed heap location via unique_ptr:
#include <string>
#include <memory>
#include <queue>
#include <functional>
template <class B, class Owner>
class Buffer
{
public:
Buffer(std::unique_ptr<Owner>&& owner, B buf, size_t size) :
_owner(std::move(owner)), _buf(std::move(buf)), _size(size)
{}
B data() { return _buf; }
size_t size() { return _size; }
private:
std::unique_ptr<Owner> _owner;
B _buf;
size_t _size;
};
//Allocates a new buffer with a size
template<typename T>
Buffer<T*, T[]> alloc_buffer(size_t size) {
auto buf = std::make_unique<T[]>(size);
return {std::move(buf), buf.get(), size};
}
Here's a repl link: https://repl.it/repls/TemporalFreshApi
If you want to have a type-erased Buffer, you can do that like this:
template <class B>
class Buffer
{
public:
virtual ~Buffer() = default;
B* data() { return _buf; }
size_t size() { return _size; }
protected:
Buffer(B* buf, size_t size) :
_buf(buf), _size(size) {};
B* _buf;
size_t _size;
};
template <class B, class Owner>
class BufferImpl : public Buffer<B>
{
public:
BufferImpl(std::unique_ptr<Owner>&& owner, B* buf, size_t size) :
Buffer<B>(buf, size), _owner(std::move(owner))
{}
private:
std::unique_ptr<Owner> _owner;
};
//Allocates a new buffer with a size
template<typename T>
std::unique_ptr<Buffer<T>> alloc_buffer(size_t size) {
auto buf = std::make_unique<T[]>(size);
return std::make_unique<BufferImpl<T, T[]>>(std::move(buf), buf.get(), size);
}
Again, repl link: https://repl.it/repls/YouthfulBoringSoftware#main.cpp

You wrote:
There are situations in which I need to pass a char* buffer back and
forth.
and
So I consider what I've done to be not good and maybe there's a better
way of doing this in C++ that is safer.
It's not exactly clear what you are aiming at, but when I have this need i will sometimes use std::vector<char> - a std::vector (and std::string) is a just that: a managed buffer. Calling data() on vector will give you a raw pointer to the buffer to pass on to legacy interfaces etc. or for whatever reason you just need a buffer that you manage yourself. Hint: use resize() or constructor to allocate the buffer.
So you see, there's no need to store the internal pointer of std::string in your example. Instead just call data() on a need basis.
It seems like you are concerned about copies and efficiency. If you use objects that support move semantics and you use the emplace family of functions there shouldn't be any copy-ing going on at least in c++17. All/most containers supports moving as well.

The class std::unique_ptr is already a "buffer holder" that "guarantee delete", no string copies, no dangling references and no seg faults:
#include <iostream>
#include <queue>
#include <memory>
int main()
{
std::queue<std::unique_ptr<std::string>> queue;
std::cout << "begin" << std::endl;
{
auto h = std::make_unique<std::string>("Hello");
queue.emplace( std::move(h) ); // move into the queue without copy
}
{
auto b = std::move(queue.front()); // move out from queue without copy
std::cout << *b << std::endl;
} // when b goes out of scope it delete the string
std::cout << "end" << std::endl;
}
https://godbolt.org/z/neP838

Using Metadata/Inheritance to factor out code across multiple classes

I have two classes that will represent two very simple databases, and each has a "Save" function which will write what's in the class to a file. Since the code within the "Save" function is very similar, I was wondering if I could factor it out.
One of my colleagues said this might be possible with inheritance and/or metadata, so I tried looking into it myself with Google. However, I couldn't find anything that was helpful and am still unsure if what I want to do is even possible.
If it's possible to factor out, then I think I'd need to have another class or function know about each class's types and iterate through them somehow (metadata?). It would check the type of every data, and depending on what the type is, it would make sure that it's correctly output to the text file.
(I know data like name, age, etc. should be private, but to keep this simple I just had everything be public)
class A
{
public:
A() : name(""), age(0) {};
void Save(void)
{
std::string filename = "A.txt";
std::string data;
data += name + "\n";
data += std::to_string(age) + "\n";
std::ofstream outfile(filename);
outfile.write(data.c_str(), data.size());
outfile.close();
}
std::string name;
int age;
};
class B
{
public:
B() : ID(0), points(0) {};
void Save(void)
{
std::string filename = "B.txt";
std::string data;
data += std::to_string(ID) + "\n";
data += std::to_string(points) + "\n";
std::ofstream outfile(filename);
outfile.write(data.c_str(), data.size());
outfile.close();
}
int ID;
int points;
};
int main(void)
{
A a;
B b;
a.name = "Bob"; a.age = 20;
b.ID = 4; b.points = 95;
a.Save();
b.Save();
return 0;
}

A possible solution could be to use metaprogramming (not sure what you mean by metadata), i.e. templates to reuse the common parts
template<typename T1, typename T2>
void TSave(const std::string fname, const T1& p1, const T2& p2) {
std::string filename = fname;
std::stringstream data;
data << p1 << "\n";
data << p2 << "\n";
std::ofstream outfile(filename);
outfile.write(data.str().c_str(), data.str().size());
outfile.close();
}
class A {
...
void Save(void) {
TSave("A.txt", name, age);
}
std::string name;
int age;
};
class B {
...
void Save(void) {
TSave("B.txt", ID, points);
}
int ID;
int points;
};
Live Example

What you are looking for is serialization: saving objects to a file (and one day or another, restore the objects).
Of course, you could write your own serialization framework, and Marco's answer is an interesting start in that direction. But alternatively, you could consider existing libraries, such as boost::serialization :
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
class A {
private:
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & name;
ar & age;
}
...
};
class B {
private:
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & ID;
ar & points;
}
...
};
main() {
A a;
B b;
...
{
std::ofstream ofs("myfile");
boost::archive::text_oarchive arch(ofs);
arch << a << b;
}
}
As you see, it's still needed to say what's to be written to the file. However, the code is simplified : you don't have to worry about file management and transformation of data. And it works also with standard containers.
You won't find a C++ trick that automatically determines for a class what's to be saved. Two reasons for that:
C++ allows metaprogramming, but it is not reflexive: there are no standard process to find out at execution time which members compose a class.
In an object, some data can be transient, i.e. it means only something at the time of the execution and depends on the context. For example pointers: you could save the value of a pointer to a file, but it will mean nothing when you reload it later (the pointer is only valid until you free the object). The proper way would be to save the object that is pointed to (but where, when, how?).

copy constructor with template class not getting called

I was trying to write a sample code for implementing shared pointer [just for practice].
In this following example,
why compiler is not complaining about modifying other_T
And why copy constructor SharedPtr(const T& other_T) is not getting called ?
Here is the code snippet.
#include <iostream>
using namespace std;
#define DBG cout<<"[DEBUG]"<<__PRETTY_FUNCTION__<<endl
class RefCount
{
protected:
int m_ref;
RefCount(){ DBG; m_ref = 1 ; }
void reference(){ DBG; ++m_ref; }
void dereference(){ DBG;--m_ref; }
};
template <class T>
class SharedPtr : public RefCount
{
T* m_T;
public:
SharedPtr() { DBG; m_T = new T; }
SharedPtr(const T& other_T){
DBG;
m_T = other_T.m_T;
other_T.dereference();
other_T.m_T = NULL;
}
~SharedPtr() {
DBG;
dereference();
cout<<m_ref<<endl;
if(m_ref <= 0 && m_T != NULL ){
cout<<"Destroying"<<endl;
delete m_T;
m_T = NULL;
}
}
};
class A{};
int main()
{
SharedPtr<A> obj;
cout<<"assigning "<<endl;
SharedPtr<A> obj2 = obj;
cout<<"END"<<endl;
return 0;
}
and the result is segfault.

Your primary problem is that the copy constructor is being called--but you haven't defined a copy constructor, so you're getting the copy constructor that's defined by the compiler by default.
That copy constructor just does a member-wise copy. That means you've allocated one A with new, then pointed two SharedPtr objects at that same A. The first one to get destroyed deletes the A object. Then the second one gets destroyed, attempts to delete the same object again, and havoc ensues.
In the end, it doesn't look to me like much (any?) of this is going to make any real difference though. I'm pretty sure your basic design is broken. To get a working shared pointer, you have one reference count and "raw" pointer to the final object. Then you have N SharedPtr objects referring to that one ref count/pointer structure that in turn refers to the final object.
You're trying to combine the raw pointer/ref count into the individual SharedPtr, and I don't see any way that can actually work.
It also seems to me that the basic concept of what you've called a RefCount is really part of the design of a SharedPtr. As such, I think its definition should be nested inside that of SharedPtr (and probably made private, since the outside world has no reason to know it exists, not to mention being able to access it directly).
With those taken into account, the code might end up something like this:
#include <iostream>
using namespace std;
#define DBG cout<<"[DEBUG]"<<__PRETTY_FUNCTION__<<endl
template <class T>
class SharedPtr {
template <class U>
struct Ref {
mutable int m_ref;
U *data;
Ref(T *data) : m_ref(1), data(data) { DBG; }
void add_ref() const { DBG; ++m_ref; std::cout << "m_ref=" << m_ref << "\n"; }
void sub_ref() const { DBG; --m_ref; std::cout << "m_ref=" << m_ref << "\n"; }
~Ref() { delete data; }
};
Ref<T> *r;
public:
SharedPtr(T *data) { DBG; r = new Ref<T>(data); }
SharedPtr(SharedPtr const &p) : r(p.r) { DBG; r->add_ref(); }
~SharedPtr() {
DBG;
r->sub_ref();
if (0 == r->m_ref) {
delete r;
std::cout << "deleted pointee\n";
}
}
};
class A{};
int main() {
SharedPtr<A> obj(new A);
cout<<"copying "<<endl;
SharedPtr<A> obj2 = obj;
cout<<"END"<<endl;
return 0;
}
Notes: though this fixes at least some of the basic design, it's still quite a ways short of usable. It's missing the dereference operator, so you can't use the pointer to get to the value it points at. It'll break completely in a multi-threaded environment. I haven't thought enough about it to be sure, but my immediate guess is that it's probably not exception safe either.

Derived class serialization without class tracking in Boost (C++)

I have some problems with boost serialization when serializing derived class through base class pointer. I need a system which serializes some objects as they are being received in the system, so I need to serialize over time. This is not really a problem since I can open a boost::archive::binary_oarchive and serialize objects when required. Rapidly I noticed that boost was performing object tracking by memory address, so the first problem was that different objects in time that share the same memory address were saved as the same object. This can be fixed by using the following macro in the required derived class:
BOOST_CLASS_TRACKING(className, boost::serialization::track_never)
This works fine, but again, when the base class is not abstract, the base class is not serialized properly. In the following example, the base class serialization method is only called once with the first object. In the following, boost assumes that this object has been serialized before although the object has different type.
#include <iostream>
#include <fstream>
#include <boost/serialization/export.hpp>
#include <boost/serialization/base_object.hpp>
#include <boost/serialization/list.hpp>
#include <boost/serialization/map.hpp>
#include <boost/serialization/vector.hpp>
#include <boost/serialization/shared_ptr.hpp>
#include <boost/archive/archive_exception.hpp>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
using namespace std;
class AClass{
public:
AClass(){}
virtual ~AClass(){}
private:
double a;
double b;
//virtual void virtualMethod() = 0;
private:
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & a;
ar & b;
cout << "A" << endl;
}
};
//BOOST_SERIALIZATION_ASSUME_ABSTRACT(Aclass)
//BOOST_CLASS_TRACKING(AClass, boost::serialization::track_never)
class BClass : public AClass{
public:
BClass(){}
virtual ~BClass(){}
private:
double c;
double d;
virtual void virtualMethod(){};
private:
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & boost::serialization::base_object<AClass>(*this);
ar & c;
ar & d;
cout << "B" << endl;
}
};
// define export to be able to serialize through base class pointer
BOOST_CLASS_EXPORT(BClass)
BOOST_CLASS_TRACKING(BClass, boost::serialization::track_never)
class CClass : public AClass{
public:
CClass(){}
virtual ~CClass(){}
private:
double c;
double d;
virtual void virtualMethod(){};
private:
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & boost::serialization::base_object<AClass>(*this);
ar & c;
ar & d;
cout << "C" << endl;
}
};
// define export to be able to serialize through base class pointer
BOOST_CLASS_EXPORT(CClass)
BOOST_CLASS_TRACKING(CClass, boost::serialization::track_never)
int main() {
cout << "Serializing...." << endl;
{
ofstream ofs("serialization.dat");
boost::archive::binary_oarchive oa(ofs);
for(int i=0;i<5;i++)
{
AClass* baseClassPointer = new BClass();
// serialize object through base pointer
oa << baseClassPointer;
// free the pointer so next allocation can reuse memory address
delete baseClassPointer;
}
for(int i=0;i<5;i++)
{
AClass* baseClassPointer = new CClass();
// serialize object through base pointer
oa << baseClassPointer;
// free the pointer so next allocation can reuse memory address
delete baseClassPointer;
}
}
getchar();
cout << "Deserializing..." << endl;
{
ifstream ifs("serialization.dat");
boost::archive::binary_iarchive ia(ifs);
try{
while(true){
AClass* a;
ia >> a;
delete a;
}
}catch(boost::archive::archive_exception const& e)
{
}
}
return 0;
}
When executing this piece of code, the result is as follow:
Serializing....
A
B
B
B
B
B
C
C
C
C
C
Deserializing...
A
B
B
B
B
B
C
C
C
C
C
So the base class is only being serialized once, although the derived class has explicitly the track_never flag. There are two different workarounds to fix this behaviour. The first one is to make the base class abstract with a pure virtual method and calling the macro BOOST_SERIALIZATION_ASSUME_ABSTRACT(Aclass), and the second one is to put the track_never flag also in the base class (commented in code).
None of these solutions meets my requirements, since I want to do in the future punctual serializations of the system state, which would require tracking features for a given DClass extending A (not B or C), and also the AClass should not be abstract.
Any hints? Is there any way to call explicitly the base class serialization method avoiding the tracking feature in the base class (that already has been disabled in the derived class)?

After having a little closer look to boost::serialization I'm also convinced there is no straightforward solution for you request.
As you already mentioned the tracking behavior for the serialization is declared on a class by class base with BOOST_CLASS_TRACKING.
This const global information is than interpret in the virtual method tracking from class oserializer.
virtual bool tracking(const unsigned int /* flags */)
Because this is a template class you can explicitly instantiate this method for your classes.
namespace boost {
namespace archive {
namespace detail {
template<>
virtual bool oserializer<class binary_oarchive, class AClass >::tracking(const unsigned int f /* flags */) const {
return do_your_own_tracking_decision();
}
}}}
Now you can try to e.g have something like a global variable and change the tracking behavior from time to time. (E.g depending on which derivate class is written to the archive.)
This seems to wok for “Serializing“ but the “Deserializing“ than throw an exception.
The reason for this is, that the state of “tracking” for each class is only written ones to the archive. Therefore the deserialize does always expect the data for AClass if BClass or CClass is read (at leased if the first write attempt for AClass was with tracking disabled).
One possible solution could be to use the flags parameter in tracking() method.
This parameter represent the flags the archive is created with, default “0”.
binary_oarchive(std::ostream & os, unsigned int flags = 0)
The archive flags are declared in basic_archive.hpp
enum archive_flags {
no_header = 1, // suppress archive header info
no_codecvt = 2, // suppress alteration of codecvt facet
no_xml_tag_checking = 4, // suppress checking of xml tags
no_tracking = 8, // suppress ALL tracking
flags_last = 8
};
no_tracking seems currently not to be supported, but you can now add this behavior to tracking.
template<>
virtual bool oserializer<class binary_oarchive, class AClass >::tracking(const unsigned int f /* flags */) const {
return !(f & no_tracking);
}
Now you can at leased decide for different archives whether AClass should be tracked or not.
boost::archive::binary_oarchive oa_nt(ofs, boost::archive::archive_flags::no_tracking);
And this are the changes in your example.
int main() {
cout << "Serializing...." << endl;
{
ofstream ofs("serialization1.dat");
boost::archive::binary_oarchive oa_nt(ofs, boost::archive::archive_flags::no_tracking);
//boost::archive::binary_oarchive oa(ofs);
for(int i=0;i<5;i++)
{
AClass* baseClassPointer = new BClass();
// serialize object through base pointer
oa_nt << baseClassPointer;
// free the pointer so next allocation can reuse memory address
delete baseClassPointer;
}
ofstream ofs2("serialization2.dat");
boost::archive::binary_oarchive oa(ofs2);
//boost::archive::binary_oarchive oa(ofs);
for(int i=0;i<5;i++)
{
AClass* baseClassPointer = new CClass();
// serialize object through base pointer
oa << baseClassPointer;
// free the pointer so next allocation can reuse memory address
delete baseClassPointer;
}
}
getchar();
cout << "Deserializing..." << endl;
{
ifstream ifs("serialization1.dat");
boost::archive::binary_iarchive ia(ifs);
try{
while(true){
AClass* a;
ia >> a;
delete a;
}
}catch(boost::archive::archive_exception const& e)
{
}
ifstream ifs2("serialization2.dat");
boost::archive::binary_iarchive ia2(ifs2);
try{
while(true){
AClass* a;
ia2 >> a;
delete a;
}
}catch(boost::archive::archive_exception const& e)
{
}
}
return 0;
}
namespace boost {
namespace archive {
namespace detail {
template<>
virtual bool oserializer<class binary_oarchive, class AClass >::tracking(const unsigned int f /* flags */) const {
return !(f & no_tracking);
}
}}}
This still may not be what you are looking for. There are lot more methods which could be adapted with an own implementation. Or your have to derivate your own archive class.

Ultimately the problem seems to be that a boost::serialization archive represents state at a single point in time, and you want your archive to contain state that has changed, i.e. pointers that have been reused. I don't think there is a simple boost::serialization flag that induces the behavior you want.
However, I think there are other workarounds that might be sufficient. You can encapsulate the serialization for a class into its own archive, and then archive the encapsulation. That is, you can implement the serialization for B like this (note that you have to split serialize() into save() and load()):
// #include <boost/serialization/split_member.hpp>
// #include <boost/serialization/string.hpp>
// Replace serialize() member function with this.
template<class Archive>
void save(Archive& ar, const unsigned int version) const {
// Serialize instance to a string (or other container).
// std::stringstream used here for simplicity. You can avoid
// some buffer copying with alternative stream classes that
// directly access an external container or iterator range.
std::ostringstream os;
boost::archive::binary_oarchive oa(os);
oa << boost::serialization::base_object<AClass>(*this);
oa << c;
oa << d;
// Archive string to top level.
const std::string s = os.str();
ar & s;
cout << "B" << endl;
}
template<class Archive>
void load(Archive& ar, const unsigned int version) {
// Unarchive string from top level.
std::string s;
ar & s;
// Deserialize instance from string.
std::istringstream is(s);
boost::archive::binary_iarchive ia(is);
ia >> boost::serialization::base_object<AClass>(*this);
ia >> c;
ia >> d;
cout << "B" << endl;
}
BOOST_SERIALIZATION_SPLIT_MEMBER()
Because each instance of B is serialized into its own archive, A is effectively not tracked because there is only one reference per archive of B. This produces:
Serializing....
A
B
A
B
A
B
A
B
A
B
A
C
C
C
C
C
Deserializing...
A
B
A
B
A
B
A
B
A
B
A
C
C
C
C
C
A potential objection to this technique is the storage overhead of encapsulation. The result of the original test program are 319 bytes while the modified test program produces 664 bytes. However, if gzip is applied to both output files then the sizes are 113 bytes for the original and 116 bytes for the modification. If space is a concern then I would recommend adding compression to the outer serialization, which can be easily done with boost::iostreams.
Another possible workaround is to extend the life of instances to the lifespan of the archive so pointers are not reused. You could do this by associating a container of shared_ptr instances to your archive, or by allocating instances from a memory pool.

How to avoid successive deallocations/allocations in C++?

Consider the following code:
class A
{
B* b; // an A object owns a B object
A() : b(NULL) { } // we don't know what b will be when constructing A
void calledVeryOften(…)
{
if (b)
delete b;
b = new B(param1, param2, param3, param4);
}
};
My goal: I need to maximize performance, which, in this case, means minimizing the amount of memory allocations.
The obvious thing to do here is to change B* b; to B b;. I see two problems with this approach:
I need to initialize b in the constructor. Since I don't know what b will be, this means I need to pass dummy values to B's constructor. Which, IMO, is ugly.
In calledVeryOften(), I'll have to do something like this: b = B(…), which is wrong for two reasons:
The destructor of b won't be called.
A temporary instance of B will be constructed, then copied into b, then the destructor of the temporary instance will be called. The copy and the destructor call could be avoided. Worse, calling the destructor could very well result in undesired behavior.
So what solutions do I have to avoid using new? Please keep in mind that:
I only have control over A. I don't have control over B, and I don't have control over the users of A.
I want to keep the code as clean and readable as possible.

I liked Klaim's answer, so I wrote this up real fast. I don't claim perfect correctness but it looks pretty good to me. (i.e., the only testing it has is the sample main below)
It's a generic lazy-initializer. The space for the object is allocated once, and the object starts at null. You can then create, over-writing previous objects, with no new memory allocations.
It implements all the necessary constructors, destructor, copy/assignment, swap, yadda-yadda. Here you go:
#include <cassert>
#include <new>
template <typename T>
class lazy_object
{
public:
// types
typedef T value_type;
typedef const T const_value_type;
typedef value_type& reference;
typedef const_value_type& const_reference;
typedef value_type* pointer;
typedef const_value_type* const_pointer;
// creation
lazy_object(void) :
mObject(0),
mBuffer(::operator new(sizeof(T)))
{
}
lazy_object(const lazy_object& pRhs) :
mObject(0),
mBuffer(::operator new(sizeof(T)))
{
if (pRhs.exists())
{
mObject = new (buffer()) T(pRhs.get());
}
}
lazy_object& operator=(lazy_object pRhs)
{
pRhs.swap(*this);
return *this;
}
~lazy_object(void)
{
destroy();
::operator delete(mBuffer);
}
// need to make multiple versions of this.
// variadic templates/Boost.PreProccesor
// would help immensely. For now, I give
// two, but it's easy to make more.
void create(void)
{
destroy();
mObject = new (buffer()) T();
}
template <typename A1>
void create(const A1 pA1)
{
destroy();
mObject = new (buffer()) T(pA1);
}
void destroy(void)
{
if (exists())
{
mObject->~T();
mObject = 0;
}
}
void swap(lazy_object& pRhs)
{
std::swap(mObject, pRhs.mObject);
std::swap(mBuffer, pRhs.mBuffer);
}
// access
reference get(void)
{
return *get_ptr();
}
const_reference get(void) const
{
return *get_ptr();
}
pointer get_ptr(void)
{
assert(exists());
return mObject;
}
const_pointer get_ptr(void) const
{
assert(exists());
return mObject;
}
void* buffer(void)
{
return mBuffer;
}
// query
const bool exists(void) const
{
return mObject != 0;
}
private:
// members
pointer mObject;
void* mBuffer;
};
// explicit swaps for generality
template <typename T>
void swap(lazy_object<T>& pLhs, lazy_object<T>& pRhs)
{
pLhs.swap(pRhs);
}
// if the above code is in a namespace, don't put this in it!
// specializations in global namespace std are allowed.
namespace std
{
template <typename T>
void swap(lazy_object<T>& pLhs, lazy_object<T>& pRhs)
{
pLhs.swap(pRhs);
}
}
// test use
#include <iostream>
int main(void)
{
// basic usage
lazy_object<int> i;
i.create();
i.get() = 5;
std::cout << i.get() << std::endl;
// asserts (not created yet)
lazy_object<double> d;
std::cout << d.get() << std::endl;
}
In your case, just create a member in your class: lazy_object<B> and you're done. No manual releases or making copy-constructors, destructors, etc. Everything is taken care of in your nice, small re-usable class. :)
EDIT
Removed the need for vector, should save a bit of space and what-not.
EDIT2
This uses aligned_storage and alignment_of to use the stack instead of heap. I used boost, but this functionality exists in both TR1 and C++0x. We lose the ability to copy, and therefore swap.
#include <boost/type_traits/aligned_storage.hpp>
#include <cassert>
#include <new>
template <typename T>
class lazy_object_stack
{
public:
// types
typedef T value_type;
typedef const T const_value_type;
typedef value_type& reference;
typedef const_value_type& const_reference;
typedef value_type* pointer;
typedef const_value_type* const_pointer;
// creation
lazy_object_stack(void) :
mObject(0)
{
}
~lazy_object_stack(void)
{
destroy();
}
// need to make multiple versions of this.
// variadic templates/Boost.PreProccesor
// would help immensely. For now, I give
// two, but it's easy to make more.
void create(void)
{
destroy();
mObject = new (buffer()) T();
}
template <typename A1>
void create(const A1 pA1)
{
destroy();
mObject = new (buffer()) T(pA1);
}
void destroy(void)
{
if (exists())
{
mObject->~T();
mObject = 0;
}
}
// access
reference get(void)
{
return *get_ptr();
}
const_reference get(void) const
{
return *get_ptr();
}
pointer get_ptr(void)
{
assert(exists());
return mObject;
}
const_pointer get_ptr(void) const
{
assert(exists());
return mObject;
}
void* buffer(void)
{
return mBuffer.address();
}
// query
const bool exists(void) const
{
return mObject != 0;
}
private:
// types
typedef boost::aligned_storage<sizeof(T),
boost::alignment_of<T>::value> storage_type;
// members
pointer mObject;
storage_type mBuffer;
// non-copyable
lazy_object_stack(const lazy_object_stack& pRhs);
lazy_object_stack& operator=(lazy_object_stack pRhs);
};
// test use
#include <iostream>
int main(void)
{
// basic usage
lazy_object_stack<int> i;
i.create();
i.get() = 5;
std::cout << i.get() << std::endl;
// asserts (not created yet)
lazy_object_stack<double> d;
std::cout << d.get() << std::endl;
}
And there we go.

Simply reserve the memory required for b (via a pool or by hand) and reuse it each time you delete/new instead of reallocating each time.
Example :
class A
{
B* b; // an A object owns a B object
bool initialized;
public:
A() : b( malloc( sizeof(B) ) ), initialized(false) { } // We reserve memory for b
~A() { if(initialized) destroy(); free(b); } // release memory only once we don't use it anymore
void calledVeryOften(…)
{
if (initialized)
destroy();
create();
}
private:
void destroy() { b->~B(); initialized = false; } // hand call to the destructor
void create( param1, param2, param3, param4 )
{
b = new (b) B( param1, param2, param3, param4 ); // in place new : only construct, don't allocate but use the memory that the provided pointer point to
initialized = true;
}
};
In some cases a Pool or ObjectPool could be a better implementation of the same idea.
The construction/destruction cost will then only be dependante on the constructor and destructor of the B class.

How about allocating the memory for B once (or for it's biggest possible variant) and using placement new?
A would store char memB[sizeof(BiggestB)]; and a B*. Sure, you'd need to manually call the destructors, but no memory would be allocated/deallocated.
void* p = memB;
B* b = new(p) SomeB();
...
b->~B(); // explicit destructor call when needed.

If B correctly implements its copy assignment operator then b = B(...) should not call any destructor on b. It is the most obvious solution to your problem.
If, however, B cannot be appropriately 'default' initialized you could do something like this. I would only recommend this approach as a last resort as it is very hard to get safe. Untested, and very probably with corner case exception bugs:
// Used to clean up raw memory of construction of B fails
struct PlacementHelper
{
PlacementHelper() : placement(NULL)
{
}
~PlacementHelper()
{
operator delete(placement);
}
void* placement;
};
void calledVeryOften(....)
{
PlacementHelper hp;
if (b == NULL)
{
hp.placement = operator new(sizeof(B));
}
else
{
hp.placement = b;
b->~B();
b = NULL; // We can't let b be non-null but point at an invalid B
}
// If construction throws, hp will clean up the raw memory
b = new (placement) B(param1, param2, param3, param4);
// Stop hp from cleaning up; b points at a valid object
hp.placement = NULL;
}

A quick test of Martin York's assertion that this is a premature optimisation, and that new/delete are optimised well beyond the ability of mere programmers to improve. Obviously the questioner will have to time his own code to see whether avoiding new/delete helps him, but it seems to me that for certain classes and uses it will make a big difference:
#include <iostream>
#include <vector>
int g_construct = 0;
int g_destruct = 0;
struct A {
std::vector<int> vec;
A (int a, int b) : vec((a*b) % 2) { ++g_construct; }
~A() {
++g_destruct;
}
};
int main() {
const int times = 10*1000*1000;
#if DYNAMIC
std::cout << "dynamic\n";
A *x = new A(1,3);
for (int i = 0; i < times; ++i) {
delete x;
x = new A(i,3);
}
#else
std::cout << "automatic\n";
char x[sizeof(A)];
A* yzz = new (x) A(1,3);
for (int i = 0; i < times; ++i) {
yzz->~A();
new (x) A(i,3);
}
#endif
std::cout << g_construct << " constructors and " << g_destruct << " destructors\n";
}
$ g++ allocperf.cpp -oallocperf -O3 -DDYNAMIC=0 -g && time ./allocperf
automatic
10000001 constructors and 10000000 destructors
real 0m7.718s
user 0m7.671s
sys 0m0.030s
$ g++ allocperf.cpp -oallocperf -O3 -DDYNAMIC=1 -g && time ./allocperf
dynamic
10000001 constructors and 10000000 destructors
real 0m15.188s
user 0m15.077s
sys 0m0.047s
This is roughly what I expected: the GMan-style (destruct/placement new) code takes twice as long, and is presumably doing twice as much allocation. If the vector member of A is replaced with an int, then the GMan-style code takes a fraction of a second. That's GCC 3.
$ g++-4 allocperf.cpp -oallocperf -O3 -DDYNAMIC=1 -g && time ./allocperf
dynamic
10000001 constructors and 10000000 destructors
real 0m5.969s
user 0m5.905s
sys 0m0.030s
$ g++-4 allocperf.cpp -oallocperf -O3 -DDYNAMIC=0 -g && time ./allocperf
automatic
10000001 constructors and 10000000 destructors
real 0m2.047s
user 0m1.983s
sys 0m0.000s
This I'm not so sure about, though: now the delete/new takes three times as long as the destruct/placement new version.
[Edit: I think I've figured it out - GCC 4 is faster on the 0-sized vectors, in effect subtracting a constant time from both versions of the code. Changing (a*b)%2 to (a*b)%2+1 restores the 2:1 time ratio, with 3.7s vs 7.5]
Note that I've not taken any special steps to correctly align the stack array, but printing the address shows it's 16-aligned.
Also, -g doesn't affect the timings. I left it in accidentally after I was looking at the objdump to check that -O3 hadn't completely removed the loop. That pointers called yzz because searching for "y" didn't go quite as well as I'd hoped. But I've just re-run without it.

Are you sure that memory allocation is the bottleneck you think it is? Is B's constructor trivially fast?
If memory allocation is the real problem, then placement new or some of the other solutions here might well help.
If the types and ranges of the param[1..4] are reasonable, and the B constructor "heavy", you might also consider using a cached set of B. This presumes you are actually allowed to have more than one at a time, that it does not front a resource for example.

Like the others have already suggested: Try placement new..
Here is a complete example:
#include <new>
#include <stdio.h>
class B
{
public:
int dummy;
B (int arg)
{
dummy = arg;
printf ("C'Tor called\n");
}
~B ()
{
printf ("D'tor called\n");
}
};
void called_often (B * arg)
{
// call D'tor without freeing memory:
arg->~B();
// call C'tor without allocating memory:
arg = new(arg) B(10);
}
int main (int argc, char **args)
{
B test(1);
called_often (&test);
}

I'd go with boost::scoped_ptr here:
class A: boost::noncopyable
{
typedef boost::scoped_ptr<B> b_ptr;
b_ptr pb_;
public:
A() : pb_() {}
void calledVeryOften( /*…*/ )
{
pb_.reset( new B( params )); // old instance deallocated
// safely use *pb_ as reference to instance of B
}
};
No need for hand-crafted destructor, A is non-copyable, as it should be in your original code, not to leak memory on copy/assignment.
I'd suggest to re-think the design though if you need to re-allocate some inner state object very often. Look into Flyweight and State patterns.

Erm, is there some reason you can't do this?
A() : b(new B()) { }
void calledVeryOften(…)
{
b->setValues(param1, param2, param3, param4);
}
(or set them individually, since you don't have access to the B class - those values do have mutator-methods, right?)

Just have a pile of previously used Bs, and re-use them.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How does boost::serialization allocate memory when deserializing through a pointer? - c++

Related

Making a safe buffer holder in C++

Using Metadata/Inheritance to factor out code across multiple classes

copy constructor with template class not getting called

Derived class serialization without class tracking in Boost (C++)

How to avoid successive deallocations/allocations in C++?

Categories

Resources