How to read and write shared_ptr? - c++

When my data use shared_ptr which is shared by a number of entries, any good way to read and write the data to show the sharing? For example
I have a Data structure
struct Data
{
int a;
int b;
};
Data data;
data.a = 2;
data.b = 2;
I can write it out in a file like data.txt as
2 2
and read the file, I can get the data with values a = 2 and b = 2. However if the Data uses share_ptr, it becomes difficult. For example,
struct Data
{
shared_ptr<int> a;
shared_ptr<int> b;
};
Data data;
data can be
data.a.reset(new int(2));
data.b = data.a;
or
data.a.reset(new int(2));
data.b.reset(new int(2));
The 2 cases are different. How to write the data to the data.txt file and then read the file to the data, I can get the same data with the same relations of a and b?

This is a kind of data serialization problem. Here, you want to serialize your Data that has pointer types within it. When you serialize pointer values, the data they are pointing to is written somewhere, and the pointers are converted into offsets to the file with the data.
In your case, you can think of the int values as being written out right after the object, and the "pointer" values are represented by the number of bytes after the object. So, each Data in your file could look like:
|total-bytes-of-data|
|offset-a|
|offset-b|
|value[]|
If a and b point to the same instance, they would have the same offset. If a and b point to different instances, they would have different offsets.
I'll leave as an exercise the problem of detecting and dealing with sharing that happens between different Data instances.

Related

Map a layout onto memory address

In C++ is there a way to "map" my desired layout onto a memory data, without memcopying it?
I.e. there is a void* buffer, and I know its layout:
byte1: uint8_t
byte2-3: uint16_t
byte4: uint8_t
I know I can create a struct, and memcpy the data to the struct, and then I can have the values as fields of struct.
But is there a way achieving this without copying? The data is already there, I just need to get some fields, and I'm looking a way for something can help with the layout.
(I can have some static ints for the memory offsets, but I'm hoping for some more generic).
I.e: I would have more "layouts", and based on type of the raw data I'd map the appropriate layout and access its fields which still points to the original data.
I know I can point structs to data, it is easy:
struct message {
uint8_t type;
};
struct request:message {
uint8_t rid;
uint8_t other;
};
struct response:message {
uint8_t result;
};
vector<uint8_t> data;
data.push_back(1); //type
data.push_back(10);
data.push_back(11);
data.push_back(12);
data.push_back(13);
struct request* ptrRequest;
ptrRequest = (struct request*)&data[1];
cout << (int)ptrRequest->rid; //10
cout << (int)ptrRequest->other; //11
But what I'd like to achieve is to have a map with the layouts, i.e:
map<int, struct message*> messagetypes;
But I have no clue on how can I proceed as emplacing would need a new object, and casting is also challenging if the maps stores the base pointers only.
If your layout structure is POD you can do placement new-expression with no initialization, that serves as an object creation marker. E.g.:
#include <new> // Placement new.
// ...
uint8_t* data = ...; // Read from disk, network, or elsewhere.
static_assert(std::is_pod<request>::value, "struct request must be POD.");
request* ptrRequest = new (static_cast<void*>(data)) request;
That only works with PODs. This is a long-standing issue documented in P0593R6
Implicit creation of objects for low-level object manipulation.
If your target architecture requires data to be aligned, add data pointer alignment check.
As another answer states, memcpy may be eliminated by the compiler, examine the assembly output.
In C++ is there a way to "map" my desired layout onto a memory data, without memcopying it?
No, not in standard C++.
If the layout matches that of the class1, then what you might be able to do is to write the memory data onto the class instance initially, so that it doesn't need for copying afterwards.
If the above is not possible, then what you might do is copy (yes, this is memcopy, but hold that thought) the data onto an automatic instance of the class, then placement-new a copy of the automatic instance onto the source array. A good optimiser can see that these copies back and forth do not change the value, and can optimise them away. Matching layout is also necessary here. Example:
struct data {
std::uint8_t byte;
std::uint8_t another;
std::uint16_t properly_aligned;
};
void* buffer = get_some_buffer();
if (!std::align(alignof(data), sizeof(data), buffer, space))
throw std::invalid_argument("bad alignment");
data local{};
std::memcpy(&local, buffer, sizeof local);
data* dataptr = new(buffer) data{local};
std::uint16_t value_from_offset = dataptr->properly_aligned;
https://godbolt.org/z/uvrXS2 Notice how there is no call to std::memcpy in the generated assembly.
One thing to consider here is that the multi-byte integers must have the same byte order as the CPU uses natively. Therefore the data is not portable across systems (of different byte endienness). More advanced de-serialisation is required for portability.
1 It however seems unlikely that the data could possibly match the layout of the class, because the second element which is uint16_t is not aligned to two a 16 bit boundary from start of the layout.

precision about structure loading in memory

typedef struct sample_s
{
int sampleint;
sample2 b;
} sample;
typedef struct sample2_s
{
int a;
int b;
int c;
int d;
} sample2;
int main()
{
sample t;
}
In this example, when I create the instance t of the sample structure, I will also load sample2 in memory.
The Question is, how is it possible to only load the sampleint in the memory ?
Is there a way to only load a part of a structure in memory ?
If the answer is, like I think it is, the inheritance. How does it work exactly ? Will there be a waste of time during the execution due to hash table ?
I am asking those question because I want to develop a DOD (data oriented design) program and I want to understand better how structures are managed in the memory.
Thank you
If you just want to copy sampleint, you can declare int s = x.sampleint; You can also memcpy() a range of memory defined by the offsetof macro in <stddef.h> to get a range of consecutive member variables.
It seems as if what you want is one of the following:
Declare a samplebase type that, in C++, sample can inherit from.
Declare storage for only the individual members you want to copy.
Have sample hold a pointer to a sample2, and set that to NULL if you aren’t allocating one.
Declare the sample as a temporary in a block of code, copy the parts you want, let the memory be reclaimed when it goes out of scope.

C++ Transferring objects from memory to a file

Sometimes it is needed to store a collection of linked objects in a file for future use. In this regard, there are two most obvious approaches, neither of which seems quite satisfactory. The first approach is to create a mapping of pointers to file offsets, like this:
struct A
{
int data;
std::list<B*> links;
};
struct B
{
char data;
std::list<C*> links;
};
typedef unsigned Offset;
std::map<void*,Offset> ptr2ofs;
The problem with this approach is that it requires additional mapping, which may be hashed for faster access, but overall will introduce time and space overhead per each saved link.
The second approach is to include the offset field directly in the data structures:
struct A
{
int data;
Offset offset;
std::list<B*> links;
};
This makes writing operations much faster, but the offset fields become redundant after saving, and will produce memory overhead after loading. So, in this case two sets of structures will be needed, one for saving data, and another one for loading it:
struct A_write
{
int data;
Offset offset;
std::list<B*> links;
};
struct A_read
{
int data;
std::list<B*> links;
};
Thus, both of the approaches obviously have significant drawbacks and neither can be considered as the reference approach. But is there a way to improve them?
Ever heard of offset pointers? They store the relative address rather than the absolute address, so if you just want to plunk down a block of memory into a file and read it back into memory another day, all the address references will still be valid.
If you want to use a serialization library, I'd look at one of the following. Yeah, you could design it yourself, but someone's already been through this trouble before....
Google Protocol Buffers
Cap'n Proto
Google FlatBuffers

Serialize complex objetcs using mmap

I would like to serialize a complex object into a binary file using mmap in C++. By complex object I mean an object that contains pointers (like a tree data structure).
The idea is to be able to load the object later from the file with mmap in that way :
my_structure obj = (my_structure)mmap(...)
without needing to reload all the data structure, for purposes of performance (because it is a huge data structure !).
All examples I found on the internet are very reductive (like how to put an int in a file ...) and I don't find anything about how to write the memory corresponding to an object that contains pointeurs ? How can we do that ?
Note : I'm on mac osx
There is one interesting way of doing it that i have seen but it's use is somewhat limited:
First you can't serialize pointers or any other non POD type. The way you can get serialize the structure with pointer reference is to have a special type that instead of keeping the pointer value keeps an offset from it's memory location:
example:
struct void_ptr
{
int offset;
void * get ()
{
return ((char*)this) + offset;
}
};
//or for generic type:
template <class T>
struct t_ptr
{
int offset;
T * get ()
{
return (T*)(((char*)this) + offset);
}
};
Second, you need to have a special serializer that will compute the offsets of all the members inside the class/structure
let's take an example , you want to serialize struct A:
struct A
{
t_ptr<int> pointer_to_int;//let's suppose it points to an array of 2 ints
int my_value;
};
the total memory requirement for this structure is 16 bytes or 4 ints (one int for my_value, one for pointer_to_int offset, and 2 for the int array pointer to int points to)
The array that pointer_to_int points to needs to be located in memory right after A structure memory data and the offset of int_ptr should be sizeof(A) because
example:
int m[] = { 8, 1, 2, 3 };
A& a = *(A*)&m[0];
std::cout << a.my_value << std::endl;
std::cout << a.pointer_to_int.get()[0] << std::endl;
std::cout << a.pointer_to_int.get()[1] << std::endl;
It's very important to know and handle the memory alignment when doing such a thing!!!
What you're trying to do is dangerous in C++. It's enough to have a pointer or reference as a member to fail, as you won't be able to restore those when deserializing. You won't be able to restore pointers directly since the addresses of data change between runs.
Most probably you'd want to check out the following pages:
Cap’n Proto
msgpack
Protocol Buffers
You will also probably need to restructure your program, such that for the serialized data you don't use pointers as members, as most libraries dictate a suitable serializable data structure for you: either their own generated classes or combinations of plain data and STL.
Depending on the nature of the data, you might want to split or chunk the data.

What is best way to implement address-albe fields in C++?

I'm trying to implement deserialization where the mapping to field/member is only known at runtime (its complicated). Anyway what I'm trying to do is something like the following:
Class A
{
public:
int a; // index 0
float b; // index 1
char c; // index 2
}
Then I have two arrays, one with the index of the field and the other with something that indicates the type. I then want to iterate over the arrays and write to the fields from a byte stream.
Sorry for the crappy description but I just don't know how to implement it in code. Any ideas would be appreciated thanks!
Yes you can, the there are two things you need to look out for when doing it though.
First of all make sure you start writing from (const char*)&A.a because all compilers append stuff that doesn't really concern you at the start of an object (visualc puts the vtable there for instance) and you won't be writing what you think you are if you start from the address of the object.
Second you might want to do a #pragma pack(1) before declaring any class that needs to be written to disk because the compilers usually align class members to make DMA transfers more efficient and you might end up having problems with this as well.
On the dynamic part of it, if making one class definition for each field combination you want to have is acceptable, then it's ok to do it like this, otherwise you'd be better off including a hash table in your class and serializing/deserializing its' contents by writing key-value pairs to the file
I can't think of a language construct that will be able to give your a field address given an index at runtime. If you could have the "type" array to actually include field sizes you would have been able to do something like:
istream &in = <get it somehow>;
size_t *field_size = <get it somehow>;
size_t num_of_fields = <get it somehow>;
A a;
char *ptr = reinterpret_cast<char *>(&a);
for (int i = 0; i < num_of_fields; i++)
{
in.read(ptr, field_size[i]);
ptr += field_size[i];
}
Note that this will be true if your class is simple and doesn't have any virtual function members
(or inheritcs from such a class). If that is the case, you would do better to include a dummy member
for getting to the byte offset where fields start within the class:
class A
{
int __dummy; /* must be the first data member in the class */
...
<rest of your class definition here>
};
and now change the initialization of ptr as follows:
ptr = reinterpret_cast<char *>(&a) + offsetof(A, __dummy);
Another implicit assumption for this code is that machine byte-order is the same for both the machine running this code and the machine from which the serialized data is received. If not, then you will need to convert the byte ordering of the data read from the stream. This conversion is of course type dependent but you could have another array of conversion functions per field.
There are a lot of issues and decisions needed. At the simplest, you could keep an offset into A per field, you can switch on type and set through a pointer to the field. For example - assuming there's a int16_t encoding field numbers in the input stream, making no effort to use static_cast<> etc. where it's a little nicer to do so, and assuming a 0 field number input terminator...
A a;
char* pa = (char*)&a;
char* p_v = (char*)&input_buffer;
...
while ((field_num = *(int16_t)p_v) && (p_v += sizeof(int16_t)))
switch (type[field_num])
{
case Int32:
*(int32_t*)(p_a + offset[field_num]) = *(int32_t*)(p_v);
p_v += sizeof(int32_t);
break;
...
}
You may want to consider using e.g. ntohl() etc. to handle endianness conversions.
Let the compiler do it:
Write an operator>> function.