I would like to serialize a complex object into a binary file using mmap in C++. By complex object I mean an object that contains pointers (like a tree data structure).
The idea is to be able to load the object later from the file with mmap in that way :
my_structure obj = (my_structure)mmap(...)
without needing to reload all the data structure, for purposes of performance (because it is a huge data structure !).
All examples I found on the internet are very reductive (like how to put an int in a file ...) and I don't find anything about how to write the memory corresponding to an object that contains pointeurs ? How can we do that ?
Note : I'm on mac osx
There is one interesting way of doing it that i have seen but it's use is somewhat limited:
First you can't serialize pointers or any other non POD type. The way you can get serialize the structure with pointer reference is to have a special type that instead of keeping the pointer value keeps an offset from it's memory location:
example:
struct void_ptr
{
int offset;
void * get ()
{
return ((char*)this) + offset;
}
};
//or for generic type:
template <class T>
struct t_ptr
{
int offset;
T * get ()
{
return (T*)(((char*)this) + offset);
}
};
Second, you need to have a special serializer that will compute the offsets of all the members inside the class/structure
let's take an example , you want to serialize struct A:
struct A
{
t_ptr<int> pointer_to_int;//let's suppose it points to an array of 2 ints
int my_value;
};
the total memory requirement for this structure is 16 bytes or 4 ints (one int for my_value, one for pointer_to_int offset, and 2 for the int array pointer to int points to)
The array that pointer_to_int points to needs to be located in memory right after A structure memory data and the offset of int_ptr should be sizeof(A) because
example:
int m[] = { 8, 1, 2, 3 };
A& a = *(A*)&m[0];
std::cout << a.my_value << std::endl;
std::cout << a.pointer_to_int.get()[0] << std::endl;
std::cout << a.pointer_to_int.get()[1] << std::endl;
It's very important to know and handle the memory alignment when doing such a thing!!!
What you're trying to do is dangerous in C++. It's enough to have a pointer or reference as a member to fail, as you won't be able to restore those when deserializing. You won't be able to restore pointers directly since the addresses of data change between runs.
Most probably you'd want to check out the following pages:
Cap’n Proto
msgpack
Protocol Buffers
You will also probably need to restructure your program, such that for the serialized data you don't use pointers as members, as most libraries dictate a suitable serializable data structure for you: either their own generated classes or combinations of plain data and STL.
Depending on the nature of the data, you might want to split or chunk the data.
Related
In C++ is there a way to "map" my desired layout onto a memory data, without memcopying it?
I.e. there is a void* buffer, and I know its layout:
byte1: uint8_t
byte2-3: uint16_t
byte4: uint8_t
I know I can create a struct, and memcpy the data to the struct, and then I can have the values as fields of struct.
But is there a way achieving this without copying? The data is already there, I just need to get some fields, and I'm looking a way for something can help with the layout.
(I can have some static ints for the memory offsets, but I'm hoping for some more generic).
I.e: I would have more "layouts", and based on type of the raw data I'd map the appropriate layout and access its fields which still points to the original data.
I know I can point structs to data, it is easy:
struct message {
uint8_t type;
};
struct request:message {
uint8_t rid;
uint8_t other;
};
struct response:message {
uint8_t result;
};
vector<uint8_t> data;
data.push_back(1); //type
data.push_back(10);
data.push_back(11);
data.push_back(12);
data.push_back(13);
struct request* ptrRequest;
ptrRequest = (struct request*)&data[1];
cout << (int)ptrRequest->rid; //10
cout << (int)ptrRequest->other; //11
But what I'd like to achieve is to have a map with the layouts, i.e:
map<int, struct message*> messagetypes;
But I have no clue on how can I proceed as emplacing would need a new object, and casting is also challenging if the maps stores the base pointers only.
If your layout structure is POD you can do placement new-expression with no initialization, that serves as an object creation marker. E.g.:
#include <new> // Placement new.
// ...
uint8_t* data = ...; // Read from disk, network, or elsewhere.
static_assert(std::is_pod<request>::value, "struct request must be POD.");
request* ptrRequest = new (static_cast<void*>(data)) request;
That only works with PODs. This is a long-standing issue documented in P0593R6
Implicit creation of objects for low-level object manipulation.
If your target architecture requires data to be aligned, add data pointer alignment check.
As another answer states, memcpy may be eliminated by the compiler, examine the assembly output.
In C++ is there a way to "map" my desired layout onto a memory data, without memcopying it?
No, not in standard C++.
If the layout matches that of the class1, then what you might be able to do is to write the memory data onto the class instance initially, so that it doesn't need for copying afterwards.
If the above is not possible, then what you might do is copy (yes, this is memcopy, but hold that thought) the data onto an automatic instance of the class, then placement-new a copy of the automatic instance onto the source array. A good optimiser can see that these copies back and forth do not change the value, and can optimise them away. Matching layout is also necessary here. Example:
struct data {
std::uint8_t byte;
std::uint8_t another;
std::uint16_t properly_aligned;
};
void* buffer = get_some_buffer();
if (!std::align(alignof(data), sizeof(data), buffer, space))
throw std::invalid_argument("bad alignment");
data local{};
std::memcpy(&local, buffer, sizeof local);
data* dataptr = new(buffer) data{local};
std::uint16_t value_from_offset = dataptr->properly_aligned;
https://godbolt.org/z/uvrXS2 Notice how there is no call to std::memcpy in the generated assembly.
One thing to consider here is that the multi-byte integers must have the same byte order as the CPU uses natively. Therefore the data is not portable across systems (of different byte endienness). More advanced de-serialisation is required for portability.
1 It however seems unlikely that the data could possibly match the layout of the class, because the second element which is uint16_t is not aligned to two a 16 bit boundary from start of the layout.
I have a structure which a structure within structure as
shown in this following question :
How to dynamically fill the structure which is a pointer to pointer of arrays in C++ implementing xfs
I need to fetch the values of the above structure to another structure that I have created.This structure needs to be considered as array of structure.
typedef struct Sp_cashinfo
{
LPSTR lpPhysicalPositionName;
ULONG ulInitialCount;
ULONG ulCount;
}SP_CASHUNITINFO;
This structure is an array of structure since I need to store in a 2D form(i.e 7 times )
int CashUnitInfo(SP_CASHUNITINFO *Sp_cdm_cashinfo)
{
try
{
-----assigned the values----------------
hResult = WFSGetInfo (hService,dwCategory,lpQueryDetails,dwTimeOut,&lppResult); //assigned the values ,got the response 0 ie success
fwCashUnitInfo = (LPWFSCDMCUINFO)lppResult->lpBuffer;
USHORT NumPhysicalCUs;
USHORT count =(USHORT)fwCashUnitInfo->usCount;
Sp_cdm_cashinfo = (SP_CASHUNITINFO*)malloc(7*sizeof(SP_CASHUNITINFO));
for(int i=0;i<(int)count;i++)
{
NumPhysicalCUs =fwCashUnitInfo->lppList[i]->usNumPhysicalCUs;
for(int j=0;j<NumPhysicalCUs;j++)//storing the values of structure
{
Sp_cdm_cashinfo[i].lpPhysicalPositionName =fwCashUnitInfo->lppList[i]->lppPhysical[j]->lpPhysicalPositionName;
Sp_cdm_cashinfo[i].ulInitialCount =fwCashUnitInfo->lppList[i]->lppPhysical[j]->ulInitialCount;
}
}
return (int)hResult;
}
The above code is been written in a class library needs to be displayed in a class library.
But due to memory allocation problem ,I'm stuck to get garbage value to the structure that I have created.
I have successfully filled the Main Structure( (i.e)Structure within structure) and I require just specific members from this structures
You have this struct:
typedef struct Sp_cashinfo
{
LPSTR lpPhysicalPositionName;
ULONG ulInitialCount;
ULONG ulCount;
}SP_CASHUNITINFO;
Assuming that LPSTR is from the windows types then it is a typedef for char * on most modern systems. If that is the case then you need to allocate memory for that array along with the space for the struct. When you create space for this struct you set aside enough memory for storing the pointer and the other 2 data members, however the pointer doesn't yet point to anything that's valid, all you have done is put aside enough space to store the poiner. In the code snippet it looks like the char array here was never actually allocated any memory hence the garbage values.
I would however change this struct to a more idiomatic c++ design like the following:
#include <string>
struct Sp_cashinfo
{
std::string lpPhysicalPositionName;
uint32_t ulInitialCount;
uint32_t ulCount;
Sp_cashinfo(std::string name, uint32_t initialCount, uint32_t count):
lpPhysicalPositionName(name),
ulInitialCount(initialCount),
ulCount(count)
{}
};
As the memory management with this approach is a lot easier to deal with.
You can then store these structs in a std::vector and make a utility function to convert to a raw array if need be.
Keeping all your data stored in containers then converting at the boundaries of your code where you call the existing libraries is a better way of managing the complexity of a situation like this.
I have assignment which asks one to write a function for any data type.The function is supposed to print the bytes of the structure and identify the total number of bytes the data structure uses along with differentiating between bytes used for members and bytes used for padding.
My immediate reaction, along with most of the classes reaction was to use templates. This allows you to write the function once and gather the run time type of the objects passed into the function. Using memset and typeid's one can easily accomplish what has been asked. However, our prof. just saw our discussion about templates and damned templates to hell.
After seeing this I was thrown for a loop and I'm looking for a little guidance as the best way to get around this. Some things I've looked into:
void pointers with explicit casting (this seems like it'd get messy)
base class with virtual functions only from which all data structures inherit from, seems a bit odd to do.
a base class with 'friendships' to each of our data structures.
rewriting a function for each data structure in our problem set (what I imagine is the worst possible solution).
Was hoping I overlooked a common c++ tool, does anyone have any ideas?
Treat the function as stupid as possible, in fact, treat it as if it doesn't know anything and all information must be passed to it.
Parameters to the function:
Structure address, as a uint8_t *. (Needed to print the bytes)
Structure size, in bytes. (Needed to print the bytes and to print the
total size)
A vector of member information: member length OR the sum of the bytes used by the members.
The vector is needed to fulfill the requirement of printing the bytes used by the members and the bytes used by padding. Optionally you could pass the sum of the members.
Example:
void Analyze_Structure(uint8_t const * p_structure,
size_t size_of_structure,
size_t size_occupied_by_members);
The trick of this assignment is to figure out how to have the calling function determine these items.
Hope this helps.
Edit 1:
struct Apple
{
char a;
int weight;
double protein_per_gram;
};
int main(void)
{
Apple granny_smith;
Analyze_Structure((uint8_t *) &granny_smith,
sizeof(Apple),
sizeof(granny_smith.a)
+ sizeof(granny_smith.weight)
+ sizeof(granny_smith.protein_per_gram);
return 0;
}
I have assignment which asks one to write a function for any data type.
This means either templates (which your prof. dismissed), void*, or variable number of arguments (simiar to printf).
The function is supposed to print the bytes of the structure
void your_function(void* data, std::size_t size)
{
std::uint8_t* bytes = reinterpret_cast<std::uint8_t*>(data);
for(auto x = bytes; x != bytes + size; ++x)
std::clog << "0x" << std::hex << static_cast<std::uint32_t>(*x) << " ";
}
[...] and identify the total number of bytes the data structure uses along with differentiating between bytes used for members and bytes used for padding.
On this one, I'm lost: the bytes used for padding are (by definition) not part of the structure. Consider:
struct x { char c; char d; char e; }; // sizeof(x) == 3;
x instance{ 0, 0, 0 };
your_function(&instance, sizeof(x)); // passes 3, not 4 (4 for 32bits architecture)
Theoretically, you could also pass alignof(instance) to the function, but that won't tell you the alignment of the fields in memory (as far as I know it is not standardized, but I may be wrong).
There are a few possibilities here:
Your prof. learned "hacky" C++ that was considered good code 10 or 20 years ago and didn't update his knowledge (C-style code, pointers, direct memory access and "smart hacks" are all in here).
He didn't know how to express exactly what he wanted or the terminology to use ("write a function for any data type" is too vague: as a developer, if I got this assignment, the first thing to do would be to ask for details - like "how will it be used?" and "what is the expected function signature").
For example, this could be achieved - to a degree - with macros, but if he wants you to use macros in place of functions and templates, you should probably contemplate changing professors.
He meant that you should write some arbitrary data type (like my struct x above) and define your API around that (unlikely).
I am not sure that such a function can be built without a minimum of introspection: you need to know what the struct members are, otherwise you only have access to the size of the struct.
Anyway, here is my proposal for a solution that should work without introspection, provided the user of the code "cooperates".
Your functions will take as arguments void* and size_t for the address and sizeof of the struct.
0) let the user create a struct of the desired type.
1) let the user call a function of yours that sets all bytes to 0.
2) let the user assign a value to every field of the struct.
3) let the user call a function of yours that keeps a record of every byte that is still 0.
4) let the user call a function of yours that sets all bytes to 1.
5) let the user assign a value to every field of the struct again. (Same values as the first time!)
6) let the user call a function of yours and count the bytes that are still 1 AND were marked before. These are padding bytes.
The reason to try with values 0 then 1 is that the values assigned by the user could include bytes 0; but they can't be bytes 0 and bytes 1 at the same time so one of the test will exclude them.
struct _S { int I; char C } S;
Fill0(S, sizeof(S));
// User cooperation
S.I= 0;
S.C= '\0';
Mark0(S, sizeof(S)); // Has some form of static storage
Fill1(S, sizeof(S));
// User cooperation
S.I= 0;
S.C= '\0';
DetectPadding(S, sizeof(S));
You can pack all of this in a single function that takes a callback function argument that does the member assignments.
void Assign(void* pS) // User-written callback
{
struct _S& S= *(struct _S)pS;
S.I= 0;
S.C= '\0';
}
I am learning C++ from a game development standpoint coming from long time development in C# not related to gaming, but am having a fairly difficult time grasping the concept/use of pointers and de-referencing. I have read the two chapters in my current classes textbook literally 3 times and even googled some different pages relating to them, but it doesn't seem to be coming together all that well.
I think I get this part:
#include <iostream>
int main()
{
int myValue = 5;
int* myPointer = nullptr;
std::cout << "My value: " << myValue << std::endl; // Returns value of 5.
std::cout << "My Pointer: " << &myValue << std::endl; // Returns some hex address.
myPointer = &myValue; // This would set it to the address of memory.
*myPointer = 10; // Essentially sets myValue to 10.
std::cout << "My value: " << myValue << std::endl; // Returns value of 10.
std::cout << "My Pointer: " << &myValue << std::endl; // Returns same hex address.
}
I think what I'm not getting is, why? Why not just say myValue = 5, then myValue = 10? What is the purpose in going through the added layer for another variable or pointer? Any helpful input, real life uses or links to some reading that would help make sense of this would be GREATLY appreciated!
The purpose of pointers is something you will not fully realize until you actually need them for the first time. The example you provide is a situation where pointers are not needed, but can be used. It is really just to show how they work. A pointer is a way to remember where memory is without having to copy around everything it points to. Read this tutorial because it may give you a different view than the class book does:
http://www.cplusplus.com/doc/tutorial/pointers/
Example: If you have an array of game entities defined like this:
std::vector<Entity*> entities;
And you have a Camera class that can "track" a particular Entity:
class Camera
{
private:
Entity *mTarget; //Entity to track
public:
void setTarget(Entity *target) { mTarget = target; }
}
In this case, the only way for a Camera to refer to an Entity is by the use of pointers.
entities.push_back(new Entity());
Camera camera;
camera.setTarget(entities.front());
Now whenever the position of the Entity changes in your game world, the Camera will automatically have access to the latest position when it renders to the screen. If you had instead not used a pointer to the Entity and passed a copy, you would have an outdated position to render the Camera.
TL;DR: pointers are useful when multiple places need access to the same information
In your example they aren't doing much, like you said it's just showing how they can be used. One thing pointers are used for is to connect nodes like in a tree. If you have a node structure like so...
struct myNode
{
myNode *next;
int someData;
};
You can create several nodes and link each one to the previous myNode's next member. You can do this without pointers, but the neat thing with pointers is because they are all linked together, when you pass around the myNode list you only need to pass the first (root) node.
The cool thing about pointers is that if two pointers are referencing the same memory address, any changes to the memory address are recognized by everything referencing that memory address. So if you did:
int a = 5; // set a to 5
int *b = &a; // tell b to point to a
int *c = b; // tell c to point to b (which points to a)
*b = 3; // set the value at 'a' to 3
cout << c << endl; // this would print '3' because c points to the same place as b
This has some practical uses. Consider you have a list of nodes linked together. The data in each node defines some sort of task that needs to be done that will be handled by some function. As new tasks are added to the list, they get appended to the end. Since the function has a pointer to the node list, as tasks are added on it receives those as well. On the other hand, the function can also remove tasks as it completes them, which are then reflected back across any other pointers that are looking at the node list.
Pointers are also used for dynamic memory. Say you want the user to enter in a series of numbers, and they tell you how many numbers they want to use. You could define an array of 100 elements to allow for up to 100 numbers, or you could use dynamic memory.
int count = 0;
cout << "How many numbers do you want?\n> ";
cin >> count;
// Create a dynamic array with size 'count'
int *myArray = new int[count];
for(int i = 0; i < count; i++)
{
// Ask for numbers here
}
// Make sure to delete it afterwars
delete[] myArray;
If you pass an int by value, you will not be able to change the callers value. But if you pass a pointer to the int, you can change it. This is how C changed parameters. C++ can pass values by reference so this is less useful.
f(int i)
{
i= 10;
std::cout << "f value: " << i << std::endl;
}
f2(int *pi)
{
*pi = 10;
std::cout << "f2 value: " << pi << std::endl;
}
main()
{
i = 5
f(i)
std::cout << "main f value: " << i << std::endl;
f2(&i)
std::cout << "main f2 value: " << i << std::endl;
}
in main the first print should still be 5. The second one should be 10.
What is the purpose in going through the added layer for another variable or pointer?
There isn't one. It's a deliberately contrived example to show you how the mechanism works.
In reality, objects are often stored, or accessed from distant parts of your codebase, or allocated dynamically, or otherwise cannot be scope-bound. In any of these scenarios you may find yourself in need of indirectly referring to objects, and this is achieved using pointers and/or references (depending on your need).
For example some objects have no name. It can be an allocated memory or an address returned from a function or it can be an iterator.
In your simple example of course there is no need to declare the pointer. However in many cases as for example when you deal with C string functions you need to use pointers. A simple example
char s[] = "It is pointer?";
if ( char *p = std::strchr( s, '?' ) ) *p = '!';
We use pointers mainly when we need to allocate memory dynamically. For example,To implement some data structures like Linked lists,Trees etc.
From C# point of view pointer is quite same as Object reference in C# - it is just an address in memory there actual data is stored, and by dereferencing it you can manipulate with this data.
First of non-pointer data like int in your example is allocated on the stack. This means that then it goes out of the scope it's used memory will be set free. On the other hand data allocated with operator new will be placed in heap (just like then you create any Object in C#) resulting that this data will not be set free that you loose it's pointer. So using data in heap memory makes you do one of the following:
use garbage collector to remove data later (as done in C#)
manually free memory then you don't need it anymore (in C++ way
with operator delete).
Why is it needed?
There are basically three use-cases:
stack memory is fast but limited, so if you need to store big
amount of data you have to use heap
copying big data around is
expensive. Then you pass simple value between functions on the stack
it does copying. Then you pass pointer the only thing copied is just
it's address (just like in C#).
some objects in C++ might be
non-copyable, like threads for example, due to their nature.
Take the example where you have a pointer to a class.
struct A
{
int thing;
double other;
A() {
thing = 4;
other = 7.2;
}
};
Let's say we have a method which takes an 'A':
void otherMethod()
{
int num = 12;
A mine;
doMethod(num, mine);
std::cout << "doobie " << mine.thing;
}
void doMethod(int num, A foo)
{
for(int i = 0; i < num; ++i)
std::cout << "blargh " << foo.other;
foo.thing--;
}
When the doMethod is called, the A object is passed by value. This means a NEW A object is created (as a copy). The foo.thing-- line won't modify the mine object at all as they're two separate objects.
What you need to do is to pass in a pointer to the original object. When you pass in a pointer, then the foo.thing-- will modify the original object instead of creating a copy of the old object into a new one.
Pointers (or references) are vital for the use of dynamic polymorphism in C++. They are how you use a class hierarchy.
Shape * myShape = new Circle();
myShape->Draw(); // this draws a circle
// in fact there is likely no implementation for Shape::Draw
Attempts to use a derived class through a value (instead of pointer or reference) to a base class will often result in slicing and losing the derived data portion of the object.
It makes a lot more sense when you're passing the pointer to a function, see this example:
void setNumber(int *number, int value) {
*number = value;
}
int aNumber = 5;
setNumber(&aNumber, 10);
// aNumber is now 10
What we're doing here is setting the value of *number, this would not be possible without the use of pointers.
If you defined it like this instead:
void setNumber(int number, int value) {
number = value;
}
int aNumber = 5;
setNumber(aNumber, 10);
// aNumber is still 5 since you're only copying its value
It also gives better performance and you're not wasting as much memory when you're passing a reference to a larger object (such as a class) to a function, instead of passing the whole object.
Well to use pointers in programming is a pretty concept. And for dynamically allocating the memory it is essentiall to use pointers to store the adress of the first location of the memory which we have reserved and same is the case for releasing the memory, we need pointers. It's true as some one said in above answer that u cannot understand the use of poinetrs until u need it. One example is that u can make a variable size array using pointers and dynamic memoru allocation. And one thing important is that using pointers we can change the actual value of the location becaues we are accessing the location indirectly. More ever, when we need to pass our value by refernce there are times when references do not work so we need pointers.
And the code u have written is using dereference operator. As i have said that we access the loaction of memory indirectly by using pointers so it changes the actuall value of the location like reference objects that is why it is printing 10.
When my data use shared_ptr which is shared by a number of entries, any good way to read and write the data to show the sharing? For example
I have a Data structure
struct Data
{
int a;
int b;
};
Data data;
data.a = 2;
data.b = 2;
I can write it out in a file like data.txt as
2 2
and read the file, I can get the data with values a = 2 and b = 2. However if the Data uses share_ptr, it becomes difficult. For example,
struct Data
{
shared_ptr<int> a;
shared_ptr<int> b;
};
Data data;
data can be
data.a.reset(new int(2));
data.b = data.a;
or
data.a.reset(new int(2));
data.b.reset(new int(2));
The 2 cases are different. How to write the data to the data.txt file and then read the file to the data, I can get the same data with the same relations of a and b?
This is a kind of data serialization problem. Here, you want to serialize your Data that has pointer types within it. When you serialize pointer values, the data they are pointing to is written somewhere, and the pointers are converted into offsets to the file with the data.
In your case, you can think of the int values as being written out right after the object, and the "pointer" values are represented by the number of bytes after the object. So, each Data in your file could look like:
|total-bytes-of-data|
|offset-a|
|offset-b|
|value[]|
If a and b point to the same instance, they would have the same offset. If a and b point to different instances, they would have different offsets.
I'll leave as an exercise the problem of detecting and dealing with sharing that happens between different Data instances.