In C++ is there a way to "map" my desired layout onto a memory data, without memcopying it?
I.e. there is a void* buffer, and I know its layout:
byte1: uint8_t
byte2-3: uint16_t
byte4: uint8_t
I know I can create a struct, and memcpy the data to the struct, and then I can have the values as fields of struct.
But is there a way achieving this without copying? The data is already there, I just need to get some fields, and I'm looking a way for something can help with the layout.
(I can have some static ints for the memory offsets, but I'm hoping for some more generic).
I.e: I would have more "layouts", and based on type of the raw data I'd map the appropriate layout and access its fields which still points to the original data.
I know I can point structs to data, it is easy:
struct message {
uint8_t type;
};
struct request:message {
uint8_t rid;
uint8_t other;
};
struct response:message {
uint8_t result;
};
vector<uint8_t> data;
data.push_back(1); //type
data.push_back(10);
data.push_back(11);
data.push_back(12);
data.push_back(13);
struct request* ptrRequest;
ptrRequest = (struct request*)&data[1];
cout << (int)ptrRequest->rid; //10
cout << (int)ptrRequest->other; //11
But what I'd like to achieve is to have a map with the layouts, i.e:
map<int, struct message*> messagetypes;
But I have no clue on how can I proceed as emplacing would need a new object, and casting is also challenging if the maps stores the base pointers only.
If your layout structure is POD you can do placement new-expression with no initialization, that serves as an object creation marker. E.g.:
#include <new> // Placement new.
// ...
uint8_t* data = ...; // Read from disk, network, or elsewhere.
static_assert(std::is_pod<request>::value, "struct request must be POD.");
request* ptrRequest = new (static_cast<void*>(data)) request;
That only works with PODs. This is a long-standing issue documented in P0593R6
Implicit creation of objects for low-level object manipulation.
If your target architecture requires data to be aligned, add data pointer alignment check.
As another answer states, memcpy may be eliminated by the compiler, examine the assembly output.
In C++ is there a way to "map" my desired layout onto a memory data, without memcopying it?
No, not in standard C++.
If the layout matches that of the class1, then what you might be able to do is to write the memory data onto the class instance initially, so that it doesn't need for copying afterwards.
If the above is not possible, then what you might do is copy (yes, this is memcopy, but hold that thought) the data onto an automatic instance of the class, then placement-new a copy of the automatic instance onto the source array. A good optimiser can see that these copies back and forth do not change the value, and can optimise them away. Matching layout is also necessary here. Example:
struct data {
std::uint8_t byte;
std::uint8_t another;
std::uint16_t properly_aligned;
};
void* buffer = get_some_buffer();
if (!std::align(alignof(data), sizeof(data), buffer, space))
throw std::invalid_argument("bad alignment");
data local{};
std::memcpy(&local, buffer, sizeof local);
data* dataptr = new(buffer) data{local};
std::uint16_t value_from_offset = dataptr->properly_aligned;
https://godbolt.org/z/uvrXS2 Notice how there is no call to std::memcpy in the generated assembly.
One thing to consider here is that the multi-byte integers must have the same byte order as the CPU uses natively. Therefore the data is not portable across systems (of different byte endienness). More advanced de-serialisation is required for portability.
1 It however seems unlikely that the data could possibly match the layout of the class, because the second element which is uint16_t is not aligned to two a 16 bit boundary from start of the layout.
Related
There are problems, where we need to fill buffers with mixed types. Two examples:
programming OpenGL/DirectX, we need to fill vertex buffers, which can have mixed types (which is basically an array of struct, but the struct maybe described by a run-time data)
creating a memory allocator: putting header/trailer information to the buffer (size, flags, next/prev pointer, sentinels, etc.)
The problem can be described like this:
there is an allocation function, which gives back some memory (new, malloc, OS dependent allocation function, like mmap or VirtualAlloc)
there is a need to put mixed types into an allocated buffer, at various offsets
A solution can be this, for example writing an int to an offset:
void *buffer = <allocate>;
int offset = <some_offset>;
char *ptr = static_cast<char*>(buffer);
*reinterpret_cast<int*>(ptr+offset) = int_value;
However, this is inconvenient, and has UB at least two places:
ptr+offset is UB, as there is no char array at ptr
writing to the result of reinterpret_cast is UB, as there is no int there
To solve the inconvenience problem, this solution is often used:
union Pointer {
void *asVoid;
bool *asBool;
byte *asByte;
char *asChar;
short *asShort;
int *asInt;
Pointer(void *p) : asVoid(p) { }
};
So, with this union, we can do this:
Pointer p = <allocate>;
p.asChar += offset;
*p.asInt++ = int_value; // write an int to offset
*p.asShort++ = short_value; // then a short afterwards
// other writes here
This solution is convenient for filling buffers, but has further UB, as the solution uses non-active union members.
So, my question is: how can one solve this problem in a strictly standard conformant, and most convenient way? I mean, I'd like to have the functionality which the union solution gives me, but in a standard conformant way.
(Note: suppose, that we have no alignment issues here, alignment is taken care of by using proper offsets)
A simple (and conformant) way to handle these things is leveraging std::memcpy to move whatever values you need into the correct offsets in your storage area, e.g.
std::int32_t value;
char *ptr;
int offset;
// ...
std::memcpy(ptr+offset, &value, sizeof(value));
Do not worry about performance, since your compiler will not actually perform std::memcpy calls in many cases (e.g. small values). Of course, check the assembly output (and profile!), but it should be fine in general.
i'm trying to write a handle allocator in C++. this allocator would "handle" (hue hue hue) the allocation of handles for referencing assets (such as textures, uniforms, etc) in a game engine. for instance, inside a function for creating a texture, the handle allocator would be called to create a TextureHandle. when the texture was destroyed, the handle allocator would free the TextureHandle.
i'm reading through the source of BX, a library that includes a handle allocator just for this purpose - it's the base library of the popular library BGFX, a cross-platform abstraction over different rendering APIs.
before i start explaining what's baffling me, let me first outline what this class essentially looks like:
class HandleAllocator {
public:
constructor, destructor
getters: getNumHandles, getMaxHandles
u16 alloc();
void free(u16 handle);
bool isValid(u16 handle) const;
void reset();
private:
u16* getDensePointer() const;
u16* getSparsePointer() const;
u16 _numHandles;
u16 _maxHandles;
}
here's what getDensePointer() looks like:
u8* ptr = (u8*)reinterpret_cast<const u8*>(this);
return (u16*)&ptr[sizeof(HandleAlloc)];
as far as i understand it, this function is returning a pointer to the end of the class in memory, although i don't understand why the this pointer is first cast to a uint8_t* before being dereferenced and used with the array-index operator on the next line.
here's what's weird to me. the constructor calls the reset() function, which looks like this.
_numHandles = 0;
u16* dense = getDensePointer();
for(u16 ii=0, num = _maxHandles; ii < num; ++ii) {
dense[ii] = ii;
}
if getDensePointer returns a pointer to the end of the class in memory, how is it safe to be writing to memory beyond the end of the class in this for loop? how do i know this isn't stomping on something stored adjacent to it?
i'm a total noob, i realize the answer to this is probably obvious and betrays a total lack of knowledge on my part, but go easy on me..
To answer the first question, ask yourself why pointers have a type. In the end, they are just variables that are meant to store memory addresses. Any variable with a range large enough to store all possible memory addresses could do. They what is the difference between, let's say, int* and u8*?
The difference is the way operations are performed on them. Besides dereferencing, which is another story, pointer arithmetic is also involved. Let's take the following declarations: int *p; u8 *u;. Now, p+2, in order to have sense, will return the address at p+8 (the address of the second integer, if you'd like) while u+2 would return the address of u+2 (since u8 has a size of 1).
Now, sizeof gives you the size of the type in bytes. You want to move sizeof(x) bytes, so you need to index the array (or do pointer arithmetic, they are equivalent here) on a byte-sized data type. And that's why you cast it to u8.
Now, for the second question,
how do i know this isn't stomping on something stored adjacent to it?
simply by making sure nothing is there. This is done during the creation of the handler. For example, if you have:
HandleAllocator *h = new HandleAllocator[3]
you can freely call reset on h[0] and have 2 handlers worth of memory to play with. Without more details, it's hard to tell the exact way this excess memory is allocated and what's its purpose.
I have a structure which a structure within structure as
shown in this following question :
How to dynamically fill the structure which is a pointer to pointer of arrays in C++ implementing xfs
I need to fetch the values of the above structure to another structure that I have created.This structure needs to be considered as array of structure.
typedef struct Sp_cashinfo
{
LPSTR lpPhysicalPositionName;
ULONG ulInitialCount;
ULONG ulCount;
}SP_CASHUNITINFO;
This structure is an array of structure since I need to store in a 2D form(i.e 7 times )
int CashUnitInfo(SP_CASHUNITINFO *Sp_cdm_cashinfo)
{
try
{
-----assigned the values----------------
hResult = WFSGetInfo (hService,dwCategory,lpQueryDetails,dwTimeOut,&lppResult); //assigned the values ,got the response 0 ie success
fwCashUnitInfo = (LPWFSCDMCUINFO)lppResult->lpBuffer;
USHORT NumPhysicalCUs;
USHORT count =(USHORT)fwCashUnitInfo->usCount;
Sp_cdm_cashinfo = (SP_CASHUNITINFO*)malloc(7*sizeof(SP_CASHUNITINFO));
for(int i=0;i<(int)count;i++)
{
NumPhysicalCUs =fwCashUnitInfo->lppList[i]->usNumPhysicalCUs;
for(int j=0;j<NumPhysicalCUs;j++)//storing the values of structure
{
Sp_cdm_cashinfo[i].lpPhysicalPositionName =fwCashUnitInfo->lppList[i]->lppPhysical[j]->lpPhysicalPositionName;
Sp_cdm_cashinfo[i].ulInitialCount =fwCashUnitInfo->lppList[i]->lppPhysical[j]->ulInitialCount;
}
}
return (int)hResult;
}
The above code is been written in a class library needs to be displayed in a class library.
But due to memory allocation problem ,I'm stuck to get garbage value to the structure that I have created.
I have successfully filled the Main Structure( (i.e)Structure within structure) and I require just specific members from this structures
You have this struct:
typedef struct Sp_cashinfo
{
LPSTR lpPhysicalPositionName;
ULONG ulInitialCount;
ULONG ulCount;
}SP_CASHUNITINFO;
Assuming that LPSTR is from the windows types then it is a typedef for char * on most modern systems. If that is the case then you need to allocate memory for that array along with the space for the struct. When you create space for this struct you set aside enough memory for storing the pointer and the other 2 data members, however the pointer doesn't yet point to anything that's valid, all you have done is put aside enough space to store the poiner. In the code snippet it looks like the char array here was never actually allocated any memory hence the garbage values.
I would however change this struct to a more idiomatic c++ design like the following:
#include <string>
struct Sp_cashinfo
{
std::string lpPhysicalPositionName;
uint32_t ulInitialCount;
uint32_t ulCount;
Sp_cashinfo(std::string name, uint32_t initialCount, uint32_t count):
lpPhysicalPositionName(name),
ulInitialCount(initialCount),
ulCount(count)
{}
};
As the memory management with this approach is a lot easier to deal with.
You can then store these structs in a std::vector and make a utility function to convert to a raw array if need be.
Keeping all your data stored in containers then converting at the boundaries of your code where you call the existing libraries is a better way of managing the complexity of a situation like this.
I would like to serialize a complex object into a binary file using mmap in C++. By complex object I mean an object that contains pointers (like a tree data structure).
The idea is to be able to load the object later from the file with mmap in that way :
my_structure obj = (my_structure)mmap(...)
without needing to reload all the data structure, for purposes of performance (because it is a huge data structure !).
All examples I found on the internet are very reductive (like how to put an int in a file ...) and I don't find anything about how to write the memory corresponding to an object that contains pointeurs ? How can we do that ?
Note : I'm on mac osx
There is one interesting way of doing it that i have seen but it's use is somewhat limited:
First you can't serialize pointers or any other non POD type. The way you can get serialize the structure with pointer reference is to have a special type that instead of keeping the pointer value keeps an offset from it's memory location:
example:
struct void_ptr
{
int offset;
void * get ()
{
return ((char*)this) + offset;
}
};
//or for generic type:
template <class T>
struct t_ptr
{
int offset;
T * get ()
{
return (T*)(((char*)this) + offset);
}
};
Second, you need to have a special serializer that will compute the offsets of all the members inside the class/structure
let's take an example , you want to serialize struct A:
struct A
{
t_ptr<int> pointer_to_int;//let's suppose it points to an array of 2 ints
int my_value;
};
the total memory requirement for this structure is 16 bytes or 4 ints (one int for my_value, one for pointer_to_int offset, and 2 for the int array pointer to int points to)
The array that pointer_to_int points to needs to be located in memory right after A structure memory data and the offset of int_ptr should be sizeof(A) because
example:
int m[] = { 8, 1, 2, 3 };
A& a = *(A*)&m[0];
std::cout << a.my_value << std::endl;
std::cout << a.pointer_to_int.get()[0] << std::endl;
std::cout << a.pointer_to_int.get()[1] << std::endl;
It's very important to know and handle the memory alignment when doing such a thing!!!
What you're trying to do is dangerous in C++. It's enough to have a pointer or reference as a member to fail, as you won't be able to restore those when deserializing. You won't be able to restore pointers directly since the addresses of data change between runs.
Most probably you'd want to check out the following pages:
Cap’n Proto
msgpack
Protocol Buffers
You will also probably need to restructure your program, such that for the serialized data you don't use pointers as members, as most libraries dictate a suitable serializable data structure for you: either their own generated classes or combinations of plain data and STL.
Depending on the nature of the data, you might want to split or chunk the data.
I have the following declaration in a file that gets generated by a perl script ( during compilation ):
struct _gamedata
{
short res_count;
struct
{
void * resptr;
short id;
short type;
} res_table[3];
}
_gamecoderes =
{
3,
{
{ &char_resource_ID_RES_welcome_object_ID,1002, 1001 },
{ &blah_resource_ID_RES_another_object_ID,1004, 1003 },
{ &char_resource_ID_RES_someting_object_ID,8019, 1001 },
}
};
My problem is that struct _gamedata is generated during compile time and the number of items in res_table will vary. So I can't provide a type declaring the size of res_table in advance.
I need to parse an instance of this structure, originally I was doing this via a pointer to a char ( and not defining struct _gamedata as a type. But I am defining res_table.
e.g.
char * pb = (char *)_gamecoderes;
// i.e. pb points to the instance of `struct _gamedata`.
short res_count = (short *)pb;
pb+=2;
res_table * entry = (res_table *)pb;
for( int i = 0; i < res_count; i++ )
{
do_something_with_entry(*entry);
}
I'm getting wierd results with this. I'm not sure how to declare a type _struct gamedata as I need to be able to handle a variable length for res_table at compile time.
Since the struct is anonymous, there's no way to refer to the type of this struct. (res_table is just the member name, not the type's name). You should provide a name for the struct:
struct GameResult {
short type;
short id;
void* resptr;
};
struct _gamedata {
short res_count;
GameResult res_table[3];
};
Also, you shouldn't cast the data to a char*. The res_count and entry's can be extracted using the -> operator. This way the member offsets can be computed correctly.
_gamedata* data = ...;
short res_count = data->res_count;
GameResult* entry = data->res_table;
or simply:
_gamedata* data;
for (int i = 0; i < data->res_count; ++ i)
do_something_with_entry(data->res_table[i]);
Your problem is alignment. There will be at least two bytes of padding in between res_count and res_table, so you cannot simply add two to pb. The correct way to get a pointer to res_table is:
res_table *table = &data->res_table;
If you insist on casting to char* and back, you must use offsetof:
#include <stddef.h>
...
res_table *table = (res_table *) (pb + offsetof(_gamedata, res_table));
Note: in C++ you may not use offsetof with "non-POD" data types (approximately "types you could not have declared in plain C"). The correct idiom -- without casting to char* and back -- works either way.
Ideally use memcpy(3), at least use type _gamedata, or define a protocol
We can consider two use cases. In what I might call the programmer-API type, serialization is an internal convenience and the record format is determined by the compiler and library. In the more formally defined and bulletproof implementation, a protocol is defined and a special-purpose library is written to portably read and write a stream.
The best practice will differ depending on whether it makes sense to create a versioned protocol and develop stream I/O operations.
API
The best and most completely portable implementation when reading from compiler-oject serialized streams would be to declare or dynamically allocate an exact or max-sized _gamedata and then use memcpy(3) to pull the data out of the serial stream or device memory or whatever it is. This lets the compiler allocate the object that is accessed by compiler code and it lets the developer allocate the object that is accessed by developer (i.e., char *) logic.
But at a minimum, set a pointer to _gamedata and the compiler will do everything for you. Note also that res_table[n] will always be at the "right" address regardless of the size of the res_table[] array. It's not like making it bigger changes the location of the first element.
General serialization best practice
If the _gamedata object itself is in a buffer and potentially misaligned, i,e., if it is anything other than an object allocated for a _gamedata type by the compiler or dynamically by a real allocator, then you still have potential alignment issues and the only correct solution is to memcpy(3) each discrete type out of the buffer.
A typical error is to use the misaligned pointer anyway, because it works (slowly) on x86. But it may not work on mobile devices, or future architectures, or on some architectures when in kernel mode, or with advanced optimizations enabled. It's best to stick with real C99.
It's a protocol
Finally, when serializing binary data in any fashion you are really defining a protocol. So, for maximum robustness, don't let the compiler define your protocol. Since you are in C, you can generally handle each fundamental object discretely with no loss in speed. If both the writer and reader do it, then only the developers have to agree on the protocol, not the developers and the compilers and the build team, and the C99 authors, and Dennis M. Ritchie, and probably some others.
As #Zack points out, there is padding between elements of your structure.
I'm assuming you have a char* because you've serialized the structure (in a cache, on disk, or over the network). Just because you are starting with a char * doesn't mean you have to access the entire struct the hard way. Cast it to a typed pointer, and let the compiler do the work for you:
_gamedata * data = (_gamedata *) my_char_pointer;
for( int i = 0; i < data->res_count; i++ )
{
do_something_with_entry(*data->res_table[i]);
}