I'm trying to store a 2D array of variable length c-strings into a struct so I can transmit and rebuild it over a network socket.
The plan is to have rows and cols, which is in the header of the packet, help me read the variable size lens and arr that come after. I believe I must be syntactically writing the pointers incorrectly or there's some kind of aux pointer I need use when setting them into the struct.
struct STORAGE {
int rows; // hdr
int cols; // hdr
int** lens;
const char*** arr;
}
// code
int rows = 11;
int cols = 2;
int lens[rows][cols];
const char* arr[rows][cols];
// ... fill with strings ...
// ... along with lens ...
STORAGE store;
store.rows = rows;
store.cols = cols;
store.lens = lens;
store.arr = arr;
I get these errors when compiling this code:
error: invalid conversion from int to int** [-fpermissive]
error: cannot convert const char* [11][2] to `const char***' in assignment
I come from mostly a Java background, but I do understand how pointers work and such. The syntax of this one is just a little sideways for someone with my background (mostly write java/c++ and less c). Any suggestions?
Note: the reason why I'm not using more complex types like strings, maps, vectors, etc is that I need to transmit the structure over the network (ie pointers to the heap won't work if they have variable sizes). It must be low-level arrays unless someone can offer a better solution.
It must be low-level arrays unless someone can offer a better solution.
A unidimensional std::vector<int> or std::vector<uint8_t> already provides you with a low-level array allocated contiguously using the std::vector::data() member.
Any further dimensions you need might be determined by sectioning that data properly. For network transmission, you would need to provide the necessary sectioning dimensions up front, and sent the data afterwards.
Something like:
Transmit num_of_dimensions
Transmit dim_size_1, dim_size_2, dim_size_3, ...
Transmit data
Receive num_of_dimensions
Loop Receiving dimension sizes
Receive dim_size_1 * dim_size_2 * dim_size_3 * ... of data
What I'd probably have to handle such situation is a class / struct looking like:
template<typename T>
class MultiDimensional {
size_t num_dimensions_; // If known in advance can be made a template parameter also
std::vector<size_t> dimension_sizes_;
std::vector<T> data_;
public:
const T& indexing_accessor(...) const;
T& indexing_accessor(...);
std::vector<uint8_t> render_transmision_data();
// construct from transmission data
MultiDimensional(std::vector<uint8_t>& transmission_data);
};
Using low-level stuff like arrays won't help you much, it's too complicated already. Also, it can get you in a hell of compatibility problems (like thinking about byte-order).
Unless you have very strict performance constraints, use a solution designed specifically for networking instead: protocol buffers. This is a bit of overkill for your case, but it scales well in case you need to add anything.
To use protocol buffers, you first define "messages" (structures) in a .proto file, then compile them to C++ with Proto Compiler.
You define your message like this (this is a complete .proto file):
syntax = "proto2";
package test;
message Storage {
message Row {
repeated string foo = 1;
}
repeated Row row = 1;
}
There is no direct support for 2D arrays, but an array of arrays will do just fine (repeated means that there can be multiple values in given field, it is basically a vector). You could add fields for array size, if you need quick access to them, but checking the size of repeated fields should suffice in most practical cases.
What you get is a class that have all the fields you need, takes care of memory management, and have a bunch of methods to serialize and deserialize.
The C++ code gets a little longer in places, as you need to use getters and setters, but it should be well offset by the fact that you never need to think about serialization - it happens all by itself.
Example use of this thing in C++ could look like this:
#include "test.pb.h" // Generated from test.proto
using ::test::Storage;
int main() {
Storage s;
Storage::Row* row1 = s.add_row();
row1->add_foo("foo 0,0");
row1->add_foo("foo 0,1");
Storage::Row* row2 = s.add_row();
row2->add_foo("foo 1,0");
row2->add_foo("foo 1,1");
assert(s.row_size() == 2);
assert(s.row(0).foo_size() == 2);
s.PrintDebugString(); // prints to stdout
}
In the result, you get this output (note that this is debug output, not real serialization):
row {
foo: "foo 0,0"
foo: "foo 0,1"
}
row {
foo: "foo 1,0"
foo: "foo 1,1"
}
For completeness: in above example the source files were test.proto and test.cpp, compiled using:
protoc --cpp_out=. test.proto
g++ test.cpp test.pb.cc -o test -lprotobuf
Related
I am not sure if this is possible at all in standard C++, so whether it even is possible to do, could be a secondary way to put my question.
I have this binary data which I want to read and re-create using structs. This data is originally created as a stream with the content appended to a buffer, field by field at a time; nothing special about that. I could simply read it as a stream, the same way it was written. Instead, I merely wanted to see if letting the compiler do the math for me, was possible, and instead implementing the binary data as a data structure instead.
The fields of the binary data have a predictable order which allows it to be represented as a data type, the issue I am having is with the depth and variable length of repeating fields. I am hoping the example code below makes it clearer.
Simple Example
struct Common {
int length;
};
struct Boo {
long member0;
char member1;
};
struct FooSimple : Common {
int count;
Boo boo_list[];
};
char buffer[1024];
int index = 15;
((FooSimple *)buffer)->boo_list[index].member0;
Advanced Example
struct Common {
int length;
};
struct Boo {
long member0;
char member1;
};
struct Goo {
int count;
Boo boo_list[];
};
struct FooAdvanced : Common {
int count;
Goo goo_list[];
};
char buffer[1024];
int index0 = 5, index1 = 15;
((FooAdvanced *)buffer)->goo_list[index0].boo_list[index1].member0;
The examples are not supposed to relate. I re-used some code due to lack of creativity for unique names.
For the simple example, there is nothing unusual about it. The Boo struct is of fixed size, therefore the compiler can do the calculations just fine, to reach the member0 field.
For the advanced example, as far as I can tell at least, it isn't as trivial of a case. The problem that I see, is that if I use the array selector operator to select a Goo object from the inline array of Goo-elements (goo_list), the compiler will not be able to do the offset calculations properly unless it makes some assumptions; possibly assuming that all preceding Goo-elements in the array have zero Boo-elements in the inline array (boo_list), or some other constant value. Naturally, that won't be the case.
Question(s):
What ways are there to achieve the offset computations to be done by the compiler, despite the inline arrays having variable lengths? Unless I am missing something, I believe templates can't help at all, due to their compile-time nature.
Is this even possible to achieve in C++?
How do you handle the case with instantiating a FoodAdvanced object, by feeding a variable number of Goo and Boo element counts to the goo_list and boo_list members, respectively?
If it is impossible, would I have to write some sort of wrapper code to handle the calculations instead?
I'm trying to implement deserialization where the mapping to field/member is only known at runtime (its complicated). Anyway what I'm trying to do is something like the following:
Class A
{
public:
int a; // index 0
float b; // index 1
char c; // index 2
}
Then I have two arrays, one with the index of the field and the other with something that indicates the type. I then want to iterate over the arrays and write to the fields from a byte stream.
Sorry for the crappy description but I just don't know how to implement it in code. Any ideas would be appreciated thanks!
Yes you can, the there are two things you need to look out for when doing it though.
First of all make sure you start writing from (const char*)&A.a because all compilers append stuff that doesn't really concern you at the start of an object (visualc puts the vtable there for instance) and you won't be writing what you think you are if you start from the address of the object.
Second you might want to do a #pragma pack(1) before declaring any class that needs to be written to disk because the compilers usually align class members to make DMA transfers more efficient and you might end up having problems with this as well.
On the dynamic part of it, if making one class definition for each field combination you want to have is acceptable, then it's ok to do it like this, otherwise you'd be better off including a hash table in your class and serializing/deserializing its' contents by writing key-value pairs to the file
I can't think of a language construct that will be able to give your a field address given an index at runtime. If you could have the "type" array to actually include field sizes you would have been able to do something like:
istream &in = <get it somehow>;
size_t *field_size = <get it somehow>;
size_t num_of_fields = <get it somehow>;
A a;
char *ptr = reinterpret_cast<char *>(&a);
for (int i = 0; i < num_of_fields; i++)
{
in.read(ptr, field_size[i]);
ptr += field_size[i];
}
Note that this will be true if your class is simple and doesn't have any virtual function members
(or inheritcs from such a class). If that is the case, you would do better to include a dummy member
for getting to the byte offset where fields start within the class:
class A
{
int __dummy; /* must be the first data member in the class */
...
<rest of your class definition here>
};
and now change the initialization of ptr as follows:
ptr = reinterpret_cast<char *>(&a) + offsetof(A, __dummy);
Another implicit assumption for this code is that machine byte-order is the same for both the machine running this code and the machine from which the serialized data is received. If not, then you will need to convert the byte ordering of the data read from the stream. This conversion is of course type dependent but you could have another array of conversion functions per field.
There are a lot of issues and decisions needed. At the simplest, you could keep an offset into A per field, you can switch on type and set through a pointer to the field. For example - assuming there's a int16_t encoding field numbers in the input stream, making no effort to use static_cast<> etc. where it's a little nicer to do so, and assuming a 0 field number input terminator...
A a;
char* pa = (char*)&a;
char* p_v = (char*)&input_buffer;
...
while ((field_num = *(int16_t)p_v) && (p_v += sizeof(int16_t)))
switch (type[field_num])
{
case Int32:
*(int32_t*)(p_a + offset[field_num]) = *(int32_t*)(p_v);
p_v += sizeof(int32_t);
break;
...
}
You may want to consider using e.g. ntohl() etc. to handle endianness conversions.
Let the compiler do it:
Write an operator>> function.
EDIT TO CODE I derped up really bad with the code below when I originally posted. The problem I had in the end was that I was getting the "sizeof" of the size value which I had already passed to the function. Thus I was always writing 4bytes instead the actual size. Basically I did something like this:
size = sizeof(valueIWantToSend);
finalSizeToWrite = sizeof(size); //derpderpderp
I have fixed the example code below.
Tried to do something a little bit more complex to try and clean up my code when creating a packet of network data to be sent.
The idea is to have a function like:
AppendPacket(dataToBeSent, sizeOfData)
The part I'm stuck on is getting the dataToBeSent to copy across correctly via this function.
Previously I would have a few messy lines of "memcopy" which would stuff the data into the buffer in a specified position. This way works, tried and tested. Despite the example below, its messy and a paint to maintain.
memcpy((char*)GetBufferPositionIndex(), (char*)&dataToBeSent, sizeof(sizeof(dataToBeSent)));
StepBufferPositionForward(sizeof(dataToBeSent));
So yeah, thats a more understandable version of what I currently use in various places throughout my game code to package data I need to send to multiplayer clients.
Instead of copying those lines all over the place I want to use my AppendPacket() function. Initially it looks something similar to this:
//How I tend to use the function in my code
double playerVelocity = 22.34;
AppendPacket( &playerVelocity, sizeof(playerVelocity) );
//Definition
AppendPacket( void* dataToBeSent, size_t sizeOfData )
{
memcpy( (char*)GetBufferPositionIndex(), dataToBeSent, sizeof(sizeOfData) );
StepBufferPositionForward( sizeof(sizeOfData) );
}
From what I can tell by debugging, is that I'm actually sending the address of the data and not the actual data it self.
I've tried various combinations of the the pointer symbol and address symbol through the function and "dataToBeSent" part, but I'm not having any luck :(
The above needs to work with all sorts of data formats such as structs, arrays of characters, single bytes, doubles, ints, floats, etc.
Any ideas as to how I'm suppose to be doing it? Been racking my brain all night and morning :(
For the curious:
The GetBufferPositionIndex() essentially returns the position of where to begin writing next in the buffer.
unsigned char packetToBeSentBuffer[PACKET_SIZE];
unsigned char* bufferIndexCurrent;
unsigned char* bufferIndexStartPosition;
unsigned char* GetBufferPositionIndex();
void StepBufferPositionForward(int sizeOfVar);
unsigned char* GetBufferPositionIndex()
{
return bufferIndexCurrent;
}
void PlayerObj::StepBufferPositionForward(int sizeOfVar)
{
bufferIndexCurrent += sizeOfVar;
if (messageToSendIndex > (( bufferIndexStartPosition + PACKET_SIZE) - 200))
{
cout << "WARNING: Packet for this tick has less than 200 bytes left until full\n";
}
}
I am not entirely sure where the exact problem is right now without knowing how the code eventually gets to socket reads/writes, and how your buffer works, but here is a suggestion for now, to further simplify your code.
The code is still error prone as it is up to you to make sure the size of the data you're passing is correct. One wrong copy paste of your new function with the wrong size, and you have bugs again. For these cases I always suggest a simple template:
template <typename T>
AppendPacket( T& data )
{
std::size_t size = sizeof(T);
memcpy( some_destination, &data, size );
StepBufferPositionForward( size );
}
struct whatever a;
AppendPacket(a);
double b;
AppendPacket(b);
int c[5] = {1, 2, 3, 4, 5};
AppendPacket(c);
This way, the compiler figures out the size of the data for you each time.
About the buffer, having a fixed size will be problematic, because currently AppendPacket does not handle the case when the packet size is exceeded. There are two solutions:
Throw an exception when getting the position to write within the buffer
Have a dynamically resizeable buffer
I'll give an example of how to handle the second case:
class Client
{
std::vector<char> nextPacket;
public:
template <typename T>
AppendToPacket( T& data )
{
// figure out the size
std::size_t size = sizeof(T);
// expand vector to accomodate more space
nextPacket.reserve(size);
// append the data to the vector
std::copy_n( static_cast<char const*>(&data),
size,
std::back_inserter(nextPacket) );
}
}
I have the following declaration in a file that gets generated by a perl script ( during compilation ):
struct _gamedata
{
short res_count;
struct
{
void * resptr;
short id;
short type;
} res_table[3];
}
_gamecoderes =
{
3,
{
{ &char_resource_ID_RES_welcome_object_ID,1002, 1001 },
{ &blah_resource_ID_RES_another_object_ID,1004, 1003 },
{ &char_resource_ID_RES_someting_object_ID,8019, 1001 },
}
};
My problem is that struct _gamedata is generated during compile time and the number of items in res_table will vary. So I can't provide a type declaring the size of res_table in advance.
I need to parse an instance of this structure, originally I was doing this via a pointer to a char ( and not defining struct _gamedata as a type. But I am defining res_table.
e.g.
char * pb = (char *)_gamecoderes;
// i.e. pb points to the instance of `struct _gamedata`.
short res_count = (short *)pb;
pb+=2;
res_table * entry = (res_table *)pb;
for( int i = 0; i < res_count; i++ )
{
do_something_with_entry(*entry);
}
I'm getting wierd results with this. I'm not sure how to declare a type _struct gamedata as I need to be able to handle a variable length for res_table at compile time.
Since the struct is anonymous, there's no way to refer to the type of this struct. (res_table is just the member name, not the type's name). You should provide a name for the struct:
struct GameResult {
short type;
short id;
void* resptr;
};
struct _gamedata {
short res_count;
GameResult res_table[3];
};
Also, you shouldn't cast the data to a char*. The res_count and entry's can be extracted using the -> operator. This way the member offsets can be computed correctly.
_gamedata* data = ...;
short res_count = data->res_count;
GameResult* entry = data->res_table;
or simply:
_gamedata* data;
for (int i = 0; i < data->res_count; ++ i)
do_something_with_entry(data->res_table[i]);
Your problem is alignment. There will be at least two bytes of padding in between res_count and res_table, so you cannot simply add two to pb. The correct way to get a pointer to res_table is:
res_table *table = &data->res_table;
If you insist on casting to char* and back, you must use offsetof:
#include <stddef.h>
...
res_table *table = (res_table *) (pb + offsetof(_gamedata, res_table));
Note: in C++ you may not use offsetof with "non-POD" data types (approximately "types you could not have declared in plain C"). The correct idiom -- without casting to char* and back -- works either way.
Ideally use memcpy(3), at least use type _gamedata, or define a protocol
We can consider two use cases. In what I might call the programmer-API type, serialization is an internal convenience and the record format is determined by the compiler and library. In the more formally defined and bulletproof implementation, a protocol is defined and a special-purpose library is written to portably read and write a stream.
The best practice will differ depending on whether it makes sense to create a versioned protocol and develop stream I/O operations.
API
The best and most completely portable implementation when reading from compiler-oject serialized streams would be to declare or dynamically allocate an exact or max-sized _gamedata and then use memcpy(3) to pull the data out of the serial stream or device memory or whatever it is. This lets the compiler allocate the object that is accessed by compiler code and it lets the developer allocate the object that is accessed by developer (i.e., char *) logic.
But at a minimum, set a pointer to _gamedata and the compiler will do everything for you. Note also that res_table[n] will always be at the "right" address regardless of the size of the res_table[] array. It's not like making it bigger changes the location of the first element.
General serialization best practice
If the _gamedata object itself is in a buffer and potentially misaligned, i,e., if it is anything other than an object allocated for a _gamedata type by the compiler or dynamically by a real allocator, then you still have potential alignment issues and the only correct solution is to memcpy(3) each discrete type out of the buffer.
A typical error is to use the misaligned pointer anyway, because it works (slowly) on x86. But it may not work on mobile devices, or future architectures, or on some architectures when in kernel mode, or with advanced optimizations enabled. It's best to stick with real C99.
It's a protocol
Finally, when serializing binary data in any fashion you are really defining a protocol. So, for maximum robustness, don't let the compiler define your protocol. Since you are in C, you can generally handle each fundamental object discretely with no loss in speed. If both the writer and reader do it, then only the developers have to agree on the protocol, not the developers and the compilers and the build team, and the C99 authors, and Dennis M. Ritchie, and probably some others.
As #Zack points out, there is padding between elements of your structure.
I'm assuming you have a char* because you've serialized the structure (in a cache, on disk, or over the network). Just because you are starting with a char * doesn't mean you have to access the entire struct the hard way. Cast it to a typed pointer, and let the compiler do the work for you:
_gamedata * data = (_gamedata *) my_char_pointer;
for( int i = 0; i < data->res_count; i++ )
{
do_something_with_entry(*data->res_table[i]);
}
I need to craft a packet that has a header, a trailer, and a variable length payload field. So far I have been using a vector for the payload so my struct is set up like this:
struct a_struct{
hdr a_hdr;
vector<unsigned int> a_vector;
tr a_tr;
};
When I try to access members of the vector I get a seg fault and a sizeof of an entire structs give me 32 (after I've added about 100 elements to the vector.
Is this a good approach? What is better?
I found this post
Variable Sized Struct C++
He was using a char array, and I'm using a vector though.
Even though the vector type is inlined in the struct, the only member that is in the vector is likely a pointer. Adding members to the vector won't increase the size of the vector type itself but the memory that it points to. That's why you won't ever see the size of the struct increase in memory and hence you get a seg fault.
Usually when people want to make a variable sized struct, they do so by adding an array as the last member of the struct and setting it's length to 1. They then will allocate extra memory for the structure that is actually required by sizeof() in order to "expand" the structure. This is almost always accompanied by an extra member in the struct detailing the size of the expanded array.
The reason for using 1 is thoroughly documented on Raymond's blog
http://blogs.msdn.com/oldnewthing/archive/2004/08/26/220873.aspx
The solution in the other SO answer is c-specific, and relies on the peculiarities of c arrays - and even in c, sizeof() won't help you find the "true" size of a variable size struct. Essentially, it's cheating, and it's a kind of cheating that isn't necessary in C++.
What you are doing is fine. To avoid seg faults, access the vector as you would any other vector in C++:
a_struct a;
for(int i = 0; i < 100; ++i) a.a_vector.push_back(i);
cout << a.a_vector[22] << endl; // Prints 22
i saw this implementation in boost..it looks really neat...to have a variable
length payload....
class msg_hdr_t
{
public:
std::size_t len; // Message length
unsigned int priority;// Message priority
//!Returns the data buffer associated with this this message
void * data(){ return this+1; } //
};
this may be totally un-related to the question, but i wanted to share the info