Storing dynamic length data 'inside' structure - c++

Problem statement : User provides some data which I have to store inside a structure. This data which I receive come in a data structure which allows user to dynamically add data to it.
Requirement: I need a way to store this data 'inside' the structure, contiguously.
eg. Suppose user can pass me strings which I have to store. So I wrote something like this :
void pushData( string userData )
{
struct
{
string junk;
} data;
data.junk = userData;
}
Problem : When I do this kind of storage, actual data is not really stored 'inside' the structure because string is not POD. Similar problem comes when I receive vector or list.
Then I could do something like this :
void pushData( string userData )
{
struct
{
char junk[100];
} data;
// Copy userdata into array junk
}
This store the data 'inside' the structure, but then, I can't put an upper limit on the size of string user can provide.
Can someone suggest some approach ?
P.S. : I read something about serializability, but couldnt really make out clearly if it could be helpful in my case. If it is the way to go forward, can someone give idea how to proceed with it ?
Edit :
No this is not homework.
I have written an implementation which can pass this kind of structure over message queues. It works fine with PODs, but I need to extend it to pass on dynamic data as well.
This is how message queue takes data:
i. Give it a pointer and tell the size till which it should read and transfer data.
ii. For plain old data types, data is store inside the structure, I can easily pass on the pointer of this structure to message queue to other processes.
iii. But in case of vector/string/list etc, actual data is not inside the structure and thus if I pass on the pointer of this structure, message queue will not really pass on the actual data, but rather the pointers which would be stored inside this structure.
You can see this and this. I am trying to achieve something similar.

void pushData( string userData )
{
struct Data
{
char junk[1];
};
struct Data* data = malloc(userData.size() + 1);
memcpy(data->junk, userData.data(), userData.size());
data->junk[userData.size()] = '\0'; // assuming you want null termination
}
Here we use an array of length 1, but we allocate the struct using malloc so it can actually have any size we want.

You ostensibly have some rather artificial constraints, but to answer the question: for a single struct to contain a variable amount of data is not possible... the closest you can come is to have the final member be say char [1], put such a struct at the start of a variably-sized heap region, and use the fact that array indexing is not checked to access memory beyond that character. To learn about this technique, see http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html (or the answer John Zwinck just posted)
Another approach is e.g. template <size_t N> struct X { char data_[size]; };, but each instantiation will be a separate struct type, and you can't pre-instantiate every size you might want at run-time (given you've said you don't want an upper bound). Even if you could, writing code that handles different instantiations as the data grows would be nightmarish, as would the code bloat caused.
Having a structure in one place with a string member with data in another place is almost always preferable to the hackery above.
Taking a hopefully-not-so-wild guess, I assume your interest is in serialising the object based on starting address and size, in some generic binary block read/write...? If so, that's still problematic even if your goal were satisfied, as you need to find out the current data size from somewhere. Writing struct-specific serialisation routines that incorporates the variable-length data on the heap is much more promising.

Simple solution:estimate max_size of data (ex 1000), to prevent memory leak(if free memory & malloc new size memory -> fragment memory) when pushData multiple called.
#define MAX_SIZE 1000
void pushData( string userData )
{
struct Data
{
char junk[MAX_SIZE];
};
memcpy(data->junk, userData.data(), userData.size());
data->junk[userData.size()] = '\0'; // assuming you want null termination
}

As mentioned by John Zwinck....you can use dynamic memory allocation to solve your problem.
void pushData( string userData )
{
struct Data
{
char *junk;
};
struct Data *d = calloc(sizeof(struct data), 1);
d->junk = malloc(strlen(userData)+1);
strcpy(d->junk, userdata);
}

Related

Binary Serialization of multiple data types

I'm trying to serialize a struct in C++ in Visual Studio with multiple data types into binary file and de-serialize them all at once. But facing problem with memory allocation with strings while reading the data back. I know what the problem is and that there are other open source libraries which I can use but I do not want to use them unless it is really necessary and I also know that I can write/read data types one by one but that method is too long for struct containing large number of data types. I want to perform write/read operations in one go without using any open source library.
Here is a struct for example:
struct Frame {
bool isPass{ true };
uint64_t address{ 0 };
uint32_t age{ 0 };
float marks{ 0.0 };
std::string userName;
};
is there any way to perform write/read operation in one go in binary format?
Thankyou
Not using existing libraries is NEVER good. Still,...
You could, for example, create a create pure virtual class like
class Serializable
{
public:
virtual std::vector<char> serialize() = 0;
}
Then:
You implement it for all your own classes that you have
You implement serialization methods for all STL and PoD types that you use (std::strings, PoD types and structs with only PoD types) inside of some static class. Basically, during serialization you can put there something like [size][type][data ~ [size][type][data][size][type][data]].
Then, when you process a class for serialization, you create a buffer, first put a size into it, then type identifier, then put all bytes from all members serialized by those you have implemented in 1) and 2)
When you read anything from such an array, you do the same backwards: read N bytes from an array (first field), determine its actual type (second field), read all members, deserialize all stuff included.
The process is recursive.
But... man, its really a bad idea. Use protobuf, or boost::serialization. Or anything else - there's a lot of serialization libraries on the internet. Read these precious comments under your question. People are right.
Assuming you have some other mechanism to keep track of your frame size rewrite your struct as:
struct Frame {
bool isPass{ true };
uint8_t pad1[3]{};
uint32_t age{ 0 };
uint64_t address{ 0 };
double marks{ 0.0 };
char userName[];
};
If we have a pointer Frame* frame. We can write this using write(fd, frame, frame_size). (frame_size > sizeof(frame)).
Assuming you have read the frame into a buffer, you can access the data using:
auto frame = reinterpret<const Frame*>(buf) The length of userName will therefore be frame_size - sizeof(Frame). You can now access the elements through your struct.
The is very C like and the approach is limited to only one variable length element at the end of the array.

Passing struct to a function

This works:
struct client {
string address;
int toPay;
int id;
};
int main() {
struct client clients[10];
...
file.read( (char*)&clients, sizeof (clients) );
}
What I want to do, is do those things inside of a function.
But how would I have to pass the struct to the function?
If I pass it likes this, read doesn't work:
void newFunction ( struct client *clients_t) {
...
file.read( (char*)&clients_t, sizeof (clients_t) );
}
This is not going to work, because the string data is not embedded into the struct. Instead, it has a couple of pointers to the string content. That is why file.read( (char*)&clients...) is not going to produce a valid result: the string will point to the place where a saved string once pointed, but it would no longer represent the data of interest.
If you would like to serialize the data like that, embed an entire char array in the struct, with the obvious limitation that there would be a cap on the number of characters and some wasted space.
Remove the & in the file.read() call.
So:
file.read((char *)clients_t, sizeof(*clients_t) * 10 );
You are passing the address of the pointer itself, but what you want is the address of the structure array, and to pass its correct size.
However, while that makes the read technically valid, it won't create string objects for you, so that fragment would only work in the unusual case that you had written out the references to your own objects earlier in the lifetime of that one process.
As a learning experience, reading and writing binary data is a great idea.
IRL, though, usually you don't want to do it at all, except perhaps via a DBMS. It's hard to debug and can be architecture-specific by exposing the byte order. Think YAML, XML, or CSV instead.
Why don't you simply create an object of struct in the function.
It will work that way too.
and then you can access the members of the struct in that function using the object you created.
A struct is just an object, so you treat it just as you would treat other object.
void Foo(client& ClientObj)
{
//anything written in here directly effects the client object you passed
}
notice you need to pass by reference, if not you would only pass a copy of the object.
//in main
client CObj;
Foo(CObj);
&clients_t is a pointer to a pointer.
clients_t is a pointer which is what is required.
void newFunction ( struct client *clients_t) {
...
file.read( (char*)clients_t, sizeof (clients_t) );
}

Need help copying c-strings from embedded SQL fetch to another c-string in a separate struct

I've hit a wall with a program that uses embedded SQL to fetch rows from a database table, stores the row data in a struct, and then has that data processed with results being stored in another struct and pushed to a linked list. The struct that where the fetch data is stored is as follows:
struct rowstruct {
char *first;
char *last;
long amt;
} client;
and my struct that I use to store the processed data (and subsequently push as a node in my linked list) is like this:
struct mystruct {
char *firstN;
char *lastN;
long total;
} data;
My problem is that with each fetch loop that occurs, I need to copy the client.first and client.last values into data.firstN and data.lastN, but I can't get it to work. The following, using the assignment operator, just seems to be copying the pointer, rather than the value:
data.firstN = client.first;
data.lastN = client.last;
If I output data.firstN and data.lastN after the first iteration of my loop, the values appear correct, but after the second fetch iteration, the first node in my list will reflect the values from the second fetch, rather than the first.
strcpy will compile, but fails at runtime due to segmentation fault, which from reading on here is due to the char* being used, though I don't think I can use char[] or string when fetching the data using embedded SQL, so that seems like a dead end.
I'm sure there's a way to do this, and it's probably obvious to most here, but I'm at a loss. Any help would be appreciated.
Thanks!
If the code says to copy the pointer, that is exactly what happens.
Probably what you want is something along the lines of
data.firstN = strdup (client.first);
data.lastN = strdup (client.last);
There are more C++-ish ways of doing the same thing, but this should get you over the hump.
You don't need to re-declare the structure. Instead, you can declare data with
struct rowstruct data;

How do I fit a variable sized char array in a struct?

I don't understand how the reallocation of memory for a struct allows me to insert a larger char array into my struct.
Struct definition:
typedef struct props
{
char northTexture[1];
char southTexture[1];
char eastTexture[1];
char westTexture[1];
char floorTexture[1];
char ceilingTexture[1];
} PROPDATA;
example:
void function SetNorthTexture( PROPDATA* propData, char* northTexture )
{
if( strlen( northTexture ) != strlen( propData->northTexture ) )
{
PROPDATA* propPtr = (PROPDATA*)realloc( propData, sizeof( PROPDATA ) +
sizeof( northTexture ) );
if( propPtr != NULL )
{
strcpy( propData->northTexture, northTexture );
}
}
else
{
strcpy( propData->northTexture, northTexture );
}
}
I have tested something similar to this and it appears to work, I just don't understand how it does work. Now I expect some people are thinking "just use a char*" but I can't for whatever reason. The string has to be stored in the struct itself.
My confusion comes from the fact that I haven't resized my struct for any specific purpose. I haven't somehow indicated that I want the extra space to be allocated to the north texture char array in that example. I imagine the extra bit of memory I allocated is used for actually storing the string, and somehow when I call strcpy, it realises there is not enough space...
Any explanations on how this works (or how this is flawed even) would be great.
Is this C or C++? The code you've posted is C, but if it's actually C++ (as the tag implies) then use std::string. If it's C, then there are two options.
If (as you say) you must store the strings in the structure itself, then you can't resize them. C structures simply don't allow that. That "array of size 1" trick is sometimes used to bolt a single variable-length field onto the end of a structure, but can't be used anywhere else because each field has a fixed offset within the structure. The best you can do is decide on a maximum size, and make each an array of that size.
Otherwise, store each string as a char*, and resize with realloc.
This answer is not to promote the practice described below, but to explain things. There are good reasens not to use malloc and suggestions to use std::string, in other answers, are valid.
I think You have come across the trick used for example by Microsoft to avid the cost of a pointer dereference. In the case of Unsized Arrays in Structures (please check the link) it relies on a non-standard extension to the language. You can use a trick like that, even without the extension, but only for the struct member, that is positioned at it's end in the memory. Usually the last member in the structure declaration is also the last, in the memory, but check this question to know more about it. For the trick to work, You also have to make sure, the compiler won't add padding bytes at the end of the structure.
The general idea is like this: Suppose You have a structure with an array at the end like
struct MyStruct
{
int someIntField;
char someStr[1];
};
When allocating on the heap, You would normally say something like this
MyStruct* msp = (MyStruct*)malloc(sizeof(MyStruct));
However, if You allocate more space, than Your stuct actually occupies, You can reference the bytes, that are laid out in the memory, right behind the struct with "out of bounds" access to the array elements. Assuming some typical sizes for the int and the char, and lack of padding bytes at the end, if You write this:
MyStruct* msp = (MyStruct*)malloc(sizeof(MyStruct) + someMoreBytes);
The memory layout should look like:
| msp | msp+1 | msp+2 | msp+3 | msp+4 | msp+5 | msp+6 | ... |
| <- someIntField -> |someStr[0]| <- someMoreBytes -> |
In that case, You can reference the byte at the address msp+6 like this:
msp->someStr[2];
strcpy is not that intelligent, and it is not really working.
The call to realloc() allocates enough space for the string - so it doesn't actually crash but when you strcpy the string to propData->northTexture you may be overwriting anything following northTexture in propData - propData->southTexture, propData->westTexture etc.
For example is you called SetNorthTexture(prop, "texture");
and printed out the different textures then you would probably find that:
northTexture is "texture"
southTexture is "exture"
eastTexture is "xture" etc (assuming that the arrays are byte aligned).
Assuming you don't want to statically allocate char arrays big enough to hold the largest strings, and if you absolutely must have the strings in the structure then you can store the strings one after the other at the end of the structure. Obviously you will need to dynamically malloc your structure to have enough space to hold all the strings + offsets to their locations.
This is very messy and inefficient as you need to shuffle things around if strings are added, deleted or changed.
My confusion comes from the fact that
I haven't resized my struct for any
specific purpose.
In low level languages like C there is some kind of distinction between structs (or types in general) and actual memory. Allocation basically consists of two steps:
Allocation of raw memory buffer of right size
Telling the compiler that this piece of raw bytes should be treated as a structure
When you do realloc, you do not change the structure, but you change the buffer it is stored in, so you can use extra space beyond structure.
Note that, although your program will not crash, it's not correct. When you put text into northTexture, you will overwrite other structure fields.
NOTE: This has no char array example but it is the same principle. It is just a guess of mine of what are you trying to achieve.
My opinion is that you have seen somewhere something like this:
typedef struct tagBITMAPINFO {
BITMAPINFOHEADER bmiHeader;
RGBQUAD bmiColors[1];
} BITMAPINFO, *PBITMAPINFO;
What you are trying to obtain can happen only when the array is at the end of the struct (and only one array).
For example you allocate sizeof(BITMAPINFO)+15*sizeof(GBQUAD) when you need to store 16 RGBQUAD structures (1 from the structure and 15 extra).
PBITMAPINFO info = (PBITMAPINFO)malloc(sizeof(BITMAPINFO)+15*sizeof(GBQUAD));
You can access all the RGBQUAD structures like they are inside the BITMAPINFO structure:
info->bmiColors[0]
info->bmiColors[1]
...
info->bmiColors[15]
You can do something similar to an array declared as char bufStr[1] at the end of a struct.
Hope it helps.
One approach to keeping a struct and all its strings together in a single allocated memory block is something like this:
struct foo {
ptrdiff_t s1, s2, s3, s4;
size_t bufsize;
char buf[1];
} bar;
Allocate sizeof(struct foo)+total_string_size bytes and store the offsets to each string in the s1, s2, etc. members and bar.buf+bar.s1 is then a pointer to the first string, bar.buf+bar.s2 a pointer to the second string, etc.
You can use pointers rather than offsets if you know you won't need to realloc the struct.
Whether it makes sense to do something like this at all is debatable. One benefit is that it may help fight memory fragmentation or malloc/free overhead when you have a huge number of tiny data objects (especially in threaded environments). It also reduces error handling cleanup complexity if you have a single malloc failure to check for. There may be cache benefits to ensuring data locality. And it's possible (if you use offsets rather than pointers) to store the object on disk without any serialization (keeping in mind that your files are then machine/compiler-specific).

Handling different datatypes in a single structure

I need to send some information on a VxWorks message queue. The information to be sent is decided at runtime and may be of different data types. I am using a structure for this -
struct structData
{
char m_chType; // variable to indicate the data type - long, float or string
long m_lData; // variable to hold long value
float m_fData; // variable to hold float value
string m_strData; // variable to hold string value
};
I am currently sending an array of structData over the message queue.
structData arrStruct[MAX_SIZE];
The problem here is that only one variable in the structure is useful at a time, the other two are useless. The message queue is therefore unneccessarily overloaded.
I can't use unions because the datatype and the value are required.
I tried using templates, but it doesn't solve the problem.I can only send an array of structures of one datatype at a time.
template <typename T>
struct structData
{
char m_chType;
T m_Data;
}
structData<int> arrStruct[MAX_SIZE];
Is there a standard way to hold such information?
I don't see why you cannot use a union. This is the standard way:
struct structData
{
char m_chType; // variable to indicate the data type - long, float or string
union
{
long m_lData; // variable to hold long value
float m_fData; // variable to hold float value
char *m_strData; // variable to hold string value
}
};
Normally then, you switch on the data type, and then access on the field which is valid for that type.
Note that you cannot put a string into a union, because the string type is a non-POD type. I have changed it to use a pointer, which could be a C zero-terminated string. You must then consider the possibility of allocating and deleting the string data as necessary.
You can use boost::variant for this.
There are many ways to handle different datatypes. Besides the union solution you can use a generic struct like :
typedef struct
{
char m_type;
void* m_data;
}
structData;
This way you know the type and you can cast the void* pointer into the right type.
This is like the union solution a more C than C++ way of doing things.
The C++ way would be something using inheritance. You define a base "Data" class an use inheritance to specialize the data. You can use RTTI to check for type if needed.
But as you stated, you need to send your data over a VxWork queue. I'm no specialist but if those queues are OS realtime queue, all the previous solutions are not good ones. Your problem is that your data have variable length (in particular string) and you need to send them through a queue that probably ask for something like a fixed length datastruct and the actual length of this datastruct.
In my experience, the right way to handle this is to serialize the data into something like a buffer class/struct. This way you can optimize the size (you only serialize what you need) and you can send your buffer through your queue.
To serialize you can use something like 1 byte for type then data. To handle variable length data, you can use 1 to n bytes to encode data length, so you can deserialize the data.
For a string :
1 byte to code the type (0x01 = string, ...)
2 bytes to code the string length (if you need less than 65536 bytes)
n data bytes
So the string "Hello" will be serialized as :
0x00 0x00 0x07 0x65 0x48 0x6c 0x6c
You need a buffer class and a serializer/deserializer class. Then you do something like :
serialize data
send serialized data into queue
and on the other side
receive data
deserialize data
I hope it helps and that I have not misunderstood your problem. The serialization part is overkill if the VxWorks queues are not what I think ...
Be very careful with the "string" member in the message queue. Under the hood, it's a pointer to some malloc'd memory that contains the actual string characters, so you're only passing the 'pointer' in your queue, not the real string.
The receiving process may potentially not be able to access the string memory, or -worse - it may have already been destroyed by the time your message reader tries to get it.
+1 for 1800 and Ylisar.
Using an union for this kind of things is probably the way to go. But, as others pointed out, it has several drawbacks:
inherently error prone.
not safely extensible.
can't handle members with constructors (although you can use pointers).
So unless you can built a nice wrapper, going the boost::variant way is probably safer.
This is a bit offtopic, but this issue is one of the reasons why languages of the ML family have such a strong appeal (at least for me). For example, your issue is elegantly solved in OCaml with:
(*
* LData, FData and StrData are constructors for this sum type,
* they can have any number of arguments
*)
type structData = LData of int | FData of float | StrData of string
(*
* the compiler automatically infers the function signature
* and checks the match exhaustiveness.
*)
let print x =
match x with
| LData(i) -> Printf.printf "%d\n" i
| FData(f) -> Printf.printf "%f\n" f
| StrData(s) -> Printf.printf "%s\n" s
Try QVariant in Qt