This works:
struct client {
string address;
int toPay;
int id;
};
int main() {
struct client clients[10];
...
file.read( (char*)&clients, sizeof (clients) );
}
What I want to do, is do those things inside of a function.
But how would I have to pass the struct to the function?
If I pass it likes this, read doesn't work:
void newFunction ( struct client *clients_t) {
...
file.read( (char*)&clients_t, sizeof (clients_t) );
}
This is not going to work, because the string data is not embedded into the struct. Instead, it has a couple of pointers to the string content. That is why file.read( (char*)&clients...) is not going to produce a valid result: the string will point to the place where a saved string once pointed, but it would no longer represent the data of interest.
If you would like to serialize the data like that, embed an entire char array in the struct, with the obvious limitation that there would be a cap on the number of characters and some wasted space.
Remove the & in the file.read() call.
So:
file.read((char *)clients_t, sizeof(*clients_t) * 10 );
You are passing the address of the pointer itself, but what you want is the address of the structure array, and to pass its correct size.
However, while that makes the read technically valid, it won't create string objects for you, so that fragment would only work in the unusual case that you had written out the references to your own objects earlier in the lifetime of that one process.
As a learning experience, reading and writing binary data is a great idea.
IRL, though, usually you don't want to do it at all, except perhaps via a DBMS. It's hard to debug and can be architecture-specific by exposing the byte order. Think YAML, XML, or CSV instead.
Why don't you simply create an object of struct in the function.
It will work that way too.
and then you can access the members of the struct in that function using the object you created.
A struct is just an object, so you treat it just as you would treat other object.
void Foo(client& ClientObj)
{
//anything written in here directly effects the client object you passed
}
notice you need to pass by reference, if not you would only pass a copy of the object.
//in main
client CObj;
Foo(CObj);
&clients_t is a pointer to a pointer.
clients_t is a pointer which is what is required.
void newFunction ( struct client *clients_t) {
...
file.read( (char*)clients_t, sizeof (clients_t) );
}
Related
I have a class that wraps C functions for reading and writing data using file descriptors
I'm currently stuck at read method.
I want to create a read method that wraps the C function ssize_t read(int fd, void *buf, size_t count);
The function above uses void *buf as an output and returns the number of bytes written in the buffer.
I want to have a method read that would return a variable size object that would contain that data or nullptr if no data was read.
What is the best way to do that?
EDIT: I already have a char array[4096] that I use to read data. I just want to return them and also give the caller the ability to know the length of the data that I return.
The char array[4096] is a member of the class that wraps C read. The reason I use it is to store the data temporarily before return them to the caller. Every time I call the wrapper read the char array will ovewriten by design. An upper layer will be responsible for concatenate the data and construct messages. This upper layer is the one that needs to know how much data has arrived.
The size of the char array[4096] is randomly chosen. It could be very small but more calls would be needed.
The object that contains the member char array will always be global.
I use C++17
Should I use std::vector or std::queue ?
The general answer here is: Don't use mutable global state. It breaks reentrancy and threading. And don't compound the issue by trying to return views of mutable global state, which makes even sequential calls a problem.
Just allocate a per-call buffer and use that; if you want to allow the caller to provide a buffer, that's also acceptable. Examples would look like:
// Some class assumed to have an fd member for reading via the C API
class Reader
{
// Define member attributes, e.g. fd
public:
std::string_view read(std::string& buf) {
ssize_t numread = read(fd, buf.data(), buf.size());
// Error checking if applicable, presumably handling negative return values
// by raising exception
return std::string_view(buf.data(), numread); // Guaranteed copy-elision
}
std::string read(size_t max_read) {
std::string buf(max_read, '\0'); // Allocate appropriately sized buffer
auto view = read(buf); // Delegate to view-based API
buf.resize(view.size()); // Resize to match amount actually read
return buf; // Likely (but not guaranteed) NRVO based copy-elision
}
}
std::string and std::string_view could be replaced with std::vector and std::span of some type in C++20 if you preferred (std::span would allow receiving a std::span instead of std::string& in C++20, making the code more generic).
This provides the caller with multiple options:
Call read with an existing pre-sized std::string (maybe change to std::span for C++20) that the caller can reuse over and over
Call read with an explicit size and get a freshly allocated std::string with few if any no copies involved (NRVO will avoid copying the std::string being returned in most cases, though if the underlying read reads very little, the resize call might reallocate the underlying storage and trigger a copy of whatever real data exists)
For maximum efficiency, many callers calling this repeatedly would choose #1 (they'd just create a local std::string of a given size, pass it in by reference, then use the returned std::string_view to limit how much of the buffer they actually work with), but for simple one-off uses, option #2 is convenient.
EDIT: I already have a char array[4096] that I use to read data. I just want to return them and also give the caller the ability to know the length of the data that I return.
Right, so the key information is that you don't want to copy that (or at least you don't want to force an additional copy).
Current preferred return type is std::span, but that's C++20 and you're still on 17.
Second preference is std::string_view. It'll work fine for binary data but may confuse people who expect it to be printable, not contain null terminators and so on.
Otherwise you can obviously return some struct or tuple with pointer & length (and possiblyerrno, which is otherwise discarded).
Returning something that might be nullptr is pretty much the least preferred option. Don't do it. It's actually harder to use correctly than the original C interface.
You could use function overloading:
void read(int fileDescriptor, short int & variable)
{
static_cast<void>(read(fileDescriptor, &variable, sizeof(variable));
}
void read(int fileDescriptor, int & variable)
{
static_cast<void>(read(fileDescriptor, &variable, sizeof(variable));
}
You may want to also look into using templates.
I am writing a parser in C++ to parse a well defined binary file. I have declared all the required structs. And since only particular fields are of interest to me, so in my structs I have skipped non-required fields by creating char array of size equal to skipped bytes. So I am just reading the file in char array and casting the char pointer to my struct pointer. Now problem is that all data fields in that binary are in big endian order, so after typecasting I need to change the endianness of all the struct fields. One way is to do it manually for each and every field. But there are various structs with many fields, so it'll be very cumbersome to do it manually. So what's the best way to achieve this. And since I'll be parsing very huge such files (say in TB's), so I require a fast way to do this.
EDIT : I have use attribute(packed) so no need to worry about padding.
If you can do misaligned accesses with no penalty, and you don't mind compiler- or platform-specific tricks to control padding, this can work. (I assume you are OK with this since you mention __attribute__((packed))).
In this case the nicest approach is to write value wrappers for your raw data types, and use those instead of the raw type when declaring your struct in the first place. Remember the value wrapper must be trivial/POD-like for this to work. If you have a POSIX platform you can use ntohs/ntohl for the endian conversion, it's likely to be better optimized that whatever you write yourself.
If misaligned accesses are illegal or slow on your platform, you need to deserialize instead. Since we don't have reflection yet, you can do this with the same value wrappers (plus an Ignore<N> placeholder that skips N bytes for fields you're not interested), and declare them in a tuple instead of a struct - you can iterate over the members in a tuple and tell each to deserialize itself from the message.
One way to do that is combine C preprocessor with C++ operators. Write a couple of C++ classes like this one:
#include "immintrin.h"
class FlippedInt32
{
int value;
public:
inline operator int() const
{
return _bswap( value );
}
};
class FlippedInt64
{
__int64 value;
public:
inline operator __int64() const
{
return _bswap64( value );
}
};
Then,
#define int FlippedInt32
before including the header that define these structures. #undef immediately after the #include.
This will replace all int fields in the structures with FlippedInt32, which has the same size but returns flipped bytes.
If it’s your own structures which you can modify you don’t need the preprocessor part. Just replace the integers with the byte-flipping classes.
If you can come up with a list of offsets (in-bytes, relative to the top of the file) of the fields that need endian-conversion, as well as the size of those fields, then you could do all of the endian-conversion with a single for-loop, directly on the char array. E.g. something like this (pseudocode):
struct EndianRecord {
size_t offsetFromTop;
size_t fieldSizeInByes;
};
std::vector<EndianRecord> todoList;
// [populate the todo list here...]
char * rawData = [pointer to the raw data]
for (size_t i=0; i<todoList.size(); i++)
{
const EndianRecord & er = todoList[i];
ByteSwap(&rawData[er.offsetFromTop], er.fieldSizeBytes);
}
struct MyPackedStruct * data = (struct MyPackedStruct *) rawData;
// Now you can just read the member variables
// as usual because you know they are already
// in the correct endian-format.
... of course the difficult part is coming up with the correct todoList, but since the file format is well-defined, it should be possible to generate it algorithmically (or better yet, create it as a generator with e.g. a GetNextEndianRecord() method that you can call, so that you don't have to store a very large vector in memory)
I am using an older network transmission function for a legacy product, which takes a char array and transmits it over the network. This char array is just data, no need for it make sense (or be null terminated). As such in the past the following occurred:
struct robot_info {
int robot_number;
int robot_type;
...
} // A robot info data structure for sending info.
char str[1024], *currentStrPos = str;
robot_info r_info;
... // str has some header data added to it.
... // robot info structure is filled out
memcpy(currentStrPos, (char *)&r_info, sizeof robot_info); // Add the robot info
scanSocket.writeTo(str, currentStrPos - str); // Write to the socket.
We have just added a bunch of stuff to robot_info but i am not happy with the single length method of the above code, i would prefer a dynamiclly allocated raii type in order to be expandable, especially since there can be multiple robot_info structures. I propose the following:
std::vector<char> str;
... // str has some header information added to it.
... // r_info is filled out.
str.insert(str.end(), (char *)&r_info, (char *)&r_info + sizeof r_info);
scanSocket.writeTo(str.data(), str.size());
Live example.
Using the std::vector insert function (with a pointer to the start of r_info as the iterator) and relying on the fact that a struct here would be aligned to at least a char and can be operated on like this. The struct has no dynamic memory elements, and no inheritance.
Will this have well defined behavior? Is there a better way to perform the same action?
While this works, it is ultimately solving a compile time problem with a run time solution. Since robot_info is a defined type, a better solution would be this:
std::array<char, sizeof robot_info> str;
memcpy(str.data(), static_cast<char *>(&r_info), sizeof robot_info);
scanSocket.writeTo(str.data(), str.size());
This has the advantages:
Can never be over size, or undersized
Automatic Storage duration and stack allocation means this is potentially faster
Problem statement : User provides some data which I have to store inside a structure. This data which I receive come in a data structure which allows user to dynamically add data to it.
Requirement: I need a way to store this data 'inside' the structure, contiguously.
eg. Suppose user can pass me strings which I have to store. So I wrote something like this :
void pushData( string userData )
{
struct
{
string junk;
} data;
data.junk = userData;
}
Problem : When I do this kind of storage, actual data is not really stored 'inside' the structure because string is not POD. Similar problem comes when I receive vector or list.
Then I could do something like this :
void pushData( string userData )
{
struct
{
char junk[100];
} data;
// Copy userdata into array junk
}
This store the data 'inside' the structure, but then, I can't put an upper limit on the size of string user can provide.
Can someone suggest some approach ?
P.S. : I read something about serializability, but couldnt really make out clearly if it could be helpful in my case. If it is the way to go forward, can someone give idea how to proceed with it ?
Edit :
No this is not homework.
I have written an implementation which can pass this kind of structure over message queues. It works fine with PODs, but I need to extend it to pass on dynamic data as well.
This is how message queue takes data:
i. Give it a pointer and tell the size till which it should read and transfer data.
ii. For plain old data types, data is store inside the structure, I can easily pass on the pointer of this structure to message queue to other processes.
iii. But in case of vector/string/list etc, actual data is not inside the structure and thus if I pass on the pointer of this structure, message queue will not really pass on the actual data, but rather the pointers which would be stored inside this structure.
You can see this and this. I am trying to achieve something similar.
void pushData( string userData )
{
struct Data
{
char junk[1];
};
struct Data* data = malloc(userData.size() + 1);
memcpy(data->junk, userData.data(), userData.size());
data->junk[userData.size()] = '\0'; // assuming you want null termination
}
Here we use an array of length 1, but we allocate the struct using malloc so it can actually have any size we want.
You ostensibly have some rather artificial constraints, but to answer the question: for a single struct to contain a variable amount of data is not possible... the closest you can come is to have the final member be say char [1], put such a struct at the start of a variably-sized heap region, and use the fact that array indexing is not checked to access memory beyond that character. To learn about this technique, see http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html (or the answer John Zwinck just posted)
Another approach is e.g. template <size_t N> struct X { char data_[size]; };, but each instantiation will be a separate struct type, and you can't pre-instantiate every size you might want at run-time (given you've said you don't want an upper bound). Even if you could, writing code that handles different instantiations as the data grows would be nightmarish, as would the code bloat caused.
Having a structure in one place with a string member with data in another place is almost always preferable to the hackery above.
Taking a hopefully-not-so-wild guess, I assume your interest is in serialising the object based on starting address and size, in some generic binary block read/write...? If so, that's still problematic even if your goal were satisfied, as you need to find out the current data size from somewhere. Writing struct-specific serialisation routines that incorporates the variable-length data on the heap is much more promising.
Simple solution:estimate max_size of data (ex 1000), to prevent memory leak(if free memory & malloc new size memory -> fragment memory) when pushData multiple called.
#define MAX_SIZE 1000
void pushData( string userData )
{
struct Data
{
char junk[MAX_SIZE];
};
memcpy(data->junk, userData.data(), userData.size());
data->junk[userData.size()] = '\0'; // assuming you want null termination
}
As mentioned by John Zwinck....you can use dynamic memory allocation to solve your problem.
void pushData( string userData )
{
struct Data
{
char *junk;
};
struct Data *d = calloc(sizeof(struct data), 1);
d->junk = malloc(strlen(userData)+1);
strcpy(d->junk, userdata);
}
I don't understand how the reallocation of memory for a struct allows me to insert a larger char array into my struct.
Struct definition:
typedef struct props
{
char northTexture[1];
char southTexture[1];
char eastTexture[1];
char westTexture[1];
char floorTexture[1];
char ceilingTexture[1];
} PROPDATA;
example:
void function SetNorthTexture( PROPDATA* propData, char* northTexture )
{
if( strlen( northTexture ) != strlen( propData->northTexture ) )
{
PROPDATA* propPtr = (PROPDATA*)realloc( propData, sizeof( PROPDATA ) +
sizeof( northTexture ) );
if( propPtr != NULL )
{
strcpy( propData->northTexture, northTexture );
}
}
else
{
strcpy( propData->northTexture, northTexture );
}
}
I have tested something similar to this and it appears to work, I just don't understand how it does work. Now I expect some people are thinking "just use a char*" but I can't for whatever reason. The string has to be stored in the struct itself.
My confusion comes from the fact that I haven't resized my struct for any specific purpose. I haven't somehow indicated that I want the extra space to be allocated to the north texture char array in that example. I imagine the extra bit of memory I allocated is used for actually storing the string, and somehow when I call strcpy, it realises there is not enough space...
Any explanations on how this works (or how this is flawed even) would be great.
Is this C or C++? The code you've posted is C, but if it's actually C++ (as the tag implies) then use std::string. If it's C, then there are two options.
If (as you say) you must store the strings in the structure itself, then you can't resize them. C structures simply don't allow that. That "array of size 1" trick is sometimes used to bolt a single variable-length field onto the end of a structure, but can't be used anywhere else because each field has a fixed offset within the structure. The best you can do is decide on a maximum size, and make each an array of that size.
Otherwise, store each string as a char*, and resize with realloc.
This answer is not to promote the practice described below, but to explain things. There are good reasens not to use malloc and suggestions to use std::string, in other answers, are valid.
I think You have come across the trick used for example by Microsoft to avid the cost of a pointer dereference. In the case of Unsized Arrays in Structures (please check the link) it relies on a non-standard extension to the language. You can use a trick like that, even without the extension, but only for the struct member, that is positioned at it's end in the memory. Usually the last member in the structure declaration is also the last, in the memory, but check this question to know more about it. For the trick to work, You also have to make sure, the compiler won't add padding bytes at the end of the structure.
The general idea is like this: Suppose You have a structure with an array at the end like
struct MyStruct
{
int someIntField;
char someStr[1];
};
When allocating on the heap, You would normally say something like this
MyStruct* msp = (MyStruct*)malloc(sizeof(MyStruct));
However, if You allocate more space, than Your stuct actually occupies, You can reference the bytes, that are laid out in the memory, right behind the struct with "out of bounds" access to the array elements. Assuming some typical sizes for the int and the char, and lack of padding bytes at the end, if You write this:
MyStruct* msp = (MyStruct*)malloc(sizeof(MyStruct) + someMoreBytes);
The memory layout should look like:
| msp | msp+1 | msp+2 | msp+3 | msp+4 | msp+5 | msp+6 | ... |
| <- someIntField -> |someStr[0]| <- someMoreBytes -> |
In that case, You can reference the byte at the address msp+6 like this:
msp->someStr[2];
strcpy is not that intelligent, and it is not really working.
The call to realloc() allocates enough space for the string - so it doesn't actually crash but when you strcpy the string to propData->northTexture you may be overwriting anything following northTexture in propData - propData->southTexture, propData->westTexture etc.
For example is you called SetNorthTexture(prop, "texture");
and printed out the different textures then you would probably find that:
northTexture is "texture"
southTexture is "exture"
eastTexture is "xture" etc (assuming that the arrays are byte aligned).
Assuming you don't want to statically allocate char arrays big enough to hold the largest strings, and if you absolutely must have the strings in the structure then you can store the strings one after the other at the end of the structure. Obviously you will need to dynamically malloc your structure to have enough space to hold all the strings + offsets to their locations.
This is very messy and inefficient as you need to shuffle things around if strings are added, deleted or changed.
My confusion comes from the fact that
I haven't resized my struct for any
specific purpose.
In low level languages like C there is some kind of distinction between structs (or types in general) and actual memory. Allocation basically consists of two steps:
Allocation of raw memory buffer of right size
Telling the compiler that this piece of raw bytes should be treated as a structure
When you do realloc, you do not change the structure, but you change the buffer it is stored in, so you can use extra space beyond structure.
Note that, although your program will not crash, it's not correct. When you put text into northTexture, you will overwrite other structure fields.
NOTE: This has no char array example but it is the same principle. It is just a guess of mine of what are you trying to achieve.
My opinion is that you have seen somewhere something like this:
typedef struct tagBITMAPINFO {
BITMAPINFOHEADER bmiHeader;
RGBQUAD bmiColors[1];
} BITMAPINFO, *PBITMAPINFO;
What you are trying to obtain can happen only when the array is at the end of the struct (and only one array).
For example you allocate sizeof(BITMAPINFO)+15*sizeof(GBQUAD) when you need to store 16 RGBQUAD structures (1 from the structure and 15 extra).
PBITMAPINFO info = (PBITMAPINFO)malloc(sizeof(BITMAPINFO)+15*sizeof(GBQUAD));
You can access all the RGBQUAD structures like they are inside the BITMAPINFO structure:
info->bmiColors[0]
info->bmiColors[1]
...
info->bmiColors[15]
You can do something similar to an array declared as char bufStr[1] at the end of a struct.
Hope it helps.
One approach to keeping a struct and all its strings together in a single allocated memory block is something like this:
struct foo {
ptrdiff_t s1, s2, s3, s4;
size_t bufsize;
char buf[1];
} bar;
Allocate sizeof(struct foo)+total_string_size bytes and store the offsets to each string in the s1, s2, etc. members and bar.buf+bar.s1 is then a pointer to the first string, bar.buf+bar.s2 a pointer to the second string, etc.
You can use pointers rather than offsets if you know you won't need to realloc the struct.
Whether it makes sense to do something like this at all is debatable. One benefit is that it may help fight memory fragmentation or malloc/free overhead when you have a huge number of tiny data objects (especially in threaded environments). It also reduces error handling cleanup complexity if you have a single malloc failure to check for. There may be cache benefits to ensuring data locality. And it's possible (if you use offsets rather than pointers) to store the object on disk without any serialization (keeping in mind that your files are then machine/compiler-specific).