prepend and remove from a ( void * ) in C - c++

i think this is a pretty straight forward problem , but i still can not figure it out .
I have function which sends stream over the network . naturally , this takes const void * as argument:
void network_send(const void* data, long data_length)
i am trying to prepend a specific header in the form of char* to this before sending it out over the socket:
long sent_size = strlen(header)+data_length;
data_to_send = malloc(sent_size);
memcpy(data_to_send,header,strlen(header)); /*first copy the header*/
memcpy((char*)data_to_send+strlen(header),data,dat_length); /*now copy the actual data*/
This works fine as long as the data is actually char* . but if it changes to some other data type , then this stops working .
when receiving , i need to remove the header from the data before processing it . so this is how it do it:
void network_data_received(const void* data, long data_length)
{
........
memmove(data_from_network,(char*)data_from_network + strlen(header),data_length); /*move the data to the beginning of the array*/
ProcessFurther(data_from_network ,data_length - strlen(header)) /*data_length - strlen(header) causes the function ProcessFurther to read only certain part of the array*/
}
This again works ok if the data is char type . but crashes if it is of any different type .
Can anyone suggest how to properly implement this ?
Regards,
Khan

Sounds like alignment could be the issue, but you don't specify which platform you're doing this on (different CPU architectures have different alignment requirements).
If the header's length is "wrong" for the alignment of the following data, that could cause access violations.

Something surprise me in this code. Is your header actually a string ? If it is a struct, of something similar you should replace strlen with sizeof. Calling strlen on non zero terminated string is likely to cause crashes.
The second thing that surprise me is that when reading received data, you should copy the header somewhere. If not using it, why bother sending it over the wire ?
EDIT: OK, the header is some http like header string. There should not be any problem from there, and it indeed does not need to be analysed if you're just testing.
And you should move the data to the place you actually need it, moving it to the beginning of the buffer does not look like the right thing to do.
If the problem comes from alignment, it will disappear if you copy the data to some variable of the real target type at byte level before using it.
There is another solution: allocate your buffer with malloc and put the data structure you want at the beginning. Then you should be able to cast it. Addresses returned by malloc are compatible with any type.
Also be aware that if you were working with C++, casting to a non-trivial class is unlikely to work (for one thing vtables are likely to get a wrong addresses, and there is other issues).
Another possible source of problem is the way you get data_length. It should be a number of bytes. Are you sure it is not a number of items ? To be sure we need some hint of the calling code.

memcpy's behaviour is undefined if the source and target overlap (as in this instance) you should be using memmove()
What exactly is happening when what is not char*? These functions will generally cast to void* before actually doing any work...

It's possible that data_length is not calculated correctly in the calling code. Otherwise this code seems to be fine apart from possible alignment issues mentioned by #unwind.
How is header declared? Does it have variable length? Are you missing a terminating NUL character after the header?

I'd also check to make sure that both sender and receiver use the same byte ordering architecture (little endian vs. big endian).

using unsigned char * solved the issue . thankyou all for your comments.

Related

Resizing struct / char array (to reduce memory usage)

This is my first project on Arduino/C++/ESP32. I wrote a fairly big program and got almost everything working - except that in the end I realized that the device would run out of breath (memory) periodically and go for a reboot. The reboot is because I configured a watchdog to do so.
There is one area where I think there's a chance to reduce the memory usage but my experience on c++ is "not there yet" for me to be able to write this by myself. Any pointers (no pun intended) please? I have been on this since yesterday and getting rid of one error only results in another new error popping up. Moreover I don't want to come up with something that is hacky or might break later. It should be a quick answer for the experienced people here.
Let me explain the code that I prefer to refactor/optimize.
I need to store a bunch of records that I would need to read/manipulate later. I declared a struct (because they are related fields) globally. Now the issue is that I may need to store 1 record, 2 records or 5 records which I would only know later once I read the data from the EEPROM. And this has to be accessible to all the functions so it has to be a global declaration.
To summarize
Question 1 - how to set "NumOfrecs" later in the program once the data is read from the eeprom.
Question 2 - The size(sizeOfUsername) of the char array username can also change depending upon the length of the username read from the eeprom. At times it might be 5 characters long, at times it could be 25. I can set it to a max 25 and solve this problem but then wouldn't I be wasting memory if many usernames were just 4-5 characters long? So in short - just before copying over the data in eeprom into the "username" char array, is it possible to set it's size to the optimal size required for holding that data ( which is the data size + 1 byte for null termination ).
struct stUSRREC {
char username[sizeOfUsername];
bool online;
};
stUSRREC userRecords[NumOfrecs];
I familiarized myself with a whole bunch of functions like strcpy, memset, malloc etc but now I have run out of time and need to keep the learning part for another day.
I can try to do this in a slightly different manner where I don't use the struct and instead use individual char arrays ( for each field like username ). But then again I'll have to resize the arrays as I read the data from the eeprom.
I can explain all the things I have tried but that will make this question unnecessarily long and perhaps result in losing some clarity. Greatly appreciate any help.
While responding to Q&A on SO I was trying some random stuff and at least this little piece of code below seems to work ( in terms of storing smaller/bigger values )
struct stUSRREC {
char username[];
bool online;
};
stUSRREC userRecords[5];
Then manipulate it this way
strcpy(userRecords[0].username, "MYUSERNAME");
strcpy(userRecords[0].username, "test");
strcpy(userRecords[0].username, "MYVERYBIGUSERNAME");
I have been able to write/rewrite different lengths (above) and can read all of them back correctly. Resizing "userRecords" might be a different game but that can wait a little
One thing I forgot to mention was that I will need to size/resize the array ( holding username ) ONLY ONCE. In the setup() itself I can read/load the required data into those arrays. I am not sure if that opens up any other possibility. The rest of the struct/array I need to manipulate during the running are only boolean and int values. This is not an issue at all because there is no resizing required to do so.
On a side note I am pretty sure I am not the only one who faced this situation. Any tips/clues/pointers could be of help to many others. The constraints on little devices like ESP32 become more visible when you really start loading them with a bunch of things. I had it all working with "Strings" (the capital S) but the periodic reboot (cpu starvation?) required me to get rid of the Strings. Even otherwise I hear that using Strings (on ESP, Arduino and gang) is a bad idea.
You tagged this question as C++, so I'll ask:
Can you use vector and string in your embedded code?
#include <string>
#include <vector>
struct stUSRREC {
std::string username;
bool online;
stUSRREC(const char* name, bool isOnline) :
username(name),
online(isOnline)
{
}
};
std::vector<stUSRREC> userRecords;
The use of string as the username type means you only allocate as many characters needed to hold the name instead of allocated an assumed max size of sizeOfUsername. The use of vector allows you to dynamically grow your record set.
Then to add a new record:
stUSRREC record("bob", true);
userRecords.push_back(record);
And you may not need NumOfrecs anymore. That's covered by userRecrods.size()

Integer pointer to char array

I am working on an application (C++ language on visual studio) where all the strings are referred with integer pointer.
For example, the class I am using has this integer pointer to the data and a variable for size.
{
..
..
unsigned short int *pData;
int iLen
}
I would like to know
Are there any advantages of using int pointer instead of char pointer?
After thinking a lot, I suspect that the reason may be to avoid the application crash which may happen if the char pointer is used without a null termination. But i am not 100% sure.
During debugging, how can we check the content of the pointer, where the content is a char array or string (on Visual studio).
I can only see the address when I check the content during debugging. because of this i am facing difficulty in debugging.
Using printf would work to display the content but I can't do it in all places.
i am suspecting that the reason for using integer pointer may be to avoid the application crash which may happen if the char pointer is used without a null termination. But i am not 100% sure.
may be to avoid such programmatic errors it is taken as integer pointer.
the class i am using has this integer pointer to the data and a variable for size.
{
..
..
unsigned int *pData;
int iLen
}
Please correct me if you think this can not be the reason.
Please let us know which language you're using so that we can help you a bit better. As for your questions:
There is no benefit of using an int array vs. char array. In fact having an integer array takes up more space (if each int represents its own character). This is because a char takes up one byte where an integer takes up four.
As for printing things when debugging I'm not a master of visual studio since I haven't used it in some time but most modern IDEs allow you to cast things before you print them. For example in lldb you can do po (char)myIntArray[0] (po stands for print). In visual studio writing a custom visualizer should do the trick.
I am not sure why you would want to do this, but if you wanted to store an EOF character in your string for some reason, you would need a pointer to int.
MS Visual Studio often uses UTF-16 to store strings. UTF-16 requires a 16-bit data type, your application uses unsigned short for that, while a more correct name would be uint16_t (or maybe wchar_t, but I am not sure about that).
Another way to store strings uses UTF-8; if you use this way, use a pointer to char (not convenient) or std::string (more convenient). However, you don't have any choice here; you are not going to change how your application stores strings (it's probably too tedious).
To view your UTF-16-encoded string, use a dedicated format specifier. For example, in a quick-watch window, enter:
object->pData,su
instead of just
object->pData

Write a C++ struct to a file and read file using another programming language?

I have a challenging situation; we will have programs on Mac, PC, iOS and Android receiving files in a legacy format and parsing data from those files. We cannot change how those files are created.
The files are produced by a C++ program filling a struct with numbers and Strings and then writing it out. Here's a sanitized version.
struct MyObject {
String Kfkj(MAXKYS);
String Oern(MAXKYS);
String Vdflj(MAXKYS, 9);
int Muic;
int Tdfkj;
int VdfkAsdk;
int SsdjsdDsldsk;
int Ndsoief;
String TdflsajPdlj;
String TdckjdfPas;
String AdsfakjIdd;
int IdkfjdKasdkj;
int AsadkjaKadkja(MAXKYS);
int Kasldsdkj;
bool Usadl;
String PsadkjOasdj(9);
String PasdkjOsdkj;
};
Primitives and Strings, as you can see.
Then here is how they write it out to a file:
MyInstance MyObject;
FileName = "C:\MyFile.ab2"
ofstream fout (FileName, ios::binary);
fout.write((char*)& MyInstance, sizeof(MyInstance));
There is no option for us to translate it once and then distribute the file to other platforms; we must translate it on each and every different platform, and this is what we have to work with. I'd appreciate any information on how C++ serializes data, so we know how to parse the file.
EDIT: solution
The feedback I received from multiple answers here was VERY helpful. Using that, I did extensive analysis with hex editors and discovered:
the elements come in the file one after another
a "String," in this case, starts with an int describing how many characters follow the int for that String. If the String does not exist, it will still have that int with a value of 0.
integers, for the files and machines I saw, are two bytes, little-endian, and MOSTLY unsigned (there were a few that were signed, just to keep me on my toes)
the boolean was two bytes, with apparently -1 (FF FF) representing "true"
So far we have not ran into issues with different padding or endianness on different devices, but those are very real concerns. The skilled notes and warnings in these answers provides us with more ammunition to try to convince the client to change to a less fragile alternative, such as XML or JSON, for transferring data online across platforms.
As for those of you asking if the developer was fired... well, let's just say their code is very old, but after multiple conversations we're still having trouble convincing them writing out the C++ struct and trying to read that on different platforms is not a good idea.
You're going to run into many problems.
C++ doesn't have a specific format for serializing data per se. It is highly dependent on the computer architecture/processor that you are running on.
The compiler is allowed to add padding to help alignment on systems. When we say alignment we basically are referring to an architecture/processor's affinity for having data lie on specific byte boundaries. For example, some processors vastly prefer floating point numbers to lie at 4 or 8 byte boundaries - if they don't the processor may work much slower or may not work at all.
So, you can't simply know what padding your system is adding magically.
What you can do is use #pragma pack(1) / #pragma pack(0) to stop your compiler from padding your numbers.
PS: you also have to worry about endianness. What if one computer is running on big-endian and one is little endian? They will interpret bytes differently without a conversion.
Simply put, you either have to fix the application generating the files so it uses a proper serialization scheme OR you need to look at it running on a SPECIFIC computer, look at exactly how it writes the files, and write a translator for every target platform (which is just silly).
Interesting Suggestion
If you're really stuck, write an app that monitors the folder where you write files. Have the app pick up the files (since it's on the same PC it'll be able to read their format without issue). Have it write the files back in XML or some other true serialization format and distribute those instead.
Whoa - that's crazy. So String objects don't contain any pointers? Must not- because you claim this is working code.
Anyway, that code isn't doing any serialization. Its just writing the structure out to file exactly the way it is laid out in memory. The only issue you have is that on some platforms padding and sizeof integral types like int may be different.
You'll have to find the size of the integral types, and use that information in reader/writer for newer platforms to make sure they get laid out the same way on the legacy platform.
You're running a real risk with that code though. As it is, a compiler change could suddenly cause the file layout to change.
The format of your data file is entirely down to the compiler that your C++ program is compiled with, and the definition of your String class. You can rely on the fields being in the order they're declared in, and in this case, I think you can rely on there not being any padding at the start, but that's about all. Some tips that might help you out in this case:-
You don't give the definition of the String class you're using. If it's a typedef for std::string, you're completely screwed, because the contents of the string aren't in the memory. I assume your C++ programmers are using some special local buffer, in which case I'll guess you will find the first bytes of the object are the string, and there is some amount of useless padding afterwards. I hope the struct contains an int at the start telling you how much data in it is useful.
You'll probably find the int fields are four bytes long.
You'll probably find the bool field is one byte long, followed by three bytes of useless padding. Only one bit, most likely the bottom bit, will be set.
That's about all the useful guesswork I can offer you. In your target language, make sure to read the whole file in as the closest thing to a byte array available in the language, and only after that, use the language features to convert it into the right kind of thing in your language. Don't try reading it in as integers, as that won't let you byte-swap if you're on a platform with different endianness to the C++ program. I suggest also looking through the file in a text editor to reverse-engineer it and help you find the offset of each field.
Last piece of advice: consider printing P45s (or pink slips, or whatever you have in your country) for whichever programmers or project managers thought this kind of 'serialization' was a good idea. This kind of sloppy work might have been acceptable in a life-or-death situation, but they have seriously screwed you over in a way you're going to find it very hard to recover from. Writing the code to read in these files will not be that hard, if it's only one struct like this, but keeping it reliable will be a world of pain, and they've effectively made it impossible for themselves to change compilers or compiler version safely.
The way it's done, the struct is written in raw form to a file. So basically what you need to know to parse this file is the binary layout of your struct.
Basically, the fields are just one after the other, so to read an int, you just read 4 bytes and cast that to an int, etc.
Strings are a particular case. It's not clear from your code whether this "String" type is an inline array of characters, or a pointer to such an array. In the first case, you need to know how many characters each string contains and simply read that number of characters sequentially. In the second case, you won't be able to get the string back, since it won't have been written to file. The pointer will be useless to you.
One last concern is whether the struct is packed or not. Since you gave no indication to that, by default struct fields are aligned to 4-bytes boundaries, so there may be space for instance after the boolean field that you need to account for. If the struct is packed, then each field comes directly after the previous.
So, to make a long story short, figure out your struct binary layout using its definition and, if all else fails, inspecting the memory at run-time with the debugger, or use a hex editor to study the output file. Then write that specification down somewhere and this will give you what you need to read from the file. It's impossible to tell exactly what that layout is simply by looking at the pseudo-definition you gave.
Writing in an ofstream does not serialize data. This code write the raw memory content of the struct as it was a string of char. Depending of your compiler, its version, its options and the system it is running on the content will be completely different.
Even the number of bits of a char is allowed to change between c++ implementation.
Data referenced by the object of the struct won't be written (forget the content of std::string).
If you cannot change the writer code. You must know the alignment policy, the size of base type and the data representation. You will have to analyze files produced by hand, for example with an hexadecimal editor like this one
http://www.physics.ohio-state.edu/~prewett/hexedit/
, and probably look at your compiler documentation.
If you can change the writer code. Use proper serialization like json, protocol buffer or simply xml.
No one has pointed out something that sticks out to me as particularly problematic (maybe because I've been bit by it). That problem: the data member bool Usadl;. sizeof(bool) varies across platforms, across compilers, and even across releases of the same compiler. Common values for sizeof(bool) are 4 and 1. This will bite you. It's getting hard to find a big endian machine nowadays, very, very hard to find a computer where CHAR_BIT is not 8 or sizeof(int) is not 4. This is not the case for sizeof(bool).
In agreement with everyone else, Chad's team needs to document the structure of the records in the file, and then make sure the program that produces the file writes this structure explicitly, including element sizes, padding, and endianness. Don't depend on class layout to do this for you. That's just asking for trouble.
The best way would probably be to use JSON or if you want a more robust solution go with something like Avro. Avro has a C++ API and a Java API, so it covers most of the cases you're encountering.

Structs Being Weird - C++

I have been having alot of trouble with this stupid struct. I don't see why it is doing this, and I am really not sure how to fix it. The only way I know how to fix it, is by removing the struct and doing it some other way(which I don't want to do).
So I am reading data from a file, and I am reading it in to a struct pointer all at once. It seems like the offset/pointer of my 'long long' gets messed up everytime. View in details below.
So here is my struct:
struct Entry
{
unsigned short type;
unsigned long long identifier;
unsigned int offset_specifier, length;
};
And here is my code for reading all the crap into the struct pointer/array:
Entry *entries = new Entry[SOME_DYNAMIC_AMOUNT];
fread(entries, sizeof(Entry), SOME_DYNAMIC_AMOUNT, openedFile);
As you can see, I write all that into my struct array. Now, I will show you the data I am reading(for the first struct in this example).
So this is the data that is going into the first element in 'entries'. The first item(the short, 'type'), seems to be read fine. After that, when the 'identifier' is read, it seems like the whole struct is shifted X amount of bytes. Here is a picture of the first element(after reversing the endian):
And here is the data in memory(the red square is where it begins):
I know that was a bit confusing, but I tried to explain it as well as possible. Thanks for any help, Hetelek. :)
Structures are padded with extra bytes so that the fields are faster to access. You can prevent this with #pragma pack:
#pragma pack(push, 1)
struct Entry
{
/* ... */
};
#pragma pack(pop)
Note that this might not be 100% portable (I know that at least GCC and MSVC support it for x86).
Reading and writing structs to a file in binary is perilous.
The problem you're running into here is that the compiler inserts padding (needed for alignment) between the type and identifier members of your structure. Apparently whatever program wrote the data (which you haven't told us about) used a different layout that the program that's trying to read the data.
This could happen if the two systems (the one writing the data and the one reading it) have different alignment requirements, and therefore different layouts for the Entry type.
Alignment is not the only potential problem, though; differences in endianness can also be a serious problem. Different systems might have differing sizes for the predefined integer types. You can't assume that struct Entry will have a consistent layout unless all the code that deals with it runs on a single system -- and ideally with the same version of the same compiler.
You might be able to use #pragma pack to work around this, but I don't recommend it. It's not portable, and it can be unsafe. At best, it will work around the problem of padding between members; there are still plenty of ways the layout can vary from one system to another.
It's impossible to give you a definitive solution without knowing where and how the data layout of the file you're reading is defined.
If we assume that the file layout for each record is, for example:
A 2-byte unsigned integer in network byte order (type)
An 8-byte integer in network byte order (identifier)
A 4-byte integer in network byte order (offset_specifier, length)
with no padding between them
then you should either read the data into an unsigned char[] buffer, or into objects of type uint16_t, uint32_t, and uint64_t (defined in <cstdint> or <stdint.h>), and then translate it from network byte order to local byte order.
You can wrap this conversion in a function that reads from the file and converts the data, storing it in an Entry struct.
If you're able to assume that the program will only run on a restricted set of systems, then you can bypass some of this. For example, you might be able to tweak the declaration of struct Entry so it matches the file format, and read and write it directly. Doing so will mean your code isn't portable to some systems. You'll have to decide which price you're willing to pay.

Reading Superblock into a C Structure

I have a disk image which contains a standard image using fuse. The Superblock contains the following, and I have a function read_superblock(*buf) that returns the following raw data:
Bytes 0-3: Magic Number (0xC0000112)
4-7: Block Size (1024)
8-11: Total file system size (in blocks)
12-15: FAT length (in blocks)
16-19: Root Directory (block number)
20-1023: NOT USED
I am very new to C and to get me started on this project I am curious what is a simple way to read this into a structure or some variables and simply print them out to the screen using printf for debugging.
I was initially thinking of doing something like the following thinking I could see the raw data, but I think this is not the case. There is also no structure and I am trying to read it in as a string which also seems terribly wrong. for me to grab data out of. Is there a way for me to specify the structure and define the number of bytes in each variable?
char *buf;
read_superblock(*buf);
printf("%s", buf);
Yes, I think you'd be better off reading this into a structure. The fields containing useful data are all 32-bit integers, so you could define a structure that looks like this (using the types defined in the standard header file stdint.h):
typedef struct SuperBlock_Struct {
uint32_t magic_number;
uint32_t block_size;
uint32_t fs_size;
uint32_t fat_length;
uint32_t root_dir;
} SuperBlock_t;
You can cast the structure to a char* when calling read_superblock, like this:
SuperBlock_t sb;
read_superblock((char*) &sb);
Now to print out your data, you can make a call like the following:
printf("%d %d %d %d\n",
sb.magic_number,
sb.block_size,
sb.fs_size,
sb.fat_length,
sb.root_dir);
Note that you need to be aware of your platform's endianness when using a technique like this, since you're reading integer data (i.e., you may need to swap bytes when reading your data). You should be able to determine that quickly using the magic number in the first field.
Note that it's usually preferable to pass a structure like this without casting it; this allows you to take advantage of the compiler's type-checking and eliminates potential problems that casting may hide. However, that would entail changing your implementation of read_superblock to read data directly into a structure. This is not difficult and can be done using the standard C runtime function fread (assuming your data is in a file, as hinted at in your question), like so:
fread(&sb.magic_number, sizeof(sb.magic_number), 1, fp);
fread(&sb.block_size, sizeof(sb.block_size), 1, fp);
...
Two things to add here:
It's a good idea, when pulling raw data into a struct, to set the struct to have zero padding, even if it's entirely composed of 32-bit unsigned integers. In gcc you do this with #pragma pack(0) before the struct definition and #pragma pack() after it.
For dealing with potential endianness issues, two calls to look at are ntohs() and ntohl(), for 16- and 32-bit values respectively. Note that these swap from network byte order to host byte order; if these are the same (which they aren't on x86-based platforms), they do nothing. You go from host to network byte order with htons() and htonl(). However, since this data is coming from your filesystem and not the network, I don't know if endianness is an issue. It should be easy enough to figure out by comparing the values you expect (e.g. the block size) with the values you get, in hex.
It's not difficult to print the data after you successfully copied data into a structure Emerick proposed. Suppose the instance of the structure you use to hold data is named SuperBlock_t_Instance.
Then you can print its fields like this:
printf("Magic Number:\t%u\nBlock Size:\t%u\n etc",
SuperBlock_t_Instance.magic_number,
SuperBlock_t_Instance.block_size);