How to serialize numeric data into char* - c++

I have a need to serialize int, double, long, and float
into a character buffer and this is the way I currently do it
int value = 42;
char* data = new char[64];
std::sprintf(data, "%d", value);
// check
printf( "%s\n", data );
First I am not sure if this is the best way to do it but my immediate problem is determining the size of the buffer. The number 64 in this case is purely arbitrary.
How can I know the exact size of the passed numeric so I can allocate exact memory; not more not less than is required?
Either a C or C++ solution is fine.
EDIT
Based on Johns answer ( allocate large enough buffer ..) below, I am thinking of doing this
char *data = 0;
int value = 42;
char buffer[999];
std::sprintf(buffer, "%d", value);
data = new char[strlen(buffer)+1];
memcpy(data,buffer,strlen(buffer)+1);
printf( "%s\n", data );
Avoids waste at a cost of speed perhaps. And does not entirely solve the potential overflow Or could I just use the max value sufficient to represent the type.

In C++ you can use a string stream and stop worrying about the size of the buffer:
#include <sstream>
...
std::ostringstream os;
int value=42;
os<<42; // you use string streams as regular streams (cout, etc.)
std::string data = os.str(); // now data contains "42"
(If you want you can get a const char * from an std::string via the c_str() method)
In C, instead, you can use the snprintf to "fake" the write and get the size of the buffer to allocate; in facts, if you pass 0 as second argument of snprintf you can pass NULL as the target string and you get the characters that would have been written as the return value. So in C you can do:
int value = 42;
char * data;
size_t bufSize=snprintf(NULL, 0 "%d", value)+1; /* +1 for the NUL terminator */
data = malloc(bufSize);
if(data==NULL)
{
// ... handle allocation failure ...
}
snprintf(data, bufSize, "%d", value);
// ...
free(data);

I would serialize to a 'large enough' buffer then copy to an allocated buffer. In C
char big_buffer[999], *small_buffer;
sprintf(big_buffer, "%d", some_value);
small_buffer = malloc(strlen(big_buffer) + 1);
strcpy(small_buffer, big_buffer);

Related

Subsetting char array without copying it in C++

I have a long array of char (coming from a raster file via GDAL), all composed of 0 and 1. To compact the data, I want to convert it to an array of bits (thus dividing the size by 8), 4 bytes at a time, writing the result to a different file. This is what I have come up with by now:
uint32_t bytes2bits(char b[33]) {
b[32] = 0;
return strtoul(b,0,2);
}
const char data[36] = "00000000000000000000000010000000101"; // 101 is to be ignored
char word[33];
strncpy(word,data,32);
uint32_t byte = bytes2bits(word);
printf("Data: %d\n",byte); // 128
The code is working, and the result is going to be written in a separate file. What I'd like to know is: can I do that without copying the characters to a new array?
EDIT: I'm using a const variable here just to make a minimal, reproducible example. In my program it's a char *, which is continually changing value inside a loop.
Yes, you can, as long as you can modify the source string (in your example code you can't because it is a constant, but I assume in reality you have the string in writable memory):
uint32_t bytes2bits(const char* b) {
return strtoul(b,0,2);
}
void compress (char* data) {
// You would need to make sure that the `data` argument always has
// at least 33 characters in length (the null terminator at the end
// of the original string counts)
char temp = data[32];
data[32] = 0;
uint32_t byte = bytes2bits(data);
data[32] = temp;
printf("Data: %d\n",byte); // 128
}
In this example by using char* as a buffer to store that long data there is not necessary to copy all parts into a temporary buffer to convert it to a long.
Just use a variable to step through the buffer by each 32 byte length period, but after the 32th byte there needs the 0 termination byte.
So your code would look like:
uint32_t bytes2bits(const char* b) {
return strtoul(b,0,2);
}
void compress (char* data) {
int dataLen = strlen(data);
int periodLen = 32;
char* periodStr;
char tmp;
int periodPos = periodLen+1;
uint32_t byte;
periodStr = data[0];
while(periodPos < dataLen)
{
tmp = data[periodPos];
data[periodPos] = 0;
byte = bytes2bits(periodStr);
printf("Data: %d\n",byte); // 128
data[periodPos] = tmp;
periodStr = data[periodPos];
periodPos += periodLen;
}
if(periodPos - periodLen <= dataLen)
{
byte = bytes2bits(periodStr);
printf("Data: %d\n",byte); // 128
}
}
Please than be careful to the last period, which could be smaller than 32 bytes.
const char data[36]
You are in violation of your contract with the compiler if you declare something as const and then modify it.
Generally speaking, the compiler won't let you modify it...so to even try to do so with a const declaration you'd have to cast it (but don't)
char *sneaky_ptr = (char*)data;
sneaky_ptr[0] = 'U'; /* the U is for "undefined behavior" */
See: Can we change the value of an object defined with const through pointers?
So if you wanted to do this, you'd have to be sure the data was legitimately non-const.
The right way to do this in modern C++ is by using std::string to hold your string and std::string_view to process parts of that string without copying it.
You can using string_view with that char array you have though. It's common to use it to modernize the classical null-terminated string const char*.

Dynamic memory allocation to char array

I have given the array size manually as below:
int main(int argc, char *argv[] )
{
char buffer[1024];
strcpy(buffer,argv[1]);
...
}
But if the data passed in the argument exceeds this size, it may will create problems.
Is this the correct way to allocate memory dynamically?
int main(int argc, char *argv[] )
{
int length;
char *buffer;
length = sizeof(argv[1]); //or strlen(argv[1])?
buffer = (char*)malloc(length*sizeof(char *));
...
}
sizeof tells you the size of char*. You want strlen instead
if (argc < 2) {
printf("Error - insufficient arguments\n");
return 1;
}
length=strlen(argv[1]);
buffer = (char*)malloc(length+1); // cast required for C++ only
I've suggested a few other changes here
you need to add an extra byte to buffer for the null terminator
you should check that the user passed in an argv[1]
sizeof(char *) is incorrect when calculating storage required for a string. A C string is an array of chars so you need sizeof(char), which is guaranteed to be 1 so you don't need to multiply by it
Alternatively, if you're running on a Posix-compatible system, you could simplify things and use strdup instead:
buffer = strdup(argv[1]);
Finally, make sure to free this memory when you're finished with it
free(buffer);
The correct way is to use std::string and let C++ do the work for you
#include <string>
int main()
{
std::string buffer = argv[1];
}
but if you want to do it the hard way then this is correct
int main()
{
int length = strlen(argv[1]);
char* buffer = (char*)malloc(length + 1);
}
Don't forget to +1 for the null terminator used in C style strings.
In C++, you can do this to get your arguements in a nice data structure.
const std::vector<std::string>(argv, argv + argc)
length= strlen(argv[1]) //not sizeof(argv[1]);
and
//extra byte of space is to store Null character.
buffer = (char*)malloc((length+1) * sizeof(char));
Since sizeof(char) is always one, you can also use this:
buffer = (char*)malloc(length+1);
Firstly, if you use C++ I think it's better to use new instead of malloc.
Secondly, you're malloc size is false : buffer = malloc(sizeof(char) * length); because you allocate a char buffer not a char* buffer.
thirdly, you must allocate 1 byte more for the end of your string and store '\0'.
Finally, sizeof get only the size of the type not a string, you must use strlen for getting string size.
You need to add an extra byte to hold the terminating null byte of the string:
length=sizeof(argv[1]) + 1;
Then it should be OK.

Typecasting from byte[] to struct

I'm currently working on a small C++ project where I use a client-server model someone else built. Data gets sent over the network and in my opinion it's in the wrong order. However, that's not something I can change.
Example data stream (simplified):
0x20 0x00 (C++: short with value 32)
0x10 0x35 (C++: short with value 13584)
0x61 0x62 0x63 0x00 (char*: abc)
0x01 (bool: true)
0x00 (bool: false)
I can represent this specific stream as :
struct test {
short sh1;
short sh2;
char abc[4];
bool bool1;
bool bool2;
}
And I can typecast it with test *t = (test*)stream; However, the char* has a variable length. It is, however, always null terminated.
I understand that there's no way of actually casting the stream to a struct, but I was wondering whether there would be a better way than struct test() { test(char* data) { ... }} (convert it via the constructor)
This is called Marshalling or serialization.
What you must do is read the stream one byte at a time (or put all in a buffer and read from that), and as soon as you have enough data for a member in the structure you fill it in.
When it comes to the string, you simply read until you hit the terminating zero, and then allocate memory and copy the string to that buffer and assign it to a pointer in the struct.
Reading strings this way is simplest and most effective if you have of the message in a buffer already, because then you don't need a temporary buffer for the string.
Remember though, that with this scheme you have to manually free the memory containing the string when you are done with the structure.
Just add a member function that takes in the character buffer(function input parameter char *) and populates the test structure by parsing it.
This makes it more clear and readable as well.
If you provide a implicit conversion constructor then you create a menace which will do the conversion when you least expect it.
When reading variable length data from a sequence of bytes,
you shouldn't fit everything into a single structure or variable.
Pointers are also used to store this variable length.
The following suggestion, is not tested:
// data is stored in memory,
// in a different way,
// NOT as sequence of bytes,
// as provided
struct data {
short sh1;
short sh2;
int abclength;
// a pointer, maybe variable in memory !!!
char* abc;
bool bool1;
bool bool2;
};
// reads a single byte
bool readByte(byte* MyByteBuffer)
{
// your reading code goes here,
// character by character, from stream,
// file, pipe, whatever.
// The result should be true if not error,
// false if cannot rea anymore
}
// used for reading several variables,
// with different sizes in bytes
int readBuffer(byte* Buffer, int BufferSize)
{
int RealCount = 0;
byte* p = Buffer;
while (readByte(p) && RealCount <= BufferSize)
{
RealCount++
p++;
}
return RealCount;
}
void read()
{
// real data here:
data Mydata;
byte MyByte = 0;
// long enough, used to read temporally, the variable string
char temp[64000];
// fill buffer for string with null values
memset(temp, '\0', 64000);
int RealCount = 0;
// try read "sh1" field
RealCount = (readBuffer(&(MyData.sh1), sizeof(short)));
if (RealCount == sizeof(short))
{
// try read "sh2" field
RealCount = readBuffer(&(MyData.sh2), sizeof(short));
if (RealCount == sizeof(short))
{
RealCount = readBuffer(temp, 64000);
if (RealCount > 0)
{
// store real bytes count
MyData.abclength = RealCount;
// allocate dynamic memory block for variable length data
MyData.abc = malloc(RealCount);
// copy data from temporal buffer into data structure plus pointer
// arrays in "plain c" or "c++" doesn't require the "&" operator for address:
memcpy(MyData.abc, temp, RealCount);
// comented should be read as:
//memcpy(&MyData.abc, &temp, RealCount);
// continue with rest of data
RealCount = readBuffer(&(MyData.bool1), sizeof(bool));
if (RealCount > 0)
{
// continue with rest of data
RealCount = readBuffer(&(MyData.bool2), sizeof(bool));
}
}
}
}
} // void read()
Cheers.

c++ Function to format time_t as std::string: buffer length?

I want a function that will take a time_t parameter and an arbitrary format string and format it. I want something like this:
std::string GetTimeAsString(std::string formatString, time_t theTime)
{
struct tm *timeinfo;
timeinfo = localtime( &theTime);
char buffer[100];
strftime(buffer, 100, formatString.c_str(), timeinfo);
std::string result(buffer);
return result;
}
However one problem I'm running into is the buffer length. I was thinking of doing something like formatString * 4 as the buffer length. But I guess you can't dynamically set the buffer length? Maybe I could pick an arbitrarily large buffer? I'm a little stuck as to how to make it generic.
How can I write a function to achieve this?
If you have C++11:
std::string GetTimeAsString(std::string formatString, time_t theTime)
{
struct tm *timeinfo;
timeinfo = localtime( &theTime);
formatString += '\a'; //force at least one character in the result
std::string buffer;
buffer.resize(formatstring.size());
int len = strftime(&buffer[0], buffer.size(), formatString.c_str(), timeinfo);
while (len == 0) {
buffer.resize(buffer.size()*2);
len = strftime(&buffer[0], buffer.size(), formatString.c_str(), timeinfo);
}
buffer.resize(len-1); //remove that trailing '\a'
return buffer;
}
Note I take formatString as a const reference, (for speed and safety), and use the result string as the buffer, which is faster than doing an extra copy later. I also start at the same size as the formatstring, and double the size with each attempt, but that's easily changable to something more appropriate for the results of strftime.
C++11 solution with std::put_time():
std::string GetTimeAsString(std::string formatString, time_t theTime)
{
const struct tm* timeinfo = localtime(&theTime);
std::ostringstream os;
os << std::put_time(timeinfo, formatString.c_str());
return os.str();
}
Use a vector<char> for the buffer instead of an array. Repeatedly increase the size until strftime returns non-zero.
I would think your best bet would be to provide a fixed buffer that is likely to handle the vast majority of cases, and then do special handling for the rest. Something like (untested, except in the wetware inside my skull):
std::string GetTimeAsString (std::string formatString, time_t theTime) {
struct tm *timeinfo;
char buffer[100], *pBuff = buffer;
int rc, buffSize = 100;
timeinfo = localtime (&theTime);
rc = strftime(pBuff, 100, formatString.c_str(), timeinfo);
// Most times, we shouldn't enter this loop.
while (rc == 0) {
// Free previous in it was allocated.
if (pBuff != buffer)
delete[] pBuff;
// Try with larger buffer.
buffSize += 100;
pBuff = new char [buffSize];
rc = strftime(pBuff, buffSize, formatString.c_str(), timeinfo);
}
// Make string then free buffer if it was allocated.
std::string result(pBuff);
if (pBuff != buffer)
delete[] pBuff;
return result;
}
strftime will return zero if the provided buffer wasn't big enough. In that case, you start allocating bigger buffers until it fits.
Your non-allocated buffer size and the increment you use for allocation size can be tuned to your needs. This method has the advantage that you won't notice an efficiency hit (however small it may be) except for the rare cases - no allocation is done for that vast majority.
In addition, you could choose some other method (e.g., +10%, doubling, etc) for increasing the buffer size.
The strftime() function returns 0 if the buffer's size is too small to hold the expected result. Using this property, you could allocate the buffer on the heap and try the consecutive powers of 2 as its size: 1, 2, 4, 8, 16 etc. until the buffer is big enough. The advantage of using the powers of 2 is that the solution's complexity is logarithmically proportional to the result's length.
There's also a special case that needs to be thought of: the format might be such that the result's size will always be 0 (e.g. an empty format). Not sure how to handle that.

Reading std::string from binary file

I have a couple of functions I created a while ago for reading and writing std::strings to a FILE* opened for reading in binary mode. They have worked fine before (and WriteString() still works) but ReadString() keeps giving me memory corruption errors at run-time. The strings are stored by writing their size as an unsigned int before the string data as char.
bool WriteString(std::string t_str, FILE* t_fp) {
// Does the file stream exist and is it valid? If not, return false.
if (t_fp == NULL) return false;
// Create char pointer from string.
char* text = const_cast<char*>(t_str.c_str());
// Find the length of the string.
unsigned int size = t_str.size();
// Write the string's size to the file.
fwrite(&size, sizeof(unsigned int), 1, t_fp);
// Followed by the string itself.
fwrite(text, 1, size, t_fp);
// Everything worked, so return true.
return true;
}
std::string ReadString(FILE* t_fp) {
// Does the file stream exist and is it valid? If not, return false.
if (t_fp == NULL) return false;
// Create new string object to store the retrieved text and to return to the calling function.
std::string str;
// Create a char pointer for temporary storage.
char* text = new char;
// UInt for storing the string's size.
unsigned int size;
// Read the size of the string from the file and store it in size.
fread(&size, sizeof(unsigned int), 1, t_fp);
// Read [size] number of characters from the string and store them in text.
fread(text, 1, size, t_fp);
// Store the contents of text in str.
str = text;
// Resize str to match the size else we get extra cruft (line endings methinks).
str.resize(size);
// Finally, return the string to the calling function.
return str;
}
Can anyone see any problems with this code or have any alternative suggestions?
Biggest major problem that jumped out at me:
// Create a char pointer for temporary storage.
char* text = new char;
// ...
// Read [size] number of characters from the string and store them in text.
fread(text, 1, size, t_fp);
This creates text as a pointer to a single character, and then you try to read an arbitrary number of characters (potentially many more than one) into it. In order for this to work right, you would have to create text as an array of characters after you figured out what the size was, like this:
// UInt for storing the string's size.
unsigned int size;
// Read the size of the string from the file and store it in size.
fread(&size, sizeof(unsigned int), 1, t_fp);
// Create a char pointer for temporary storage.
char* text = new char[size];
// Read [size] number of characters from the string and store them in text.
fread(text, 1, size, t_fp);
Second, you don't free the memory that you allocated to text. You need to do that:
// Free the temporary storage
delete[] text;
Finally, is there a good reason why you are choosing to use C file I/O in C++? Using C++-style iostreams would have alleviated all of this and made your code much, much shorter and more readable.
The problem is:
char* text = new char;
you're allocating a single character. Do the allocation after you know size, and allocate all the size characters you need (e.g. with a new char[size]). (To avoid a leak, del it later after copying it, of course).
I'm sorry but the chosen answer doesn't work for me.
// UInt for storing the string's size.
unsigned int size;
// Read the size of the string from the file and store it in size.
fread(&size, sizeof(unsigned int), 1, t_fp);
// Create a char pointer for temporary storage.
char* text = new char[size];
// Read [size] number of characters from the string and store them in text.
fread(text, 1, size, t_fp);
The size ends up being a very large number. Am I missing something?