Properly serializing/unserializing primitives with Endian in mind - c++

For a project, I'm trying to serialize some primitives and send them over the wire to a locally hosted Java server. The Java part isn't so important right now, but what is important is that Java interprets its data as Big Endian, whereas my C++ program is Little Endian.
For a small example, I have a double:
_fields.degLatitude = 50.0;
I then want to serialize that data, but also swap the byte order (to conform to Little->Big Endian):
char buffer[sizeof(struct Fields)];
char* p = buffer;
writeReversedData<double>(&p, _fields.degLatitude);
writeReversedData's implementation is here:
template <typename T> void JsbSimWrapper::writeReversedData(char** bb, T& data)
{
const char* charBuffer = std::to_string(data).c_str();
for (int i = 0; i < sizeof(T); i++)
{
(*bb)[i] = charBuffer[sizeof(T) - i - 1];
}
*bb += sizeof(T);
}
Then just as a test on my side, I wanted to unserialize that data then swap it back to Little Endian:
std::cout << getReversedData<double>(&p) << std::endl;
And here's getReversedData's implementation:
template <typename T> T JsbSimWrapper::getReversedData(char** p)
{
union temp
{
char bytes[sizeof(T)];
T tt;
};
temp t;
for (int i = 0; i < sizeof(T); i++)
{
t.bytes[i] = (*p)[(sizeof(T) - 1) - i];
}
*p += sizeof(T);
return t.tt;
}
Unfortunately, cout just returns huge numbers that aren't even close to 50.0. I also tested the data received on the Java end, and it matches up with the cout in the C++, so it has to be something with my write call.
I know for sure that my getReversedData works, because I tested it with the Java side sending me serialized data, and the C++ side interprets it just fine.

This code stores a pointer to a temporary. It will be dangling as soon as the statement completes.
const char* charBuffer = std::to_string(data).c_str();
Also, in the for loop
for (int i = 0; i < sizeof(T); i++)
{
(*bb)[i] = charBuffer[sizeof(T) - i - 1];
}
there is no rule that the string representation is sizeof(T). For example, a 4 byte int might convert to 10 digits.
It would be much better to store the converted result in a std::string and then use the length() of that string.

Related

Corrupted struct char arrays - sqlite C++

I am a bit puzzled by this one. Everything has been fine so far with using SQLite but now I am trying to store my query results in a simple struct. When I do this in my callback, all my data looks great in my SQLItems vector but as soon as the callback exits, my SQLItems vector holding my rows of data is suddenly corrupted. Any ideas what could be causing this?
// Simple struct to hold column name and row data
struct SQLrow {
char * Column;
char * Data;
};
// static Vector to hold SQL rows
static std::vector<SQLrow> SQLItems;
...
// static callback that handles placing query results into structs and into SQLItems vector
// SQLItems column/row data gets corrupted after this function exits
static int countTablesCallback(void *data, int count, char **rows, char **azColName) {
int i;
for (i = 0; i < count; i++) {
SQLrow newItem = { azColName[i] ,rows[i] };
SQLItems.push_back(newItem);
}
*static_cast<std::vector<SQLrow>*>(data) = SQLItems; // Tried this too but throws an exception
return 0;
}
I also thought maybe it is only possible to statically cast from the callback to save the vector but that is throwing an exception as well. Stumped here. Thanks for any advice!
Your vector is fine, the static_cast makes no sense there, unless data is actually used as an out parameter. Your problem is, most likely, that SQLrow holds char pointer and SQLite deletes the pointed-to strings after the callback returns. Changing your class to
struct SQLrow {
std::string Column;
std::string Data;
};
should solve the problem.
Just looking at the code, it appears that the data pointed to by rows will be invalidated/destroyed/changed once the callback returns. So you can't retain those pointers for later use, and will have to make a copy of the data.
One easy way is to change Column and Data from char * to std::string. Failing that, you'll have to do some sort of manual memory management (allocate space with new, then delete it later) which is error prone and not really advisable these days.
In my opinion, there are very few case in which you want/need to use raw string in c++ and yours isn't one of those. By the way I hope this will help you or someone else in some way:
#include <vector>
#include <stdio.h>
#include <string.h>
#include <iostream>
struct SQLrow {
char* Column;
char* Data;
};
void your_callback(int count, char **rows, char **azColName) {
std::vector<SQLrow> rows_list;
for (int i = 0; i < count; i++) {
/* Uncomment this if you want
your copy of the strings. If you
use this, don't forget to free the
memory yourself with delete[] s1 and
s2.
size_t s1_len = strlen(rows[i]);
size_t s2_len = strlen(azColName[i]);
char* s1 = new char [sizeof(char) * (s1_len + 1)];
char* s2 = new char [sizeof(char) * (s2_len + 1)];
memcpy(s1, rows[i], s1_len);
s1[s1_len] = '\0';
memcpy(s2, azColName[i], s2_len);
s2[s2_len] = '\0';
SQLrow r = { s1, s2 }; */
SQLrow r = { rows[i], azColName[i] };
rows_list.push_back(r);
}
// test the result
for (int i = 0; i < count; i++) {
SQLrow r = rows_list.at(i);
std::cout << "rows:" << r.Column << " azColName:" << r.Data << std::endl;
}
}
// this 2 lines are just for simulating the data
// you will get this 'warning: ISO C++ forbids converting a string constant to char*''
char* rows[] = {"row1", "row2" , "row3" };
char* colName[] = {"name1", "name2", "name3" };
int main()
{
your_callback(3, rows, colName);
return 0;
}

How to calculate the length of a mpz_class in bytes?

I want to implement RSA with padding but first I have to find out the length in bytes of the message which is a mpz_class item. Which function would be useful in cpp to accomplish this?
const mpz_class m(argv[1])
What is the length of m in bytes?
Thank you!
#Shawn's comment is correct: The bytes occupied in memory by your class are not what you should be concerned about. Not only does the location of the bytes in memory depend on how your compiler decides to pack them, but their order can also depend on the hardware used.
So, instead of doing some awkward and very fragile memcopy'ish thing that are almost guaranteed to break at some point, you should construct the message yourself (google keyword: Serialization). This also has the advantage that your class can contain stuff that you don't want to add to the message (caches with temp results, or other implementation/optimization stuff).
To the best of my knowledge C++ (unlike f.ex. C#) does not come with build in serialization support, but there are likely to exist libraries that can do a lot of it for you. Otherwise you just have to write your "data member to byte array" functions yourself.
Super simple example:
#include <vector>
#include<iostream>
class myClass
{
int32_t a;
public:
myClass(int32_t val) : a(val) {}
// Deserializer
bool initFromBytes(std::vector<uint8_t> msg)
{
if (msg.size() < 4)
return false;
a = 0;
for (int i = 0; i < 4; ++i)
{
a += msg[i] << (i * 8);
}
return true;
}
// Serializer
std::vector<uint8_t> toBytes()
{
std::vector<uint8_t> res;
for (int i = 0; i < 4; ++i)
{
res.push_back(a >> (i*8));
}
return res;
}
void print() { std::cout << "myClass: " << a << std::endl; }
};
int main()
{
myClass myC(123456789);
myC.print();
std::vector<uint8_t> message = myC.toBytes();
myClass recreate(0);
if (recreate.initFromBytes(message))
recreate.print();
else
std::cout << "Error" << std::endl;
return 0;
}

Efficient way to convert int to string

I'm creating a game in which I have a main loop. During one cycle of this loop, I have to convert int value to string about ~50-100 times. So far I've been using this function:
std::string Util::intToString(int val)
{
std::ostringstream s;
s << val;
return s.str();
}
But it doesn't seem to be quite efficient as I've encountered FPS drop from ~120 (without using this function) to ~95 (while using it).
Is there any other way to convert int to string that would be much more efficient than my function?
It's 1-72 range. I don't have to deal with negatives.
Pre-create an array/vector of 73 string objects, and use an index to get your string. Returning a const reference will let you save on allocations/deallocations, too:
// Initialize smallNumbers to strings "0", "1", "2", ...
static vector<string> smallNumbers;
const string& smallIntToString(unsigned int val) {
return smallNumbers[val < smallNumbers.size() ? val : 0];
}
The standard std::to_string function might be a useful.
However, in this case I'm wondering if maybe it's not the copying of the string when returning it might be as big a bottleneck? If so you could pass the destination string as a reference argument to the function instead. However, if you have std::to_string then the compiler probably is C++11 compatible and can use move semantics instead of copying.
Yep — fall back on functions from C, as explored in this previous answer:
namespace boost {
template<>
inline std::string lexical_cast(const int& arg)
{
char buffer[65]; // large enough for arg < 2^200
ltoa( arg, buffer, 10 );
return std::string( buffer ); // RVO will take place here
}
}//namespace boost
In theory, this new specialisation will take effect throughout the rest of the Translation Unit in which you defined it. ltoa is much faster (despite being non-standard) than constructing and using a stringstream.
However, I've experienced problems with name conflicts between instantiations of this specialisation, and instantiations of the original function template, between competing shared libraries.
In order to get around that, I actually just give this function a whole new name entirely:
template <typename T>
inline std::string fast_lexical_cast(const T& arg)
{
return boost::lexical_cast<std::string>(arg);
}
template <>
inline std::string my_fast_lexical_cast(const int& arg)
{
char buffer[65];
if (!ltoa(arg, buffer, 10)) {
boost::throw_exception(boost::bad_lexical_cast(
typeid(std::string), typeid(int)
));
}
return std::string(buffer);
}
Usage: std::string myString = fast_lexical_cast<std::string>(42);
Disclaimer: this modification is reverse-engineered from Kirill's original SO code, not the version that I created and put into production from my company codebase. I can't think right now, though, of any other significant modifications that I made to it.
Something like this:
const int size = 12;
char buf[size+1];
buf[size] = 0;
int index = size;
bool neg = false
if (val < 0) { // Obviously don't need this if val is always positive.
neg = true;
val = -val;
}
do
{
buf[--index] = (val % 10) + '0';
val /= 10;
} while(val);
if (neg)
{
buf[--index] = '-';
}
return std::string(&buf[index]);
I use this:
void append_uint_to_str(string & s, unsigned int i)
{
if(i > 9)
append_uint_to_str(s, i / 10);
s += '0' + i % 10;
}
If You want negative insert:
if(i < 0)
{
s += '-';
i = -i;
}
at the beginning of function.

efficent way to save objects into binary files

I've a class that consists basically of a matrix of vectors: vector< MyFeatVector<T> > m_vCells, where the outer vector represents the matrix. Each element in this matrix is then a vector (I extended the stl vector class and named it MyFeatVector<T>).
I'm trying to code an efficient method to store objects of this class in binary files.
Up to now, I require three nested loops:
foutput.write( reinterpret_cast<char*>( &(this->at(dy,dx,dz)) ), sizeof(T) );
where this->at(dy,dx,dz) retrieves the dz element of the vector at position [dy,dx].
Is there any possibility to store the m_vCells private member without using loops? I tried something like: foutput.write(reinterpret_cast<char*>(&(this->m_vCells[0])), (this->m_vCells.size())*sizeof(CFeatureVector<T>)); which seems not to work correctly. We can assume that all the vectors in this matrix have the same size, although a more general solution is also welcomed :-)
Furthermore, following my nested-loop implementation, storing objects of this class in binary files seem to require more physical space than storing the same objects in plain-text files. Which is a bit weird.
I was trying to follow the suggestion under http://forum.allaboutcircuits.com/showthread.php?t=16465 but couldn't arrive into a proper solution.
Thanks!
Below a simplified example of my serialization and unserialization methods.
template < typename T >
bool MyFeatMatrix<T>::writeBinary( const string & ofile ){
ofstream foutput(ofile.c_str(), ios::out|ios::binary);
foutput.write(reinterpret_cast<char*>(&this->m_nHeight), sizeof(int));
foutput.write(reinterpret_cast<char*>(&this->m_nWidth), sizeof(int));
foutput.write(reinterpret_cast<char*>(&this->m_nDepth), sizeof(int));
//foutput.write(reinterpret_cast<char*>(&(this->m_vCells[0])), nSze*sizeof(CFeatureVector<T>));
for(register int dy=0; dy < this->m_nHeight; dy++){
for(register int dx=0; dx < this->m_nWidth; dx++){
for(register int dz=0; dz < this->m_nDepth; dz++){
foutput.write( reinterpret_cast<char*>( &(this->at(dy,dx,dz)) ), sizeof(T) );
}
}
}
foutput.close();
return true;
}
template < typename T >
bool MyFeatMatrix<T>::readBinary( const string & ifile ){
ifstream finput(ifile.c_str(), ios::in|ios::binary);
int nHeight, nWidth, nDepth;
finput.read(reinterpret_cast<char*>(&nHeight), sizeof(int));
finput.read(reinterpret_cast<char*>(&nWidth), sizeof(int));
finput.read(reinterpret_cast<char*>(&nDepth), sizeof(int));
this->resize(nHeight, nWidth, nDepth);
for(register int dy=0; dy < this->m_nHeight; dy++){
for(register int dx=0; dx < this->m_nWidth; dx++){
for(register int dz=0; dz < this->m_nDepth; dz++){
finput.read( reinterpret_cast<char*>( &(this->at(dy,dx,dz)) ), sizeof(T) );
}
}
}
finput.close();
return true;
}
A most efficient method is to store the objects into an array (or contiguous space), then blast the buffer to the file. An advantage is that the disk platters don't have waste time ramping up and also the writing can be performed contiguously instead of in random locations.
If this is your performance bottleneck, you may want to consider using multiple threads, one extra thread to handle the output. Dump the objects into a buffer, set a flag, then the writing thread will handle the output, releaving your main task to perform more important tasks.
Edit 1: Serializing Example
The following code has not been compiled and is for illustrative purposes only.
#include <fstream>
#include <algorithm>
using std::ofstream;
using std::fill;
class binary_stream_interface
{
virtual void load_from_buffer(const unsigned char *& buf_ptr) = 0;
virtual size_t size_on_stream(void) const = 0;
virtual void store_to_buffer(unsigned char *& buf_ptr) const = 0;
};
struct Pet
: public binary_stream_interface,
max_name_length(32)
{
std::string name;
unsigned int age;
const unsigned int max_name_length;
void load_from_buffer(const unsigned char *& buf_ptr)
{
age = *((unsigned int *) buf_ptr);
buf_ptr += sizeof(unsigned int);
name = std::string((char *) buf_ptr);
buf_ptr += max_name_length;
return;
}
size_t size_on_stream(void) const
{
return sizeof(unsigned int) + max_name_length;
}
void store_to_buffer(unsigned char *& buf_ptr) const
{
*((unsigned int *) buf_ptr) = age;
buf_ptr += sizeof(unsigned int);
std::fill(buf_ptr, 0, max_name_length);
strncpy((char *) buf_ptr, name.c_str(), max_name_length);
buf_ptr += max_name_length;
return;
}
};
int main(void)
{
Pet dog;
dog.name = "Fido";
dog.age = 5;
ofstream data_file("pet_data.bin", std::ios::binary);
// Determine size of buffer
size_t buffer_size = dog.size_on_stream();
// Allocate the buffer
unsigned char * buffer = new unsigned char [buffer_size];
unsigned char * buf_ptr = buffer;
// Write / store the object into the buffer.
dog.store_to_buffer(buf_ptr);
// Write the buffer to the file / stream.
data_file.write((char *) buffer, buffer_size);
data_file.close();
delete [] buffer;
return 0;
}
Edit 2: A class with a vector of strings
class Many_Strings
: public binary_stream_interface
{
enum {MAX_STRING_SIZE = 32};
size_t size_on_stream(void) const
{
return m_string_container.size() * MAX_STRING_SIZE // Total size of strings.
+ sizeof(size_t); // with room for the quantity variable.
}
void store_to_buffer(unsigned char *& buf_ptr) const
{
// Treat the vector<string> as a variable length field.
// Store the quantity of strings into the buffer,
// followed by the content.
size_t string_quantity = m_string_container.size();
*((size_t *) buf_ptr) = string_quantity;
buf_ptr += sizeof(size_t);
for (size_t i = 0; i < string_quantity; ++i)
{
// Each string is a fixed length field.
// Pad with '\0' first, then copy the data.
std::fill((char *)buf_ptr, 0, MAX_STRING_SIZE);
strncpy(buf_ptr, m_string_container[i].c_str(), MAX_STRING_SIZE);
buf_ptr += MAX_STRING_SIZE;
}
}
void load_from_buffer(const unsigned char *& buf_ptr)
{
// The actual coding is left as an exercise for the reader.
// Psuedo code:
// Clear / empty the string container.
// load the quantity variable.
// increment the buffer variable by the size of the quantity variable.
// for each new string (up to the quantity just read)
// load a temporary string from the buffer via buffer pointer.
// push the temporary string into the vector
// increment the buffer pointer by the MAX_STRING_SIZE.
// end-for
}
std::vector<std::string> m_string_container;
};
I'd suggest you to read C++ FAQ on Serialization and you can choose what best fits for your
When you're working with structures and classes, you've to take care of two things
Pointers inside the class
Padding bytes
Both of these could make some notorious results in your output. IMO, the object must implement to serialize and de-serialize the object. The object can know well about the structures, pointers data etc. So it can decide which format can be implemented efficiently.
You will have to iterate anyway or has to wrap it somewhere. Once you finished implementing the serialization and de-serialization function (either you can write using operators or functions). Especially when you're working with stream objects, overloading << and >> operators would be easy to pass the object.
Regarding your question about using underlying pointers of vector, it might work if it's a single vector. But it's not a good idea in the other way.
Update according to the question update.
There are few things you should mind before overriding STL members. They're not really a good candidate for inheritance because it doesn't have any virtual destructors. If you're using basic data types and POD like structures it wont make much issues. But if you use it truly object oriented way, you may face some unpleasant behavior.
Regarding your code
Why you're typecasting it to char*?
The way you serialize the object is your choice. IMO what you did is a basic file write operation in the name of serialization.
Serialization is down to the object. i.e the parameter 'T' in your template class. If you're using POD, or basic types no need of special synchronization. Otherwise you've to carefully choose the way to write the object.
Choosing text format or binary format is your choice. Text format has always has a cost at the same time it's easy to manipulate it rather than binary format.
For example the following code is for simple read and write operation( in text format).
fstream fr("test.txt", ios_base::out | ios_base::binary );
for( int i =0;i <_countof(arr);i++)
fr << arr[i] << ' ';
fr.close();
fstream fw("test.txt", ios_base::in| ios_base::binary);
int j = 0;
while( fw.eof() || j < _countof(arrout))
{
fw >> arrout[j++];
}
It seems to me, that the most direct root to generate a binary file containing a vector is to memory map the file and place it in the mapped region. As pointed out by sarat, you need to worry about how pointers are used within the class. But, boost-interprocess library has a tutorial on how to do this using their shared memory regions which include memory mapped files.
First off, have you looked at Boost.multi_array? Always good to take something ready-made rather than reinventing the wheel.
That said, I'm not sure if this is helpful, but here's how I would implement the basic data structure, and it'd be fairly easy to serialize:
#include <array>
template <typename T, size_t DIM1, size_t DIM2, size_t DIM3>
class ThreeDArray
{
typedef std::array<T, DIM1 * DIM2 * DIM3> array_t;
array_t m_data;
public:
inline size_t size() const { return data.size(); }
inline size_t byte_size() const { return sizeof(T) * data.size(); }
inline T & operator()(size_t i, size_t j, size_t k)
{
return m_data[i + j * DIM1 + k * DIM1 * DIM2];
}
inline const T & operator()(size_t i, size_t j, size_t k) const
{
return m_data[i + j * DIM1 + k * DIM1 * DIM2];
}
inline const T * data() const { return m_data.data(); }
};
You can serialize the data buffer directly:
ThreeDArray<int, 4, 6 11> arr;
/* ... */
std::ofstream outfile("file.bin");
outfile.write(reinterpret_cast<char*>(arr.data()), arr.byte_size());

Dumping the memory contents of a object

In a game that I mod for they recently made some changes which broke a specific entity. After speaking with someone who figured out a fix for it, they only information they gave me was that they "patched it" and wouldn't share anymore.
I am basically trying to remember how to dump the memory contents of a class object at runtime. I vaguely remember doing something similar before, but it has been a very long time. Any help on remember how to go about this would be most appreciated.
template <class T>
void dumpobject(T const *t) {
unsigned char const *p = reinterpret_cast<unsigned char const *>(t);
for (size_t n = 0 ; n < sizeof(T) ; ++n)
printf("%02d ", p[n]);
printf("\n");
}
Well, you may reinterpret_cast your object instance as a char array and display that.
Foo foo; // Your object
// Here comes the ugly cast
const unsigned char* a = reinterpret_cast<const unsigned char*>(&foo);
for (size_t i = 0; i < sizeof(foo); ++i)
{
using namespace std;
cout << hex << setw(2) << static_cast<unsigned int>(a[i]) << " ";
}
This is ugly but should work.
Anyway, dealing with the internals of some implementation is usually not a good idea.