Parsing protocol buffers in c++ - c++

I wanted to write some protocol buffers to the socket and read it back from the client side. Things didn't work so I wrote some decoding part in the server itself right after I encoded it. Can you please take a look at the code below and tell me what I am doing wrong ?
(I had to use arraystream and coded stream so that I can write a delimiter)
int bytes_written = tData.ByteSize() + sizeof(google::protobuf::uint32);
google::protobuf::uint8 buffer[bytes_written];
memset(buffer, '\0', bytes_written);
google::protobuf::io::ArrayOutputStream aos(buffer,bytes_written);
google::protobuf::io::CodedOutputStream *coded_output = new google::protobuf::io::CodedOutputStream(&aos);
google::protobuf::uint32 size_ = tData.ByteSize();
coded_output->WriteVarint32(size_);
tData.SerializeToCodedStream(coded_output);
int sent_bytes = 0;
std::cout << buffer << std::endl;
if ( (sent_bytes = send(liveConnections.at(i), buffer, bytes_written, MSG_NOSIGNAL)) == -1 )
liveConnections.erase(liveConnections.begin() + i);
else
std::cout << "sent " << sent_bytes << " bytes to " << i << std::endl;
delete coded_output;
////////////////
google::protobuf::uint8 __buffer[sizeof(google::protobuf::uint32)];
memset(__buffer, '\0', sizeof(google::protobuf::uint32));
memcpy (__buffer, buffer, sizeof(google::protobuf::uint32));
google::protobuf::uint32 __size = 0;
google::protobuf::io::ArrayInputStream ais(__buffer,sizeof(google::protobuf::uint32));
google::protobuf::io::CodedInputStream coded_input(&ais);
coded_input. ReadVarint32(&__size);
std::cout <<" size of payload is "<<__size << std::endl;
google::protobuf::uint8 databuffer[__size];
memset(databuffer, '\0', __size);
memcpy (databuffer, buffer+sizeof(google::protobuf::uint32), __size);
std::cout << "databuffs " << "size " << __size << " "<< databuffer << std::endl;
google::protobuf::io::ArrayInputStream array_input(databuffer,__size);
google::protobuf::io::CodedInputStream _coded_input(&array_input);
data_model::terminal_data* tData = new data_model::terminal_data();
if (!tData->ParseFromCodedStream(&_coded_input))
{
std::cout << "data could not be parsed" << std::endl;
}
else
{
std::cout <<" SYMBOL --" << tData->symbol_name() << std::endl;
}
delete tData;
Out put of the program:
size of payload is 55
databuffs size 55 C109"056* BANKNIFTY0���20140915#�J 145406340
data could not be parsed
C109"056* BANKNIFTY0���20140915#�J 145406340

WriteVarint32 doesn't necessarily write 4 bytes, and ReadVarint32 doesn't read 4 bytes. "Var" stands for "variable", as in "variable length encoding".
When encoding, you write the size (which can take as little as one byte), then immediately the proto. When decoding, you read the size, then advance by four bytes, then read the proto. So, you are parsing starting from the wrong offset.
Use CurrentPosition() after ReadVarint32 to figure out how many bytes the size indicator consumed. Advance by that number of bytes.

Related

inconsistent results while reading file in binary c++. READ BODY

As shown in the picture, if I read 2 bytes at offset 254786 and print it in hexadecimal, I should be getting 0xffd9 and I do get that exact value if I directly set the offset to 254786. however, if I set the offset to something far away from 254786 and run a while loop as shown in the second picture, I do not get 0xffd9. I really don't know where I could be possibly going wrong here.
std::ifstream myfile ("test-01.jpg");
if(!myfile) throw std::runtime_error("unable to open input file");
myfile.seekg(254786,myfile.beg);
std::string buf {};
buf.resize(2);
myfile.read(&buf[0], 2);
std::cout << std::hex << std::showbase << big_endian_2_bytes_to_int(buf);
0xffd9
int offset = 17000
std::cout << offset << std::endl;
myfile.seekg(offset,myfile.beg);
myfile.read(&buffer[0],2);
while ( big_endian_2_bytes_to_int(buffer) != 0xffd9){
offset++;
myfile.seekg(offset,myfile.beg);
myfile.read(&buffer[0],2);
if (offset == 254786){
std::cout << offset <<std::endl;
std::cout << std::hex << std::showbase << big_endian_2_bytes_to_int(buffer) << std::endl;
return 0;
}
}

C++ Reading back "incorrect" values from binary file?

The project I'm working on, as a custom file format consisting of the header of a few different variables, followed by the pixel data. My colleagues have developed a GUI, where processing, writing reading and displaying this type of file format works fine.
But my problem is, while I have assisted in writing the code for writing data to disk, I cannot myself read this kind of file and get satisfactorily values back. I am able to read the first variable back (char array) but not the following value(s).
So the file format matches the following structure:
typedef struct {
char hxtLabel[8];
u64 hxtVersion;
int motorPositions[9];
int filePrefixLength;
char filePrefix[100];
..
} HxtBuffer;
In the code, I create an object of the above structure and then set these example values:
setLabel("MY_LABEL");
setFormatVersion(3);
setMotorPosition( 2109, 5438, 8767, 1234, 1022, 1033, 1044, 1055, 1066);
setFilePrefixLength(7);
setFilePrefix( string("prefix_"));
setDataTimeStamp( string("000000_000000"));
My code for opening the file:
// Open data file, binary mode, reading
ifstream datFile(aFileName.c_str(), ios::in | ios::binary);
if (!datFile.is_open()) {
cout << "readFile() ERROR: Failed to open file " << aFileName << endl;
return false;
}
// How large is the file?
datFile.seekg(0, datFile.end);
int length = datFile.tellg();
datFile.seekg(0, datFile.beg);
cout << "readFile() file " << setw(70) << aFileName << " is: " << setw(15) << length << " long\n";
// Allocate memory for buffer:
char * buffer = new char[length];
// Read data as one block:
datFile.read(buffer, length);
datFile.close();
/// Looking at the start of the buffer, I should be seeing "MY_LABEL"?
cout << "buffer: " << buffer << " " << *(buffer) << endl;
int* mSSX = reinterpret_cast<int*>(*(buffer+8));
int* mSSY = reinterpret_cast<int*>(&buffer+9);
int* mSSZ = reinterpret_cast<int*>(&buffer+10);
int* mSSROT = reinterpret_cast<int*>(&buffer+11);
int* mTimer = reinterpret_cast<int*>(&buffer+12);
int* mGALX = reinterpret_cast<int*>(&buffer+13);
int* mGALY = reinterpret_cast<int*>(&buffer+14);
int* mGALZ = reinterpret_cast<int*>(&buffer+15);
int* mGALROT = reinterpret_cast<int*>(&buffer+16);
int* filePrefixLength = reinterpret_cast<int*>(&buffer+17);
std::string filePrefix; std::string dataTimeStamp;
// Read file prefix character by character into stringstream object
std::stringstream ss;
char* cPointer = (char *)(buffer+18);
int k;
for(k = 0; k < *filePrefixLength; k++)
{
//read string
char c;
c = *cPointer;
ss << c;
cPointer++;
}
filePrefix = ss.str();
// Read timestamp character by character into stringstream object
std::stringstream timeStampStream;
/// Need not increment cPointer, already pointing # 1st char of timeStamp
for (int l= 0; l < 13; l++)
{
char c;
c = * cPointer;
timeStampStream << c;
}
dataTimeStamp = timeStampStream.str();
cout << 25 << endl;
cout << " mSSX: " << mSSX << " mSSY: " << mSSY << " mSSZ: " << mSSZ;
cout << " mSSROT: " << mSSROT << " mTimer: " << mTimer << " mGALX: " << mGALX;
cout << " mGALY: " << mGALY << " mGALZ: " << mGALZ << " mGALROT: " << mGALROT;
Finally, what I see is here below. I added the 25 just to double check that not everything was coming out in hexadecimal. As you can see, I am able to see the label "MY_LABEL" as expected. But the 9 motorPositions all come out looking suspiciously like addresses are not values. The file prefix and the data timestamp (which should be strings, or at least characters), are just empty.
buffer: MY_LABEL M
25
mSSX: 0000000000000003 mSSY: 00000000001BF618 mSSZ: 00000000001BF620 mSSROT: 00000000001BF628 mTimer: 00000000001BF630 mGALX: 00000000001BF638 mGALY: 00000000001BF640 mGALZ: 00000000001BF648 mGALROT: 00000000001BF650filePrefix: dataTimeStamp:
I'm sure the solution can't be too complicated, but I reached a stage where I had this just spinning and I cannot make sense of things.
Many thanks for reading this somewhat long post.
-- Edit--
I might hit the maximum length allowed for a post, but just in case I thought I shall post the code that generates the data that I'm trying to read back:
bool writePixelOutput(string aOutputPixelFileName) {
// Write pixel histograms out to binary file
ofstream pixelFile;
pixelFile.open(aOutputPixelFileName.c_str(), ios::binary | ios::out | ios::trunc);
if (!pixelFile.is_open()) {
LOG(gLogConfig, logERROR) << "Failed to open output file " << aOutputPixelFileName;
return false;
}
// Write binary file header
string label("MY_LABEL");
pixelFile.write(label.c_str(), label.length());
pixelFile.write((const char*)&mFormatVersion, sizeof(u64));
// Include File Prefix/Motor Positions/Data Time Stamp - if format version > 1
if (mFormatVersion > 1)
{
pixelFile.write((const char*)&mSSX, sizeof(mSSX));
pixelFile.write((const char*)&mSSY, sizeof(mSSY));
pixelFile.write((const char*)&mSSZ, sizeof(mSSZ));
pixelFile.write((const char*)&mSSROT, sizeof(mSSROT));
pixelFile.write((const char*)&mTimer, sizeof(mTimer));
pixelFile.write((const char*)&mGALX, sizeof(mGALX));
pixelFile.write((const char*)&mGALY, sizeof(mGALY));
pixelFile.write((const char*)&mGALZ, sizeof(mGALZ));
pixelFile.write((const char*)&mGALROT, sizeof(mGALROT));
// Determine length of mFilePrefix string
int filePrefixSize = (int)mFilePrefix.size();
// Write prefix length, followed by prefix itself
pixelFile.write((const char*)&filePrefixSize, sizeof(filePrefixSize));
size_t prefixLen = 0;
if (mFormatVersion == 2) prefixLen = mFilePrefix.size();
else prefixLen = 100;
pixelFile.write(mFilePrefix.c_str(), prefixLen);
pixelFile.write(mDataTimeStamp.c_str(), mDataTimeStamp.size());
}
// Continue writing header information that is common to both format versions
pixelFile.write((const char*)&mRows, sizeof(mRows));
pixelFile.write((const char*)&mCols, sizeof(mCols));
pixelFile.write((const char*)&mHistoBins, sizeof(mHistoBins));
// Write the actual data - taken out for briefy sake
// ..
pixelFile.close();
LOG(gLogConfig, logINFO) << "Written output histogram binary file " << aOutputPixelFileName;
return true;
}
-- Edit 2 (11:32 09/12/2015) --
Thank you for all the help, I'm closer to solving the issue now. Going with the answer from muelleth, I try:
/// Read into char buffer
char * buffer = new char[length];
datFile.read(buffer, length);// length determined by ifstream.seekg()
/// Let's try HxtBuffer
HxtBuffer *input = new HxtBuffer;
cout << "sizeof HxtBuffer: " << sizeof *input << endl;
memcpy(input, buffer, length);
I can then display the different struct variables:
qDebug() << "Slice BUFFER label " << QString::fromStdString(input->hxtLabel);
qDebug() << "Slice BUFFER version " << QString::number(input->hxtVersion);
qDebug() << "Slice BUFFER hxtPrefixLength " << QString::number(input->filePrefixLength);
for (int i = 0; i < 9; i++)
{
qDebug() << i << QString::number(input->motorPositions[i]);
}
qDebug() << "Slice BUFFER filePrefix " << QString::fromStdString(input->filePrefix);
qDebug() << "Slice BUFFER dataTimeStamp " << QString::fromStdString(input->dataTimeStamp);
qDebug() << "Slice BUFFER nRows " << QString::number(input->nRows);
qDebug() << "Slice BUFFER nCols " << QString::number(input->nCols);
qDebug() << "Slice BUFFER nBins " << QString::number(input->nBins);
The output is then mostly as expected:
Slice BUFFER label "MY_LABEL"
Slice BUFFER version "3"
Slice BUFFER hxtPrefixLength "2"
0 "2109"
1 "5438"
...
7 "1055"
8 "1066"
Slice BUFFER filePrefix "-1"
Slice BUFFER dataTimeStamp "000000_000000P"
Slice BUFFER nRows "20480"
Slice BUFFER nCols "256000"
Slice BUFFER nBins "0"
EXCEPT, dataTimeStamp, which is 13 chars long, displays instead 14 chars. The 3 variables that follow: nRows, nCols and nBins are then incorrect. (Should be nRows=80, nCols=80, nBins=1000). My guess is that the bits belonging to the 14th char of dataTimeStamp should be read along with nRows, and so cascade on to produce the correct nCols and nBins.
I have separately verified (not shown here) using qDebug that what I'm writing into the file, really are the values I expect, and their individual sizes.
I personally would try to read exactly the number of bytes your struct is from the file, i.e. something like
int length = sizeof(HxtBuffer);
and then simply use memcpy to assign a local structure from the read buffer:
HxtBuffer input;
memcpy(&input, buffer, length);
You can then access your data e.g. like:
std::cout << "Data: " << input.hxtLabel << std::endl;
Why do you read to buffer, instead of using the structure for reading?
HxtBuffer data;
datFile.read(reinterpret_cast<char *>(&data), sizeof data);
if(datFile && datFile.gcount()!=sizeof data)
throw io_exception();
// Can use data.
If you want to read to a chracter buffer, than your way of getting the data is just wrong. You probably want to do something like this.
char *buf_offset=buffer+8+sizeof(u64); // Skip label (8 chars) and version (int64)
int mSSX = *reinterpret_cast<int*>(buf_offset);
buf_offset+=sizeof(int);
int mSSY = *reinterpret_cast<int*>(buf_offset);
buf_offset+=sizeof(int);
int mSSZ = *reinterpret_cast<int*>(buf_offset);
/* etc. */
Or, a little better (provided you don't change the contents of the buffer).
int *ptr_motors=reinterpret_cast<int *>(buffer+8+sizeof(u64));
int &mSSX = ptr_motors[0];
int &mSSY = ptr_motors[1];
int &mSSZ = ptr_motors[2];
/* etc. */
Notice that I don't declare mSSX, mSSY etc. as pointers. Your code was printing them as addresses because you told the compiler that they were addresses (pointers).

Server not receiving trailing longs in c++ socket

I am transferring a struct over socket using c++. I read some earlier questions on sending structs and one approach suggested was to transfer using a char* after cast. Since both server and client are on same machine so no issues of endianness here.
Couple of questions here. I get size of struct as 48. As per my calculation shouldn't it be 43? 8x4 + 10 +1
Secondly on server side when i print the received buffer I only get the text elements. The long integers are not received.
struct testStruct{
char type;
char field1[10];
char field2[8];
char field3[8];
long num1, num2;
};
testStruct ls;
ls.type = 'U';
strcpy(ls.field1, "NAVEENSHAR");
strcpy(ls.field2, "abcd1234");
strcpy(ls.field3, "al345678");
ls.num1 = 40;
ls.num2 = 200;
char* bytes = static_cast<char*>(static_cast<void*>(&ls));
bytes_sent = send(socketfd, bytes, sizeof(ls), 0);
cout << "bytes sent: " << bytes_sent<< "\n";
//On server sidechar
incomming_data_buffer[1000];
bytes_recieved = recv(new_sd, incomming_data_buffer,1000, 0);
cout << "|" << incomming_data_buffer << "|\n";
It shows 48 bytes received and no trailing integers which i added.
Any idea on why this could be happening. I have read about sending structs using boost serialization but at the same time that overhead is huge for simple structs.
You are almost certainly receiving all the data. The problem is with this line:
cout << "|" << incomming_data_buffer << "|\n";
which prints incomming_data_buffer as a C style string, so stops at the first zero-byte. Since your long values are encoded in binary for, there will be zeros at least there (there may also be zeros in the padding between fields).
You could try doing something like:
cout << "|";
for (int i = 0; i < bytes_received; i++)
{
cout << hex << (((int)incomming_data_buffer[i]) & 0xff) << " ";
}
cout << "|\n";
to show all bytes of the package you received.

Issue with GPB SerializeTo functions

I have the below code.
main()
{
test::RouteMessage *Rtmesg = new test::RouteMessage;
test::RouteV4Prefix *prefix = new test::RouteV4Prefix;
test::RouteMessage testRtmesg;
prefix->set_family(test::RouteV4Prefix::RT_AFI_V4);
prefix->set_prefix_len(24);
prefix->set_prefix(1000);
Rtmesg->set_routetype(test::RouteMessage::RT_TYPE_BGP);
Rtmesg->set_allocated_v4prefix(prefix);
Rtmesg->set_flags(test::RouteMessage::RT_FLGS_NONE);
Rtmesg->set_routeevnt(test::RouteMessage::BGP_EVNT_V4_RT_ADD);
Rtmesg->set_nexthop(100);
Rtmesg->set_ifindex(200); Rtmesg->set_metric(99);
Rtmesg->set_pref(1);
int size = Rtmesg->ByteSize();
char const *rt_msg = (char *)malloc(size);
google::protobuf::io::ArrayOutputStream oarr(rt_msg, size);
google::protobuf::io::CodedOutputStream output (&oarr)
Rtmesg->SerializeToCodedStream(&output);
// Below code is just to see if everything is fine.
google::protobuf::io::ArrayInputtStream iarr(rt_msg, size);
google::protobuf::io::CodedInputStream Input (&iarr)
testRtmesg.ParseFromCodedStream(&Input);
Vpe::RouteV4Prefix test_v4Prefix = testRtmesg.v4prefix();
cout << std::endl;
std::cout << "Family " << test_v4Prefix.family() << std::endl;
std::cout << "Prefix " << test_v4Prefix.prefix()<< std::endl;
std::cout << "PrefixLen " << test_v4Prefix.prefix_len() << std::endl;
// All the above outputs are fine.
cout << std::endl;
cout << rt_msg; <<------------ This prints absolutely junk.
cout << std::endl;
amqp_bytes_t str2;
str2 = amqp_cstring_bytes(rt_msg); <<----- This just crashes.
printf("\n str2=%s %d", str2.bytes, str2.len);
}
Any operation on the above rt_msg just crashes. I want to use the above buffer to send to socket and another rabbitmq publish APIs.
Anybody out there who had similar issue...or worked out similar code ?
Protocol Buffers is a binary serialization format, not text. This means:
Yes, if you write the binary data to cout, it will look like junk (or crash).
The data is not NUL-terminated like C strings. Therefore, you cannot pass it into a function like amqp_cstring_bytes which expects a NUL-terminated char* -- it may cut the data short at the first 0 byte, or it may search for a 0 byte past the end of the buffer and crash. In general, any function that takes a char* but does not also take a length won't work.
I'm not familiar with amqp, but it looks like the function you are trying to call, amqp_cstring_bytes, just builds a amqp_bytes_t, which is defined as follows:
typedef struct amqp_bytes_t_ {
size_t len;
void *bytes;
} amqp_bytes_t;
So, all you have to do is something like:
amqp_bytes_t str2;
str2.bytes = rt_msg;
str2.len = size;

C++ Websocket recv 2 bytes

Helo. I'm working on a C++ Websocket library. All was ok until one strange problem appeared.
int n = 0, n_add = 0;
char *buf = (char*)malloc(BUFLEN);
char new_buffer[4096];
while ((n = recv(client_id, buf, BUFLEN, 0)) > 0) {
strcat(new_buffer, buf);
int new_buffer_length = strlen(new_buffer);
int buf_length = strlen(buf);
n_add+= n;
// debug
cout << "buf: '" << buf << "'" << endl;
cout << "new_buffer_length: '" << new_buffer_length << "'" << endl;
cout << "buf_length: '" << buf_length << "'" << endl;
cout << "n: '" << n << "'" << endl;
cout << "n_add: '" << n_add << "'" << endl;
memset(buf, '\0', BUFLEN);
if (n_add == new_buffer_length && n < BUFLEN) {
cout << "new_buffer: '" << new_buffer << "'" << endl;
// if client is already connected
if (ws_clients[client_id][2] == WS_READY_STATE_OPEN) {
this->ws_client_message(client_id, new_buffer, new_buffer_length);
}
// if client needs a handshake
if (ws_clients[client_id][2] == WS_READY_STATE_CONNECTING) {
this->ws_client_handshake(client_id, new_buffer);
}
memset(&new_buffer, '\0', 4096);
n_add = 0;
FD_ZERO(&this->tmp_fds);
}
}
The handshake is done perfectly aswell as any payload that is less than 126. When that happends i get this:
buf: 'þ'
new_buffer_length: '2'
buf_length: '2'
n: '128'
n_add: '128'
buf: '½HÆA¯J'
new_buffer_length: '8'
buf_length: '6'
n: '6'
n_add: '134'
The n says i recived 128 bytes but it's actually only 2, the second time gives me 6 bytes, and those are ok. If i change my BUFLEN which is limited to 128 and put to 2, i'm getting everything ok, except for the last loop, it never gets to 134, the actual payload length.
Ok so if everybody has any idea, i'm using http://www.websocket.org/echo.html for testing, i tried everything. please give me some hints
The data framing section of RFC6455 shows that websocket messages are not plain text and are not null-terminated strings. You can't use C string handling functions like strlen or strcat on messages you read.
If you want to keep using C strings, use n to determine the number of bytes you have read and strncpy to add that to a buffer. Since you're writing a C++ server though, you'll probably find things easier if you switch to using std::string instead.
The difference in handling messages around 128 chars probably comes from the fact that the number of bytes used to indicate the length of the message increases for payloads longer than 125 bytes. For messages between 126 and 255 bytes long, there will be a zero byte in the extended payload length. Your code will (incorrectly) interpret this as the terminator of a C string.
A char* is not necessarily a C string, It has to be properly terminated.
while ((n = recv(client_id, buf, BUFLEN, 0)) > 0) {
strcat(new_buffer, buf);
....
This spells problem, whatever received in buf, may not be '\0' terminated.
and when you call strcat, it wouldn't work.
checkout the manual of strcat and strncat function, make sure you use these function correctly.