Binary through http - c++

I'm using C++ to send post-request with binary information. The code looks like:
int binary[4] = { 1, 2, 3, 4 };
std::stringstream out;
out << "POST /address HTTP/1.1\r\n";
out << "Host: localhost\r\n";
out << "Connection: Keep-Alive\r\n";
out << "Content-Type: application/octet-stream\r\n";
out << "Content-Transfer-Encoding: binary\r\n";
out << "Content-Length: " << 4*sizeof(int) << "\r\n\r\n"; // 4 elements of integer type
And sending data into opened connection in socket:
std::string headers = out.str();
socket.send(headers.c_str(), headers.size()); // Send headers first
socket.send(reinterpret_cast<char*>(&binary[0]), bufferLength*sizeof(int)); // And array of numbers
But I was told, that sending pure bytes through http-protocol is wrong. Is that right? For example, I can't send 0 (zero), it's used by protocol.
If that's right (because I can't handle that post-request and get the data I've sent) what could I use instead? Maybe, convert array into hex or base64url?
Thanks.

The problem people saying it's wrong are addressing is about the endianness. You can transfer binary data with http of course, but when the other end receives them, it must be able to interpret them correctly. Let's suppose your machine is a little endian machine; your integers will be, in memory, stored as (32 bit int)
01 00 00 00
02 00 00 00
03 00 00 00
04 00 00 00
and you send these 16 bytes as they "are". Now, suppose the receiving machine get the data naively disregarding who and how they are sent, and suppose that machine is a big endian machine; in such machine, the memory layout for 1, 2, 3, 4 intergers would be
00 00 00 01
00 00 00 02
00 00 00 03
00 00 00 04
This means that for the receiving machine the first integer is 0x01000000 which is not 0x00000001 as the sender wanted.
If you decide that your integers must be sent always as big endian integer, then if the sender is a little endian machine, it needs to "re-arrange" properly the integers before sending. There are functions like hton* (host to net) that "transforms" host 32/16 bit integers to the "net byte order" that is big endian (and viceversa, with ntoh* net to host)
Note that data are not scrambled, they are send as they "are", so to say. What changes is the way you store them in memory, and the way you interpret them when reading. Usually it's not an issue, since data are sent according to a format that, if needed, specifies the endianness of non-single-byte data (e.g. see PNG format spec, sec 2.1, integers byte order: PNG uses net byte order i.e. big endian)

But I was told, that sending pure bytes through http-protocol is
wrong. Is that right?
No, it is fine in the body, depending on the Content-Type of course. "Octet-stream" should be fine in this regard, and yes it can contain zero bytes.

There is nothing wrong to send binaries via HTTP.
This happens all the time with images and with file upload

Related

Interpreting OleDb representation of the database numeric(20, 0) using native C++

Context: I am expanding the old console application that is used to SQL query an SQL server and to store the SELECT result into the DBF table. The application is written in the native C++. It includes oledb.h and uses the documented ATL objects (atldbcli.h)
CCommand<CManualAccessor, CRowset> cmd;
CTable<CManualAccessor, CRowset> dstTable;
Then the common buffer is allocated and shared later by the source table and the dstTable. The suitable binding of columns to the buffer parts does the conversion when copying to the destination table. So far, so good.
What I need: I need to implement the extension where the source table content is interpreted. The values of a row from the source table should be used to build another SQL SELECT command. No problem with strings. However, I need to get the value of the field that is defined as NUMERIC(20, 0) in the database. The column type is really of DBTYPE_NUMERIC, the colInfo.bPrecision shows 20 and colInfo.bScale is zero. The lenght of the part of the buffer is 19 bytes
Buffer content: The buffer for the field shows the value
04 00 01 10 1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00
I know that the exact value of the field should be 7440 that is 0x1d10. I can see it from the fourth byte of the buffer on. For the quick&dirty hack I cand get the value; however, I would like to understand the details to implement it nicely for any NUMERIC(x, y)...
What tell the first three bytes? Are there any ready-to-be-used functions in oledb.h or atldbcli.h to get the value?

How to retrieve underlying block device IO error

Consider a device in the system, something under /dev/hdd[sg][nvme]xx
Open the device, get the file descriptor and start working with it (read(v)/write(v)/lseek, etc), at some point you may get EIO. How do you retrieve the underlying error reported by the device driver?
EDIT001: in case it is impossible using unistd functions, maybe there is other ways to work with block devices which can provide more low-level information like sg_scsi_sense_hdr?
You can't get any more error detail out of the POSIX functions. You're onto the right track with the SCSI generic stuff though. But, boy, it's loaded with hair. Check out the example in sg3_utils of how to do a SCSI READ(16). This will let you look at the sense data when it comes back:
https://github.com/hreinecke/sg3_utils/blob/master/examples/sg_simple16.c
Of course, this technique doesn't work with NVMe drives. (At least, not to my knowledge).
One concept I've played with in the past is to use normal POSIX/libc block I/O functions like pread and pwrite until I get an EIO out. At that point, you can bring in the SCSI-generic versions to try to figure out what happened. In the ideal case, a pread or lseek/read fails with EIO. You then turn around and re-issue it using a SG READ (10) or (16). If it's not just a transient failure, this may return sense data that your application can use.
Here's an example, using the command-line sg_read program. I have an iSCSI attached disk that I'm reading and writing. On the target, I remove its LUN mapping. dd reports EIO:
# dd if=/dev/sdb of=/tmp/output bs=512 count=1 iflag=direct
dd: error reading ‘/dev/sdb’: Input/output error
but sg_read reports some more useful information:
[root#localhost src]# sg_read blk_sgio=1 bs=512 cdbsz=10 count=512 if=/dev/sdb odir=1 verbose=10
Opened /dev/sdb for SG_IO with flags=0x4002
read cdb: 28 00 00 00 00 00 00 00 80 00
duration=9 ms
reading: SCSI status: Check Condition
Fixed format, current; Sense key: Illegal Request
Additional sense: Logical unit not supported
Raw sense data (in hex):
70 00 05 00 00 00 00 0a 00 00 00 00 25 00 00 00
00 00
sg_read: SCSI READ failed
Some error occurred, remaining block count=512
0+0 records in
You can see the Logical unit not supported additional sense code in the above output, indicating that there's no such LU at the target.
Possible? Yes. But as you can see from the code in sg_simple16.c, it's not easy!

boost asio async_read() seems to be skipping some nulls

I'm going a bit crazy with a simple boost asio TCP conversation.
I have a server and a client. I use length-prefixed messges. The client sends "one" and the server responds with "two". So this is what I see happen:
The client sends, and the server receives, 00 00 00 03 6F 6E 65 (== 0x0003 one).
The server responds by sending 00 00 00 03 74 77 6F (== 0x0003 two).
Now here is where it is very strange (code below). If the client reads four bytes, I expect it to get 00 00 00 03. If it reads seven, I expect to see 00 00 00 03 74 77 6F. (In fact, it will read four (the length header), then three (the body).)
But what I actually see is that, while if I read seven at once I do see 00 00 00 03 74 77 6F, if I only ask for four, I see 74 77 6F 03. This doesn't make any sense to me.
Here is the code I'm using to receive it (minus some print statements and such):
const int kTcpHeaderSize = 4;
const int kTcpMessageSize = 2048;
std::array<char, kTcpMessageSize + kTcpHeaderSize> receive_buffer_;
void TcpConnection::ReceiveHeader() {
boost::asio::async_read(
socket_, boost::asio::buffer(receive_buffer_, kTcpHeaderSize),
[this](boost::system::error_code error_code,
std::size_t received_length) {
if (error_code) {
LOG_WARNING << "Header read error: " << error_code;
socket_.close(); // TODO: Recover better.
return;
}
if (received_length != kTcpHeaderSize) {
LOG_ERROR << "Header length " << received_length
<< " != " << kTcpHeaderSize;
socket_.close(); // TODO: Recover better.
return;
}
uint32_t read_length_network;
memcpy(&read_length_network, receive_buffer_.data(),
kTcpHeaderSize);
uint32_t read_length = ntohl(read_length_network);
// Error: read_length is in the billions.
ReceiveBody(read_length);
});
}
Note that kTcpHeaderSize is 4. If I change it to 7 (which makes no sense, but just for the experiment) I see the stream of 7 bytes I expect. When it is 4, I see a stream that is not the first four bytes of what I expect.
Any pointers what I am doing wrong?
From what I can see in your code it should work according to the async_read documentation:
The asynchronous operation will continue until one of the following conditions is true:
The supplied buffers are full. That is, the bytes transferred is equal to the sum of the buffer sizes.
An error occurred.
However see the remark at the bottom:
This overload is equivalent to calling:
boost::asio::async_read(
s, buffers,
boost::asio::transfer_all(),
handler);
It looks like the transfer_all condition might be the only thing checked.
Try using the transfer_exactly condition and if it does work report an issue on https://github.com/boostorg/asio/issues.
The suggestion by #sergiopm to use transfer_all was good, and I'm pretty sure it helped. The other issue involved buffer lifetimes in the asynchronous send/receive functions. I got a bit confused, apparently, about how long certain things would live and how long I needed them to live, and so I was overwriting things from time to time. That may have been more important than transfer_all, but I'm still happy to give #sergiopm credit for helping getting me on my way.
The intent has just been to have a simple tcp client or server that I can declare, hand it a callback, and then go on my way knowing that I can only pay attention to those callbacks.
I'm pretty sure something like this must exist (thousands of times over). Do feel free to comment below, both for me and for those who come after, if you think there are better libraries than asio for this task (i.e., that would involve substantially less code on my part). The principle constraint is that, due to multiple languages and services, we need to own the wire protocol. Otherwise we get into things like "does library X have a module for language Y?".
As an aside, it's interesting to me that essentially every example I've found does length-prefix encoding rather than beginning/end of packet encoding. Length prefix is really easy to implement but, unless I'm quite mistaken, suffers from re-sync hell: if a stream is interrupted ("I'm going to send you 100 bytes, here are the first 50 but then I died") it's not clear to me that there aren't scenarios where I'm unable to resync properly.
Anyway, I learned a lot along the way, I recommend the exercise.

How to read a QTcpSocket from R

I'm trying to send data from Qt to R. I am new to the QtNetwork module and relatively new to Qt overall. As such I am also trying to figure out how QIODevice encodes data for the purposes of reading and writing.
If I run the Fortune Server Example and connect to it with the following code in R:
connection <- socketConnection(host="localhost", port=50743, open="rb", timeout=10)
readBin(connection, what="raw", n = 1000)
the following raw hexadecimal vector is returned
00 00 00 56 00 59 00 6f 00 75 00 20 00 77 00 69 00 6c 00 6c 00 20 00 66 00 65 00 65 00 6c 00 20 00 68 00 75 00 6e 00 67 00 72 00 79 00 20 00 61 00 67 00 61 00 69 00 6e 00 20 00 69 00 6e 00 20 00 61 00 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 68 00 6f 00 75 00 72 00 2e
Removing the first five bytes and all the remaining null characters and converting to char I get:
"You will feel hungry again in another hour."
So what I want to know is where do all the characters that are not part of the fortune come from? The fourth byte seems to be the byte length of the message from the sixth byte to the end, the rest of the "non-fortune" characters are all null.
I read that QByteArray terminates each byte with a null character and QByteArray is converted to a QBuffer before being written by QTcpSocket, is that what is happening here? QBuffer adds the length of the message (but what of the other four bytes) and every second byte of a QByteArray is the null character? Also, the last byte is not null (did the readBin operation consume it/ how did readBin know where the message ended)?
Is this the only way to write data to the socket? If I wanted to transmit values of type double would I have to convert them to QByteArray to transmit them in this fashion? Is there not some non-text way of transmitting data through a socket?
Any enlightenment would be much appreciated!
EDIT:
Thanks for the answer! For completeness sake here is how you might decode the string in R
connection <- socketConnection(host="localhost", port=50743, open="rb", timeout=10)
# Read first 32 bits, which contains the size of the string in bytes
len.raw <- readBin(connection, what="raw", n = 4)
# convert to integer
len <- strtoi(paste(c("0x",len.raw),collapse=""))
# Read raw message
msg.raw <- readBin(connection, what="raw", n = len)
# convert to char using UTF-16BE
msg <- iconv(list(msg.raw),from="UTF-16BE")
close(connection)
cat(msg)
If you take a look at how the Fortune Server Example is implemented, you can see that it uses QDataStream to serialize fortunes (QStrings) over the socket:
QByteArray block;
QDataStream out(&block, QIODevice::WriteOnly);
out.setVersion(QDataStream::Qt_4_0);
out << fortunes.at(qrand() % fortunes.size());
So, the question is reduced to "How does QDataStream serialize QStrings?", and this is answered extensively in the documentation page about serializing Qt data types. You can see that a QString's serialization looks like this:
If the string is null: 0xFFFFFFFF (quint32)
Otherwise: The string length in bytes (quint32) followed by the data in UTF-16
And this is exactly what you are seeing in your question. The first four bytes are the string length in bytes, and the "nulls" you are seeing later appear because of using UTF-16 encoding.
Is this the only way to write data to the socket? If I wanted to transmit values of type double would I have to convert them to QByteArray to transmit them in this fashion? Is there not some non-text way of transmitting data through a socket?
You can use any serialization format you like. QDataStream is widely used in Qt since it supports most Qt data types out of the box. This has nothing to do with using QByteArray, you can let QDataStream write to the socket directly. QDataStream is, actually, a binary format (non-text) as you can see. If want textual human-readable formats, you can use JSON.
But if you are aiming to send data from Qt to R using QDataStream, you'll have to write your QDataStream deserializer for R. I would recommend using some common data serialization that has implementations in C++ and R (in lieu of re-inventing the wheel). I believe JSON meets this criterion, and if you want to use a binary format, msgpack might be interesting for you, since it supports a lot of programming languages (including R and C++).

Setting endianness of VS debugger

I am using VS 2012 and programming in C++. I have a wide string
wchar_t *str = L"Hello world".
Technically I read the string from a file but I don't know if that makes a difference. When I look at str in the memory window it looks like this:
00 48 00 65 00 6c 00 6c 00 6f 00 2c 00 20 00 77 00 6f 00 72 00 6c 00 64 00 21 00
As you can see the string is stored in memory as big-endian.
When I hover my mouse over the string I get:
L"䠀攀氀氀漀Ⰰ 眀漀爀氀搀℀"
And after I reverse the endianness of str the memory looks like:
48 00 65 00 6c 00 6c 00 6f 00 2c 00 20 00 77 00 6f 00 72 00 6c 00 64 00 21 00 00
And the hover over looks like:
L"Hello, world!"
It seems that the debugger displays UTF-16 in little-endian by default. My program reads big-endian files so it is very tedious to keep reversing the endianness of all strings to debug them. Is there any way to change the endianness of the debugger's display?
Except for debug purposes I can do all my processing in big endian.
It's not only the debugger. The wchar_t function of Visual Studio are little endian as the host is. When you want to process the data you need to reverse the string endianess to little endian anyway.
It's worth to have this change even if you output the strings to a file with a different endianess. Strings are defined as a byte sequence, your endianess applied to a string looks strange anyhow.
Your best shot in getting this to work is to define your own type and create a debugger type visualizer for it (see Customizing the Visual Studio Debugger Display of Your Data, or here).
Or maybe you can quick-hack it by shifting the address by 1 byte in watch window.
You're working with a non-native string format that just happens to "feel" similar to the native format. So you are tempted to think there should be almost a way to do it. But to the debugger, it's just a foreign binary format. The debugger is not designed to handle foreign endianness just as it does not handle visualizing an OGG stream packet.
If you want to use available tools for manipulating native-endian Unicode strings, you'll need to convert to native-endian Unicode format.
As has been pointed out, VS uses the native endianness, which is
little endian on an Intel/AMD. The problem is that you're not
reading the strings correctly; you should imbue the
std::istream with a locale which reads UTF-16BE (since this is
apparently the encoding form you're trying to read).
std::istream (or rather the backing std::filebuf) will
automatically do the code translation on the fly when reading
and writing.
You can set the endianness of the Memory window using the context menu. Right-click in the Memory window and check "Big Endian".