Cannot read .jpg binary data, buffer only has 4 bytes of data - c++

My question almost exactly the same as this one which is unanswered. I am trying to read the binary data of a .jpg to send as an HTTP response on a simple web server using C++. The code for reading the data is below.
FILE *f = fopen(file.c_str(),"rb");
if(f){
fseek(f,0,SEEK_END);
int length = ftell(f);
fseek(f,0,SEEK_SET);
char* buffer = (char*)malloc(length+1);
if(buffer){
int b = fread(buffer,1,length,f);
std::cout << "bytes read: " << b << std::endl;
}
fclose(f);
buffer[length] = '\0';
return buffer;
}
return NULL;
When the request for the image is made and this code runs, fread() returns 25253 bytes being read, which seems correct. However, when I perform strlen(buffer) I get only 4. Of course, this gives an error on a browser when the image tries to display. I have also tried manually setting the HTTP content length to 25253 but I then a receive a curl error 18, indicating the transfer ended early (as only 4 bytes exist).
As the other poster mentioned in their question, the 5th byte of the image (and I assume most .jpg images) is 0x00, but I am unsure if this has an effect on saving to the buffer.
I have verified the .jpg images I am loading are in the directory, valid, and display properly when opened normally. I have also tried 2 different methods of loading the binary data, and both also give only 4 bytes, so I am really at a loss. Any help is much appreciated.

When the request for the image is made and this code runs, fread()
returns 25253 bytes being read, which seems correct. However, when I
perform strlen(buffer) I get only 4.
Well there is your problem: You read binary data, not text, meaning that special characters like newline or the null character is not a something that indicates the structure of a text, its simple numbers.
strlen is a function to give you the count of characters other than '\0' or simply 0. However in a binary file like jpeg there a dozen of zeros usually in there, and because of a binary header structure, there seems to be always a zero at position 5 so, so strlen will stop at the first it found and return 4.
Also you seem confused by the fact that you try to send this "text interpreted" jpeg to a HTTP server. Of course it will complain, because you can not simply send binary data as text in HTTP, you either have to encode it, base64 is very popular, or set the content length header. Of course you also have to tell the HTTP client/server the type by setting the proper MIME header.

Related

writing file bytes to a file client side c++

I have a server (python) that sends bytes from a file to a client in c++. I am using libcurl to make requests to the python server and flask to do all of the "hard" work for me in python. After i get the file bytes from the server, i want to write it to a zip file on the client side. Initially, i was going to use libcurl to do it for me, but i decided i didn't want to do that as it would require an extra function in my wrapper which is not necessary.
FILE* zip_file = fopen(zip_name, "wb");
//make request and store the bytes from the server in a string
fwrite(response_information.first.c_str(), sizeof(char), sizeof(response_information.first.c_str()), zip_file);
//response_information is a pair . First = std::string, Second = curl response code
I do plan on switching to fopen_s (safe version of fopen), but i want to get a working program first. This is part of a bigger project so i can't provide code that can be run. Some things to note that i think can be causing this: storing response as string then attempting to get the c string version and write it to the file. When storing the return value/code of fwrite, i get "8" which means "*" bytes written apparently. Also, when im on windows, it says that the file was modified after i run my program, but nothing is in the zip file itself. How can i write the response bytes to a file?
The third parameter in fwrite is a count of items to write. So sizeof doesn't seem to be the thing you need. response_information.first.c_str() is a pointer, so sizeof(response_information.first.c_str()) returns a pointer size. Here it should be:
fwrite(response_information.first.c_str(), sizeof(char), strlen(response_information.first.c_str()), zip_file);
or
fwrite(response_information.first.c_str(), sizeof(char), response_information.first.length(), zip_file);

How to 'read' from a (binary==true) boost::beast::websocket::stream<tcp::socket> into a buffer (boost::beast::flat_buffer?) so it is not escaped?

I am using boost::beast to read data from a websocket into a std::string. I am closely following the example websocket_sync_client.cpp in boost 1.71.0, with one change--the I/O is sent in binary, there is no text handler at the server end, only a binary stream. Hence, in the example, I added one line of code:
// Make the stream binary?? https://github.com/boostorg/beast/issues/1045
ws.binary(true);
Everything works as expected, I 'send' a message, then 'read' the response to my sent message into a std::string using boost::beast::buffers_to_string:
// =============================================================
// This buffer will hold the incoming message
beast::flat_buffer wbuffer;
// Read a message into our buffer
ws.read(wbuffer);
// =============================================================
// ==flat_buffer to std::string=================================
string rcvdS = beast::buffers_to_string(wbuffer.data());
std::cout << "<string_rcvdS>" << rcvdS << "</string_rcvdS>" << std::endl;
// ==flat_buffer to std::string=================================
This just about works as I expected, except there is some kind of escaping happening on the data of the (binary) stream.
There is no doubt some layer of boost logic (perhaps character traits?) that has enabled/caused all non-printable characters to be '\u????' escaped, human-readable text.
The binary data that is read contains many (intentional) non-printable ASCII control characters to delimit/organize chunks of data in the message:
I would rather not have the stream escaping these non-printable characters, since I will have to "undo" that effort anyway, if I cannot coerce the 'read' buffer into leaving the data as-is, raw. If I have to find another boost API to undo the escaping, that is just wasted processing that no doubt is detrimental to performance.
My question has to have a simple solution. How can I cause the resulting flat_buffer that is ws.read into 'rcvdS' to contain truely raw, unescaped bytes of data? Is it possible, or is it necessary for me to simply choose a different buffer template/class, so that the escaping does not happen?
Here is a visual aid - showing expected vs. actual data:
Beast does not alter the contents of the message in any way. The only thing that binary() and text() do is set a flag in the message which the other end receives. Text messages are validated against the legal character set, while binary messages are not. Message data is never changed by Beast. buffers_to_string just transfers the bytes in the buffer to a std::string, it does not escape anything. So if the buffer contains a null, or lets say a ctrl+A, you will get a 0x00 and a 0x01 in the std::string respectively.
If your message is being encoded or translated, it isn't Beast that is doing it. Perhaps it is a consequence of writing the raw bytes to the std::cout? Or it could be whatever you are using to display those messages in the image you posted. I note that the code you provided does not match the image.
If anyone else lands here, rest assured, it is your server end, not the client end that is escaping your data.

Sending Binary File Data via Google Protobuf

I have my protobuf-message set up fine it seems, all other fields I have transmit correctly across the network and do not truncate. I only have one problem, when I read the binary data of a picture or file then send it through google protobuf as bytes array type, on the other side it only contains the first 4 elements of the array. If the picture is say 200kb, on the other end it comes out as 1kb(Basically only contains a header or identifier). This problem is kinda complex so I will try to give a run down. Sorry if I make this impossible to understand. I may be going about this completely the wrong way.
Example below contains conceptual work, and was written in class. It very well could contain small errors. The code compiles at home, and if it is a typo let me know and I can fix it.
FILE* file;
FILE* ofile;
file = fopen("red.png", "rb");
fseek(file, 0, SEEK_END);
long fSize = ftell(file);
rewind(file);
BYTE* ret = new BYTE[fSize];
fread(ret, 1, fSize, file);
fclose(file);
char dataStream[1024] //yes it is large enough
myPacket.set_file(ret);
//set other fields here
myPacket.SerializeToArray(dataStream,sizeof(dataStream));
//send through sockets below, works for all but file field.
I can include more when I get back home to my main work computer, sorry, was just hoping I could let this stew while at class. If this is not enough info feel free to give me the smack down, it's alright just looking for advice. I also know that certain image formats can be read certain ways, but I was able to copy a png and rewrite it through binary locally, just not over protobuf
Thanks for reading my pseudo book guys, I am finally trying to leap into improving my knowledge.
Edited quickly typed pointer error(&ret) to (ret). Also then should size of be sizeof(myPacket) rather.
You have written this:
char dataStream[1024] //yes it is large enough
But how could 1024 bytes buffer be large enough if you want to store 200 000 bytes into it?
Better allocate a bigger buffer on the heap, e.g.:
std::vector<char> dataStream(500000);
myPacket.SerializeToArray(&dataStream[0], dataStream.size());

Curlpp, incomplete data from request

I am using Curlpp to send requests to various webservices to send and receive data.
So far this has worked fine since i have only used it for sending/receiving JSON data.
Now i have a situation where a webservice returns a zip file in binary form. This is where i encountered a problem where the data received is not complete.
I first had Curl set to write any data to a ostringstream by using the option WriteStream, but this proved not to be the correct approach since the data contained null characters, and thus the data stopped at the first null char.
After that, instead of using WriteStream i used WriteFunction with a callback function.
The problem in this case is that this function is always called 2 or 3 times, regardless of the amount of data.
This results in always having a few chunks of data that don't seem to be the first part of the file, although the data always contains PK as the first 2 characters, indicating a zip file.
I used several tools to verify that the data is entirely being sent to my application so this is not a problem of the webservice.
Here the code. Do note that the options like hostname, port, headers and postfields are set elsewhere.
string requestData;
size_t WriteStringCallback(char* ptr, size_t size, size_t nmemb)
{
requestData += ptr;
int totalSize= size*nmemb;
return totalSize;
}
const string CurlRequest::Perform()
{
curlpp::options::WriteFunction wf(WriteStringCallback);
this->request.setOpt( wf );
this->request.perform();
return requestData;
}
I hope anyone can help me out with this issue because i've run dry of any leads on how to fix this, also because curlpp is poorly documented(and even worse since the curlpp website disappeared).
The problem with the code is that the data is put into a std::string, despite having the data in binary (ZIP) format. I'd recommend to put the data into a stream (or a binary array).
You can also register a callback to retrieve the response headers and act in the WriteCallback according to the "Content-type".
curlpp::options::HeaderFunction to register a callback to retrieve response-headers.
std::string is not a problem, but the concatenation is:
requestData += ptr;
C string (ptr) is terminated with zero, if the input contains any zero bytes, the input will be truncated. You should wrap it into a string which knows the length of its data:
requestData += std::string(ptr, size*nmemb);

feof() returning true when EOF is not reached

I'm trying to read from a file at a specific offset (simplified version):
typedef unsigned char u8;
FILE *data_fp = fopen("C:\\some_file.dat", "r");
fseek(data_fp, 0x004d0a68, SEEK_SET); // move filepointer to offset
u8 *data = new u8[0x3F0];
fread(data, 0x3F0, 1, data_fp);
delete[] data;
fclose(data_fp);
The problem becomes, that data will not contain 1008 bytes, but 529 (seems random). When it reaches 529 bytes, calls to feof(data_fp) will start returning true.
I've also tried to read in smaller chunks (8 bytes at a time) but it just looks like it's hitting EOF when it's not there yet.
A simple look in a hex editor shows there are plenty of bytes left.
Opening a file in text mode, like you're doing, makes the library translate some of the file contents to other stuff, potentially triggering a unwarranted EOF or bad offset calculations.
Open the file in binary mode by passing the "b" option to the fopen call
fopen(filename, "rb");
Is the file being written to in parallel by some other application? Perhaps there's a race condition, so that the file ends at wherever the read stops, when the read is running, but later when you inspect it the rest has been written. That would explain the randomness, too.
Maybe it's a difference between textual and binary file. If you're on Windows, newlines are CRLF, which is two characters in file, but converted to only one when read. Try using fopen(..., "rb")
I can't see your link from work, but if your computer claims no more bytes exist, I'd tend to believe it. Why don't you print the size of the file rather than doing things by hand in a hex editor?
Also, you'd be better off using level 2 I/O the f-calls are ancient C ugliness, and you're using C++ since you have new.
int fh =open(filename, O_RDONLY);
struct stat s;
fstat(fh, s);
cout << "size=" << hex << s.st_size << "\n";
Now do your seeking and reading using level 2 I/O calls, which are faster anyway, and let's see what the size of the file really is.