Sending Binary File Data via Google Protobuf - c++

I have my protobuf-message set up fine it seems, all other fields I have transmit correctly across the network and do not truncate. I only have one problem, when I read the binary data of a picture or file then send it through google protobuf as bytes array type, on the other side it only contains the first 4 elements of the array. If the picture is say 200kb, on the other end it comes out as 1kb(Basically only contains a header or identifier). This problem is kinda complex so I will try to give a run down. Sorry if I make this impossible to understand. I may be going about this completely the wrong way.
Example below contains conceptual work, and was written in class. It very well could contain small errors. The code compiles at home, and if it is a typo let me know and I can fix it.
FILE* file;
FILE* ofile;
file = fopen("red.png", "rb");
fseek(file, 0, SEEK_END);
long fSize = ftell(file);
rewind(file);
BYTE* ret = new BYTE[fSize];
fread(ret, 1, fSize, file);
fclose(file);
char dataStream[1024] //yes it is large enough
myPacket.set_file(ret);
//set other fields here
myPacket.SerializeToArray(dataStream,sizeof(dataStream));
//send through sockets below, works for all but file field.
I can include more when I get back home to my main work computer, sorry, was just hoping I could let this stew while at class. If this is not enough info feel free to give me the smack down, it's alright just looking for advice. I also know that certain image formats can be read certain ways, but I was able to copy a png and rewrite it through binary locally, just not over protobuf
Thanks for reading my pseudo book guys, I am finally trying to leap into improving my knowledge.
Edited quickly typed pointer error(&ret) to (ret). Also then should size of be sizeof(myPacket) rather.

You have written this:
char dataStream[1024] //yes it is large enough
But how could 1024 bytes buffer be large enough if you want to store 200 000 bytes into it?
Better allocate a bigger buffer on the heap, e.g.:
std::vector<char> dataStream(500000);
myPacket.SerializeToArray(&dataStream[0], dataStream.size());

Related

Cannot read .jpg binary data, buffer only has 4 bytes of data

My question almost exactly the same as this one which is unanswered. I am trying to read the binary data of a .jpg to send as an HTTP response on a simple web server using C++. The code for reading the data is below.
FILE *f = fopen(file.c_str(),"rb");
if(f){
fseek(f,0,SEEK_END);
int length = ftell(f);
fseek(f,0,SEEK_SET);
char* buffer = (char*)malloc(length+1);
if(buffer){
int b = fread(buffer,1,length,f);
std::cout << "bytes read: " << b << std::endl;
}
fclose(f);
buffer[length] = '\0';
return buffer;
}
return NULL;
When the request for the image is made and this code runs, fread() returns 25253 bytes being read, which seems correct. However, when I perform strlen(buffer) I get only 4. Of course, this gives an error on a browser when the image tries to display. I have also tried manually setting the HTTP content length to 25253 but I then a receive a curl error 18, indicating the transfer ended early (as only 4 bytes exist).
As the other poster mentioned in their question, the 5th byte of the image (and I assume most .jpg images) is 0x00, but I am unsure if this has an effect on saving to the buffer.
I have verified the .jpg images I am loading are in the directory, valid, and display properly when opened normally. I have also tried 2 different methods of loading the binary data, and both also give only 4 bytes, so I am really at a loss. Any help is much appreciated.
When the request for the image is made and this code runs, fread()
returns 25253 bytes being read, which seems correct. However, when I
perform strlen(buffer) I get only 4.
Well there is your problem: You read binary data, not text, meaning that special characters like newline or the null character is not a something that indicates the structure of a text, its simple numbers.
strlen is a function to give you the count of characters other than '\0' or simply 0. However in a binary file like jpeg there a dozen of zeros usually in there, and because of a binary header structure, there seems to be always a zero at position 5 so, so strlen will stop at the first it found and return 4.
Also you seem confused by the fact that you try to send this "text interpreted" jpeg to a HTTP server. Of course it will complain, because you can not simply send binary data as text in HTTP, you either have to encode it, base64 is very popular, or set the content length header. Of course you also have to tell the HTTP client/server the type by setting the proper MIME header.

How to read in a large metadata file into an array in C++

I am currently coding a search algorithm to find all of the cases of a specific byte (char) array found in the metadata of a video. I am trying to load all of the contents of a very large file (about 2GB) into an array, but I keep getting a bad_alloc() exception when I run the program because of the size of the file.
I think one solution would be to create a buffer in order to "chunk" the contents of the file, but I am not sure how to go about coding this.
So far, my code looks like this:
string file = "metadata.ts";
ifstream fl(file);
fl.seekg(0, ios::end);
size_t len = fl.tellg();
char *byteArray = new char[len];
fl.seekg(0, ios::beg);
fl.read(byteArray, len);
fl.close();
It works for smaller video files, but when I try a file that's slightly under 2GB, it crashes with a bad_alloc() exception.
Thanks in advance for any help - I'm open to all solutions.
EDIT:
I have already checked out the other solutions on Stack OverFlow, and they are not exactly what I'm looking for. I am trying to "chunk" the data and use a buffer to put it into an array, which is not what the other solutions are doing.

zlib's compress function is not doing anything. Why?

before = new unsigned char[mSizeNeeded*4];
uLong value = compressBound(mSizeNeeded*4);
after = new unsigned char[value];
compress(after, &value, before, mSizeNeeded*4);
fwrite(&after, 1, value, file);
'before' has a bunch of audio data stored into it and I am trying to compress it and store it into 'after'. I then write it into a file. The file is the same size as the original file, it also contains the same data that was in before (as far as I can tell).
Compress also returns OK so I know that the compression is not failing.
Okay, so it looks like my only problem is somewhere in the compression (I think). I am able to run compress and then I can uncompress and get the correct data out. Also, it is writing into the file and fwrite returns 561152 but the count (value) is 684964. So it looks like something is wrong with fwrite. I looked more carefully and the after data is different than the before data.
561152 is the same size as the original audio data in a .wav file that I have (stripped of the .wav headers of course).
Based on your original text:
fwrite (&before, ...
I am trying to compress it and store it into 'after'. I then write it into a file.
I think not. You are writing the original data to the file, you should probably be writing after instead.
The other thing you should get in the habit of doing is checking return values from functions that you care about. In other words, compress() will tell you if a problem occurs yet you seem to be totally ignoring the possibility.
Similarly, fwrite() also uses its return value to indicate whether it was successful or not. Since you haven't included the code showing how that's set up, this is also a distinct possibility. In particular fwrite is under no obligation to write your entire block to the file in one hit (device may be full, etc), that's why it has a return value, so you can detect and adjust for that situation. Often, a better option than:
fwrite (&after, 1, value, file);
is:
fwrite (&after, value, 1, file);
since the latter will always give you one for a fully successful write, something else for a failure of some description.
That would be my first step in establishing where the problem lies.
On top of that, there are numerous other (generally-applicable) methods you can use to track down the issue, such as:
outputting all variables after they change or are set (like the return values of functions, after, before, value and so on).
delete the output file before running your program, to ensure it's created afresh.
run the code through a debugger so you can see what's happening under the covers.
clearing after to all zero bytes (or a known pattern) to ensure you don't get stale data in there.
And, as a final approach (given that the zlib source code is freely available), you can also modify (or debug into) it so that you can clearly see what's going on under the covers.

C++ read text line-by-line, speed/efficiency savings needed

I have a series of large text files (10s - 100s of thousands of lines) that I want to parse line-by-line. The idea is to check if the line has a specific word/character/phrase and to, for now, record to a secondary file if it does.
The code I've used so far is:
ifstream infile1("c:/test/test.txt");
while (getline(infile1, line)) {
if (line.empty()) continue;
if (line.find("mystring") != std::string::npos) {
outfile1 << line << '\n';
}
}
The end goal is to be writing those lines to a database. My thinking was to write them to the file first and then to import the file.
The problem I'm facing is the time taken to complete the task. I'm looking to minimize the time as far as possible, so any suggestions as to time savings on the read/write scenario above would be most welcome. Apologies if anything is obvious, I've only just started moving into C++.
Thanks
EDIT
I should say that I'm using VS2015
EDIT 2
So this was my own dumb fault, when switching to Release and changing the architecture type I had noticeable speed increases. Thanks to everyone for pointing me in that direction. I'm also looking at the mmap stuff and that's proving useful too. Thanks guys!
When you use ifstream to read and process to/from really big files, you have to increase the default buffer size that is used (normally 512 bytes).
The best buffer size depends on your needs, but as a hint you can use the partition block size of the file(s) your reading/writing. To know that information you can use a lot of tools or even code.
Example in Windows:
fsutil fsinfo ntfsinfo c:
Now, you have to create a new buffer to ifstream like this:
size_t newBufferSize = 4 * 1024; // 4K
char * newBuffer = new char[newBufferSize];
ifstream infile1;
infile1.rdbuf()->pubsetbuf(newBuffer, newBufferSize);
infile1.open("c:/test/test.txt");
while (getline(infile1, line)) {
/* ... */
}
delete newBuffer;
Do the same with the output stream and don't forget set new buffer before open file or it may not work.
You can play with values to find the very best size for you.
You'll note the difference.
C-style I/O functions are much faster than fstream.
You may use fgets/fputs to read/write each text line.

Write at specific position at a file with open()

Hello I am trying to simulate two programs that send and receive files in C++ from the network, something like client and server. To begin with I have to split a file to pages of 4096 bytes and send it to the other program in order to create the file. The way I send and receive files through the network is by write and read. So in the client programm I must create a function tha receives the packages and puts them into a file. I cannot figure a way to put the packages in to the file. For example I a file has 2 pages I must create another file using these 2 pages. Also i cannot know if they come in order so I must create the file and put them in the right position.
/*consider the connections are ok and the file's name is at char* name*/
int file=open(name,"O_CREAT | O_WRONLY,0666);
char buffer[4096];
int pagenumber;
for(int i=0;i<page_number;i++){
read(socket,&pagenumber,sizeof(int));
read(socket,buffer,sizeof(int));
write(file(pagenumber*4096),buffer,4096);
}
This code works for pagenumber=0 but for pagenumber=1 nothing happens! Can you help me? Thanks in advance!
To write at a certain position in the file you must use lseek
off_t lseek(int fd, off_t offset, int whence);
It takes the descriptor, the offset and the final parameter is a constant in these:
SEEK_SET The offset is set to offset bytes.
SEEK_CUR The offset is set to its current location plus offset bytes.
SEEK_END The offset is set to the size of the file plus offset bytes.
If you know how big is the file going to be, you can use ftruncate for it.
int ftruncate(int fd, off_t length);
Anyway even if you create a file that is huge, since most filesystems on Linux support sparse files, the actual file on disk will be the sum of the blocks that have been written.
The first argument to write() is a filedescriptor, which you optained with open(). So it should be
int file = open(...);
...
write(file,buffer,4096);
not
write(file(pagenumber*4096),buffer,4096);
Regarding the question as to how to write at a specific position. You can prepare the file beforehand with write, and then use seek() to position the file where you want to write at. For a description of seek you can look here.
Mario, first of all, lets no rely on garbage in 'pagenumber' to continue the loop (which is happening when loop boundary condition is checked here for the first time). Now, if you are writing page number '0' and then page following it, pagenumber will be initialized to 0 and your loop will come out. Also, please check bytes written and read in write and read system calls respectively.
try pwrite
int file=open(name,"O_CREAT | O_WRONLY,0666);
char buffer[4096];
int pagenumber;
for(int i=0;i<page_number;i++){
read(socket,&pagenumber,sizeof(int));
read(socket,buffer,sizeof(int));
pwrite(file,buffer,4096,4096*i);
}