I have a large file containing strings. I have to read this file and store it in a buffer using C or C++. I tried to do it as follows:
FILE* file = fopen(fileName.c_str(), "r");
assert(file != NULL);
size_t BUF_SIZE = 10 * 1024 * 1024;
char* buf = new char[BUF_SIZE];
string contents;
while (!feof(file))
{
int ret = fread(buf, BUF_SIZE, 1, file);
assert(ret != -1);
contents.append(buf);
}
The data in the file would be the strings and i have to find the character with maximum frequency.
Is it possible to optimize the code more than this ? Will using BinaryReader improve optimisation ? Could you share some more ways if you know?
Related
Scenario: I have a file that is 8,203,685 bytes long in binary, and I am using fread() to read in the file.
Problem: Hexdumping the data after the fread() on both Linux and Windows yields different results. Both hexdump files are the same size, but on Linux it matches the original input file that went in, whereas on Windows starting at byte 8,200,193 the rest of the hexdump contains 0's.
Code:
int main(void)
{
FILE * fp = fopen("input.exe", "rb");
unsigned char * data = NULL;
long size = 0;
if (fp)
{
fseek(fp, 0, SEEK_END);
size = ftell(fp);
fseek(fp, 0, SEEK_SET);
data = (unsigned char *)malloc(size);
size_t read_bytes = fread(data, 1, size, fp);
// print out read_bytes, value is equal to size
// Hex dump using ofstream. Hexdump file is different here on Windows vs
// on Linux. Last ~3000 bytes are all 0's on Windows.
std::ofstream out("hexdump.bin", std::ios::binary | std::ios::trunc);
out.write(reinterpret_cast<char *>(data), size);
out.close();
FILE * out_file = fopen("hexdump_with_FILE.bin", "wb");
fwrite(data, 1, size, out_file);
fflush(out_file);
fclose(out_file);
}
if (fp) fclose(fp);
if (data) free(data);
return 0;
}
Has anyone seen this behavior before, or have an idea of what might be causing the behavior that I am seeing?
P.S. Everything works as expected when using ifstream and its read function
Thanks!
I have a binary data in c++ variable buffer as below:
int len = 39767;
uint16_t * buffer = (uint16_t) malloc(len);
FILE * fp = fopen("rahul.jpg", "rb"); // size of file is 39767 byte.
fread(buffer, len, 1, fp);
fclose(fp);
fp = fopen("rahul3.jpg", "wb");
fwrite(buffer, len, 1, fp); // Here it is written correct.
fclose(fp);
I want to pass this buffer to Node.js and write to a file. I used below line to convert it to Local and then wrapping to an Object:
Local<String> str = Nan::New(buffer).ToLocalChecked();
return scope.Escape(str);
But, in node.js when I am checking length of received data it prints 9 only and value seems corrupted.
console.log(data);
console.log("len = " + data.length );
fs.writeFileSync('rahul2.jpg', data, 'binary');
Here rahul2.jpg is corrupted and is of 9 bytes only. How can we get rahul2.jpg from node.js code same as rahul.jpg in c++? Which Nan::New() we should use to pass binary data unaffected? Please help. Thanks.
Try something like this:
Local<Value> returnValue = Nan::CopyBuffer(buffer, len).ToLocalChecked();
and, by the way, fread returns the number of bytes read from file, so it is better to do as follows:
int truelen = fread(buffer, len, 1, fp);
. . .
fwrite(buffer, truelen, 1, fp);
Local<String> str = Nan::NewOneByteString((uint8_t *) buffer, len).ToLocalChecked();
I used above code in C++ which solved the issue. In node.js used the below code to write to file:
fs.writeFileSync('rahul2.jpg', data, 'binary');
Thanks.
I have a binary file I want to transmit and basically I was wondering if I converted the c_string into a string, whether that would have an effect on the end result, because I sent a c_string after using read() and made sure it read for binary files and not text file, but then I put it in a string and converted back to c_string. If that's no good, is there a simple way to get it back to binary form?
FILE *file = fopen(filename, "ab");
int size = 0;
do{
size = recvfrom(s, buffer, 128, 0, (LPSOCKADDR) &sa_in, &senderSize);
if(size > 0)
{
fwrite(buffer, sizeof(char), size, file);
}
}while(size > 0);
c_string(binary) turns into string and then turns back into c_string.
FILE *file = fopen(filename, "ab");
int size = 0;
do{
size = recvfrom(s, buffer, 128, 0, (LPSOCKADDR) &sa_in, &senderSize);
if(size > 0)
{
string bufferstring(buffer);
strcpy(buffer, bufferstring);
fwrite(buffer, sizeof(char), size, file);
}
}while(size > 0);
Doing this:
string bufferstring(buffer);
means to use a null-terminated string as the input. The data in buffer is probably not a null-terminated string of exactly length 127. If it's shorter you have data loss, and if there is no null terminator in buffer then you cause undefined behaviour.
The next line, strcpy(buffer, bufferstring); doesn't even compile; std::string cannot be used as argument to strcpy.
After that you write from buffera which isn't even defined.
Was there some problem with your first version of code that makes you want to change it?
Binary data may have NULs mixed in, so the line
string bufferstring(buffer);
may truncate the data. This line:
strcpy(buffer, bufferstring);
has the same problem of truncation, and also, you need to call std::string::c_str() to get the char * representation. Use memcpy() to avoid truncation.
Lastly, I don't like the do...while() pattern.
while((size = recvfrom(s, buffer, 128, 0, (LPSOCKADDR) &sa_in, &senderSize)) > 0) {
string bufferstring(buffer, size);
memcpy(buffer, bufferstring.c_str(), bufferstring.size());
fwrite(buffer, sizeof(char), size, file);
}
If I am reading a file in c++ like this:
//Begin to read a file
FILE *f = fopen("vids/18.dat", "rb");
fseek(f, 0, SEEK_END);
long pos = ftell(f);
fseek(f, 0, SEEK_SET);
char *m_sendingStream = (char*)malloc(pos);
fread(m_sendingStream, pos, 1, f);
fclose(f);
//Finish reading a file
I have 2 questions first: Is this reading the entire file? (I want it to do so), and 2nd how can I create a while that continues until reaching the end of the file? I have:
while(i < sizeof(m_sendingStream))
but I am not sure if this works, I've been reading around (I've never programmed in c++ before) and I thought I could use eof() but apparently that's bad practice.
A loop should not be necessary when reading from a file, since you will get the entire contents with your code in one go. You should still record and check the return value, of course:
size_t const n = fread(buf, pos /*bytes in a record*/, 1 /*max number of records to read*/, f);
if (n != 1) { /* error! */ }
You can also write a loop that reads until the end of the file without knowing the file size in advance (e.g. read from a pipe or growing file):
#define CHUNKSIZE 65536
char * buf = malloc(CHUNKSIZE);
{
size_t n = 0, r = 0;
while ((r = fread(buf + n, 1 /*bytes in a record*/, CHUNKSIZE /*max records*/, f)) != 0)
{
n += r;
char * tmp = realloc(buf, n + CHUNKSIZE);
if (tmp) { buf = tmp; }
else { /* big fatal error */ }
}
if (!feof(f))
{
perror("Error reading file");
}
}
This is the C style of working with files, the C++ style would be using the fstream library.
And about your second question, a good way to check wether you are on the end of the file or not, would be to use the feof function.
I try to read data from one PNG file, and want to write this data to the new file and save it.
I do such stuff like that:
FILE *fp = fopen("C:\\dev\\1.png", "rb");
fseek(fp, 0, SEEK_END);
long size = ftell(fp);
rewind(fp);
char *buffer = (char*)malloc(sizeof(char)*size);
size_t result = fread(buffer, 1, size, fp);
FILE *tmpf = fopen("C:\\dev\\1_1.png", "wb");
fputs(buffer, tmpf);
fflush(tmpf);
fclose(tmpf);
I've got problem, that second file only has in its content, only that: ‰PNG SUB
In debugging , I have checked, long size = 652521, and size_t result has got the same size...
Don't understand, why I can't write all data to the second file...
Don't use fputs - use fwrite - fputs is for strings and will terminate on the first zero byte.
Change:
fputs(buffer, tmpf);
to:
fwrite(buffer, 1, size, tmpf);