Ifstream read strange behavior - c++

I'm using C++ to read some chars from a file and store them in a buffer, however, I'm witnessing strange behavior with ifstream's read function.
To start with, I'm using this code snippet to get the file's length:
input.seekg (0, input.end);
int length = input.tellg();
input.seekg (0, input.beg);
After that, I call read() to get length bytes from the file.
It works fine, except for one thing:
If I use input.gcount() to see how many bytes were read, this number is much less that the length of the file we got above (but shows the actual nuber of bytes in the file).
Do you guys know anything about the difference between the file's length, found by using tellg(), and the number of bytes read afterwards, as reported by gcount()?
Sorry for any formatting issues (I'm using my phone).
Thanks a lot.
Edit :
That's the code (more or less) I'm using:
ifstream input("test.txt");
input.seekg (0, input.end);
int length = input.tellg();
input.seekg (0, input.beg);
input.read(buffer,length);
int extracted = input.gcount();

Fstream's tellg / seekg returning higher value than expected
Just found this link... It explains it nicely!
Turns out I need to search a little bit more before posting...
Thank you all for your answers

Related

How to read in a large metadata file into an array in C++

I am currently coding a search algorithm to find all of the cases of a specific byte (char) array found in the metadata of a video. I am trying to load all of the contents of a very large file (about 2GB) into an array, but I keep getting a bad_alloc() exception when I run the program because of the size of the file.
I think one solution would be to create a buffer in order to "chunk" the contents of the file, but I am not sure how to go about coding this.
So far, my code looks like this:
string file = "metadata.ts";
ifstream fl(file);
fl.seekg(0, ios::end);
size_t len = fl.tellg();
char *byteArray = new char[len];
fl.seekg(0, ios::beg);
fl.read(byteArray, len);
fl.close();
It works for smaller video files, but when I try a file that's slightly under 2GB, it crashes with a bad_alloc() exception.
Thanks in advance for any help - I'm open to all solutions.
EDIT:
I have already checked out the other solutions on Stack OverFlow, and they are not exactly what I'm looking for. I am trying to "chunk" the data and use a buffer to put it into an array, which is not what the other solutions are doing.

Sending Binary File Data via Google Protobuf

I have my protobuf-message set up fine it seems, all other fields I have transmit correctly across the network and do not truncate. I only have one problem, when I read the binary data of a picture or file then send it through google protobuf as bytes array type, on the other side it only contains the first 4 elements of the array. If the picture is say 200kb, on the other end it comes out as 1kb(Basically only contains a header or identifier). This problem is kinda complex so I will try to give a run down. Sorry if I make this impossible to understand. I may be going about this completely the wrong way.
Example below contains conceptual work, and was written in class. It very well could contain small errors. The code compiles at home, and if it is a typo let me know and I can fix it.
FILE* file;
FILE* ofile;
file = fopen("red.png", "rb");
fseek(file, 0, SEEK_END);
long fSize = ftell(file);
rewind(file);
BYTE* ret = new BYTE[fSize];
fread(ret, 1, fSize, file);
fclose(file);
char dataStream[1024] //yes it is large enough
myPacket.set_file(ret);
//set other fields here
myPacket.SerializeToArray(dataStream,sizeof(dataStream));
//send through sockets below, works for all but file field.
I can include more when I get back home to my main work computer, sorry, was just hoping I could let this stew while at class. If this is not enough info feel free to give me the smack down, it's alright just looking for advice. I also know that certain image formats can be read certain ways, but I was able to copy a png and rewrite it through binary locally, just not over protobuf
Thanks for reading my pseudo book guys, I am finally trying to leap into improving my knowledge.
Edited quickly typed pointer error(&ret) to (ret). Also then should size of be sizeof(myPacket) rather.
You have written this:
char dataStream[1024] //yes it is large enough
But how could 1024 bytes buffer be large enough if you want to store 200 000 bytes into it?
Better allocate a bigger buffer on the heap, e.g.:
std::vector<char> dataStream(500000);
myPacket.SerializeToArray(&dataStream[0], dataStream.size());

How to correctly buffer lines using fread

I want to use fread() for IO for some reason (speed and ...). i have a file with different sized lines. by using a code like:
while( !EOF ){
fread(buffer,500MB,1,fileName);
// process buffer
}
the last line may read incompletely and we have to read the last line again in next trial, so how to force fread() to continue from the beginning of the last line?
or if possible how to force fread() to read more than 500MB untill reaching a \n or another specific character?
Thanks All
Ameer.
Assuming a bufferof bytes that you have reverse found a \n character in at position pos, then you want to roll back to the length of the buffer minus this pos. Call this step.
You can use fseek to move the file pointer back by this much:
int fseek( FILE *stream, long offset, int origin );
In your case
int ret = fseek(stream, -step, SEEK_END);
This will involve re-reading part of the file, and a fair bit of jumping around - the comments have suggested alternative ways that may be quicker.

Is istream::seekg with ios_base::end reliable?

Consider I created a file using this way:
std::ofstream osf("MyTextFile.txt");
string buffer="spam\neggs\n";
osf.write(buffer,buffer.length());
osf.close();
When I was trying to read that file using the following way, I realized that more characters than present was read.
std::ifstream is("MyTextFile.txt");
is.seekg (0, is.end);
int length = is.tellg();
is.seekg (0, is.beg);
char * buffer = new char [length];
is.read (buffer,length);
//work with buffer
delete[] buffer;
For example, if the file contains spam\neggs\n, then this procedure reads 12 characters instead of 10. The first 10 chars are spam\neggs\n as expected but there are 2 more, which have the integer value 65533.
Moreover, this problem happens only when \n is present in the file. For example, there is no problem if file contains spam\teggs\t instead.
The question is;
Am I doing anything wrong? Or doesn't this procedure work as it should do?
Bonus Q: Can you suggest an alternative for reading the whole file at once?
Note: I found this way here.
The problem is that you wrote the string
"spam\neggs\n"
initially to an ofstream, without setting the std::ios::binary flag at the open (or on the initializator). This causes the runtime to translate to the "native text format", i. e., to convert each \n to \r\n on the output (as you are on Windows OS). So, after being written, the contents of your file was actually:
"spam\r\neggs\r\n"
(i. e., 12 chars). That was returned by
int length = is.tellg();
But, when you tried to read 12 chars you got
"spam\neggs\n"
back, because the runtime converted each \r\n back to \n.
As a final advice, please, please, don't use new char[length]... use std::string and reserve so you won't leak memory etc. And if your file can be very big, maybe it's not a good idea to slurp the whole file to memory at once, also.
Just an idea, since the number 2 corresponds to the count of \ns: Are you doing this on Windows? It might have something to do with the file actually containing \r\n. What happens if you open the file in binary mode (std::ios::binary)?
Can you suggest an alternative for reading the whole file at once?
Yes:
std::ifstream is("MyTextFile.txt");
std::string str( std::istreambuf_iterator<char>{is}, {} ); // requires <iterator>
str now contains the file. Does this solve your problem?

feof() returning true when EOF is not reached

I'm trying to read from a file at a specific offset (simplified version):
typedef unsigned char u8;
FILE *data_fp = fopen("C:\\some_file.dat", "r");
fseek(data_fp, 0x004d0a68, SEEK_SET); // move filepointer to offset
u8 *data = new u8[0x3F0];
fread(data, 0x3F0, 1, data_fp);
delete[] data;
fclose(data_fp);
The problem becomes, that data will not contain 1008 bytes, but 529 (seems random). When it reaches 529 bytes, calls to feof(data_fp) will start returning true.
I've also tried to read in smaller chunks (8 bytes at a time) but it just looks like it's hitting EOF when it's not there yet.
A simple look in a hex editor shows there are plenty of bytes left.
Opening a file in text mode, like you're doing, makes the library translate some of the file contents to other stuff, potentially triggering a unwarranted EOF or bad offset calculations.
Open the file in binary mode by passing the "b" option to the fopen call
fopen(filename, "rb");
Is the file being written to in parallel by some other application? Perhaps there's a race condition, so that the file ends at wherever the read stops, when the read is running, but later when you inspect it the rest has been written. That would explain the randomness, too.
Maybe it's a difference between textual and binary file. If you're on Windows, newlines are CRLF, which is two characters in file, but converted to only one when read. Try using fopen(..., "rb")
I can't see your link from work, but if your computer claims no more bytes exist, I'd tend to believe it. Why don't you print the size of the file rather than doing things by hand in a hex editor?
Also, you'd be better off using level 2 I/O the f-calls are ancient C ugliness, and you're using C++ since you have new.
int fh =open(filename, O_RDONLY);
struct stat s;
fstat(fh, s);
cout << "size=" << hex << s.st_size << "\n";
Now do your seeking and reading using level 2 I/O calls, which are faster anyway, and let's see what the size of the file really is.