fstream::read() read empty if input size too big - c++

I have tried to read a file by using istream& read (char* s, streamsize n). I have read the description at: http://www.cplusplus.com/reference/istream/istream/read/ saying
If the input sequence runs out of characters to extract (i.e., the end-of-file is reached) before n characters have been successfully read, the array pointed to by s contains all the characters read until that point, and both the eofbit and failbit flags are set for the stream.
Because of that I have put the n with a very large number because I trust the caller that able to allocate enough buffer to read. But I always receive 0 read, I have tried following code to read txt file with 90 bytes:
std::wstring name(L"C:\\Users\\dle\\Documents\\01_Project\\01_VirtualMachine\\99_SharedFolder\\lala.txt");
std::ifstream ifs;
ifs.open(name, ifstream::binary | ifstream::in);
if (ifs)
{
// get length of file:
ifs.seekg(0, ifs.end);
int length = ifs.tellg();
ifs.seekg(0, ifs.beg);
char *buffer = new char[length];
ifs.read(buffer, UINT32_MAX);
int success = ifs.gcount();
cout << "success: " << success << endl;
cout << "size: " << size;
ifs.close();
}
I even tried with smaller number, eg: 500,000 and it still failed. I have realized that the "n" and the size of file related somehow, the "n" could not be larger than file size too much or else it will read empty....
I know we could fix that easily by putting correct size to read() but I wonder why it happened like that? I should read till EOF then stop right? Could anyone explain to me why please?
EDIT: I just simply want to read to EOF by utilizing istream& read without caring about file size. According to the definition of istream& read(char*s, streamsize n)it should work.

ifs.read(buffer, UINT32_MAX);
The second parameter to fstream::read is std::streamsize, which is defined as (emphasis mine)...
...a signed integral type...
I therefore guess (as I don't have a Windows environment to test on at this point) that you're working on a machine where std::streamsize is 32bit, and you're looking at your UINT32_MAX ending up as a -1 (and #john testing on a machine where sizeof( std::streamsize ) > 4 so that his UINT32_MAX doesn't wrap into the negative.)
Try again with std::numeric_limits< std::streamsize >::max()... or even better yet, use length because, well, you have the file size right there and don't have to rely on the EOF behavior of fstream::read to save you.
I am not sure whether C++ changed the definition of streams from what the C standard says, but note that C's definition on binary streams states that they...
...may, however, have an implementation-defined number of null characters appended to the end of the stream.
So your, or the user's, assumption that a buffer big enough to hold the data written earlier is big enough to hold the data read till EOF might actually fail.

Related

read the whole binary into a buffer then resolve it in specific format

Here is my C++ homework. Given a binary file, this file consists of some data units. Every data unit contains two parts. The first part is 1 char and the second part is 1 int. Read the whole file into a buffer at a time and then extract all data units from the buffer.
Now I've read the file into a buffer successfully like this:
char* readBinaryFile(const char* fileName) {
ifstream file(fileName, ios::binary || ios::ate);
// get the size of file
streampos beg, end;
beg = file.tellg();
file.seekg(0,ios::end);
end = file.tellg();
long size = end - beg;
char* buffer = new char[size];
// now read the file into buffer
file.seekg(0, ios::beg);
file.read(buffer, size);
file.close();
return buffer;
}
So my problem is how can I get the data unit from the buffer?
I'm not going to write the code for you, but think about this for a moment...
At buffer[0] is your first char. At buffer[1] through buffer[4] is your first int. It repeats, so buffer[5] is the character for the second set of data.
There are five bytes for the character and the int together. If you know the amount of data you've read, you could divide that by 5 and know the number of "sets" of data there is.
You can now use something like a for loop to iterate from zero to the numbers of sets minus one. Let's say this iterator variable is i, then you could access the character of each "set" of data with buffer[i * 5], the first byte of the int at buffer[i * 5 + 1], etc.
So, a for loop and a little bit of math will help you extract the information from that buffer. You'll have 5 individual bytes, and you'll need to reassemble 4 of those bytes back into an int. There are a variety of ways of accomplishing this, which I'll let you attempt to discover.
Could your issue stem from the fact that you're using:
ios::binary || ios::ate
when I think you mean:
ios::binary | ios::ate
The former evaluates to "1", since binary logical-or at-end is "true", the latter is a bitmask that says "open this file in binary mode, and at the end". The way you have written it is actually the equivalent of
ios::app

ifstream and oftream issue

Just this:
int size = getFileSize(path); //Listed below
ifstream fs(path, ios::in);
ofstream os(path2, ios::out);
//Check - both streams are valid
char buff[CHUNK_SIZE]; //512
while (size > CHUNK_SIZE)
{
fs >> buff;
os << buff;
size -= CHUNK_SIZE;
}
char* dataLast = new char[size];
fs>>dataLast;
os<<dataLast;
fs.close();
os.close();
//Found on SO, works fine
int getFileSize(string path)
{
FILE *pFile = NULL;
if (fopen_s( &pFile, path.c_str(), "rb" ))
{
return 0;
}
fseek( pFile, 0, SEEK_END );
int Size = ftell( pFile );
fclose( pFile );
return Size;
}
File at path2 is corrupted and less then 1 Kb. (initial file is 30Kb);
I don't need advices how to copy file, I am curios what is wrong about this example.
First an important warning: Never (as in really never) use the formatted input operator for char* without setting the width()! You open yourself up to a buffer overrun. This is basically the C++ version of writing gets() which was bad enough to be removed (not just deprecated) from the C standard! If you insist in using formatted input with char* (normally you are much better off using std::string), set the width, e.g.:
char buffer[512];
in >> std::setw(sizeof(buffer) >> buffer;
OK, with this out of the way: it seems you actually want to change two important things:
You probably don't want to use formatted input, i.e., operator>>(): the formatted input operators start off with skipping whitespace. When reading into char* it also stops when reaching a whitespace (or when the width() is non-zero when having read a much characters and still space to store a terminating zero; note that the set width() will be reset to 0 after each of these reads). That is you probably want to use unformatted input, e.g., in.read(buffer, sizeof(buffer)) which sets in.gcount() to the number of characters actually read which may be less then size parameter, e.g., at the end of the stream.
You probably should open the file in std::ios_base::binary mode. Although it doesn't matter on some systems (e.g., POSIX systems) on some systems reading in text mode merges a line end sequence, e.g. \r\n on Windows, into the line end character \n. Likewise, when writing a \n in text mode, it will be replaced by a line end sequence on some system, i.e., you probably also want to open the output stream in text mode.
Th input and output operators, when used with strings (like buff is from the libraries point of view), reads space-delimited words, only.
If you want to read chunks, then use std::istream::read, and use std::istream::gcount to get the number of bytes actually read. Then write with std::ostream::write.
And if the data in the file is binary, you should use the binary open mode.

C++ reading leftover data at the end of a file

I am taking input from a file in binary mode using C++; I read the data into unsigned ints, process them, and write them to another file. The problem is that sometimes, at the end of the file, there might be a little bit of data left that isn't large enough to fit into an int; in this case, I want to pad the end of the file with 0s and record how much padding was needed, until the data is large enough to fill an unsigned int.
Here is how I am reading from the file:
std::ifstream fin;
fin.open('filename.whatever', std::ios::in | std::ios::binary);
if(fin) {
unsigned int m;
while(fin >> m) {
//processing the data and writing to another file here
}
//TODO: read the remaining data and pad it here prior to processing
} else {
//output to error stream and exit with failure condition
}
The TODO in the code is where I'm having trouble. After the file input finishes and the loop exits, I need to read in the remaining data at the end of the file that was too small to fill an unsigned int. I need to then pad the end of that data with 0's in binary, recording enough about how much padding was done to be able to un-pad the data in the future.
How is this done, and is this already done automatically by C++?
NOTE: I cannot read the data into anything but an unsigned int, as I am processing the data as if it were an unsigned integer for encryption purposes.
EDIT: It was suggested that I simply read what remains into an array of chars. Am I correct in assuming that this will read in ALL remaining data from the file? It is important to note that I want this to work on any file that C++ can open for input and/or output in binary mode. Thanks for pointing out that I failed to include the detail of opening the file in binary mode.
EDIT: The files my code operates on are not created by anything I have written; they could be audio, video, or text. My goal is to make my code format-agnostic, so I can make no assumptions about the amount of data within a file.
EDIT: ok, so based on constructive comments, this is something of the approach I am seeing, documented in comments where the operations would take place:
std::ifstream fin;
fin.open('filename.whatever', std::ios::in | std::ios::binary);
if(fin) {
unsigned int m;
while(fin >> m) {
//processing the data and writing to another file here
}
//1: declare Char array
//2: fill it with what remains in the file
//3: fill the rest of it until it's the same size as an unsigned int
} else {
//output to error stream and exit with failure condition
}
The question, at this point, is this: is this truly format-agnostic? In other words, are bytes used to measure file size as discrete units, or can a file be, say, 11.25 bytes in size? I should know this, I know, but I've got to ask it anyway.
Are bytes used to measure file size as discrete units, or can a file be, say, 11.25 bytes in size?
No data type can be less than a byte, and your file is represented as an array of char meaning each character is one byte. Thus it is impossible to not get a whole number measure in bytes.
Here is step one, two, and three as per your post:
while (fin >> m)
{
// ...
}
std::ostringstream buffer;
buffer << fin.rdbuf();
std::string contents = buffer.str();
// fill with 0s
std::fill(contents.begin(), contents.end(), '0');

C++ copying files. Short on data

I'm trying to copy a file, but whatever I try, the copy seems to be a few bytes short.
_file is an ifstream set to binary mode.
void FileProcessor::send()
{
//If no file is opened return
if(!_file.is_open()) return;
//Reset position to beginning
_file.seekg(0, ios::beg);
//Result buffer
char * buffer;
char * partBytes = new char[_bufferSize];
//Packet *p;
//Read the file and send it over the network
while(_file.read(partBytes,_bufferSize))
{
//buffer = Packet::create(Packet::FILE,std::string(partBytes));
//p = Packet::create(buffer);
//cout<< p->getLength() << "\n";
//writeToFile(p->getData().c_str(),p->getLength());
writeToFile(partBytes,_bufferSize);
//delete[] buffer;
}
//cout<< *p << "\n";
delete [] partBytes;
}
_writeFile is the file to be written to.
void FileProcessor::writeToFile(const char *buffer,unsigned int size)
{
if(_writeFile.is_open())
{
_writeFile.write(buffer,size);
_writeFile.flush();
}
}
In this case I'm trying to copy a zip file.
But opening both the original and copy in notepad I noticed that while they look identical , It's different at the end where the copy is missing a few bytes.
Any suggestions?
You are assuming that the file's size is a multiple of _bufferSize. You have to check what's left on the buffer after the while:
while(_file.read(partBytes,_bufferSize)) {
writeToFile(partBytes,_bufferSize);
}
if(_file.gcount())
writeToFile(partBytes, _file.gcount());
Your while loop will terminate when it fails to read _bufferSize bytes because it hits an EOF.
The final call to read() might have read some data (just not a full buffer) but your code ignores it.
After your loop you need to check _file.gcount() and if it is not zero, write those remaining bytes out.
Are you copying from one type of media to another? Perhaps different sector sizes are causing the apparent weirdness.
What if _bufferSize doesn't divide evenly into the size of the file...that might cause extra bytes to be written.
You don't want to always do writeToFile(partBytes,_bufferSize); since it's possible (at the end) that less than _bufferSize bytes were read. Also, as pointed out in the comments on this answer, the ifstream is no longer "true" once the EOF is reached, so the last chunk isn't copied (this is your posted problem). Instead, use gcount() to get the number of bytes read:
do
{
_file.read(partBytes, _bufferSize);
writeToFile(partBytes, (unsigned int)_file.gcount());
} while (_file);
For comparisons of zip files, you might want to consider using a non-text editor to do the comparison; HxD is a great (free) hex editor with a file compare option.

C++ Read Binary file containing numbers of type double

I have a binary file that contains numbers of a type double.
The example input file is available here: www.bobdanani.net/download/A.0.0
I would like to read the file and print the numbers in it.
This is what I have done:
char* buffer;
int length;
string filename = "A.0.0";
ifs.open (filename.c_str(), ios::in | ios::binary);
// get length of file:
ifs.seekg (0, ios::end);
length = ifs.tellg();
ifs.seekg (0, ios::beg);
// allocate memory:
buffer = new char [length];
// read data as a block:
ifs.read (buffer,length);
ifs.close();
cout.write (buffer,length);
cout << buffer << endl;
delete[] buffer;
I have also tried to use a type casting to double when printing the number, but I got strange characters. What is the best way to do this? I need the data of this binary file as an input to a function for a parallel program. But this is out of the scope of this question.
While I could be wrong, since you said the number is separated by a tab/space, I'm willing to be this is actually ASCII data, and not raw binary data. Therefore the best way to work with the floating point value would be to use the operator>> on the ifstream object and then push that into a double. That will do an automatic conversion of the input value into a double, where-as what you've done will merely copy the character bytes that compose a floating point value, but are not a floating point value themselves. Additionaly, if you were trying to output your buffer like a string, you haven't explicitly null-terminated it, so it's going to keep reading up the stack until it encounters a null-terminator or you get a segmentation fault due to accessing memory the OS isn't allowing you to access off the top of the stack. But either way, in the end, your buffer won't be a representation of a double data-type.
So you would have something like:
double my_double_val;
ifs.open (filename.c_str());
if (ifs)
{
ifs >> my_double_val;
}
else
{
cerr << "Error opening file" << endl;
}
ifs.close();
cout << "Double floating point value: " << my_double_val << endl;
cout.write (buffer,length);
Don't do this! The above is going to dump the binary data to standard output.
cout << buffer << endl;
Don't do this either! The above will dump the binary data up to the first byte that happens to be zero to standard output. If there is no such byte, this will just keep on going past the end of the buffer (so undefined behavior).
If the buffer truly does contain doubles, and only doubles, you can do something nasty like
double * dbuf = reinterpret_cast<double*>(buffer);
int dlength = length / sizeof(double);
Use the system function call in C++ (assuming you are using unix OS) and pass 'od -e filename' as the argument of the system function call. And then you can easily pipe the values that it returned and read them. This is one approach. Of course there are many other approaches to do this.