read in mingw under windows does not read entire file. Why? - c++

Using mingw under windows the following code which works under linux does not work:
int fh = open(filename, O_RDONLY);
const int size=100000;
int bytesRead = read(fh, buffer, size);
The file is bigger than 100k yet bytes read is just 232. I think this has something to do with binary files in windows?
This code with ifstream will work in windows and Linux.
ifstream in(filename, ios::binary);
const int size=100000;
in.read(buffer, size);
Is there a way to make level 2 IO calls work on windows as well?

Reading manuals is very useful technics.
_read
Return Value
_read returns the number of bytes read, which might be less than buffer_size if there are fewer than buffer_size bytes left in the file, or if the file was opened in text mode. In text mode, each carriage return-line feed pair \r\n is replaced with a single line feed character \n. Only the single line feed character is counted in the return value. The replacement does not affect the file pointer.
Text and Binary Mode File I/O
File I/O operations take place in one of two translation modes, text or binary, depending on the mode in which the file is opened. Data files are usually processed in text mode.
Use the function _set_fmode to change the default mode for newly opened files. Use _get_fmode to find the current default mode. The initial default setting is text mode (_O_TEXT).
Change the default translation mode directly by setting the global variable _fmode in your program. The function _set_fmode sets the value of this variable, but it can also be set directly.
open(filename, O_RDONLY); opens files in text mode in Windows by default.
open(filename, O_RDONLY | O_BINARY); opens files in binary mode in Windows and further read will read all requested bytes if there's enough bytes left in the file.

Related

C++ - Missing end of line characters in file read

I am using the C++ streams to read in a bunch of files in a directory and then write them to another directory. Since these files may be of different types, I am using a the generic ios::binary flag when reading/writing these files.
Example code below:
std::fstream inf( "ex.txt", std::ios::in | std::ios::binary);
char c;
while( inf >> c ) {
// writing to another file in binary format
}
The issue I have is that in the case of files containing text, the end of line characters in these text files are not being written to the output file.
Edit: Or at least they do not appear to be as when the newly written file is opened, there is only a single continuous line of characters.
Edit again: The problem (of the continuous string) appears to persist even when the read / write is made in text mode.
Thus, I was wondering if there was a way to check if a file has text or binary and then read/write it appropriately. Else, is there any way to preserve the end of line characters even when opening the file in binary format?
Edit: I am using the g++ 4.8.2 compiler
When you want to manipulate bytes, you need to use read and write methods, not >> << operators.
You can get the intended behavior with inp.flags(inp.flags() & ~std::ios_base::skipws);, though.

Difference in using read/write when stream is opened with/without ios::binary mode

In my experiments with the following code snippet, I did not find any particular difference whether i created the streams with/without the ios:binary mode:
int main()
{
ifstream ostr("Main.cpp", ios::in | ios::binary | ios::ate);
if (ostr.is_open())
{
int size = ostr.tellg();
char * memBlock = new char[size + 1];
ostr.seekg(0, ios::beg);
ostr.read(memBlock, size);
memBlock[size] = '\0';
ofstream file("trip.cpp", ios::out | ios::binary);
file.write(memBlock, size);
ostr.close();
}
}
Here I am trying to copy the original source file into another file with a different name.
My question is what is the difference between the read/write calls(which are associated with binary file IO) when an fstream object is opened with/without ios::binary mode ?
Is there any advantage of using the binary mode ? when to and when not to use it when doing file IO ?
The only difference between binary and text mode is how the '\n' character is treated.
In binary mode there is no translation.
In text mode \n is translated on write into a the end of line sequence.
In text mode end of line sequence is translated on read into \n.
The end of line sequence is platform dependant.
Examples:
ASCII based systems:
LF ('\0x0A'): Multics, Mac OS X, BeOS, Amiga, RISC OS
CRLF ('\0x0D\0x0A'): Microsoft Windows, DEC TOPS-10, RT-11
CR: ('\0x0D'): TRS-80, Mac OS Pre X
RS: ('\0x1E'): QNX pre-POSIX implementation.
When you want to write files in binary, with no modifications taking place to your data, specify the ios::binary flag. When you want to write files in text mode, don't specify ios::binary, and you may get things like line ending translation. If you're on a UNIX-like platform, binary and text formats are the same, so you won't see any difference.

Mismatch between characters put and read

I'm trying to write a Huffman encoder but I'm getting some compression errors. I identified the problem as mismatches between characters that were put() to the ofstream and the characters read() from the same file.
One specific instance of this problem :
The put() writes ASCII character 10 (Line feed)
The read() reads ASCII character 13 (Carriage return)
I thought read and put read and write raw data ( no character translations ) I'm not sure why this is happening. Can someone help me out?
Here is the ofstream instance for writing the compressed file:
std::ofstream compressedFileStream(getCompressedFileName(),std::ios::binary||std::ios::ate);
and the ifstream instance for reading the same
std::ifstream fileInput(getFileName()+".huf",std::ios::binary);
The code is running on Windows 7 and all streams in the program are opened in binary mode.
Not opening in binary mode due to a typo:
std::ofstream compressedFileStream(getCompressedFileName(),std::ios::binary||std::ios::ate)
should be:
std::ofstream compressedFileStream(getCompressedFileName(),std::ios::binary|std::ios::ate)
// ^
|, not ||.
The symptoms show that you are creating the ofsteam with text mode or you are creating it using a filedesc that is opened in text mode.
You will want to pass ios::binary to it at construction time or it may run in text mode on Windows.
After you added the code, the reason proves to be a typo;
std::ios::binary||std::ios::ate
should be
std::ios::binary|std::ios::ate
On Windows, if you are writing binary data, you need to open the file with the appropriate attributes.
Similarly, if you are reading binary data, you need to open the file with the appropriate attributes.

feof() returning true when EOF is not reached

I'm trying to read from a file at a specific offset (simplified version):
typedef unsigned char u8;
FILE *data_fp = fopen("C:\\some_file.dat", "r");
fseek(data_fp, 0x004d0a68, SEEK_SET); // move filepointer to offset
u8 *data = new u8[0x3F0];
fread(data, 0x3F0, 1, data_fp);
delete[] data;
fclose(data_fp);
The problem becomes, that data will not contain 1008 bytes, but 529 (seems random). When it reaches 529 bytes, calls to feof(data_fp) will start returning true.
I've also tried to read in smaller chunks (8 bytes at a time) but it just looks like it's hitting EOF when it's not there yet.
A simple look in a hex editor shows there are plenty of bytes left.
Opening a file in text mode, like you're doing, makes the library translate some of the file contents to other stuff, potentially triggering a unwarranted EOF or bad offset calculations.
Open the file in binary mode by passing the "b" option to the fopen call
fopen(filename, "rb");
Is the file being written to in parallel by some other application? Perhaps there's a race condition, so that the file ends at wherever the read stops, when the read is running, but later when you inspect it the rest has been written. That would explain the randomness, too.
Maybe it's a difference between textual and binary file. If you're on Windows, newlines are CRLF, which is two characters in file, but converted to only one when read. Try using fopen(..., "rb")
I can't see your link from work, but if your computer claims no more bytes exist, I'd tend to believe it. Why don't you print the size of the file rather than doing things by hand in a hex editor?
Also, you'd be better off using level 2 I/O the f-calls are ancient C ugliness, and you're using C++ since you have new.
int fh =open(filename, O_RDONLY);
struct stat s;
fstat(fh, s);
cout << "size=" << hex << s.st_size << "\n";
Now do your seeking and reading using level 2 I/O calls, which are faster anyway, and let's see what the size of the file really is.

Difference between files written in binary and text mode

What translation occurs when writing to a file that was opened in text mode that does not occur in binary mode? Specifically in MS Visual C.
unsigned char buffer[256];
for (int i = 0; i < 256; i++) buffer[i]=i;
int size = 1;
int count = 256;
Binary mode:
FILE *fp_binary = fopen(filename, "wb");
fwrite(buffer, size, count, fp_binary);
Versus text mode:
FILE *fp_text = fopen(filename, "wt");
fwrite(buffer, size, count, fp_text);
I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:
line feeds ('\n') will be translated to '\r\n" sequences on output
carriage return/line feed sequences will be translated to line feeds on input.
If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.
In text mode, a newline "\n" may be converted to a carriage return + newline "\r\n"
Usually you'll want to open in binary mode. Trying to read any binary data in text mode won't work, it will be corrupted. You can read text ok in binary mode though - it just won't do automatic translations of "\n" to "\r\n".
See fopen
Additionally, when you fopen a file with "rt" the input is terminated on a Crtl-Z character.
Another difference is when using fseek
If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Some binary streams may not support the SEEK_END.
If the stream is open in text mode, the only supported values for offset are zero (which works with any origin) and a value returned by an earlier call to std::ftell on a stream associated with the same file (which only works with origin of SEEK_SET.
Even though this question was already answered and clearly explained, I think it would be interesting to show the main issue (translation between \n and \r\n) with a simple code example. Note that I'm not addressing the issue of the Crtl-Z character at the end of the file.
#include <stdio.h>
#include <string.h>
int main() {
FILE *f;
char string[] = "A\nB";
int len;
len = strlen(string);
printf("As you'd expect string has %d characters... ", len); /* prints 3*/
f = fopen("test.txt", "w"); /* Text mode */
fwrite(string, 1, len, f); /* On windows "A\r\nB" is writen */
printf ("but %ld bytes were writen to file", ftell(f)); /* prints 4 on Windows, 3 on Linux*/
fclose(f);
return 0;
}
If you execute the program on Windows, you will see the following message printed:
As you'd expect string has 3 characters... but 4 bytes were writen to file
Of course you can also open the file with a text editor like Notepad++ and see yourself the characters:
The inverse conversion is performed on Windows when reading the file in text mode.
We had an interesting problem with opening files in text mode where the files had a mixture of line ending characters:
1\n\r
2\n\r
3\n
4\n\r
5\n\r
Our requirement is that we can store our current position in the file (we used fgetpos), close the file and then later to reopen the file and seek to that position (we used fsetpos).
However, where a file has mixtures of line endings then this process failed to seek to the actual same position. In our case (our tool parses C++), we were re-reading parts of the file we'd already seen.
Go with binary - then you can control exactly what is read and written from the file.
In 'w' mode, the file is opened in write mode and the basic coding is 'utf-8'
in 'wb' mode, the file is opened in write -binary mode and it is resposible for writing other special characters and the encoding may be 'utf-16le' or others