Difference between files written in binary and text mode - c++

What translation occurs when writing to a file that was opened in text mode that does not occur in binary mode? Specifically in MS Visual C.
unsigned char buffer[256];
for (int i = 0; i < 256; i++) buffer[i]=i;
int size = 1;
int count = 256;
Binary mode:
FILE *fp_binary = fopen(filename, "wb");
fwrite(buffer, size, count, fp_binary);
Versus text mode:
FILE *fp_text = fopen(filename, "wt");
fwrite(buffer, size, count, fp_text);

I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:
line feeds ('\n') will be translated to '\r\n" sequences on output
carriage return/line feed sequences will be translated to line feeds on input.
If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.

In text mode, a newline "\n" may be converted to a carriage return + newline "\r\n"
Usually you'll want to open in binary mode. Trying to read any binary data in text mode won't work, it will be corrupted. You can read text ok in binary mode though - it just won't do automatic translations of "\n" to "\r\n".
See fopen

Additionally, when you fopen a file with "rt" the input is terminated on a Crtl-Z character.

Another difference is when using fseek
If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Some binary streams may not support the SEEK_END.
If the stream is open in text mode, the only supported values for offset are zero (which works with any origin) and a value returned by an earlier call to std::ftell on a stream associated with the same file (which only works with origin of SEEK_SET.

Even though this question was already answered and clearly explained, I think it would be interesting to show the main issue (translation between \n and \r\n) with a simple code example. Note that I'm not addressing the issue of the Crtl-Z character at the end of the file.
#include <stdio.h>
#include <string.h>
int main() {
FILE *f;
char string[] = "A\nB";
int len;
len = strlen(string);
printf("As you'd expect string has %d characters... ", len); /* prints 3*/
f = fopen("test.txt", "w"); /* Text mode */
fwrite(string, 1, len, f); /* On windows "A\r\nB" is writen */
printf ("but %ld bytes were writen to file", ftell(f)); /* prints 4 on Windows, 3 on Linux*/
fclose(f);
return 0;
}
If you execute the program on Windows, you will see the following message printed:
As you'd expect string has 3 characters... but 4 bytes were writen to file
Of course you can also open the file with a text editor like Notepad++ and see yourself the characters:
The inverse conversion is performed on Windows when reading the file in text mode.

We had an interesting problem with opening files in text mode where the files had a mixture of line ending characters:
1\n\r
2\n\r
3\n
4\n\r
5\n\r
Our requirement is that we can store our current position in the file (we used fgetpos), close the file and then later to reopen the file and seek to that position (we used fsetpos).
However, where a file has mixtures of line endings then this process failed to seek to the actual same position. In our case (our tool parses C++), we were re-reading parts of the file we'd already seen.
Go with binary - then you can control exactly what is read and written from the file.

In 'w' mode, the file is opened in write mode and the basic coding is 'utf-8'
in 'wb' mode, the file is opened in write -binary mode and it is resposible for writing other special characters and the encoding may be 'utf-16le' or others

Related

I want to print all the data in the text file into the edit controller

I can take the text file string with "fgets" and print it out one line using "SetWindowTextA". Like This code
FILE *p_file = fopen("Test.txt", "r");
if (p_file != NULL) {
text = fgets(temp, sizeof(temp), p_file);
m_Edit_load.SetWindowTextA(text);
fclose(p_file)
}
But I want to print out all the lines. I've used the code at the bottom, but only the last line printed
FILE *p_file = fopen("Test.txt", "r");
if (p_file != NULL) {
while(NULL != fgets(temp, sizeof(temp), p_file){
m_Edit_load.SetWindowTextA(temp);
}
fclose(p_file);
}
How can I print out all the rows?
Problem here is, SetWindowTextA is setting text, not appending. Hence, your window might be ending up with last line. To remove this problem, first create a dynamic array, append all characters, then call SetWindowTextA at last.
The most straightforward way is to open the file in binary mode, load it into a buffer and put it in the control through a single SetWindowText() call.
Depending on the format of the text file, it may require some additional steps:
If the file is ASCII, and of the same codepage as the system it runs on, a SetWindowTextA() call is OK.
If the file is Unicode it can be loaded onto the control by calling SetWindowTextW() - the control must be Unicode as well.
If the file is UTF-8 or ASCII, but in a codepage other than that of the system, the text must be converted to Unicode using the MultiByteToWideChar() function, before loaded onto the control.
Another conversion that may be needed is LF to CR-LF, if the lines are joined in the control. You need to write some code for this.
As already stated in one of the other answers, the problem is that SetWindowText will overwrite the text. Your code seems to incorrectly assume that this function will append the text instead.
If you want to set the edit control to the entire text of a file, then you will have to read the entire text of the file into a memory buffer, and pass a pointer to that memory buffer to SetWindowText.
The function fgets is used for reading a single line. Although you can solve your problem with fgets, there is no reason to limit yourself to only reading one line at a time. It would therefore be more efficient to read as much data as possible in one function call, for example by using the function fread instead of fgets.
Another issue is that you are opening your text file in text mode, which means that the \r\n line endings will get translated to \n. However, this is not what you want, because when using SetWindowText on a multiline edit control, the line endings must be \r\n, not \n. Therefore, you should change the line
FILE *p_file = fopen("Test.txt", "r");
to
FILE *p_file = fopen("Test.txt", "rb");
in order to open the file in binary mode.
The whole code should look like this:
FILE *fp = fopen( "Test.txt", "rb" );
if ( fp != NULL )
{
char buffer[4096];
size_t bytes_read;
bytes_read = fread( buffer, 1, (sizeof buffer) - 1, fp );
buffer[bytes_read] = '\0';
m_Edit_load.SetWindowTextA( buffer );
fclose(p_file);
}
If it is possible that 4096 bytes is not sufficient to contain the entire file, then you could increase the size of the buffer. However, you should not increase it too much, because otherwise, there is a danger of a stack overflow. Instead of allocating the memory buffer on the stack, you could also allocate it on the heap, by using malloc instead. Another alternative would be to use a static buffer, which also does not get allocated on the stack.

read in mingw under windows does not read entire file. Why?

Using mingw under windows the following code which works under linux does not work:
int fh = open(filename, O_RDONLY);
const int size=100000;
int bytesRead = read(fh, buffer, size);
The file is bigger than 100k yet bytes read is just 232. I think this has something to do with binary files in windows?
This code with ifstream will work in windows and Linux.
ifstream in(filename, ios::binary);
const int size=100000;
in.read(buffer, size);
Is there a way to make level 2 IO calls work on windows as well?
Reading manuals is very useful technics.
_read
Return Value
_read returns the number of bytes read, which might be less than buffer_size if there are fewer than buffer_size bytes left in the file, or if the file was opened in text mode. In text mode, each carriage return-line feed pair \r\n is replaced with a single line feed character \n. Only the single line feed character is counted in the return value. The replacement does not affect the file pointer.
Text and Binary Mode File I/O
File I/O operations take place in one of two translation modes, text or binary, depending on the mode in which the file is opened. Data files are usually processed in text mode.
Use the function _set_fmode to change the default mode for newly opened files. Use _get_fmode to find the current default mode. The initial default setting is text mode (_O_TEXT).
Change the default translation mode directly by setting the global variable _fmode in your program. The function _set_fmode sets the value of this variable, but it can also be set directly.
open(filename, O_RDONLY); opens files in text mode in Windows by default.
open(filename, O_RDONLY | O_BINARY); opens files in binary mode in Windows and further read will read all requested bytes if there's enough bytes left in the file.

Is there a way to ignore end of file characters while reading files in c++?

So I am trying to read a file into a program in C++, but there are random end of files thrown in throughout the file. When trying to read the file, ifstream stops reading when it hits an end of file character.
This is the code that I am using to try to read the file
size_t bytesAvailable = 1000;
std::ifstream file(directory, std::ifstream::in);
unsigned char headDataBuffer[1000];
file.read((char*)(&headDataBuffer[0]), bytesAvailable);
the file I am trying to read gets this far into the file but then stops when it reaches a certain character which I later found out to be an end of file, there is plenty of text afterwards but I can't seem to get ifstream to read anything after the end of file character. Is there a way to read the entire file without having to break it up into smaller chunks?
Firsts few lines of the file
˜1È£….ƒÑäÄÕ!õÏ]ÀåM”Ú2jó8ÒQ;Fb#Ãë»Cé‚ 1³¸)æ¸)¼™Â¢¼mí¾J”ÜT’S·Õ}xÇ\'Ò¬Ëëk|&cõe´„[zÊN4äHH•Æpé€i‹,ɶ‰v%••¡ÁÎ:ïÂOÚåÀ‡É=wí7iÓOQ3Fg,‚¹ªGô“(stops right here) I9á¸"æ£/¼™Ù£«|¿¿FI€À^‚ ‚2 tÁ[;Åéúî2`9es¹Va°ÝNe-˜1È´’},••°ÛÙuòŸLÚቜÕ/9ñ7,Õ[uv/†í]¼CúŸ
Try opening the file in binary mode. On some platforms, text mode and binary mode behave differently, such as the text mode interpreting end-of-line into LF, or interpreting a control character (possibly Ctrl+D or Ctrl+Z) as an end-of-file.
size_t bytesAvailable = 1000;
std::ifstream file(directory, std::ifstream::in|std::ifstream::binary);
unsigned char headDataBuffer[1000];
file.read((char*)(&headDataBuffer[0]), bytesAvailable);

c++: Istream counts every newline in a .txt file as two

I've got a slight problem. It appears that for some reason my function, when counting the size of a .txt file, counts a newline as it was two chars instead of one. Here's the function:
#define IN_FILE "in_mat.txt"
#define IN_BUF
#ifdef IN_BUF
void inBuf(char *(&b)){
streampos size;
ifstream f(IN_FILE, ios::in);
f.seekg(0,ios::end);
size=f.tellg();
b=new char[size];
f.seekg(0, ios::beg);
f.read(b, size);
f.close();
}
#endif
And here's the read file:
2 2
1 0
0 1
2 2
i 0
0 -i
2 2
0 1
-1 0
2 2
0 i
i 0
Earlier, i've put some couts, and it appears, that size=60, while the actual size is 49 (checked it), and the count of newlines in the file is 11, so exactly 60-49. Could somebody help me with that?
To add to the other answers, if you want to read special characters such as newline characters, you should open your file in binary mode, not text mode.
ifstream f(IN_FILE, ios::in | ios::binary);
If you don't open the file in binary mode, the actual characters that make up the '\n' are translated by the runtime to a single character (namely '\n'). So in text mode, you don't get the "real" version of the file in terms of all of the actual characters that the file consists of.
In addition, functions such as seekg() and tellg() will not work as expected with a file opened in text mode, or at the very least, will give you "wrong results" (actually not wrong to the functions themselves, but wrong if you're writing a program that tries to "hone in" on a position within the file). Again, the newline (and EOF) translation that is done under the hood by the runtime gets in the way of these functions working as you would expect them to.
On the other hand, a file opened in binary mode allows these functions to work as expected -- no translation of newline, or EOF -- whatever the individual bytes that makes up the file contents are, that is what you get.
The next thing you need to determine is whether it is a Unix text file or a Windows text file. Depending on which one it is, the line endings will be different.
Windows uses "\r\n" to return to the beginning of the line ('\r') and begin a new one ('\n').
To remove them from your count you have to read the whole file and count the number of '\r's.
Windows stores newlines as two characters: '\r\n', known as carriage return and line feed. That's why it's counted twice: there are actually two characters to be counted.
I am assuming that you are running on Windows. If not, disregard my answer below.
Windows stores new line characters in text files as two characters (CR LF or '\r' '\n'). So, seeking to the end of the file and calling tellg() will return the binary size of the file (60), not the text size (49).
In order to get the correct text size (49), one solution would be to count each new line character (11) and subtract that number from the total byte size.

inserting text to a file (end line isn't entered)

I have a strange problem:
I read a file into a buf and tried to run it in ssh (Linux)..
my file contains:
We
I
a
so this is my buf:
now I create a new file and paste the buf into this new file:
FILE*nem_file_name;
nem_file_name= fopen("email1.clear","wb"); //create the file if not exist.
fwrite (buf, sizeof(char), strlen(buf),nem_file_name); //write the new sensored mail to the file.
in this case, the file: email1.clear was created, but this is what it contains:
We Ia
when I copy it to clipboard and paste it to this topic, it was pasted so:
We
I
a
why there is no 'end line' in my file? I want it to be like what I have in my clipboard :/
UPDATE
I tried to create the buf manually by:
char buf[10];
buf[0] = 'W';
buf[1] = 'e';
buf[2] = 32;
buf[3] = 13;
buf[4] = 10;
buf[5] = 'I';
buf[6] = 13;
buf[7] = 10;
buf[8] = 'a';
buf[9] = 0;
(note that I didn't read a file into buf, but do it manually)
and then:
FILE*nem_file_name;
nem_file_name= fopen("email1.clear","wb"); //create the file if not exist.
fwrite (buf, sizeof(char), strlen(buf),nem_file_name);
and the file email1.clear was created as I want:
We
I
a
I can't understand it!
Is the debugger-screenshot actually from your linux environment? Or did you create it on a windows-debugger?
It depends on how you read the original file. I you're using text mode (r or rt at the fopen call), Linux will convert the CRLF (13,10) into a single LF (10) character during reading. When writing this into a new file in binary mode (wb as in your code), it will stay a single LF.
Notepad cannot handle single LF characters as newlines, however, your webbrowser does obviously.
UPDATE:
End-Of-Line characters are handled differently by different Operating Systems. When opening a file in text mode, the differences are handled during reading/writing and converted to/from the system's mode. In binary mode, the bytes are read and written as is without conversion (fopen documentation).
It depends on where the program should run and what clients should read the output (Linux/Windows). When your code runs on linux, reads text files from linux and generates text files to be used in linux, use text mode (same applies for windows). If you need to mix platforms, you might have to convert line ends by yourself.
It's a text, why do you write it to a binary file ("wb")? Just work with text files and everything should be fine (remove b from your file open mode when you read file and when you write file)
I think strlen(buf) will return the size of buf without the null char that marks end of string. You can try and write to your file like this:
fwrite (buf, sizeof(char), strlen(buf), nem_file_name);
char eos = '\0';
fputc (eos, nem_file_name);
Just my guess. Good luck!