C++ in Windows I can't put the Enter character into a .txt file - c++

I made a program wich use the Huffman Coding to compress and decompress .txt files (ANSI, Unicode, UTF-8, Big Endian Unicode...).
In the decompression I take characters from a binary tree and I put them into a .txt in binary mode:
Ofstream F;
F.open("example.txt", ios::binary);
I have to write into .txt file in binary mode because I need to decompress every type of .txt file (not only ANSI) so my simbols are the single bytes.
On Windows it puts every simbol but doesn't care about the Enter character!
For example, if I have this example.txt file:
Hello
World!
=)
I compress it into example.dat file and I save the Huffman tree into another file (exampletree.dat).
Now to decompress example.dat I take characters from the tree saved in exampletree.dat and I put them into a new .txt file through put() or fwrite(), but on Windows it will be like this:
HelloWorld!=)
On Ubuntu it works perfectly and saves also the Enter character!
It isn't a code error because if I print in the console the decompressed .txt file, it also prints the enter characters! So there is a problem in Windows! Could someone help me?

Did you try opening the file using a wordpad or any other advanced text editor(Notepad++) which identify LF as newline character. The default editor notepad would put it in a single line like you described.
This may not be the solution you are looking for. But the problem looks to be due to having LF as the line break instead of windows default CR/LF.

It looks like it will be the difference in handling EndOfLine on Linux vs. Windows. The EOL can be just "\n" or "\r\n" - i.e. Windows usually puts 0x0d,0x0a at the end of lines.
On Windows there's a difference between:
fopen( "filename", "w" );
fopen( "filename", "tw" );
quote:
In text mode, carriage return–linefeed combinations are translated into single linefeeds on input, and linefeed characters are translated to carriage return–linefeed combinations on output

Related

C++: Problem of Korean alphabet encoding in text file write process with std::ofstream

I have a code for save the log as a text file.
It usually works well, but I found a case where doesn't work:
{Id": "testman", "ip": "192.168.1.1", "target": "?뚯뒪??exe", "desc": "?덈뀞諛⑷??뚯슂"}
My code is a simple logic that saves the log string as a text file.
My code was works well when log is English, but there is a problem when log is Korean language.
After checking through various experiments, it was confirmed that Korean language would not problem if the file could be saved as utf-8 format.
I think, if Korean language is included in log string, c++ is basically saved as ANSI format.
This is my c++ code:
string logfilePath = {path};
log = "{\Id\": \"testman\", \"ip\": \"192.168.1.1\", \"target\": \"테스트.exe\", \"desc\": \"안녕방가워요\"}";
ofstream output(logFilePath, ios::app);
output << log << endl;
output.close();
Is there a way to save log files as uft-8 or any other good way?
Please give me some advice.
You could set UTF-8 in File->Advanced Save Options.
If you do not find it, you could add Advanced Save Options in Tools->Customize->Commands->Add Command..->File.
TDLR: write 0xefbbbf (3-bytes UTF-8 BOM) in the beginning of the file before writing out your string.
One of the hints that text viewer software use to determine if the file should be shown in the Unicode format is something called the Byte Order Marker (or BOM for short). It is basically a series of bytes in the beginning of a stream of text that specifies the encoding and endianness of the text string. For UTF-8 it is these three bytes 0xEF 0xBB 0xBF.
You can experiment with this by opening notepad, writing a single character and saving file in the ANSI format. Then look at the size of file in bytes. It will be 1 byte. Now open the file and save it in UTF-8 and look at the size of file again. It will 4 bytes that is three bytes for the BOM and one byte for the single character you put in there. You can confirm this by viewing both files in some hex editor.
That being said, you may need to insert these bytes to your files before writing your string to them. So why UTF-8? you may ask, well, it depends on the encoding the original string is encoded in (your std::string log) which in this case it is an string literal written in a source file whose encoding is (most likely) UTF-8. Therefor the bytes that build up the string are made according to this encoding and are put into your executable.
note that std::string can contain Unicode string, it just can't make sense of it. For example it reports its length wrong. But it can be used to carry Unicode string around fine.

What is Eol in text file and normal file?

Now I am quite confused about the end of line character I am working with c++ and I know that text files have a end of line marker which sets the limit for reading a line which a single shifing operator(>>).Data is read continously untill eol character does not apprears and while opening a file in text mode carriage return(CR) is converted into CRLF which is eol marker so if i add white spaces in my text then would it act as eol maker cause it does.
Now i created a normal file i.e. a file without .txt
eg
ifstream("test"); // No .txt
Now what is eol marker in this case
The ".txt" at the end of the filename is just a convention. It's just part of the filename.
It does not signify any magical property of the file, and it certainly doesn't change how the file is handled by your operating system kernel or file system driver.
So, in short, what difference is there? None.
I know that text files have a end of line marker which sets the limit for reading a line which a single shifing operator(>>)
That is incorrect.
Data is read continously untill eol character does not apprears
Also incorrect. Some operating systems (e.g. Windows IIRC) inject an EOF (not EOL!) character into the stream to signify to calling applications that there is no more data to read. Other operating systems don't even do that. But in neither case is there an actual EOF character at the end of the actual file.
while opening a file in text mode carriage return(CR) is converted into CRLF which is eol marker
That conversion may or may not happen and, either way, EOL is not EOF.
if i add white spaces in my text then would it act as eol maker cause it does.
That's a negative, star command.
I'm not sure where you're getting all this stuff from, but you've been heavily mistaught. I suggest a good, peer-reviewed, well-recommended book from Amazon about how computer operating systems work.
When reading strings in C++ using the extraction operator >>, the default is to skip spaces.
If you want the entire line verbatim, use std::getline.
A typical input loop is:
int main(void)
{
std::string text_from_file;
std::ifstream input_file("My_data.txt");
if (!input_file)
{
cerr << "Error opening My_data.txt for reading.\n";
return EXIT_FAILURE;
}
while (input_file >> text_from_file)
{
// Process the variable text_from_file.
}
return EXIT_SUCCESS;
}
A lot of old and mainframe operating systems required a record structure of all data files which, for text files, originated with a Hollerith (punch) card of 80 columns and was faithfully preserved through disk file records, magnetic tapes, output punch card decks, and line printer lines. No line ending was used because the record structure required that every record have 80 columns (and were typically filled with spaces). In later years (1960s+), having variable length records with an 80 column maximum became popular. Today, even OpenVMS still requires the file creator to specify a file format (sequential, indexed, or "stream") and record size (fixed, variable) where the maximum record size must be specified in advance.
In the modern era of computing (which effectively began with Unix) it is widely considered a bad idea to force a structure on data files. Any programmer is free to do that to themselves and there are plenty of record-oriented data formats like compiler/linker object files (.obj, .so, .o, .lib, .exe, etc.), and most media formats (.gif, .tiff, .flv, .mov, mp3, etc.)
For communicating text lines, the paradigm is to target a terminal or printer and for that, line endings should be indicated. Most operating systems environments (except MSDOS and Windows) use the \n character which is encoded in ASCII as a linefeed (ASCII 10) code. MSDOS and ilk use '\r\n' which are encoded as carriage return then linefeed (ASCII 13, 10). There are advantages and disadvantages to both schemes. But text files may also contain other controls, most commonly the ANSI escape sequences which control devices in specific ways:
clear the screen, either in part or all of it
eject a printer page, skip some lines, reverse feed, and other little-used features
establish a scrolling region
change the text color
selecting a font, text weight, page size, etc.
For these operations, line endings are not a concern.
Also, data files encoded in ASCII such as JSON and XML (especially HTML with embedded Javascript), might not have any line endings, especially when the data is obfuscated or compressed.
To answer your questions:
I am quite confused about the end of line character I am working with c++ and I know that text files have a end of line marker
Maybe. Maybe not. From a C or C++ program's viewpoint, writing \n indicates to the runtime environment the end of a line. What the system does with that varies by runtime operating environment. For Unix and Linux, no translation occurs (though writing to a terminal-like device converts to \r\n). In MSDOS, '\n' is translated to \r\n. In OpenVMS, '\n' is removed and that record's size is set. Reading does the inverse translation.
which sets the limit for reading a line which a single shifing operator(>>).
There is no such limit: A program can choose to read data byte-by-byte if it wants as well as ignore the line boundaries.
The "shifting operators" are overloaded for filestreams to input or output data but are not related to bit twiddling shifts. These operators were chosen for visual approximation of input/output and due to their low operator precedence.
Data is read continously untill eol character does not apprears
This bit is confusing: I think you meant until eol character appears, which is indeed how the line-oriented functions gets() and fgets() work.
and while opening a file in text mode carriage return(CR) is converted into CRLF which is eol marker so if i add white spaces in my text then would it act as eol maker cause it does.
Opening the file does not convert anything, but reading from a file might. However, no environment (that I know of) converts input to CR LF. MSDOS converts CR LF on input to \n.
Adding spaces has no effect on end of lines, end of file, or anything. Spaces are just data. However, the C++ streaming operations reading/writing numbers and some other datatypes use whitespace (a sequence of spaces, horizontal tabs, vertical tabs, form feed, and maybe some others) as a delimiter. This convenience feature may cause some confusion.
Now i created a normal file i.e. a file without .txt eg
ifstream("test"); \No .txt
Now what is eol marker in this case
The filename does not determine the file type. In fact, file.txt may not be a text file at all. Using a particular file extension is convenient for humans to communicate a file's purpose, but it is not obligatory.

Read text-file in C++ with fopen without linefeed conversion

I'm working with text-files (UTF-8) on Windows and want to read them using C++.
To open the file corrently, I use fopen. As described here, there are two options for opening the file:
Text mode "rt" (Carriage return + Linefeed will automatically be converted into Linefeed; Short "\r\n" becomes "\n").
Binary mode "rb" (The file will be read byte by byte).
Now it becomes tricky. I don't want to open the file in binary mode, since I would lose the correct handling of my UTF-8 characters (and there are special characters in my text-files, which are corrupted when interpreted as ANSI-character). But I also don't want fopen to convert all my CR+LF into LF.
Is there a way to combine the two modes, to read a text-file into a string without tampering with the linefeeds, while still being able to read UTF-8 correctly?
I am aware, that the reverse conversion would happen, if I write it through the same file, but the string is sent to another application that expects Windows-style line-endings.
The difference between opening files in text and binary mode is exactly the handling of line end sequences in text mode or not touching them in binary mode. Nothing more nothing less. Since the ASCII characters use the same code points in Unicode and UTF-8 retains the encoding of ASCII characters (i.e., every ASCII file happens to be a UTF-8 encoded Unicode file) whether you use binary or text mode won't affect the other bytes.
It may be worth to have a look at James McNellis "Unicode in C++" presentation at C++Now 2014.

How to change stream from text mode to binary in C++?

In a game I'm making I need to read a map from a file. Assuming some of the data in the beginning is written in characters, but the tile map is written in binary, I would open the file in text mode then switch it to binary mode once it reaches the tile data.
Is there an easy, or standard, way of changing an ifstream from text mode to binary mode while keeping the same position in the file?
This also applies to the writting part, I will need to start writting into the file using characters, then change to binary mode.
EDIT: I'm using text mode to make this readable and to read strings of unknown size. For example, this line:
map-name=TestMap
I'd read this with
getline( mapFile, attribute, '=' );
getline( mapFile, mapName, '\n' );
How would I read this in binary mode if there won't be newline characters?
The mode is established when the file is opened, and cannot be
changed later. If there is any binary data in the file, you
must use binary mode. But where is the problem? You can read
text in binary mode; line endings might appear a bit strange
(but not if you also wrote it in binary mode), but otherwise,
there should be no problem as long as the binary data actually
is text.
If you are responsible for writing the files as well, the simplest (and perhaps sanest) solution might be to write two files.
One in text, for text you wish to be human readable.
And the second as binary, for things like maps. In fact that way you could have one binary map file for each map.

How can I read Notepad++ file in DOS or Fortran?

I received a textfile created with Notepad++ that I'm trying to read with a Fortran 95 program on both a Mac and a PC. The read line is:
read(lun,'(a)',iostat=io1) input
Since I don't know what the line lengths are I defined input to be 512 in length. With non-notepad++ files when the end of line is found the read "stops" and automatically advances to the next line of text. With the notepad++ file, it reads 512 characters, skipping over the carriage returns. When I open the file using the dos editor on the pc I see carriage return symbols (ASCII char 13) but there is no break between lines, they are all appended to one another.
I've tried searching for ichar(13) and ichar(10), backspacing to the beginning of the line and trying to force an advance to the next line; reading in with format '(a,/')', but haven't been able to get anything to work.
What you need is a pipeline type design. The basic routine is one called getline, which gets a line of data up to the carriage return. Inside the initialization, what you do is open the file as a binary file and read a buffer of say 1024 characters in. Whenever getline is called, return the next lot of characters until you get to a CR. If there aren't enough characters, move the unprocessed characters to the front and read in the remaining characters.
It is basically how compilers work - they get a stream of tokens, which, in your case is a string of characters ending with a CR, and then they process the tokens.