C/C++ print custom EOL - c++

I want to generate on Windows a file (script) for UNIX. So I need to output just LF characters, without outputting CR characters.
When I do fprintf(fpout, "some text\n");, character \n is automatically replaced by \r\n in the file.
Is there a way to output specifically just \n (LF) character?
The language is C++, but the I/O functions come from C.

You can open the file in binary mode, e.g.
FILE *fpout = fopen("unixfile.txt", "wb");
fprintf(fpout, "some text\n"); // no \r inserted before \n
As a consequence, every byte you pass to fprintf is interpreted as a byte and nothing else, which should omit the conversion from \n to \r\n.
From cppreference on std::fopen:
File access mode flag "b" can optionally be specified to open a file in binary mode. This flag has no effect on POSIX systems, but on Windows, for example, it disables special handling of '\n' and '\x1A'.

Related

Why does char 0xA change to 0xD + 0xA characters when writing it into a file with a console?

I used
std::cout << (char)0xA
in my C++ code.
Then, I wrote to the console myProgram.exe > file.txt.
Next, I opened file.txt with A HEX editor and I found 0D 0A instead of 0A.
IDK why this happened. Please, help.
HEX Editor screenshot (look at 0x82 and 0x19B)
The C and C++ Standards both specify that when a stream is open in text mode, sending a \n to it will do whatever action is necessary on the target system to advance the file to the next line. On a Unix system, that simply means outputting a \n to a file. On some record-based systems, it means flushing the current line-output record and advancing to the next one. On MS-DOS and Windows, it means sending both a \r and \n to the stream.
Historically, sending a \r to a teletype would reset the carriage to the left edge, and sending a \n would advance the paper. Someone realized that while the ability to reset the carriage without advancing paper was useful, having \n advance the paper without resetting the carriage was far less so, and thus some devices will respond to \n by advancing to the start of the next line. MS-DOS, however, opted to have files stored in a way that would produce meaningful output if sent directly to a printer where \n would advance to the current location on the next line, and one would have to send both \r and \n if one wanted to go to the start of the next line.
Welcome to the magic of end of lines across OS!
The C language originally was developped for Unix systems where the end of line was marked by the single '\n' (ASCII code 0x0A) character.
When it was ported to other systems, it was decided that seen from the programmer an end of line will only be a '\n', and that the drivers and the standard library would convert it to the appropriate line ending when the file would be opened in text mode.
That convention was later keeped in C++.
As Windows uses "\r\n" as an end of line marker, the standard library convert any '\n' to the pair "\r\n" (ASCII codes 0x0D and 0x0A).

What is substitute character and how to process it in windows

I get a text file from aix and try to process it on windows
but some lines contains strange character like , in ultraedit , it is displayed like
when fgets function encounter the line, it raise a ferror and stop at drawing. Then it refuse to continue even if I force to run fgets again on after meets the line.
The hexa code of the character is 1A
The explanation of this character in ASCII table is substitute character, which is used to replace the character that cannot be represented on the device.
Does that means
I have a specific character of AIX and there is no way to process it on windows.
Does this happens only in case of a cross plateform file ?
Thanks!
There are several issues.
If you use fopen with the "r" mode, the file will be opened in text mode and then the ASCII character 0x1b will be interpreted as end of file character. Furthermore if your file comes from aix, the line endings are certainly "\n" (0x10) instead of "\r\n" (0x13 0x10) on Windows, and fgets regognizes only "\r\n" as line endings.
You need to implement your own fgets like function by reading the file character by character with the fgetc function, and you must fopen the aix file with "rb" mode instead of the "r" mode.
Your new fgets like function should be no more than 5 or 6 lines long.

Find the system's line terminator

Is there a header file somewhere that stores the line termination character/s for the system (so that without any #ifdefs it will work on all platforms, MAC, Windows, Linux, etc)?
You should open the file in "text mode" (that is "not use binary"), and newline is always '\n', whatever the native file is. The C library will translate whatever native character(s) indicate newlines into '\n' whenever appropriate [that is, reading/writing text files]. Note that this also means you can't rely on "counting the number of characters read and using that to "seek back to this location".
If the file is binary, then newlines aren't newlines anyways.
And unless you plan on running on really ancient systems, and you REALLY want to do this, I would do:
#ifdef __WINDOWS__ // Or something like that
#define END_LINE "\r\n"
#else
#define END_LINE "\n"
#endif
This won't work for MacOS before MacOS X, but surely nobody is using pre-MacOS X hardware any longer?
No, because it's \n everywhere. That expands to the correct newline character(s) when you write it to a text file.
Posix requires it to be \n. So if _POSIX_VERSION is defined, it's \n. Otherwise, special-case the only non-POSIX OS, windows, and you're done.
It doesn't look like there's anything in the standard library to obtain the current platform's line terminator.
The closest looking API is
char_type std::basic_ios::widen(char c);
It "converts a character c to its equivalent in the current locale" (cppreference). I was pointed at it by the documentation for std::endl which "inserts a endline character into the output sequence os and flushes it as if by calling os.put(os.widen('\n')) followed by os.flush()" (cppreference).
On Posix,
widen('\n') returns '\n' (as a char, for char-based streams);
endl inserts a '\n' and flushes the buffer.
On Windows, they do exactly the same. In fact
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ofstream f;
f.open("aaa.txt", ios_base::out | ios_base::binary);
f << "aaa" << endl << "bbb";
f.close();
return 0;
}
will result in a file with just '\n' as a line terminator.
As others have suggested, when the file is open in text mode (the default) the '\n' will be automatically converted to '\r' '\n' on Windows.
(I've rewritten this answer because I had incorrectly assumed that std::endl translated to "\r\n" on Windows)
The answer to your question can be extended a little further by being able to use the same code to read both Windows-based text files and Unix-based text files in Windows, MacOS and Linux/Unix systems (excluding the ancient Macintosh system that use \r as line delimiter).
As already pointed out by others, \n can be used as line delimiter in all above systems because underlying C library can convert it to native delimiter used by each system. Therefore, one can use the following codes to read in text files that use either \n or \r\n as line delimiters while discarding all delimiter characters:
// Open a file in text mode
std::ifstream file_stream(file_name, ios_base::in);
// Use widened '\n' as line delimiter
for(std::string text_line; std::getline(file_stream, text_line, input.widen('\n'));)
{
if(!text_line.empty())
{
// Discard '\r' when read Windows-based file in Unix-like systems
if(text_line.back() == '\r') text_line.pop_back();
// Do more with text_line
}
}
In above codes, read-in lines containing \r will only be encountered when reading Windows-based text files in Unix-like systems because a single \n is used as delimiter while Windows-based text files use \r\n. On the other hand, when reading text files in Windows-based systems, text files with either \r\n or \n can be removed by std::getline function that uses the widened \n as delimiter. Note that this code snippet doesn't remove any \r not adjacent to \n because then those text files are not correctly formed in Windows, Mac and Linux/Unix systems.

0x0A after 0x0D when reading file

I read a file and find that there are 0x0D after any 0x0A.
I only know that it is the windows that do the convertion.
But I have used the binary mode, it cannot prevent it?
ifstream input(inputname, ios::binary);input.get(ch);
How do I avoid it. I only want to get the \n.
How about write file?
Thx in advance.
If you're on a system that does use \r\n line endings then opening a file in text mode will cause the system to automatically convert these to the standard \n without \r. Opening a file in binary mode prevents this conversion.
If you're on a system that does not use this convention then there's no mode that will convert the line endings. You will have to convert them manually yourself or preprocess the file using an external tool.
If you want to detect whether a file uses \r\n you'll have to do it manually. Scan through the text file and see if every \n is preceded by a \r.
As an alternative, instead of trying to preemptively detect what kind of line endings a file uses, you could simply add logic in your processing code to specially handle \r followed by \n. Something like:
for (int i=0; i<n; ++i) {
if ('\r' == text[i] && (i+1<n) && '\n' == text[i+1])
++i; // skip carriage return, just handle newline
if ('\n' == text[i])
handle newline...
else
handle other characters
}
Hmm. If you use binary mode, ios::binary tells the library that you want to read the file as it is in binary (uncooked, raw).Using msdos (some people nowadays call it windows-nt) lines in text-files are terminated by 0d0a. So if you dont want to see this two chars, you have to open the file in text-mode (just omit the ios::binary). Or you have to convert these files to unix-style by some utilities like dos2unix, but then, if you are on a windows system, e.g. notepad may not be able to display this files as expected...

c++ getline reads entire file

I'm using std::getline() to read from a text file, line by line. However, the first call to getline is reading in the entire file! I've also tried specifying the delimeter as '\n' explicitly. Any ideas why this might be happening?
My code:
std::ifstream serialIn;
...
serialIn.open(argv[3]);
...
std::string tmpStr;
std::getline(serialIn, tmpStr, '\n');
// All 570 lines in the file is in tmpStr!
...
std::string serialLine;
std::getline(serialIn, serialLine);
// serialLine == "" here
I am using Visual Studio 2008. The text file has 570 lines (I'm viewing it in Notepad++ fwiw).
Edit: I worked around this problem by using Notepad++ to convert the line endings in my input text file to "Windows" line endings. The file was written with '\n' at the end of each line, using c++ code. Why would getline() require the Windows line endings (\r\n)?? Does this have to do with character width, or Microsoft implementation?
Just guessing, but could your file have Unix line-endings and you're running on Windows?
You're confusing the newline you see in code ('\n') with the actual line-ending representation for the platform (some combination of carriage-return (CR) and linefeed (LF) bytes).
The standard I/O library functions automatically convert line-endings for your platform to and from conceptual newlines for text-mode streams (the default). See What's the difference between text and binary I/O? from the comp.lang.c FAQ. (Although that's from the C FAQ, the concepts apply to C++ as well.) Since you're on Windows, the standard I/O functions by default write newlines as CR-LF and expect CR-LF for newlines when reading.
If you don't want these conversions done and would prefer to see the raw, unadulterated data, then you should set your streams to binary mode. In binary mode, \n corresponds to just LF, and \r corresponds to just CR.
In C, you can specify binary mode by passing "b" as one of the flags to fopen:
FILE* file = fopen(filename, "rb"); // Open a file for reading in binary mode.
In C++:
std::ifstream in;
in.open(filename, std::ios::binary);
or:
std::ifstream in(filename, std::ios::binary);