What is substitute character and how to process it in windows - c++

I get a text file from aix and try to process it on windows
but some lines contains strange character like , in ultraedit , it is displayed like
when fgets function encounter the line, it raise a ferror and stop at drawing. Then it refuse to continue even if I force to run fgets again on after meets the line.
The hexa code of the character is 1A
The explanation of this character in ASCII table is substitute character, which is used to replace the character that cannot be represented on the device.
Does that means
I have a specific character of AIX and there is no way to process it on windows.
Does this happens only in case of a cross plateform file ?
Thanks!

There are several issues.
If you use fopen with the "r" mode, the file will be opened in text mode and then the ASCII character 0x1b will be interpreted as end of file character. Furthermore if your file comes from aix, the line endings are certainly "\n" (0x10) instead of "\r\n" (0x13 0x10) on Windows, and fgets regognizes only "\r\n" as line endings.
You need to implement your own fgets like function by reading the file character by character with the fgetc function, and you must fopen the aix file with "rb" mode instead of the "r" mode.
Your new fgets like function should be no more than 5 or 6 lines long.

Related

Why does char 0xA change to 0xD + 0xA characters when writing it into a file with a console?

I used
std::cout << (char)0xA
in my C++ code.
Then, I wrote to the console myProgram.exe > file.txt.
Next, I opened file.txt with A HEX editor and I found 0D 0A instead of 0A.
IDK why this happened. Please, help.
HEX Editor screenshot (look at 0x82 and 0x19B)
The C and C++ Standards both specify that when a stream is open in text mode, sending a \n to it will do whatever action is necessary on the target system to advance the file to the next line. On a Unix system, that simply means outputting a \n to a file. On some record-based systems, it means flushing the current line-output record and advancing to the next one. On MS-DOS and Windows, it means sending both a \r and \n to the stream.
Historically, sending a \r to a teletype would reset the carriage to the left edge, and sending a \n would advance the paper. Someone realized that while the ability to reset the carriage without advancing paper was useful, having \n advance the paper without resetting the carriage was far less so, and thus some devices will respond to \n by advancing to the start of the next line. MS-DOS, however, opted to have files stored in a way that would produce meaningful output if sent directly to a printer where \n would advance to the current location on the next line, and one would have to send both \r and \n if one wanted to go to the start of the next line.
Welcome to the magic of end of lines across OS!
The C language originally was developped for Unix systems where the end of line was marked by the single '\n' (ASCII code 0x0A) character.
When it was ported to other systems, it was decided that seen from the programmer an end of line will only be a '\n', and that the drivers and the standard library would convert it to the appropriate line ending when the file would be opened in text mode.
That convention was later keeped in C++.
As Windows uses "\r\n" as an end of line marker, the standard library convert any '\n' to the pair "\r\n" (ASCII codes 0x0D and 0x0A).

C/C++ print custom EOL

I want to generate on Windows a file (script) for UNIX. So I need to output just LF characters, without outputting CR characters.
When I do fprintf(fpout, "some text\n");, character \n is automatically replaced by \r\n in the file.
Is there a way to output specifically just \n (LF) character?
The language is C++, but the I/O functions come from C.
You can open the file in binary mode, e.g.
FILE *fpout = fopen("unixfile.txt", "wb");
fprintf(fpout, "some text\n"); // no \r inserted before \n
As a consequence, every byte you pass to fprintf is interpreted as a byte and nothing else, which should omit the conversion from \n to \r\n.
From cppreference on std::fopen:
File access mode flag "b" can optionally be specified to open a file in binary mode. This flag has no effect on POSIX systems, but on Windows, for example, it disables special handling of '\n' and '\x1A'.

How can i convert linux text file to windows text file by using qt?

When I copy text files to USB flash memory with Qt on raspberry pi 3 , and when I open these text files on Windows , text file '\n' characters not seem to work on Windows.
I searched this topic and I saw that text file formats are different on Linux and Windows.So I have to copy Linux based text files to Flash Memory with Qt and open these files on Windows.
There are a few characters which can indicate a new line. The usual ones are these two:
'\n' or '0x0A' (10 in decimal) -> This character is called "Line Feed" (LF).
'\r' or '0x0D' (13 in decimal) -> This one is called "Carriage return" (CR).
Different Operating Systems handle newlines in a different way. Here is a short list of the most common ones:
DOS and Windows :
They expect a newline to be the combination of two characters, namely '\r\n' (or 13 followed by 10).
Unix (and hence Linux as well) :
Unix uses a single '\n' to indicate a new line.
Mac :
Macs use a single '\r'.
EDIT : As MSalters mentioned Mac OSX is Unix and uses \n. The single \r is ancient Mac OS9
I guess you are just transporting the file, not doing anything with it, but I can't think of another option than opening it and rewrite the line endings.
If you open the .txt file on Windows and read from it (with c++ or c++/Qt) and then write the lines as you get them to a new file, the line endings should then fit the Windows sepcifics.
You can read the file like this:
std::ifstream file;
file.open(filePath);
std::ofstream file2;
file2.open(filePath2);
while(std::getline(file, line))
{
file2<<line;
}
std::getline
At least the documentation states that getline searches for '\n', it should work on windows and Unix. If it doesn't, you can still set the delimeter to '\n'.
If you want to write the file 'Windowslike' on your raspberry, you can try to replace the '\n' characters with '\r\n'
It should look somehow like this:
std::string myFileAsString;
std::string toReplace = "\n";
std::string replaceWith = "\r\n";
myFileAsString.replace(myFileAsString.find(toReplace), toReplace.length(), replaceWith);
where find searches for '\n' and then replaces it with '\r\n'
replace
find

0x0A after 0x0D when reading file

I read a file and find that there are 0x0D after any 0x0A.
I only know that it is the windows that do the convertion.
But I have used the binary mode, it cannot prevent it?
ifstream input(inputname, ios::binary);input.get(ch);
How do I avoid it. I only want to get the \n.
How about write file?
Thx in advance.
If you're on a system that does use \r\n line endings then opening a file in text mode will cause the system to automatically convert these to the standard \n without \r. Opening a file in binary mode prevents this conversion.
If you're on a system that does not use this convention then there's no mode that will convert the line endings. You will have to convert them manually yourself or preprocess the file using an external tool.
If you want to detect whether a file uses \r\n you'll have to do it manually. Scan through the text file and see if every \n is preceded by a \r.
As an alternative, instead of trying to preemptively detect what kind of line endings a file uses, you could simply add logic in your processing code to specially handle \r followed by \n. Something like:
for (int i=0; i<n; ++i) {
if ('\r' == text[i] && (i+1<n) && '\n' == text[i+1])
++i; // skip carriage return, just handle newline
if ('\n' == text[i])
handle newline...
else
handle other characters
}
Hmm. If you use binary mode, ios::binary tells the library that you want to read the file as it is in binary (uncooked, raw).Using msdos (some people nowadays call it windows-nt) lines in text-files are terminated by 0d0a. So if you dont want to see this two chars, you have to open the file in text-mode (just omit the ios::binary). Or you have to convert these files to unix-style by some utilities like dos2unix, but then, if you are on a windows system, e.g. notepad may not be able to display this files as expected...

c++ getline reads entire file

I'm using std::getline() to read from a text file, line by line. However, the first call to getline is reading in the entire file! I've also tried specifying the delimeter as '\n' explicitly. Any ideas why this might be happening?
My code:
std::ifstream serialIn;
...
serialIn.open(argv[3]);
...
std::string tmpStr;
std::getline(serialIn, tmpStr, '\n');
// All 570 lines in the file is in tmpStr!
...
std::string serialLine;
std::getline(serialIn, serialLine);
// serialLine == "" here
I am using Visual Studio 2008. The text file has 570 lines (I'm viewing it in Notepad++ fwiw).
Edit: I worked around this problem by using Notepad++ to convert the line endings in my input text file to "Windows" line endings. The file was written with '\n' at the end of each line, using c++ code. Why would getline() require the Windows line endings (\r\n)?? Does this have to do with character width, or Microsoft implementation?
Just guessing, but could your file have Unix line-endings and you're running on Windows?
You're confusing the newline you see in code ('\n') with the actual line-ending representation for the platform (some combination of carriage-return (CR) and linefeed (LF) bytes).
The standard I/O library functions automatically convert line-endings for your platform to and from conceptual newlines for text-mode streams (the default). See What's the difference between text and binary I/O? from the comp.lang.c FAQ. (Although that's from the C FAQ, the concepts apply to C++ as well.) Since you're on Windows, the standard I/O functions by default write newlines as CR-LF and expect CR-LF for newlines when reading.
If you don't want these conversions done and would prefer to see the raw, unadulterated data, then you should set your streams to binary mode. In binary mode, \n corresponds to just LF, and \r corresponds to just CR.
In C, you can specify binary mode by passing "b" as one of the flags to fopen:
FILE* file = fopen(filename, "rb"); // Open a file for reading in binary mode.
In C++:
std::ifstream in;
in.open(filename, std::ios::binary);
or:
std::ifstream in(filename, std::ios::binary);