Attempt to compare character in string to carriage return does not work - c++

I have some code to read a text based file format in that it checks for empty line with:
line == ""
where line is a string that receives a text line obtained through getline.
It worked with my own text based file format, but it did not work with another text based file format (not mine)
I opened the file with gedit and saw nothing. More and less utilities also did not show anything. Then I tried vi and it showed:
^M on all these lines that seemed empty until now (a screenshot of it is here: .
Did some research and it seems that opening the file in text mode, all I needed to do was to compare it to '\n'. So I wrote the line:
if (line[0] == '^M' || line[0] == '\n')
break;
to end a while loop where this "if" is inside, but it did not work. What do I need to do?

As you have already surmised, those ^Ms are vi's way of showing you that there are carriage return characters at the end of each line. The file probably originated on Windows.
As other commentators have mentioned, the way a carriage return character is represented in C / C++ is '\r', and the line endings in that particular file will almost certainly actually be \r\n (CRLF).
So, now you know how it all works you have some code to write. getline will remove the \n but you'll have to strip the \r (if there is one) off the end of the line yourself. Go to it.

Related

Loading a file with the LoadFromFile () function with a newline

I load the text file .txt using the LoadFromFile() function, and the text in the middle of the line is marked with a newline '\n'.
The LoadFromFile() function treats this character as a new line and divides the line in that place by creating a new line.
In the Windows system Note the text looks like this: **Ala has ace**
The program that loads this file looks different:
plik->LoadFromFile( path, TEncoding::ASCII);
for( short int i = 0; i < plik->Count; ++i )
Memo1->Lines->Add( plik->Strings[i] );
In Memo1 the text looks like this:
**Ala**
**has ace**
Can I remove the '\n' character to make the entire line and how?
I answered this same question on the Embarcadero forums earlier today, but I will answer it here, too.
plik is a TStringList (according to the other discussion), so its LoadFrom...() method treats bare-CR, bare-LF, and CRLF line breaks equally when the TStrings::LineBreak property matches the RTL's global sLineBreak constant. If the LineBreak property does not match sLineBreak, then TStrings only splits on line breaks that match its LineBreak property.
Since the RTL's sLineBreak constant is CRLF on Windows, and you don't
want to split on bare-LF line breaks, you are going to have to parse
the file data manually, not use TStrings::LoadFromFile() at all.
For instance, you could read the whole file into a System::String using the System::Classes::TStreamReader::ReadToEnd() or System::Ioutils::TFile::ReadAllText() method (TStreamReader and TFile both have methods for reading lines, but they both treat all three forms of line break equally), and then parse that String to extract CRLF-delimited substrings while ignoring any bare-LF characters.
Ideally, you would load a file into a TMemo by using its own LoadFromFile() method. But, in this situation, that will not work, either, because TMemo normalizes all three forms of line breaks to CRLF before passing the data to the Win32 API, so that is not useful to you.

How can I read Notepad++ file in DOS or Fortran?

I received a textfile created with Notepad++ that I'm trying to read with a Fortran 95 program on both a Mac and a PC. The read line is:
read(lun,'(a)',iostat=io1) input
Since I don't know what the line lengths are I defined input to be 512 in length. With non-notepad++ files when the end of line is found the read "stops" and automatically advances to the next line of text. With the notepad++ file, it reads 512 characters, skipping over the carriage returns. When I open the file using the dos editor on the pc I see carriage return symbols (ASCII char 13) but there is no break between lines, they are all appended to one another.
I've tried searching for ichar(13) and ichar(10), backspacing to the beginning of the line and trying to force an advance to the next line; reading in with format '(a,/')', but haven't been able to get anything to work.
What you need is a pipeline type design. The basic routine is one called getline, which gets a line of data up to the carriage return. Inside the initialization, what you do is open the file as a binary file and read a buffer of say 1024 characters in. Whenever getline is called, return the next lot of characters until you get to a CR. If there aren't enough characters, move the unprocessed characters to the front and read in the remaining characters.
It is basically how compilers work - they get a stream of tokens, which, in your case is a string of characters ending with a CR, and then they process the tokens.

Vim: Inter-String Line Breaking

Consider the three lines shown below.
std::ostringstream ss;
cc::write(ss, "Error parsing first magic byte of header: expected 'P', but got '{0}'.", c);
return io_error{ss.str()};
The second line automatically breaks because it exceeds the text width (&tw), but it does so unsatisfactorily for two reasons:
When the line breaks on a string, the procedure is a little more complicated than usual. Vim needs to close the string literal at the end of the broken line, and add a string literal at the beginning of the newly-created line. But it would be awkward for the line to be broken in the middle of a word, so Vim needs to back up until it finds the end of a word boundary, such that adding a " character after it would not exceed the text width. If it can find no such word boundary, then the entire string needs to be begun on the next line.
When the line breaks in the middle of a string, I do not want any indentation to be inserted at the beginning of the proceeding line.
Are there are any native features of Vim or plugins that I can use to get behaviors (1) and (2), or do I have to write my own plugin?
To have this special line breaking behavior both with auto-format and gq, you have to write a custom 'formatexpr' that takes this into account.
I'm not aware of any existing plugin, but maybe you find something to get you started on vim.org.

QString::split() and "\r", "\n" and "\r\n" convention

I understand that QString::split should be used to get a QStringList from a multiline QString. But if I have a file and I don't know if it comes from Mac, Windows or Unix, I'm not sure if QString.split("\n") would work well in all the cases. What is the best way to handle this situation?
If it's acceptable to remove blank lines, you can try:
QString.split(QRegExp("[\r\n]"),QString::SkipEmptyParts);
This splits the string whenever any of the newline character (either line feed or carriage return) is found. Any consecutive line breaks (e.g. \r\n\r\n or \n\n) will be considered multiple delimiters with empty parts between them, which will be skipped.
Emanuele Bezzi's answer misses a couple of points.
In most cases, a string read from a text file will have been read using a text stream, which automatically translates the OS's end-of-line representation to a single '\n' character. So if you're dealing with native text files, '\n' should be the only delimiter you need to worry about. For example, if your program is running on a Windows system, reading input in text mode, line endings will be marked in memory with single \n characters; you'll never see the "\r\n" pairs that exist in the file.
But sometimes you do need to deal with "foreign" text files.
Ideally, you should probably translate any such files to the local format before reading them, which avoids the issue. Only the translation utility needs to be aware of variant line endings; everything else just deals with text.
But that's not always possible; sometimes you might want your program to handle Windows text files when running on a POSIX system (Linux, UNIX, etc.), or vice versa.
A Windows-format text file on a POSIX system will appear to have an extra '\r' character at the end of each line.
A POSIX-format text file on a Windows system will appear to consist of one very long line with embedded '\n' characters.
The most general approach is to read the file in binary mode and deal with the line endings explicitly.
I'm not familiar with QString.split, but I suspect that this:
QString.split(QRegExp("[\r\n]"),QString::SkipEmptyParts);
will ignore empty lines, which will appear either as "\n\n" or as "\r\n\r\n", depending on the format. Empty lines are perfectly valid text data; you shouldn't ignore them unless you're certain that it makes sense to do so.
If you need to deal with text input delimited either by "\n", "\r\n", or "\r", then I think something like this:
QString.split(QRegExp("\n|\r\n|\r"));
would do the job. (Thanks to parsley72's comment for helping me with the regular expression syntax.)
Another point: you're probably not likely to encounter text files that use just '\r' to delimit lines. That's the format used by MacOS up to version 9. MaxOS X is based on UNIX, and it uses standard UNIX-style '\n' line endings (though it probably tolerates '\r' line endings as well).

How to detect newline character(s) in string using Visual Studio 6 C++

I have a multi-line ASCII string coming from some (Windows/UNIX/...) system. Now, I know about differences in newline character in Windows and UNIX (CR-LF / LF) and I want to parse this string on both (CR and LF) characters to detect which newline character(s) is used in this string, so I need to know what "\n" in VS6 C++ means.
My question is if I write a peace of code in Visual Studio 6 for Windows:
bool FindNewline (string & inputString) {
size_t found;
found = inputString.find ("\n");
return (found != string::npos ? true : false);
}
does this searches for CR+LF or only LF? Should I put "\r\n" or compiler interprets "\n" like CR+LF?
inputString.find ("\n");
will search for the LF character (alone).
Library routines may 'translate' between CR/LF and '\n' when I/O is performed on a text stream, but inside the realm of your program code, '\n' is just a line-feed.
"\n" means "\n". Nothing else. So you search for LF only. However Microsoft CRT does some conversions for you when you read a file in text mode, so you can write simpler code, sometimes.
All translation between "\n" and "\r\n" happens during I/O. At all other times, "\n" is just that and nothing more.
Somehow: return (found != string::npos ? true : false); reminds me of another answer I wrote a while back.
Apart from the VS6 part (you really, really want to upgrade this, the compiler is way out of date and Microsoft doesn't really support it anymore), the answer to the question depends on how you are getting the string.
For example, if you read it from a file in text mode, the runtime library will translate \r\n into \n. So if all your text strings are read in text mode via the usual file-based APIs, your search for\n` (ie, newline only) would be sufficient.
If the strings originate in files that are read in binary mode on Windows and are known to contain the DOS/Windows line separator \r\n, the you're better off searching for that character sequence.
EDIT: If you do get it in binary form, yes, ideally you'd have to check for both \r\n and \n. However I would expect that they aren't mixed within one string and still carry the same meaning unless it's a really messed up data format. I would probably check for \r\n first and then \n second if the strings are short enough and scanning them twice doesn't make that much of a difference. If it does, I'd write some code that checks for both \r\n and single \n in a single pass.