Loading a file with the LoadFromFile () function with a newline - c++

I load the text file .txt using the LoadFromFile() function, and the text in the middle of the line is marked with a newline '\n'.
The LoadFromFile() function treats this character as a new line and divides the line in that place by creating a new line.
In the Windows system Note the text looks like this: **Ala has ace**
The program that loads this file looks different:
plik->LoadFromFile( path, TEncoding::ASCII);
for( short int i = 0; i < plik->Count; ++i )
Memo1->Lines->Add( plik->Strings[i] );
In Memo1 the text looks like this:
**Ala**
**has ace**
Can I remove the '\n' character to make the entire line and how?

I answered this same question on the Embarcadero forums earlier today, but I will answer it here, too.
plik is a TStringList (according to the other discussion), so its LoadFrom...() method treats bare-CR, bare-LF, and CRLF line breaks equally when the TStrings::LineBreak property matches the RTL's global sLineBreak constant. If the LineBreak property does not match sLineBreak, then TStrings only splits on line breaks that match its LineBreak property.
Since the RTL's sLineBreak constant is CRLF on Windows, and you don't
want to split on bare-LF line breaks, you are going to have to parse
the file data manually, not use TStrings::LoadFromFile() at all.
For instance, you could read the whole file into a System::String using the System::Classes::TStreamReader::ReadToEnd() or System::Ioutils::TFile::ReadAllText() method (TStreamReader and TFile both have methods for reading lines, but they both treat all three forms of line break equally), and then parse that String to extract CRLF-delimited substrings while ignoring any bare-LF characters.
Ideally, you would load a file into a TMemo by using its own LoadFromFile() method. But, in this situation, that will not work, either, because TMemo normalizes all three forms of line breaks to CRLF before passing the data to the Win32 API, so that is not useful to you.

Related

Attempt to compare character in string to carriage return does not work

I have some code to read a text based file format in that it checks for empty line with:
line == ""
where line is a string that receives a text line obtained through getline.
It worked with my own text based file format, but it did not work with another text based file format (not mine)
I opened the file with gedit and saw nothing. More and less utilities also did not show anything. Then I tried vi and it showed:
^M on all these lines that seemed empty until now (a screenshot of it is here: .
Did some research and it seems that opening the file in text mode, all I needed to do was to compare it to '\n'. So I wrote the line:
if (line[0] == '^M' || line[0] == '\n')
break;
to end a while loop where this "if" is inside, but it did not work. What do I need to do?
As you have already surmised, those ^Ms are vi's way of showing you that there are carriage return characters at the end of each line. The file probably originated on Windows.
As other commentators have mentioned, the way a carriage return character is represented in C / C++ is '\r', and the line endings in that particular file will almost certainly actually be \r\n (CRLF).
So, now you know how it all works you have some code to write. getline will remove the \n but you'll have to strip the \r (if there is one) off the end of the line yourself. Go to it.

Reading in quoted CSV data without newline as endline

I have an issue with a file I am trying to read in and I don't know how to do solve it.
The file is a CSV, but there are also commas in the text of the file, so there are quotes around the commas indicating new values.
For instance:
"1","hello, ""world""","and then this" // In text " is written as ""
I would like to know how to deal quotes using a QFileStream (though I haven't seen a base solution either).
Furthermore, another problem is that I also can't read line by line as within these quotes there might be newlines.
In R, there is an option of quotes="" which solves these problems.
There must be something in C++. What is it?
You can split by quote (not just quote, but any symbol, like '\' for example) symbol in qt, just put \ before it, Example : string.split("\""); will split string by '"' symbol.
Here is a simple console app to split your file (the easiest solution is to split by "," symbols seems so far):
// opening file split.csv, in this case in the project folder
QFile file("split.csv");
file.open(QIODevice::ReadOnly);
// flushing out all of it's contents to stdout, just for testing
std::cout<<QString(file.readAll()).toStdString()<<std::endl;
// reseting file to read again
file.reset();
// reading all file to QByteArray, passing it to QString consructor,
// splitting that string by "," string and putting it to QStringList list
// where every element of a list is value from cell in csv file
QStringList list=QString(file.readAll()).split("\",\"",QString::SkipEmptyParts);
// adding back quotes, that was taken away by split
for (int i=0; i<list.size();i++){
if (i!=0) list[i].prepend("\"");
if (i!=(list.size()-1)) list[i].append("\"");
}//*/
// flushing results to stdout
foreach (QString i,list) std::cout<<i.toStdString()<<std::endl; // not using QDebug, becouse it will add more quotes to output, which is already confusing enough
where split.csv contains "1","hello, ""world""","and then this" and the output is:
"1"
"hello, ""world"""
"and then this"
After googling I've found some ready solution. See this article about qxt.

How can I read Notepad++ file in DOS or Fortran?

I received a textfile created with Notepad++ that I'm trying to read with a Fortran 95 program on both a Mac and a PC. The read line is:
read(lun,'(a)',iostat=io1) input
Since I don't know what the line lengths are I defined input to be 512 in length. With non-notepad++ files when the end of line is found the read "stops" and automatically advances to the next line of text. With the notepad++ file, it reads 512 characters, skipping over the carriage returns. When I open the file using the dos editor on the pc I see carriage return symbols (ASCII char 13) but there is no break between lines, they are all appended to one another.
I've tried searching for ichar(13) and ichar(10), backspacing to the beginning of the line and trying to force an advance to the next line; reading in with format '(a,/')', but haven't been able to get anything to work.
What you need is a pipeline type design. The basic routine is one called getline, which gets a line of data up to the carriage return. Inside the initialization, what you do is open the file as a binary file and read a buffer of say 1024 characters in. Whenever getline is called, return the next lot of characters until you get to a CR. If there aren't enough characters, move the unprocessed characters to the front and read in the remaining characters.
It is basically how compilers work - they get a stream of tokens, which, in your case is a string of characters ending with a CR, and then they process the tokens.

Vim: Inter-String Line Breaking

Consider the three lines shown below.
std::ostringstream ss;
cc::write(ss, "Error parsing first magic byte of header: expected 'P', but got '{0}'.", c);
return io_error{ss.str()};
The second line automatically breaks because it exceeds the text width (&tw), but it does so unsatisfactorily for two reasons:
When the line breaks on a string, the procedure is a little more complicated than usual. Vim needs to close the string literal at the end of the broken line, and add a string literal at the beginning of the newly-created line. But it would be awkward for the line to be broken in the middle of a word, so Vim needs to back up until it finds the end of a word boundary, such that adding a " character after it would not exceed the text width. If it can find no such word boundary, then the entire string needs to be begun on the next line.
When the line breaks in the middle of a string, I do not want any indentation to be inserted at the beginning of the proceeding line.
Are there are any native features of Vim or plugins that I can use to get behaviors (1) and (2), or do I have to write my own plugin?
To have this special line breaking behavior both with auto-format and gq, you have to write a custom 'formatexpr' that takes this into account.
I'm not aware of any existing plugin, but maybe you find something to get you started on vim.org.

QString::split() and "\r", "\n" and "\r\n" convention

I understand that QString::split should be used to get a QStringList from a multiline QString. But if I have a file and I don't know if it comes from Mac, Windows or Unix, I'm not sure if QString.split("\n") would work well in all the cases. What is the best way to handle this situation?
If it's acceptable to remove blank lines, you can try:
QString.split(QRegExp("[\r\n]"),QString::SkipEmptyParts);
This splits the string whenever any of the newline character (either line feed or carriage return) is found. Any consecutive line breaks (e.g. \r\n\r\n or \n\n) will be considered multiple delimiters with empty parts between them, which will be skipped.
Emanuele Bezzi's answer misses a couple of points.
In most cases, a string read from a text file will have been read using a text stream, which automatically translates the OS's end-of-line representation to a single '\n' character. So if you're dealing with native text files, '\n' should be the only delimiter you need to worry about. For example, if your program is running on a Windows system, reading input in text mode, line endings will be marked in memory with single \n characters; you'll never see the "\r\n" pairs that exist in the file.
But sometimes you do need to deal with "foreign" text files.
Ideally, you should probably translate any such files to the local format before reading them, which avoids the issue. Only the translation utility needs to be aware of variant line endings; everything else just deals with text.
But that's not always possible; sometimes you might want your program to handle Windows text files when running on a POSIX system (Linux, UNIX, etc.), or vice versa.
A Windows-format text file on a POSIX system will appear to have an extra '\r' character at the end of each line.
A POSIX-format text file on a Windows system will appear to consist of one very long line with embedded '\n' characters.
The most general approach is to read the file in binary mode and deal with the line endings explicitly.
I'm not familiar with QString.split, but I suspect that this:
QString.split(QRegExp("[\r\n]"),QString::SkipEmptyParts);
will ignore empty lines, which will appear either as "\n\n" or as "\r\n\r\n", depending on the format. Empty lines are perfectly valid text data; you shouldn't ignore them unless you're certain that it makes sense to do so.
If you need to deal with text input delimited either by "\n", "\r\n", or "\r", then I think something like this:
QString.split(QRegExp("\n|\r\n|\r"));
would do the job. (Thanks to parsley72's comment for helping me with the regular expression syntax.)
Another point: you're probably not likely to encounter text files that use just '\r' to delimit lines. That's the format used by MacOS up to version 9. MaxOS X is based on UNIX, and it uses standard UNIX-style '\n' line endings (though it probably tolerates '\r' line endings as well).