Portable char newline - c++

Is there a way to write a cross-platform parser that reads chars until a newline character is found? I'm using '\0' in Linux, but I'm not sure that this can be done on Windows too.
std::string line;
// fill the line
QTextStream ss(&line);
for(;;)
{
ss >> c;
if(c == '"' || c=='\0' ) // here I want to continue parsing until a new-line character or a ending double quote is found
break;
}

If you are working with the C++ text streams (std::istream and std::ostream, unless the ios_base::binary flag has been set when opening a file stream), then C++ treats input and output of \n in a platform-independent manner.
That means that reading a file which contains \r\n on Windows will treat this as if it were \n, and likewise outputting \n will output a platform-specific newline character.
If you need to read consecutive lines, the easiest way is to use getline:
std::string line;
while (getline(std::cin, line)) {
// process line
}
\0 is never treated as a newline character.

The newline character in C is '\n' not '\0'. It will be converted to whatever the current platform uses.

Related

Ifstream stops reading file after a few lines

I am using an ifstream into a stringstream for reading a file but it stops after a couple lines...
string read(string filename)
{
ifstream inFile;
inFile.open(filename);
stringstream strStream;
strStream << inFile.rdbuf();
inFile.close();
string str = strStream.str();
return str;
}
This code stops after 'zh¬'
I am thinking maybe they are control characters in the ascii table, the first char after it stops is 26.
But i wouldn't think that matters.
Your ifstream is being opened in text mode. Try opening the file in binary mode:
std::ifstream inFile(filename, std::ios::binary);
A text stream is an ordered sequence of characters composed into lines (zero or more characters plus a terminating '\n'). Whether the last line requires a terminating '\n' is implementation-defined. Characters may have to be added, altered, or deleted on input and output to conform to the conventions for representing text in the OS (in particular, C streams on Windows OS convert \n to \r\n on output, and convert \r\n to \n on input)
Data read in from a text stream is guaranteed to compare equal to the data that were earlier written out to that stream only if all of the following is true:
the data consist only of printing characters and the control characters \t and \n (in particular, on Windows OS, the character '\0x1A' terminates input)
no \n is immediately preceded by a space character (space characters that are written out immediately before a \n may disappear when read)
the last character is \n
A binary stream is an ordered sequence of characters that can transparently record internal data. Data read in from a binary stream always equals to the data that were earlier written out to that stream. Implementations are only allowed to append a number of null characters to the end of the stream. A wide binary stream doesn't need to end in the initial shift state.
https://en.cppreference.com/w/cpp/io/c#Binary_and_text_modes

How to read string from file with line break using ifstream c++?

I used this code to read lines from file, but I noticed, that it didn't read line breaks:
ifstream fs8(sourceFile);
string line;
while (getline(fs8, line))
{
//here I am doing convertation from utf8 to utf16, but I need also to convert symbol "\n"
}
How to read line with line breaks ?
std::getline() reads data up to a delimiter, which is not stored. By default, that delimiter is '\n'. So you would have to either:
a) Pick a different delimiter -- but then you would no longer read "lines".
b) Add the newline to the data read (line += '\n').
I'd go for b), if you really need that newline converted. (I don't quite see why that would be necessary, but who am I to judge. ;-) )

Split string by delimiter by using vectors - how to split by newline?

I have function like this (I found it somewhere, it works with \t separator).
vector<string> delimited_str_to_vector(string& str, string delimiter)
{
vector<string> retVect;
size_t pos = 0;
while(str.substr(pos).find(delimiter) != string::npos)
{
retVect.push_back(str.substr(pos, str.substr(pos).find(delimiter)));
pos += str.substr(pos).find(delimiter) + delimiter.size();
}
retVect.push_back(str.substr(pos));
return retVect;
}
I have problem with splitting string by "\r\n" delimiter. What am I doing wrong?
string data = get_file_contents("csvfile.txt");
vector<string> csvRows = delimited_str_to_vector(data, "\r\n");
I'm sure, that my file uses CRLF for new line.
You can use getline to read the file line by line, which:
Extracts characters from is and stores them into str until the delimitation character delim is found (or the newline character, '\n' ...) If the delimiter is found, it is extracted and discarded, i.e. it is not stored and the next input operation will begin after it.
Perhaps you are already reading the file through a function that removes line endings.
If you open your file in text mode, i.e., you don't mention std::ios_base::binary (or one of it alternate spellings) it is likely that the system specific end of line sequences is replaced by \n characters. That is, even if your source file used \r\n, you may not see this character sequence when reading the file. Add the binary flag when opening the file if you really want to process these sequences.

Using get line() with multiple types of end of line characters

I am using std::getline() in the following manner:
std::fstream verify;
verify.open(myURI.c_str());
std::string countingLine;
if(verify.is_open()){
std::getline(verify, countingLine);
std::istringstream iss(countingLine);
size_t pos;
// Check for the conventional myFile header.
pos = iss.str().find("Time,Group,Percent,Sign,Focus");
if(pos == std::string::npos){//its not there
headerChk = false;
this->setStatusMessage("Invalid header for myFile file");
return 0;
}
// loop that does more validation
iss.clear();
}
The problem is I'm coding on a mac (and some files get modified with both windows tools and apple tools). Some end of line characters are \r instead of \n, so my file string is never broken into lines. I believe there is also a third one I should be checking for. I'm having trouble finding an example of setting up the delim parameter for multiple endOfLine characters.
If someone could help with that example or a different approach that would be great.
Thanks
std::getline() only supports one end of line character. When opening a file in text mode, the system's end of line sequences are converted into one single end of line character (\n). However, this doesn't deal with end of line character sequences from other systems. Practically, all what really needs to be done is to remove the \r character from the input which remains. The best way to remove characters is probably to create a filtering stream buffer. Here is a trivial, untested, and probably slow one (it isn't buffering which means there is virtual function call for each individual character; this is horrific; creating a buffered version isn't much harder, though):
class normalizebuf
: std::streambuf {
std::streambuf* sbuf_;
char buffer_[1];
public:
normalizebuf(std::streambuf* sbuf): sbuf_(sbuf) {}
int underflow() {
int c = this->sbuf_->sbumpc();
while (c == std::char_traits<char>::to_int_type('\r')) {
c = this->sbuf->sbumpc();
}
if (c != std::char_traits<char>::eof()) {
this->buffer_[0] = std::char_traits<char>::to_char_type(c);
this->setg(this->buffer_, this->buffer_, this->buffer_ + 1);
}
return c;
};
You'd use this filter with an existing stream buffer, something like this:
std::ifstream fin("foo");
normalizebuf sbuf(fin.rdbuf());
std::istream in(&sbuf);
... and then you'd use in to read the file with all \r characters removed.

Reading from ifstream won't read whitespace

I'm implementing a custom lexer in C++ and when attempting to read in whitespace, the ifstream won't read it out. I'm reading character by character using >>, and all the whitespace is gone. Is there any way to make the ifstream keep all the whitespace and read it out to me? I know that when reading whole strings, the read will stop at whitespace, but I was hoping that by reading character by character, I would avoid this behaviour.
Attempted: .get(), recommended by many answers, but it has the same effect as std::noskipws, that is, I get all the spaces now, but not the new-line character that I need to lex some constructs.
Here's the offending code (extended comments truncated)
while(input >> current) {
always_next_struct val = always_next_struct(next);
if (current == L' ' || current == L'\n' || current == L'\t' || current == L'\r') {
continue;
}
if (current == L'/') {
input >> current;
if (current == L'/') {
// explicitly empty while loop
while(input.get(current) && current != L'\n');
continue;
}
I'm breaking on the while line and looking at every value of current as it comes in, and \r or \n are definitely not among them- the input just skips to the next line in the input file.
There is a manipulator to disable the whitespace skipping behavior:
stream >> std::noskipws;
The operator>> eats whitespace (space, tab, newline). Use yourstream.get() to read each character.
Edit:
Beware: Platforms (Windows, Un*x, Mac) differ in coding of newline. It can be '\n', '\r' or both. It also depends on how you open the file stream (text or binary).
Edit (analyzing code):
After
while(input.get(current) && current != L'\n');
continue;
there will be an \n in current, if not end of file is reached. After that you continue with the outmost while loop. There the first character on the next line is read into current. Is that not what you wanted?
I tried to reproduce your problem (using char and cin instead of wchar_t and wifstream):
//: get.cpp : compile, then run: get < get.cpp
#include <iostream>
int main()
{
char c;
while (std::cin.get(c))
{
if (c == '/')
{
char last = c;
if (std::cin.get(c) && c == '/')
{
// std::cout << "Read to EOL\n";
while(std::cin.get(c) && c != '\n'); // this comment will be skipped
// std::cout << "go to next line\n";
std::cin.putback(c);
continue;
}
else { std::cin.putback(c); c = last; }
}
std::cout << c;
}
return 0;
}
This program, applied to itself, eliminates all C++ line comments in its output. The inner while loop doesn't eat up all text to the end of file. Please note the putback(c) statement. Without that the newline would not appear.
If it doesn't work the same for wifstream, it would be very strange except for one reason: when the opened text file is not saved as 16bit char and the \n char ends up in the wrong byte...
You could open the stream in binary mode:
std::wifstream stream(filename, std::ios::binary);
You'll lose any formatting operations provided my the stream if you do this.
The other option is to read the entire stream into a string and then process the string:
std::wostringstream ss;
ss << filestream.rdbuf();
OF course, getting the string from the ostringstream rquires an additional copy of the string, so you could consider changing this at some point to use a custom stream if you feel adventurous.
EDIT: someone else mention istreambuf_iterator, which is probably a better way of doing it than reading the whole stream into a string.
Wrap the stream (or its buffer, specifically) in a std::streambuf_iterator? That should ignore all formatting, and also give you a nice iterator interface.
Alternatively, a much more efficient, and fool-proof, approach might to just use the Win32 API (or Boost) to memory-map the file. Then you can traverse it using plain pointers, and you're guaranteed that nothing will be skipped or converted by the runtime.
You could just Wrap the stream in a std::streambuf_iterator to get data with all whitespaces and newlines like this .
/*Open the stream in default mode.*/
std::ifstream myfile("myfile.txt");
if(myfile.good()) {
/*Read data using streambuffer iterators.*/
vector<char> buf((std::istreambuf_iterator<char>(myfile)), (std::istreambuf_iterator<char>()));
/*str_buf holds all the data including whitespaces and newline .*/
string str_buf(buf.begin(),buf.end());
myfile.close();
}
By default, this skipws flag is already set on the ifstream object, so we must disable it. The ifstream object has these default flags because of std::basic_ios::init, called on every new ios_base object (more details). Any of the following would work:
in_stream.unsetf(std::ios_base::skipws);
in_stream >> std::noskipws; // Using the extraction operator, same as below
std::noskipws(in_stream); // Explicitly calling noskipws instead of using operator>>
Other flags are listed on cpp reference.
The stream extractors behave the same and skip whitespace.
If you want to read every byte, you can use the unformatted input functions, like stream.get(c).
Why not simply use getline ?
You will get all the whitespaces, and while you won't get the end of lines characters, you will still know where they lie :)
Just Use getline.
while (getline(input,current))
{
cout<<current<<"\n";
}
I ended up just cracking open the Windows API and using it to read the whole file into a buffer first, and then reading that buffer character by character. Thanks guys.