Split string by delimiter by using vectors - how to split by newline? - c++

I have function like this (I found it somewhere, it works with \t separator).
vector<string> delimited_str_to_vector(string& str, string delimiter)
{
vector<string> retVect;
size_t pos = 0;
while(str.substr(pos).find(delimiter) != string::npos)
{
retVect.push_back(str.substr(pos, str.substr(pos).find(delimiter)));
pos += str.substr(pos).find(delimiter) + delimiter.size();
}
retVect.push_back(str.substr(pos));
return retVect;
}
I have problem with splitting string by "\r\n" delimiter. What am I doing wrong?
string data = get_file_contents("csvfile.txt");
vector<string> csvRows = delimited_str_to_vector(data, "\r\n");
I'm sure, that my file uses CRLF for new line.

You can use getline to read the file line by line, which:
Extracts characters from is and stores them into str until the delimitation character delim is found (or the newline character, '\n' ...) If the delimiter is found, it is extracted and discarded, i.e. it is not stored and the next input operation will begin after it.
Perhaps you are already reading the file through a function that removes line endings.

If you open your file in text mode, i.e., you don't mention std::ios_base::binary (or one of it alternate spellings) it is likely that the system specific end of line sequences is replaced by \n characters. That is, even if your source file used \r\n, you may not see this character sequence when reading the file. Add the binary flag when opening the file if you really want to process these sequences.

Related

Ifstream stops reading file after a few lines

I am using an ifstream into a stringstream for reading a file but it stops after a couple lines...
string read(string filename)
{
ifstream inFile;
inFile.open(filename);
stringstream strStream;
strStream << inFile.rdbuf();
inFile.close();
string str = strStream.str();
return str;
}
This code stops after 'zh¬'
I am thinking maybe they are control characters in the ascii table, the first char after it stops is 26.
But i wouldn't think that matters.
Your ifstream is being opened in text mode. Try opening the file in binary mode:
std::ifstream inFile(filename, std::ios::binary);
A text stream is an ordered sequence of characters composed into lines (zero or more characters plus a terminating '\n'). Whether the last line requires a terminating '\n' is implementation-defined. Characters may have to be added, altered, or deleted on input and output to conform to the conventions for representing text in the OS (in particular, C streams on Windows OS convert \n to \r\n on output, and convert \r\n to \n on input)
Data read in from a text stream is guaranteed to compare equal to the data that were earlier written out to that stream only if all of the following is true:
the data consist only of printing characters and the control characters \t and \n (in particular, on Windows OS, the character '\0x1A' terminates input)
no \n is immediately preceded by a space character (space characters that are written out immediately before a \n may disappear when read)
the last character is \n
A binary stream is an ordered sequence of characters that can transparently record internal data. Data read in from a binary stream always equals to the data that were earlier written out to that stream. Implementations are only allowed to append a number of null characters to the end of the stream. A wide binary stream doesn't need to end in the initial shift state.
https://en.cppreference.com/w/cpp/io/c#Binary_and_text_modes

How to read string from file with line break using ifstream c++?

I used this code to read lines from file, but I noticed, that it didn't read line breaks:
ifstream fs8(sourceFile);
string line;
while (getline(fs8, line))
{
//here I am doing convertation from utf8 to utf16, but I need also to convert symbol "\n"
}
How to read line with line breaks ?
std::getline() reads data up to a delimiter, which is not stored. By default, that delimiter is '\n'. So you would have to either:
a) Pick a different delimiter -- but then you would no longer read "lines".
b) Add the newline to the data read (line += '\n').
I'd go for b), if you really need that newline converted. (I don't quite see why that would be necessary, but who am I to judge. ;-) )

How to NOT use \n as delimiter in getline()

I'm trying to read in lines from a plain text file, but there are line breaks in the middle of sentences, so getline() reads until a line break as well as until a period. The text file looks like:
then he come tiptoeing down and stood right between us. we could
a touched him nearly. well likely it was minutes and minutes that
there warnt a sound and we all there so close together. there was a
place on my ankle that got to itching but i dasnt scratch it.
My read-in code:
// read in sentences
while (file)
{
string s, record;
if (!getline( file, s )) break;
istringstream ss(s);
while (ss)
{
string s;
if (!getline(ss, s, '.')) break;
record = s;
if(record[0] == ' ')
record.erase(record.begin());
sentences.push_back(record);
}
}
// output sentences
for (vector<string>::size_type i = 0; i < sentences.size(); i++)
cout << sentences[i] << "[][][][]" << endl;
The purpose of the [ ][ ][ ][ ] was to check if linebreaks were used as delimiters and were not just being read into the string. The output would look like:
then he come tiptoeing down and stood right between us.[][][][]
we could[][][][]
a touched him nearly.[][][][]
well likely it was minutes and minutes that[][][][]
there warnt a sound and we all there so close together.[][][][]
there was a[][][][]
place on my ankle that got to itching but i dasnt scratch it.[][][][]
What exactly is your question?
You're using getline() to read from the file stream with a newline delimiter, then parsing that line with a getline() using the istringstream is and a delimiter '.'. So of course you're getting your strings broken at both the new line and the '.'.
getdelim() works like getline(), except that a line delimiter other than newline can be specified as the delimiter argument. As with getline(), a delimiter character is not added if one was not present in the input before end of file was reached.
ssize_t getdelim(char **restrict lineptr, size_t *restrict n, int delimiter, FILE *restrict stream);

Reading a text file in c++

string numbers;
string fileName = "text.txt";
ifstream inputFile;
inputFile.open(fileName.c_str(),ios_base::in);
inputFile >> numbers;
inputFile.close();
cout << numbers;
And my text.txt file is:
1 2 3 4 5
basically a set of integers separated by tabs.
The problem is the program only reads the first integer in the text.txt file and ignores the rest for some reason. If I remove the tabs between the integers it works fine, but with tabs between them, it won't work. What causes this? As far as I know it should ignore any white space characters or am I mistaken? If so is there a better way to get each of these numbers from the text file?
When reading formatted strings the input operator starts with ignoring leading whitespace. Then it reads non-whitespace characters up to the first space and stops. The non-whitespace characters get stored in the std::string. If there are only whitespace characters before the stream reaches end of file (or some error for that matter), reading fails. Thus, your program reads one "word" (in this case a number) and stops reading.
Unfortunately, you only said what you are doing and what the problems are with your approach (where you problem description failed to cover the case where reading the input fails in the first place). Here are a few things you might want to try:
If you want to read multiple words, you can do so, e.g., by reading all words:
std::vector<std::string> words;
std::copy(std::istream_iterator<std::string>(inputFile),
std::istream_iterator<std::string>(),
std::back_inserter(words));
This will read all words from inputFile and store them as a sequence of std::strings in the vector words. Since you file contains numbers you might want to replace std::string by int to read numbers in a readily accessible form.
If you want to read a line rather than a word you can use std::getline() instead:
if (std::getline(inputFile, line)) { ... }
If you want to read multiple lines, you'd put this operation into a loop: There is, unfortunately, no read-made approach to read a sequence of lines as there is for words.
If you want to read the entire file, not just the first line, into a file, you can also use std::getline() but you'd need to know about one character value which doesn't occur in your file, e.g., the null value:
if (std::getline(inputFile, text, char()) { ... }
This approach considers a "line" a sequence of characters up to a null character. You can use any other character value as well. If you can't be sure about the character values, you can read an entire file using std::string's constructor taking iterators:
std::string text((std::istreambuf_iterator<char>(inputFile)),
std::istreambuf_iterator<char>());
Note, that the extra pair of parenthesis around the first parameter is, unfortunately, necessary (if you are using C++ 2011 you can avoid them by using braces, instead of parenthesis).
Use getline to do the reading.
string numbers;
if (inputFile.is_open())//checking if open
{
getline (inputFile,numbers); //fetches entire line into string numbers
inputFile.close();
}
Your program does behave exactly as in your description : inputFile >> numbers; just extract the first integer in the input file, so if you suppress the tab, inputFile>> will extract the number 12345, not 5 five numbers [1,2,3,4,5].
a better method :
vector< int > numbers;
string fileName = "text.txt";
ifstream inputFile;
inputFile.open(fileName.c_str(),ios_base::in);
char c;
while (inputFile.good()) // loop while extraction from file is possible
{
c = inputFile.get(); // get character from file
if ( inputFile.good() and c!= '\t' and c!=' ' ) // not sure of tab and space encoding in C++
{
numbers.push_back( (int) c);
}
}
inputFile.close();

Portable char newline

Is there a way to write a cross-platform parser that reads chars until a newline character is found? I'm using '\0' in Linux, but I'm not sure that this can be done on Windows too.
std::string line;
// fill the line
QTextStream ss(&line);
for(;;)
{
ss >> c;
if(c == '"' || c=='\0' ) // here I want to continue parsing until a new-line character or a ending double quote is found
break;
}
If you are working with the C++ text streams (std::istream and std::ostream, unless the ios_base::binary flag has been set when opening a file stream), then C++ treats input and output of \n in a platform-independent manner.
That means that reading a file which contains \r\n on Windows will treat this as if it were \n, and likewise outputting \n will output a platform-specific newline character.
If you need to read consecutive lines, the easiest way is to use getline:
std::string line;
while (getline(std::cin, line)) {
// process line
}
\0 is never treated as a newline character.
The newline character in C is '\n' not '\0'. It will be converted to whatever the current platform uses.