std::string manipulation: whitespace, "newline escapes '\'" and comments #

std::string manipulation: whitespace, "newline escapes '\'" and comments # - c++

Kind of looking for affirmation here. I have some hand-written code, which I'm not shy to say I'm proud of, which reads a file, removes leading whitespace, processes newline escapes '\' and removes comments starting with #. It also removes all empty lines (also whitespace-only ones). Any thoughts/recommendations? I could probably replace some std::cout's with std::runtime_errors... but that's not a priority here :)
const int RecipeReader::readRecipe()
{
ifstream is_recipe(s_buffer.c_str());
if (!is_recipe)
cout << "unable to open file" << endl;
while (getline(is_recipe, s_buffer))
{
// whitespace+comment
removeLeadingWhitespace(s_buffer);
processComment(s_buffer);
// newline escapes + append all subsequent lines with '\'
processNewlineEscapes(s_buffer, is_recipe);
// store the real text line
if (!s_buffer.empty())
v_s_recipe.push_back(s_buffer);
s_buffer.clear();
}
is_recipe.close();
return 0;
}
void RecipeReader::processNewlineEscapes(string &s_string, ifstream &is_stream)
{
string s_temp;
size_t sz_index = s_string.find_first_of("\\");
while (sz_index <= s_string.length())
{
if (getline(is_stream,s_temp))
{
removeLeadingWhitespace(s_temp);
processComment(s_temp);
s_string = s_string.substr(0,sz_index-1) + " " + s_temp;
}
else
cout << "Error: newline escape '\' found at EOF" << endl;
sz_index = s_string.find_first_of("\\");
}
}
void RecipeReader::processComment(string &s_string)
{
size_t sz_index = s_string.find_first_of("#");
s_string = s_string.substr(0,sz_index);
}
void RecipeReader::removeLeadingWhitespace(string &s_string)
{
const size_t sz_length = s_string.size();
size_t sz_index = s_string.find_first_not_of(" \t");
if (sz_index <= sz_length)
s_string = s_string.substr(sz_index);
else if ((sz_index > sz_length) && (sz_length != 0)) // "empty" lines with only whitespace
s_string.clear();
}
Some extra info: the first s_buffer passed to the ifstream contains the filename, std::string s_buffer is a class data member, so is std::vector v_s_recipe. Any comment is welcome :)
UPDATE: for the sake of not being ungrateful, here is my replacement, all-in-one function that does what I want for now (future holds: parenthesis, maybe quotes...):
void readRecipe(const std::string &filename)
{
string buffer;
string line;
size_t index;
ifstream file(filename.c_str());
if (!file)
throw runtime_error("Unable to open file.");
while (getline(file, line))
{
// whitespace removal
line.erase(0, line.find_first_not_of(" \t\r\n\v\f"));
// comment removal TODO: store these for later output
index = line.find_first_of("#");
if (index != string::npos)
line.erase(index, string::npos);
// ignore empty buffer
if (line.empty())
continue;
// process newline escapes
index = line.find_first_of("\\");
if (index != string::npos)
{
line.erase(index,string::npos); // ignore everything after '\'
buffer += line;
continue; // read next line
}
else // no newline escapes found
{
buffer += line;
recipe.push_back(buffer);
buffer.clear();
}
}
}

Definitely ditch the hungarian notation.

It's not bad, but I think you're thinking of std::basic_string<T> too much as a string and not enough as an STL container. For example:
void RecipeReader::removeLeadingWhitespace(string &s_string)
{
s_string.erase(s_string.begin(),
std::find_if(s_string.begin(), s_string.end(), std::not1(isspace)));
}

A few comments:
As another answer (+1 from me) said - ditch the hungarian notation. It really doesn't do anything but add unimportant trash to every line. In addition, ifstream yielding an is_ prefix is ugly. is_ usually indicates a boolean.
Naming a function with processXXX gives very very little information on what it is actually doing. Use removeXXX, like you did with the RemoveLeadingWhitespace function.
The processComment function does an unnecessary copy and assignment. Use s.erase(index, string::npos); (npos is default, but this is more obvious).
It's not clear what your program does, but you might consider storing a different file format (like html or xml) if you need to post-process your files like this. That would depend on the goal.
using find_first_of('#') may give you some false positives. If it's present in quotes, it's not necessarily indicating a comment. (But again, this depends on your file format)
using find_first_of(c) with one character can be simplified to find(c).
The processNewlineEscapes function duplicates some functionality from the readRecipe function. You may consider refactoring to something like this:
-
string s_buffer;
string s_line;
while (getline(is_recipe, s_line)) {
// Sanitize the raw line.
removeLeadingWhitespace(s_line);
removeComments(s_line);
// Skip empty lines.
if (s_line.empty()) continue;
// Add the raw line to the buffer.
s_buffer += s_line;
// Collect buffer across all escaped lines.
if (*s_line.rbegin() == '\\') continue;
// This line is not escaped, now I can process the buffer.
v_s_recipe.push_back(s_buffer);
s_buffer.clear();
}

I'm not big on methods that modify the parameters. Why not return strings rather than modifying the input arguments? For example:
string RecipeReader::processComment(const string &s)
{
size_t index = s.find_first_of("#");
return s_string.substr(0, index);
}
I personally feel this clarifies intent and makes it more obvious what the method is doing.

I'd consider replacing all your processing code (almost everything you've written) with boost::regex code.

A few comments:
If s_buffer contains the file name to be opened, it should have a better name like s_filename.
The s_buffer member variable should not be reused to store temporary data from reading the file. A local variable in the function would do as well, no need for the buffer to be a member variable.
If there is not need to have the filename stored as a member variable it could just be passed as a parameter to readRecipe()
processNewlineEscapes() should check that the found backslash is at the end of the line before appending the next line. At the moment any backslash at any position triggers adding of the next line at the position of the backslash. Also, if there are several backslashes, find_last_of() would probably easier to use than find_first_of().
When checking the result of find_first_of() in processNewlineEscapes() and removeLeadingWhitespace() it would be cleaner to compare against string::npos to check if anything was found.
The logic at the end of removeLeadingWhitespace() could be simplified:
size_t sz_index = s_string.find_first_not_of(" \t");
if (sz_index != s_string.npos)
s_string = s_string.substr(sz_index);
else // "empty" lines with only whitespace
s_string.clear();

You might wish to have a look at Boost.String. It's a simple collection of algorithms to work with streams, and notably features trim methods :)
Now, on to the review itself:
Don't bother to remove the hungarian notation, if it's your style then use it, however you should try and improve the names of methods and variables. processXXX is definitely not indicating anything useful...
Functionally, I am worried about your assumptions: the main issue here is that you do not care for espace sequences (\n uses a backslash for example) and you do not worry for the presence of strings of charachters: std::cout << "Process #" << pid << std::endl; would yield an invalid line because of your "comment" preprocessing
Furthermore, since you remove the comments before processing the newline escapes:
i = 3; # comment \
running comment
will be parsed as
i = 3; running comment
which is syntactically incorrect.
From an interface point of view: there is not benefit in having the methods being class members here, you don't need an instance of RecipeReader really...
And finally, I find it awkward that two methods would read from the stream.
Little peeve of mine: returning by const value does not serve any purpose.
Here is my own version, as I believe than showing is easier than discussing:
// header file
std::vector<std::string> readRecipe(const std::string& fileName);
std::string extractLine(std::ifstream& file);
std::pair<std:string,bool> removeNewlineEscape(const std::string& line);
std::string removeComment(const std::string& line);
// source file
#include <boost/algorithm/string.hpp>
std::vector<std::string> readRecipe(const std::string& fileName)
{
std::vector<std::string> result;
ifstream file(fileName.c_str());
if (!file) std::cout << "Could not open: " << fileName << std::endl;
std::string line = extractLine(file);
while(!line.empty())
{
result.push_back(line);
line = extractLine(file);
} // looping on the lines
return result;
} // readRecipe
std::string extractLine(std::ifstream& file)
{
std::string line, buffer;
while(getline(file, buffer))
{
std::pair<std::string,bool> r = removeNewlineEscape(buffer);
line += boost::trim_left_copy(r.first); // remove leading whitespace
// based on the current locale
if (!r.second) break;
line += " "; // as we append, we insert a whitespace
// in order unintended token concatenation
}
return removeComment(line);
} // extractLine
//< Returns the line, minus the '\' character
//< if it was the last significant one
//< Returns a boolean indicating whether or not the line continue
//< (true if it's necessary to concatenate with the next line)
std::pair<std:string,bool> removeNewlineEscape(const std::string& line)
{
std::pair<std::string,bool> result;
result.second = false;
size_t pos = line.find_last_not_of(" \t");
if (std::string::npos != pos && line[pos] == '\')
{
result.second = true;
--pos; // we don't want to have this '\' character in the string
}
result.first = line.substr(0, pos);
return result;
} // checkNewlineEscape
//< The main difficulty here is NOT to confuse a # inside a string
//< with a # signalling a comment
//< assuming strings are contained within "", let's roll
std::string removeComment(const std::string& line)
{
size_t pos = line.find_first_of("\"#");
while(std::string::npos != pos)
{
if (line[pos] == '"')
{
// We have detected the beginning of a string, we move pos to its end
// beware of the tricky presence of a '\' right before '"'...
pos = line.find_first_of("\"", pos+1);
while (std::string::npos != pos && line[pos-1] == '\')
pos = line.find_first_of("\"", pos+1);
}
else // line[pos] == '#'
{
// We have found the comment marker in a significant position
break;
}
pos = line.find_first_of("\"#", pos+1);
} // looking for comment marker
return line.substr(0, pos);
} // removeComment
It is fairly inefficient (but I trust the compiler for optmizations), but I believe it behaves correctly though it's untested so take it with a grain of salt. I have focused mainly on solving the functional issues, the naming convention I follow is different from yours but I don't think it should matter.

I want to point out a small and sweet version which lacks \ support but skips whitespace-lines and comments. (Note the std::ws in the call to std::getline.
#include <algorithm>
#include <iostream>
#include <sstream>
#include <string>
int main()
{
std::stringstream input(
" # blub\n"
"# foo bar\n"
" foo# foo bar\n"
"bar\n"
);
std::string line;
while (std::getline(input >> std::ws, line)) {
line.erase(std::find(line.begin(), line.end(), '#'), line.end());
if (line.empty()) {
continue;
}
std::cout << "line: \"" << line << "\"\n";
}
}
Output:
line: "foo"
line: "bar"

Related

C++: Parsing a log with split but one entry can have several lines

I am started to learn C++, and my current project should extend my knowledge in using files, split and finally do a regexp on a varchar string.
The problem:
I have a logfile wich contains data like
<date> <time> <username> (<ip:port>) <uuid> - #<id> "<varchar text>"
e.g:
10.03.2016 07:40:38: blacksheep (127.0.0.1:54444) #865 "(this can have text
over several lines
without ending marker"
10.03.2016 07:40:38: blacksheep (127.0.0.1:54444) #865 "A new line, just one without \n"
So I am starting with the following but I am stuck now with how to get the lines with \n into the string. How can this be solved the right way without unnecessary steps like splitting several times and how can I define where a complete line (even if it's having some \n within) stops?
With fin.ignore(80, '\n');, \ns are being ignored, but this implicates that I will only have one line... Short text before # and a very large string after :-|
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
std::vector<std::string> split(std::string str, char seperator) {
std::vector<std::string> result;
std::string::size_type token_offset = 0;
std::string::size_type seperator_offset = 0;
while (seperator_offset != std::string::npos) {
seperator_offset = str.find(seperator, seperator_offset);
std::string::size_type token_length;
if(seperator_offset == std::string::npos) {
token_length = seperator_offset;
} else {
token_length = seperator_offset - token_offset;
seperator_offset++;
}
std::string token = str.substr(token_offset, token_length);
if (!token.empty()) {
result.push_back(token);
}
token_offset = seperator_offset;
}
return result;
}
int main(int argc, char **argv) {
std::fstream fin("input.dat");
while(!fin.eof()) {
std::string line;
getline(fin, line, ';');
fin.ignore(80, '\n');
std::vector<std::string> strs = split(line, ',');
for(int i = 0; i < strs.size(); ++i) {
std::cout << strs[i] << std::endl;
}
}
fin.close();
return 0;
}
Regards
Blacksheep

There is no canned C++ library function for swallowing input like that. std::getline reads the next line of text, up until the next newline character (by default). That's it. std::getline does not do any further examination on the input, beyond that.
I will suggest the following approach for you.
Initialize a buffer representing the entire logical line just read.
Read the next line of input, using std::getline(), and append the line to the input buffer.
Count the number of quote characters in the buffer.
Is the number of quotes even? Stop. If the quote character count is odd, append a newline to the buffer, then go back and read another line of input.
Some obvious optimizations are possible here, of course, but this should be a good start.

Reading in only letters from a text file

I am trying to read in from a text file a poem that contains commas, spaces, periods, and newline character. I am trying to use getline to read in each separate word. I do not want to read in any of the commas, spaces, periods, or newline character. As I read in each word I am capitalizing each letter then calling my insert function to insert each word into a binary search tree as a separate node. I do not know the best way to separate each word. I have been able to separate each word by spaces but the commas, periods, and newline characters keep being read in.
Here is my text file:
Roses are red,
Violets are blue,
Data Structures is the best,
You and I both know it is true.
The code I am using is this:
string inputFile;
cout << "What is the name of the text file?";
cin >> inputFile;
ifstream fin;
fin.open(inputFile);
//Input once
string input;
getline(fin, input, ' ');
for (int i = 0; i < input.length(); i++)
{
input[i] = toupper(input[i]);
}
//check for duplicates
if (tree.Find(input, tree.Current, tree.Parent) == true)
{
tree.Insert(input);
countNodes++;
countHeight = tree.Height(tree.Root);
}
Basically I am using the getline(fin,input, ' ') to read in my input.

I was able to figure out a solution. I was able to read in an entire line of code into the variable line, then I searched each letter of the word and only kept what was a letter and I stored that into word.Then, I was able to call my insert function to insert the Node into my tree.
const int MAXWORDSIZE = 50;
const int MAXLINESIZE = 1000;
char word[MAXWORDSIZE], line[MAXLINESIZE];
int lineIdx, wordIdx, lineLength;
//get a line
fin.getline(line, MAXLINESIZE - 1);
lineLength = strlen(line);
while (fin)
{
for (int lineIdx = 0; lineIdx < lineLength;)
{
//skip over non-alphas, and check for end of line null terminator
while (!isalpha(line[lineIdx]) && line[lineIdx] != '\0')
++lineIdx;
//make sure not at the end of the line
if (line[lineIdx] != '\0')
{
//copy alphas to word c-string
wordIdx = 0;
while (isalpha(line[lineIdx]))
{
word[wordIdx] = toupper(line[lineIdx]);
wordIdx++;
lineIdx++;
}
//make it a c-string with the null terminator
word[wordIdx] = '\0';
//THIS IS WHERE YOU WOULD INSERT INTO THE BST OR INCREMENT FREQUENCY COUNTER IN THE NODE
if (tree.Find(word) == false)
{
tree.Insert(word);
totalNodes++;
//output word
//cout << word << endl;
}
else
{
tree.Counter();
}
}

This is a good time for a technique I've posted a few times before: define a ctype facet that treats everything but letters as white space (searching for imbue will show several examples).
From there, it's a matter of std::transform with istream_iterators on the input side, a std::set for the output, and a lambda to capitalize the first letter.

You can make a custom getline function for multiple delimiters:
std::istream &getline(std::istream &is, std::string &str, std::string const& delims)
{
str.clear();
// the 3rd parameter type and the condition part on the right side of &&
// should be all that differs from std::getline
for(char c; is.get(c) && delims.find(c) == std::string::npos; )
str.push_back(c);
return is;
}
And use it:
getline(fin, input, " \n,.");

You can use std::regex to select your tokens
Depending on the size of your file you can read it either line by line or entirely in an std::string.
To read the file you can use :
std::ifstream t("file.txt");
std::string sin((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());
and this will do the matching for space separated string.
std::regex word_regex(",\\s]+");
auto what =
std::sregex_iterator(sin.begin(), sin.end(), word_regex);
auto wend = std::sregex_iterator();
std::vector<std::string> v;
for (;what!=wend ; wend) {
std::smatch match = *what;
V.push_back(match.str());
}
I think to separate tokens separated either by , space or new line you should use this regex : (,| \n| )[[:alpha:]].+ . I have not tested though and it might need you to check this out.

Splitting sentences and placing in vector

I was given a code from my professor that takes multiple lines of input. I am currently changing the code for our current assignment and I came across an issue. The code is meant to take strings of input and separate them into sentences from periods and put those strings into a vector.
vector<string> words;
string getInput() {
string s = ""; // string to return
bool cont = true; // loop control.. continue is true
while (cont){ // while continue
string l; // string to hold a line
cin >> l; // get line
char lastChar = l.at(l.size()-1);
if(lastChar=='.') {
l = l.substr(0, l.size()-1);
if(l.size()>0){
words.push_back(s);
s = "";
}
}
if (lastChar==';') { // use ';' to stop input
l = l.substr(0, l.size()-1);
if (l.size()>0)
s = s + " " + l;
cont = false; // set loop control to stop
}
else
s = s + " " + l; // add line to string to return
// add a blank space to prevent
// making a new word from last
// word in string and first word
// in line
}
return s;
}
int main()
{
cout << "Input something: ";
string s = getInput();
cout << "Your input: " << s << "\n" << endl;
for(int i=0; i<words.size(); i++){
cout << words[i] << "\n";
}
}
The code puts strings into a vector but takes the last word of the sentence and attaches it to the next string and I cannot seem to understand why.

This line
s = s + " " + l;
will always execute, except for the end of input, even if the last character is '.'. You are most likely missing an else between the two if-s.

You have:
string l; // string to hold a line
cin >> l; // get line
The last line does not read a line unless the entire line has non-white space characters. To read a line of text, use:
std::getline(std::cin, l);
It's hard telling whether that is tripping your code up since you haven't posted any sample input.

I would at least consider doing this job somewhat differently. Right now, you're reading a word at a time, then putting the words back together until you get to a period.
One possible alternative would be to use std::getline to read input until you get to a period, and put the whole string into the vector at once. Code to do the job this way could look something like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
int main() {
std::vector<std::string> s;
std::string temp;
while (std::getline(std::cin, temp, '.'))
s.push_back(temp);
std::transform(s.begin(), s.end(),
std::ostream_iterator<std::string>(std::cout, ".\n"),
[](std::string const &s) { return s.substr(s.find_first_not_of(" \t\n")); });
}
This does behave differently in one circumstance--if you have a period somewhere other than at the end of a word, the original code will ignore that period (won't treat it as the end of a sentence) but this will. The obvious place this would make a difference would be if the input contained a number with a decimal point (e.g., 1.234), which this would break at the decimal point, so it would treat the 1 as the end of one sentence, and the 234 as the beginning of another. If, however, you don't need to deal with that type of input, this can simplify the code considerably.
If the sentences might contain decimal points, then I'd probably write the code more like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
class sentence {
std::string data;
public:
friend std::istream &operator>>(std::istream &is, sentence &s) {
std::string temp, word;
while (is >> word) {
temp += word + ' ';
if (word.back() == '.')
break;
}
s.data = temp;
return is;
}
operator std::string() const { return data; }
};
int main() {
std::copy(std::istream_iterator<sentence>(std::cin),
std::istream_iterator<sentence>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Although somewhat longer and more complex, at least to me it still seems (considerably) simpler than the code in the question. I guess it's different in one way--it detects the end of the input by...detecting the end of the input, rather than depending on the input to contain a special delimiter to mark the end of the input. If you're running it interactively, you'll typically need to use a special key combination to signal the end of input (e.g., Ctrl+D on Linux/Unix, or F6 on Windows).
In any case, it's probably worth considering a fundamental difference between this code and the code in the question: this defines a sentence as a type, where the original code just leaves everything as strings, and manipulates strings. This defines an operator>> for a sentence, that reads a sentence from a stream as we want it read. This gives us a type we can manipulate as an object. Since it's like a string in other ways, we provide a conversion to string so once you're done reading one from a stream, you can just treat it as a string. Having done that, we can (for example) use a standard algorithm to read sentences from standard input, and write them to standard output, with a new-line after each to separate them.

c++ char* parsing

SO, I am trying to write a function that will return a vector<char**>, as such:
vector<char**> test(string mystr) {
char*temp=new char[mystr.size()+1];
strcpy(temp,mystr.c_str());
char*subStr=strtok(temp,":");
while(subStr!=NULL) {
int i=0;
char**args=new char*[200];
char*tempsta=newchar[strlen(subStr)+1];
strcpy(tempsta, subStr);
args[i]=strtok(tempsta," ");
while(args[i]!=NULL) {
i++;
args[i]=strtok(NULL," ");
}
fullVec.push_back(args);
//cout<<subStr<<endl;
subStr=strtok(NULL,":");
}
return fullVec;
}
so I want to split the parameter string up with ":" delimeter, then with " " delimeter. On the cout<<subStr call I get what is expected if I comment out everything from int i=0 to fullVec.push_back(args). If I do not comment out all of those lines I only get the first substring (until the first ":") is encountered, and then the largest while loop exits.
for what is expected I mean; let us assume the parameter is "my name is: bon jovi: xxx ab"
if everything is commented out, the following lines will be printed:
my name is
bon jovi
xxx ab
if I leave it as is, what will happen is only
my name is
will print, and the large loop will exit
any assistance is appreciated, thanks! (Yes, I am aware that this seems like a silly exercise which can be done much more elegantly/easily...however I would like to get this solution to work before I entertain using string etc.)

Your problem is that strtok() maintains state between calls.
If the first parameter is not NULL then it uses to reset the state otherwise it uses the state it has saved to continue parsing from where it left off.
Since you have two nested calls to strtok() the second call is messing with state of the outer call.
This call:
args[i]=strtok(tempsta," ");
Is resetting the internal state of strtok(). It now no longer knows anything about the state from your outer call. Thus when you get to the end of the string in your inner loop.
This call:
subStr=strtok(NULL,":");
Is now using the saved state of the inner loop. So it basically just terminates as you have already reached the end of that tokenizing stream.

As it was perfectly pointed out by Loki already, you should not mix C and C++. In case you want C++ solution for your problem, then it's better to stick with STL classes that will take care of ugly memory management for you (see RAII idiom) such as std::string, std::vector, std::istringstream.
This is how your function could look like:
typedef std::vector<std::string> Line;
std::vector<Line> parse(std::string inputString)
{
std::vector<Line> lines;
std::istringstream inputStream(inputString);
for (std::string line; std::getline(inputStream, line, ':'); )
{
if (!line.empty())
{
lines.push_back(Line());
std::istringstream lineStream(line);
for (std::string word; std::getline(lineStream, word, ' '); )
{
if (!word.empty())
lines.back().push_back(word);
}
}
}
return lines;
}
Example of usage:
std::vector<Line> lines = parse("my name is: bon jovi: xxx ab");
for (int li = 0; li < lines.size(); ++li)
{
for (int wi = 0; wi < lines[li].size(); ++wi)
std::cout << lines[li][wi] << "_";
std::cout << std::endl;
}
outputs
my_name_is_
bon_jovi_
xxx_ab_
Hope this helps :)

As mentioned in comments, you are mixing C-style and C++-style code which results in quite a mess. Unless you have "good reason" to resort to char* instead of std::string then it's better practice to make use of the stl or boost.
The boostway:
std::string delims = " :";
boost::split(vector, mystr, boost::is_any_of(delims));
The stl way:
vector<string> result;
std::string delims = " :";
std::istringstream ss( mystr );
while (!ss.eof())
{
getline( ss, field, delims);
if ((empties == split::no_empties) && field.empty()) continue;
result.push_back( field );
}
For more approaches and a good comparison, see this cplusplus article

Something like istream::getline() but with alternative delim characters?

What's the cleanest way of getting the effect of istream::getline(string, 256, '\n' OR ';')?
I know it's quite straightforward to write a loop, but I feel that I might be missing something. Am I?
What I used:
while ((is.peek() != '\n') && (is.peek() != ';'))
stringstream.put(is.get());

Unfortunately there is no way to have multiple "line endings". What you can do is read the line with e.g. std::getline and put it in an std::istringstream and use std::getline (with the ';' separator) in a loop on the istringstream.
Although you could check the Boost iostreams library to see it it has functionality for it.

There's std::getline.
For more complex scenarios one might try splitting istream_iterator or istreambuf_iterator with boost split or regex_iterator (here is an example of using stream iterators).

Here is a working implementation:
enum class cascade { yes, no };
std::istream& getline(std::istream& stream, std::string& line, const std::string& delim, cascade c = cascade::yes){
line.clear();
std::string::value_type ch;
bool stream_altered = false;
while(stream.get(ch) && (stream_altered = true)){
if(delim.find(ch) == std::string::npos)
line += ch;
else if(c == cascade::yes && line.empty())
continue;
else break;
}
if(stream.eof() && stream_altered) stream.clear(std::ios_base::eofbit);
return stream;
}
The cascade::yes option collapses consecutive delimiters found. With cascade::no, it will return an empty string for each a second consecutive delimeter found.
Usage:
const std::string punctuation = ",.';:?";
std::string words;
while(getline(istream_object, words, punctuation))
std::cout << word << std::endl;
See its usage Live on Coliru
A more generic version will be this

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

std::string manipulation: whitespace, "newline escapes '\'" and comments # - c++

Definitely ditch the hungarian notation.

I'd consider replacing all your processing code (almost everything you've written) with boost::regex code.

Related

C++: Parsing a log with split but one entry can have several lines

Reading in only letters from a text file

Splitting sentences and placing in vector

c++ char* parsing

Something like istream::getline() but with alternative delim characters?

Categories

Resources