Something like istream::getline() but with alternative delim characters? - c++

What's the cleanest way of getting the effect of istream::getline(string, 256, '\n' OR ';')?
I know it's quite straightforward to write a loop, but I feel that I might be missing something. Am I?
What I used:
while ((is.peek() != '\n') && (is.peek() != ';'))
stringstream.put(is.get());

Unfortunately there is no way to have multiple "line endings". What you can do is read the line with e.g. std::getline and put it in an std::istringstream and use std::getline (with the ';' separator) in a loop on the istringstream.
Although you could check the Boost iostreams library to see it it has functionality for it.

There's std::getline.
For more complex scenarios one might try splitting istream_iterator or istreambuf_iterator with boost split or regex_iterator (here is an example of using stream iterators).

Here is a working implementation:
enum class cascade { yes, no };
std::istream& getline(std::istream& stream, std::string& line, const std::string& delim, cascade c = cascade::yes){
line.clear();
std::string::value_type ch;
bool stream_altered = false;
while(stream.get(ch) && (stream_altered = true)){
if(delim.find(ch) == std::string::npos)
line += ch;
else if(c == cascade::yes && line.empty())
continue;
else break;
}
if(stream.eof() && stream_altered) stream.clear(std::ios_base::eofbit);
return stream;
}
The cascade::yes option collapses consecutive delimiters found. With cascade::no, it will return an empty string for each a second consecutive delimeter found.
Usage:
const std::string punctuation = ",.';:?";
std::string words;
while(getline(istream_object, words, punctuation))
std::cout << word << std::endl;
See its usage Live on Coliru
A more generic version will be this

Related

Read lines from input stream with ability to skip chunks

I have an input stream with a series of bytecode-like instructions
function foo
push x
pop y
...
return
function bar
...
return
function other
...
I.e. a series of function declarations back-to-back. Each function is defined from one "function" until the next. There may be multiple "returns" within a function so I cannot use that as a delimiter. All instructions must be inside a function (i.e. the first line of the stream is always a "function" and the last line is always a "return").
I want to basically remove certain functions from the list. I have a list of the functions I want to keep and I thought about copying to an output stream, skipping over any function not on the list, something like
vector<string> wanted_functions = { "foo", "other" }
ostringstream oss;
bool skip = false;
for (string line; getline(input_stream, line);) {
istringstream iss(line);
string command;
iss >> command;
if (command == "function") {
skip = false;
string function_name;
iss >> function_name;
if (std::find(wanted_function.begin(), wanted_functions.end(), function_name)
== wanted_functions.end()) {
skip = true;
}
if (!skip) oss << line;
}
I haven't tested the above solution; it looks like it may work but I don't think it's very elegant.
I feel like stream iterators would be good here but I don't know how to use them. How can I achieve the skipping behavior using iterators, or maybe native stream methods like ignore() or seekg()?
Bonus: If there's a better way to read the first two words in the line that creating a new stream just for them I'd also like to know please.
Edit: Functions are always sequential. There are no nested functions. I.e. "function" is always immediately preceded by "return".
If it's text, you can't easily just jump/skip (seekg) without actually reading it since you don't have a known offset to go to (many binary file formats will contain such information), but you can just filter what you do read, the code in your question nearly does this.
istream_iterator<std:string> will give you each word / white-space delimited, but you can't tell where the new lines are. You can make a istream_iterator that will read lines instead, but the simplest way involves sub-classing std::string to redefine operator >>, but that is basically what getline gets you anyway, or you might make your own type containing more useful information (below).
You might use std::unordered_set<std::string> wanted_functions as that is easier to check if an item exists or not than searching a std::vector (with std::find or similar). skip also ends up working slightly weirdly as you are setting it on "unwanted" functions, then doing like if (!unwanted).
unordered_set<string> wanted_functions = { "foo", "other" };
bool is_wanted_function = false;
for (string line; getline(input_stream, line);) {
istringstream iss(line);
string command;
iss >> command;
if (command == "function") {
string function_name;
iss >> function_name;
is_wanted_function = wanted_functions.count(function_name) != 0;
}
if (is_wanted_function) {
oss << line << std::endl;
}
}
An alternative to the is_wanted_function flag would be to consume the function within the if (command == "function") {, this needs some more careful management of reading the next line, so as to not accidentally skip the one following the inner loop
unordered_set<string> wanted_functions = { "foo", "other" };
string line;
getline(input_stream, line);
while (input_stream) {
istringstream iss(line);
string command;
iss >> command;
if (command == "function") {
string function_name;
iss >> function_name;
if (wanted_functions.count(function_name)) {
oss << line << std::endl;
while (getline(input_stream, line) && line.rfind("function", 0) != 0) {
oss << line << std::endl;
}
continue; // already have a line
}
}
getline(input_stream, line); // next line
}
As is I don't think that that is much of an improvement, but if the actual parsing (iss >> command;, iss >> function_name, etc.) was refactored out elsewhere, then it would be somewhat simpler.
You might make the actual parsing (getting the command name like "function", and arguments like "foo") it's own class which can tidy up having the istringstream iss(line); iss >> command; etc. being directly in this code.
istream_iterator basically just uses operator >> to get the next item until the stream is in a failure state, so can be used with your own types, although you can get something very similar doing largely the same yourself without istream_iterator.
class command
{
public:
const std::string &cmd()const { return _cmd; }
const std::string &source_line()const { return _source_line; }
const std::string &arg(size_t i)const
{
if (i < _args.size()) return _args[i];
else throw std::out_of_range("Command does not have this many arguments.");
}
friend std::istream &operator >> (std::istream &is, command &cmd)
{
if (std::getline(is, cmd._source_line))
{
std::stringstream ss(cmd._source_line);
ss >> cmd._cmd;
cmd._args.clear(); // istream_iterator uses the same command object every time
while (true)
{
std::string val;
ss >> val;
if (!ss) break;
cmd._args.push_back(std::move(val));
}
}
return is;
}
private:
std::string _source_line;
std::string _cmd;
std::vector<std::string> _args;
};
int main()
{
using namespace std;
std::stringstream input_stream(
"function foo\n"
"push x\n"
"pop y\n"
"...\n"
"return\n"
"function bar\n"
"...\n"
"return\n"
"function other\n"
"...\n"
"return\n");
std::ostream &oss = std::cout;
std::unordered_set<string> wanted_functions = { "foo", "other" };
std::istream_iterator<command> eos; // end of stream
std::istream_iterator<command> it(input_stream); // iterator
while (it != eos)
{
if (it->cmd() == "function" && wanted_functions.count(it->arg(0)))
{
do
{
oss << it->source_line() << std::endl;
} while (++it != eos && it->cmd() != "function");
}
else ++it; // on true the while loop already advanced
}
}
istream_iterator of course does also bring compatibility with the other iterator based algorithms and constructors (std::find, etc.), and you can build some more complex things out of that. For example if you add another layer on top of this to create a istream_iterator<function>, then maybe you could use the Boost C++ filter_iterator, and then you will have an iterator with just the functions you want.
Note that if you need to start dealing with any nested constructs (like if (...) { ... } else if (...) { ... }), you might find parsing into a tree structure more convenient to do operations on than a flat sequence. See Abstract Syntax Tree. This somewhat depends on your syntax, e.g. if you use just goto if offset/label instead of while(expr), if(expr), else if, else, etc. type constructs.

Taking into account \r\n

I am trying to solve a problem on spoj. Apparently the input lines end with \r\n as per the comments. What I know about \r\n from previous questions is that its a windows thing. What I want to know is how to take it into account. Currently i am using getline(cin,str) in c++. What do I do to take into account the \r\n.
When you use std::getline(std::cin, str) the '\n' is already taken care of: std::getline() will read characters until it finds a '\n' and inserts these into str. It doesn't insert the '\n', however.
Thus, you may be stuck with a a '\r' at the end of the string. If you are on Windows you can just open your file in text mode and the stream will extract them, too. If that's not the way to go, you can just determine if your str ends with a '\r' and remove it:
if (!str.empty() && str[str.size() - 1] == '\r') {
str.erase(str.end() - 1);
}
If you want to remove all carriage returns (there may, in theory, some embedded in the string), you can use
str.erase(std::remove(str.begin(), str.end(), '\r'), str.end());
Finally, if you don't want to ever encounter the carriage returns, you can create a filtering stream buffer which just removes all '\r' (or just those from a "\r\n" sequence). Below is a quick example how a simple filtering stream buffer can be implemented:
#include <algorithm>
#include <iostream>
#include <streambuf>
#include <string>
class crfilter
: std::streambuf
{
std::istream* stream;
std::streambuf* sbuf;
char buffer[8];
int underflow() {
std::streamsize n;
while (this->gptr() == this->egptr()
&& (n = this->sbuf->sgetn(buffer, 8))) {
char* end = std::remove(buffer, buffer + n, '\r');
this->setg(buffer, buffer, end);
}
return this->gptr() == this->egptr()
? std::char_traits<char>::eof()
: std::char_traits<char>::to_int_type(*this->gptr());
}
public:
crfilter(std::istream& in): stream(&in), sbuf(in.rdbuf(this)) {}
~crfilter() { stream->rdbuf(this->sbuf); }
};
int main()
{
crfilter filter(std::cin);
std::string str;
while (std::getline(std::cin, str)) {
std::cout << "str='" << str << "'\n";
}
}
They are carriage return/line feeds telling you the end of the line and beginning of the next.

Tokenize stringstream based on type

I have an input stream containing integers and special meaning characters '#'. It looks as follows:
... 12 18 16 # 22 24 26 15 # 17 # 32 35 33 ...
The tokens are separated by space. There's no pattern for the position of '#'.
I was trying to tokenize the input stream like this:
int value;
std::ifstream input("data");
if (input.good()) {
string line;
while(getline(data, line) != EOF) {
if (!line.empty()) {
sstream ss(line);
while (ss >> value) {
//process value ...
}
}
}
}
The problem with this code is that the processing stops when the first '#' is encountered.
The only solution I can think of is to extract each individual token into a string (not '#') and use atoi() function to convert the string to an integer. However, it's very inefficient as the majority tokens are integer. Calling atoi() on the tokens introduces big overhead.
Is there a way I can parse the individual token by its type? ie, for integers, parse it as integers while for '#', skip it. Thanks!
One possibility would be to explicitly skip whitespace (ss >> std::ws), and then to use ss.peek() to find out if a # follows. If yes, use ss.get() to read it and continue, otherwise use ss >> value to read the value.
If the positions of # don't matter, you could also remove all '#' from the line before initializing the stringstream with it.
Usually not worth testing against good()
if (input.good()) {
Unless your next operation is generating an error message or exception. If it is not good all further operations will fail anyway.
Don't test against EOF.
while(getline(data, line) != EOF) {
The result of std::getline() is not an integer. It is a reference to the input stream. The input stream is convertible to a bool like object that can be used in bool a context (like while if etc..). So what you want to do:
while(getline(data, line)) {
I am not sure I would read a line. You could just read a word (since the input is space separated). Using the >> operator on string
std::string word;
while(data >> word) { // reads one space separated word
Now you can test the word to see if it is your special character:
if (word[0] == "#")
If not convert the word into a number.
This is what I would do:
// define a class that will read either value from a stream
class MyValue
{
public:
bool isSpec() const {return isSpecial;}
int value() const {return intValue;}
friend std::istream& operator>>(std::istream& stream, MyValue& data)
{
std::string item;
stream >> item;
if (item[0] == '#') {
data.isSpecial = true;
} else
{ data.isSpecial = false;
data.intValue = atoi(&item[0]);
}
return stream;
}
private:
bool isSpecial;
int intValue;
};
// Now your loop becomes:
MyValue val;
while(file >> val)
{
if (val.isSpec()) { /* Special processing */ }
else { /* We have an integer */ }
}
Maybe you can read all values as std::string and then check if it's "#" or not (and if not - convert to int)
int value;
std::ifstream input("data");
if (input.good()) {
string line;
std::sstream ss(std::stringstream::in | std::stringstream::out);
std::sstream ss2(std::stringstream::in | std::stringstream::out);
while(getline(data, line, '#') {
ss << line;
while(getline(ss, line, ' ') {
ss2 << line;
ss2 >> value
//process values ...
ss2.str("");
}
ss.str("");
}
}
In here we first split the line by the token '#' in the first while loop then in the second while loop we split the line by ' '.
Personally, if your separator is always going to be space regardless of what follows, I'd recommend you just take the input as string and parse from there. That way, you can take the string, see if it's a number or a # and whatnot.
I think you should re-examine your premise that "Calling atoi() on the tokens introduces big overhead-"
There is no magic to std::cin >> val. Under the hood, it ends up calling (something very similar to) atoi.
If your tokens are huge, there might be some overhead to creating a std::string but as you say, the vast majority are numbers (and the rest are #'s) so they should mostly be short.

How do I show results that start with a letter e.g A only

I have a list.txt file.
It contains about 100 records, but if user cin a letter, e.g A, I just want show all records containing A in the loop.
Records are recorded in line break format, in shell command we use A*, but in C++, how do we do it?
Example:
Alfred
Alpha
Augustine
Bravo
Charlie
Delta
Here's a bunch of ways to do it, chose the one you like more ;)
Crappy solution with strings and streams:
std::vector< std::string > vec;//this will hod the file data
std::ifstream ifs("test.txt");//the input file stream
std::string tmp;//a temporary string
while( ifs >> tmp )//reading the whole data from the file
vec.push_back(tmp);
for( int i = 0; i < vec.size(); i++ )
if(vec[i][0] == 'a')//vec[i][0] stands for "the first symbol of element number i in vector"
std::cout << vec[i] << std::endl;//outputting the string if it starts with 'a'
If you have c++11, you can replace the for with this range-based for:
for( std::string & s : vec )
if(s.at(0) == 'a')
std::cout << s << std::endl;
Or, you can complicate things further and replace the for with std::copy_if and lambdas from c++11 (IMO it's much too complicated and hard to read for such a simple occasion, but still I'll include it):
//this will copy all strings starting with 'a' into res vector.
std::vector< std::string > res;
std::copy_if(vec.begin(), vec.end(), back_inserter(res), [](const std::string & s){ return s[0]=='a'; } );
If you don't need to store the strings anywhere, it's easier:
std::vector< std::string > vec;
std::ifstream ifs("test.txt");
std::string tmp;
while( ifs >> tmp )
if( tmp.at(0) == 'a' )
std::cout << tmp;
A more old-school solution without using streams of strings:
FILE * f = fopen("test.txt", "r");//opening the file
if( !f )//checking in case it didn't open
return -1;
char buffer[255];//buffer for the strings being read from file
while( !feof(f) )
{
fgets(buffer, 255, f);//getting a string
if(buffer[0] == 'a')//printing if it starts with 'a'
printf("%s", buffer);
}
fclose(f);//don't forget to close the file
Here's a decently elegant, possibly more idiomatic solution:
#include <algorithm> //for copy_if
#include <cctype> //for tolower
#include <fstream> //for ifstream
#include <iostream> //for cout, cin
#include <iterator> //for istream_iterator, ostream_iterator
#include <string> //for string
int main() {
char letter;
std::cout << "Enter the letter to look for: ";
std::cin >> letter; //I didn't validate it
std::ifstream fin ("names.txt");
std::istream_iterator<std::string> ifbegin (fin); //begin file iter
std::istream_iterator<std::string> ifend; //end file iter
std::ostream_iterator<std::string> obegin (std::cout, " "); //begin out iter
std::copy_if (ifbegin, ifend, obegin, //copy from file to output if
[letter] (const std::string &str) { //capture letter
return std::tolower (str [0]) == std::tolower (letter);
} //copy if starts with upper/lower case of entered letter
);
}
Note that it does require C++11 for copy_if and the lambda. This outputs every name in the file starting with the upper/lower case of the letter entered, separated by spaces. It performs the same when the data is sorted as it does when the data is unsorted.
As Luc points out below, though, this will read separate names for lines with spaces. If you want to get around that, you need a custom replacement for which operator>> reads a line.
Step 1: Create the replacement:
struct Line {
std::string text; //note I made this public to save time
operator std::string() const {return text;} //less work later
};
Step 2: Modify operator>> to read a line for the struct:
std::istream &operator>> (std::istream &in, Line &line) {
std::getline (in, line.text); //get whole line
return in;
}
Step 3: Change the iterators to use our custom struct. Note that the last stays a string because it's implicitly convertible to one. Let's also separate the printing by newlines so we can tell that it was a line, not a word:
std::istream_iterator<Line> ifbegin (fin); //begin file iter
std::istream_iterator<Line> ifend; //end file iter
std::ostream_iterator<std::string> obegin (std::cout, "\n"); //begin out iter
Step 4: Change the lambda to suit our needs:
[letter] (const std::string &line) {
return !line.empty() //we introduced the possibility of ""
&& (std::tolower (line [0]) == std::tolower (letter));
}
4 easy steps later, we're done!
I would recommend using either remove_copy_if if you actually need the new list or for_each using a lambda expression that checks the predicate if you don't. Can you elaborate a bit what you mean by "show"?

std::string manipulation: whitespace, "newline escapes '\'" and comments #

Kind of looking for affirmation here. I have some hand-written code, which I'm not shy to say I'm proud of, which reads a file, removes leading whitespace, processes newline escapes '\' and removes comments starting with #. It also removes all empty lines (also whitespace-only ones). Any thoughts/recommendations? I could probably replace some std::cout's with std::runtime_errors... but that's not a priority here :)
const int RecipeReader::readRecipe()
{
ifstream is_recipe(s_buffer.c_str());
if (!is_recipe)
cout << "unable to open file" << endl;
while (getline(is_recipe, s_buffer))
{
// whitespace+comment
removeLeadingWhitespace(s_buffer);
processComment(s_buffer);
// newline escapes + append all subsequent lines with '\'
processNewlineEscapes(s_buffer, is_recipe);
// store the real text line
if (!s_buffer.empty())
v_s_recipe.push_back(s_buffer);
s_buffer.clear();
}
is_recipe.close();
return 0;
}
void RecipeReader::processNewlineEscapes(string &s_string, ifstream &is_stream)
{
string s_temp;
size_t sz_index = s_string.find_first_of("\\");
while (sz_index <= s_string.length())
{
if (getline(is_stream,s_temp))
{
removeLeadingWhitespace(s_temp);
processComment(s_temp);
s_string = s_string.substr(0,sz_index-1) + " " + s_temp;
}
else
cout << "Error: newline escape '\' found at EOF" << endl;
sz_index = s_string.find_first_of("\\");
}
}
void RecipeReader::processComment(string &s_string)
{
size_t sz_index = s_string.find_first_of("#");
s_string = s_string.substr(0,sz_index);
}
void RecipeReader::removeLeadingWhitespace(string &s_string)
{
const size_t sz_length = s_string.size();
size_t sz_index = s_string.find_first_not_of(" \t");
if (sz_index <= sz_length)
s_string = s_string.substr(sz_index);
else if ((sz_index > sz_length) && (sz_length != 0)) // "empty" lines with only whitespace
s_string.clear();
}
Some extra info: the first s_buffer passed to the ifstream contains the filename, std::string s_buffer is a class data member, so is std::vector v_s_recipe. Any comment is welcome :)
UPDATE: for the sake of not being ungrateful, here is my replacement, all-in-one function that does what I want for now (future holds: parenthesis, maybe quotes...):
void readRecipe(const std::string &filename)
{
string buffer;
string line;
size_t index;
ifstream file(filename.c_str());
if (!file)
throw runtime_error("Unable to open file.");
while (getline(file, line))
{
// whitespace removal
line.erase(0, line.find_first_not_of(" \t\r\n\v\f"));
// comment removal TODO: store these for later output
index = line.find_first_of("#");
if (index != string::npos)
line.erase(index, string::npos);
// ignore empty buffer
if (line.empty())
continue;
// process newline escapes
index = line.find_first_of("\\");
if (index != string::npos)
{
line.erase(index,string::npos); // ignore everything after '\'
buffer += line;
continue; // read next line
}
else // no newline escapes found
{
buffer += line;
recipe.push_back(buffer);
buffer.clear();
}
}
}
Definitely ditch the hungarian notation.
It's not bad, but I think you're thinking of std::basic_string<T> too much as a string and not enough as an STL container. For example:
void RecipeReader::removeLeadingWhitespace(string &s_string)
{
s_string.erase(s_string.begin(),
std::find_if(s_string.begin(), s_string.end(), std::not1(isspace)));
}
A few comments:
As another answer (+1 from me) said - ditch the hungarian notation. It really doesn't do anything but add unimportant trash to every line. In addition, ifstream yielding an is_ prefix is ugly. is_ usually indicates a boolean.
Naming a function with processXXX gives very very little information on what it is actually doing. Use removeXXX, like you did with the RemoveLeadingWhitespace function.
The processComment function does an unnecessary copy and assignment. Use s.erase(index, string::npos); (npos is default, but this is more obvious).
It's not clear what your program does, but you might consider storing a different file format (like html or xml) if you need to post-process your files like this. That would depend on the goal.
using find_first_of('#') may give you some false positives. If it's present in quotes, it's not necessarily indicating a comment. (But again, this depends on your file format)
using find_first_of(c) with one character can be simplified to find(c).
The processNewlineEscapes function duplicates some functionality from the readRecipe function. You may consider refactoring to something like this:
-
string s_buffer;
string s_line;
while (getline(is_recipe, s_line)) {
// Sanitize the raw line.
removeLeadingWhitespace(s_line);
removeComments(s_line);
// Skip empty lines.
if (s_line.empty()) continue;
// Add the raw line to the buffer.
s_buffer += s_line;
// Collect buffer across all escaped lines.
if (*s_line.rbegin() == '\\') continue;
// This line is not escaped, now I can process the buffer.
v_s_recipe.push_back(s_buffer);
s_buffer.clear();
}
I'm not big on methods that modify the parameters. Why not return strings rather than modifying the input arguments? For example:
string RecipeReader::processComment(const string &s)
{
size_t index = s.find_first_of("#");
return s_string.substr(0, index);
}
I personally feel this clarifies intent and makes it more obvious what the method is doing.
I'd consider replacing all your processing code (almost everything you've written) with boost::regex code.
A few comments:
If s_buffer contains the file name to be opened, it should have a better name like s_filename.
The s_buffer member variable should not be reused to store temporary data from reading the file. A local variable in the function would do as well, no need for the buffer to be a member variable.
If there is not need to have the filename stored as a member variable it could just be passed as a parameter to readRecipe()
processNewlineEscapes() should check that the found backslash is at the end of the line before appending the next line. At the moment any backslash at any position triggers adding of the next line at the position of the backslash. Also, if there are several backslashes, find_last_of() would probably easier to use than find_first_of().
When checking the result of find_first_of() in processNewlineEscapes() and removeLeadingWhitespace() it would be cleaner to compare against string::npos to check if anything was found.
The logic at the end of removeLeadingWhitespace() could be simplified:
size_t sz_index = s_string.find_first_not_of(" \t");
if (sz_index != s_string.npos)
s_string = s_string.substr(sz_index);
else // "empty" lines with only whitespace
s_string.clear();
You might wish to have a look at Boost.String. It's a simple collection of algorithms to work with streams, and notably features trim methods :)
Now, on to the review itself:
Don't bother to remove the hungarian notation, if it's your style then use it, however you should try and improve the names of methods and variables. processXXX is definitely not indicating anything useful...
Functionally, I am worried about your assumptions: the main issue here is that you do not care for espace sequences (\n uses a backslash for example) and you do not worry for the presence of strings of charachters: std::cout << "Process #" << pid << std::endl; would yield an invalid line because of your "comment" preprocessing
Furthermore, since you remove the comments before processing the newline escapes:
i = 3; # comment \
running comment
will be parsed as
i = 3; running comment
which is syntactically incorrect.
From an interface point of view: there is not benefit in having the methods being class members here, you don't need an instance of RecipeReader really...
And finally, I find it awkward that two methods would read from the stream.
Little peeve of mine: returning by const value does not serve any purpose.
Here is my own version, as I believe than showing is easier than discussing:
// header file
std::vector<std::string> readRecipe(const std::string& fileName);
std::string extractLine(std::ifstream& file);
std::pair<std:string,bool> removeNewlineEscape(const std::string& line);
std::string removeComment(const std::string& line);
// source file
#include <boost/algorithm/string.hpp>
std::vector<std::string> readRecipe(const std::string& fileName)
{
std::vector<std::string> result;
ifstream file(fileName.c_str());
if (!file) std::cout << "Could not open: " << fileName << std::endl;
std::string line = extractLine(file);
while(!line.empty())
{
result.push_back(line);
line = extractLine(file);
} // looping on the lines
return result;
} // readRecipe
std::string extractLine(std::ifstream& file)
{
std::string line, buffer;
while(getline(file, buffer))
{
std::pair<std::string,bool> r = removeNewlineEscape(buffer);
line += boost::trim_left_copy(r.first); // remove leading whitespace
// based on the current locale
if (!r.second) break;
line += " "; // as we append, we insert a whitespace
// in order unintended token concatenation
}
return removeComment(line);
} // extractLine
//< Returns the line, minus the '\' character
//< if it was the last significant one
//< Returns a boolean indicating whether or not the line continue
//< (true if it's necessary to concatenate with the next line)
std::pair<std:string,bool> removeNewlineEscape(const std::string& line)
{
std::pair<std::string,bool> result;
result.second = false;
size_t pos = line.find_last_not_of(" \t");
if (std::string::npos != pos && line[pos] == '\')
{
result.second = true;
--pos; // we don't want to have this '\' character in the string
}
result.first = line.substr(0, pos);
return result;
} // checkNewlineEscape
//< The main difficulty here is NOT to confuse a # inside a string
//< with a # signalling a comment
//< assuming strings are contained within "", let's roll
std::string removeComment(const std::string& line)
{
size_t pos = line.find_first_of("\"#");
while(std::string::npos != pos)
{
if (line[pos] == '"')
{
// We have detected the beginning of a string, we move pos to its end
// beware of the tricky presence of a '\' right before '"'...
pos = line.find_first_of("\"", pos+1);
while (std::string::npos != pos && line[pos-1] == '\')
pos = line.find_first_of("\"", pos+1);
}
else // line[pos] == '#'
{
// We have found the comment marker in a significant position
break;
}
pos = line.find_first_of("\"#", pos+1);
} // looking for comment marker
return line.substr(0, pos);
} // removeComment
It is fairly inefficient (but I trust the compiler for optmizations), but I believe it behaves correctly though it's untested so take it with a grain of salt. I have focused mainly on solving the functional issues, the naming convention I follow is different from yours but I don't think it should matter.
I want to point out a small and sweet version which lacks \ support but skips whitespace-lines and comments. (Note the std::ws in the call to std::getline.
#include <algorithm>
#include <iostream>
#include <sstream>
#include <string>
int main()
{
std::stringstream input(
" # blub\n"
"# foo bar\n"
" foo# foo bar\n"
"bar\n"
);
std::string line;
while (std::getline(input >> std::ws, line)) {
line.erase(std::find(line.begin(), line.end(), '#'), line.end());
if (line.empty()) {
continue;
}
std::cout << "line: \"" << line << "\"\n";
}
}
Output:
line: "foo"
line: "bar"