how to delete comments while parsing text C++ - c++

I'm trying to parse text in C++ using "ifstream" from a .ppm file but I want to avoid comments in the file which start with character "#" and finish at the end of the line..I can track the comment character with the code below...Anyone can help on how to dismiss the rest of the words until character '\n'?
string word;
file>>word;
if(strcmp(word, "#")){
//TO DO...Dismiss all characters till the end of the line
}

Use std::getline() & continue the while loop if line[0] == '#':
std::ifstream file( "foo.txt" );
std::string line;
while( std::getline( file, line ) )
{
if( line.empty() )
continue;
if( '#' == line[0] )
continue;
std::istringstream liness( line );
// pull words out of liness...
}
Or if the # can occur mid-line you can just ignore everything after it:
std::ifstream file( "foo.txt" );
std::string line;
while( std::getline( file, line ) )
{
std::istringstream liness( line.substr( 0, line.find_first_of( '#' ) ) );
// pull words out of liness...
}

Depending on the complexity of the comments you want to strip, you might consider using regular expressions:
Removing hash comments that are not inside quotes
For example, which of these would be considered comments:
# Start of line comment
Stuff here # mid-line comment
Contact "Tel# 911"
Would you want to strip all three of the above examples after the #?
Or are you only considering it a comment if the very first character of the line is a #?

Related

How to skip a line of a file if it starts with # in c++

So say I have a txt file that goes like:
#unwanted line
something=another thing
something2=another thing 2
#unwanted_line_2=unwanted
something3=another thing 3
and I am reading it with
getline(inFile,astring,'=');
to separate a something from its value (inside a while loop). How do I skip the entire lines that start with # ?
Also I'm storing this in a vector, if it is of any matter.
Use getline() without a delimiter to read an entire line up to \n. Then check if the line begins with #, and if so then discard it and move on. Otherwise, put the string into an istringstream and use getline() with '=' as the delimiter to split the line (or, just use astring.find() and astring.substr() instead).
For example:
while (getline(inFile, astring))
{
if (!asstring.empty() && astring[0] != '#')
{
istringstream iss(astring);
getline(iss, aname, '=');
getline(iss, avalue);
...
}
}

Unix c++: getline and line.empty not working

Happy New Year, everyone!
I have a text file that looks like this:
A|AAAAA|1|2
R|RAAAA
R|RAAAA
A|BBBBB|1|2
R|RBBBB
R|RBBBB
A|CCCCC|1|2
R|RCCCC
The following code searches for the relevant text in the file based on the key and returns all the lines that belong to the key:
while( std::getline( ifs, line ) && line.find(search_string) != 0 );
if( line.find(search_string) != 0 )
{
navData = "N/A" ;
}
else{
navData = line + '\n' ; // result initially contains the first line
// now keep reading line by line till we get an empty line or eof
while( std::getline( ifs, line ) && !line.empty() )
{
navData += line + '\n';
}
}
ifs.close();
return navData;
In Windows I get what I need:
A|BBBBB|1|2
R|RBBBB
R|RBBBB
In Mac, however, code "&& !line.empty()" seems to get ignored, since I get the following:
A|BBBBB|1|2
R|RBBBB
R|RBBBB
A|CCCCC|1|2
R|RCCCC
Does anyone know why?
Cheers, everyone!
Windows and Mac have different opinions about how an empty line looks like. On Windows, lines are teminated by "\r\n". On Mac, lines are terminated by "\n" and the preceding "\r" leads to the line not being empty.

How to skip header row in csv using C++

In my scenario,I need to create a parameters file using CSV .Every row means a config-data,the first field of the row treated as the header,using as a identifier. The format of CSV like below will be easy for me to parse:
1,field1,field2,field3,field4 // 1 indicated the TARGET that the other fields will be writted to.
1,field1,field2,field3,field4
2,field1,field2,field3,field4
2,field1,field2,field3,field4........
But it's not friendly to users.So,I define a csv file like below:
HeaderLine_Begin,1
field1,field2,field3,field4
field1,field2,field3,field4
HeaderLine_Begin,2
field1,field2,field3,field4
field1,field2,field3,field4
means,every row is the data will be writted to the target by the HeaderLine_Begin.I just separate the ID from the real data.
Then,I create a struct like this:
enum myenum
{
ON,OFF,NOCHANGE
};
struct Setting
{
int TargetID;
string field1;
string field2;
myenum field3;
myenum field4;
};
I knew how to write some code for reading csv line by line like below
filename +=".csv";
std::ifstream file(filename.c_str());
std::string line;
while ( file.good() )
{
getline ( file, line, '\n' ); // read a line until last
if(line.compare(0,1,"#") == 0) // ignore the comment line
continue;
ParseLine();// DONE.Parse the line if it's header row OR data row
}
file.close(); // close file
What I want to do is to create a list like vetor settings to keep the data.The flow should be,like,find the first headerID1,then find the next line.If the next line is dataline,treat it as the dataline belong to headerID1.If the next line is another headerID,loop again.
The problem is,there are no such std::getnextline(int lineIndex) for me to fetch the rows after I found the headerRow.
Your input loop should be more like:
int id = -1;
while (getline(file, line))
{
if (line.empty() || line[0] == '#')
continue;
if (starts_with_and_remove(line, "HeaderLine_Begin,"))
id = boost::lexical_cast<int>(line); // or id = atoi(line.c_str())
else
{
assert(id != -1);
...parse CSV, knowing "id" is in effect...
}
}
With:
bool stats_with_and_remove(std::string& lhs, const std::string& rhs)
{
if (lhs.compare(0, rhs.size(), lhs) == 0) // rhs.size() > lhs.size() IS safe
{
lhs.erase(0, rhs.size());
return true;
}
return false;
}
The simplest solution would be to use regular expressions:
std::string line;
int currentId = 0;
while ( std::getline( source, line ) ) {
trimCommentsAndWhiteSpace( line );
static std::regex const header( "HeaderLine_Begin,(\\d+)" );
std::smatch match;
if ( line.empty() ) {
// ignore
} else if ( std::regex_match( line, match, header ) ) {
std::istringstream s( match[ 1 ] );
s >> currentId;
} else {
// ...
}
}
I regularly use this strategy to parse .ini files, which pose
the same problem: section headers have a different syntax to
other things.
trimCommentsAndWhiteSpace can be as simple as:
void
trimCommentsAndWhiteSpace( std::string& line )
{
if ( !line.empty() && line[0] == '#' ) {
line = "";
}
}
It's fairly easy to expand it to handle end of line comments as
well, however, and it's usually a good policy (in contexts like
this) to trim leading and trailing whitespace---trailing
especially, since a human reader won't see it when looking at
the file.
Alternatively, of course, you could use a regular expression for
the lines you want to treet as comments ("\s*#.*"); this works
well with your current definition, but doesn't really extend
well for end of line comments, especially if you want to allow
# in quoted strings in your fields.
And one final comment: your loop is incorrect. You don't test
that getline succeeded before using its results, and
file.good() may return true even if there is nothing more to
read. (file.good() is one of those things that are there for
historical reasons; there's no case where it makes sense to use
it.)

break long string into multiple c++

I have a string that is received from third party. This string is actually the text from a text file and it may contain UNIX LF or Windows CRLF for line termination. How can I break this into multiple strings ignoring blank lines? I was planning to do the following, but am not sure if there is a better way. All I need to do is read line by line. Vector here is just a convenience and I can avoid it.
* Unfortunately I donot have access to the actual file. I only receive the string object *
string textLine;
vector<string> tokens;
size_t pos = 0;
while( true ) {
size_t nextPos = textLine.find( pos, '\n\r' );
if( nextPos == textLine.npos )
break;
tokens.push_back( string( textLine.substr( pos, nextPos - pos ) ) );
pos = nextPos + 1;
}
You could use std::getline as you're reading from the file instead of reading the whole thing into a string. That will break things up line by line by default. You can simply not push_back any string that comes up empty.
string line;
vector<string> tokens;
while (getline(file, line))
{
if (!line.empty()) tokens.push_back(line);
}
UPDATE:
If you don't have access to the file, you can use the same code by initializing a stringstream with the whole text. std::getline works on all stream types, not just files.
I'd use getline to create new strings based on \n, and then manipulate the line endings.
string textLine;
vector<string> tokens;
istringstream sTextLine;
string line;
while(getline(sTextLine, line)) {
if(line.empty()) continue;
if(line[line.size()-1] == '\r') line.resize(line.size()-1);
if(line.empty()) continue;
tokens.push_back(line);
}
EDIT: Use istringstream instead of stringstream.
I would use the approach given here (std::getline on a std::istringstream)...
Splitting a C++ std::string using tokens, e.g. ";"
... except omit the ';' parameter to std::getline.
A lot depends on what is already present in your toolkit. I work a lot
with files which come from Windows and are read under Unix, and vice
versa, so I have most of the tools for converting CRLF into LF at hand.
If you don't have any, you might want a function along the lines of:
void addLine( std::vector<std::string>& dest, std::string line )
{
if ( !line.empty() && *(line.end() - 1) == '\r' ) {
line.erase( line.end() - 1 );
}
if ( !line.empty() ) {
dest.push_back( line );
}
}
to do your insertions. As for breaking the original text into lines,
you can use std::istringstream and std::getline, as others have
suggested; it's simple and straightforward, even if it is overkill.
(The std::istringstream is a fairly heavy mechanism, since it supports
all sorts of input conversions you don't need.) Alternatively, you
might consider a loop along the lines of:
std::string::const_iterator start = textLine.begin();
std::string::const_iterator end = textLine.end();
std::string::const_iterator next = std::find( start, end, '\n' );
while ( next != end ) {
addLine( tokens, std::string( start, next ) );
start = next + 1;
next = std::find( start, end, '\n' );
}
addLine( tokens, std::string( start, end ) );
Or you could break things down into separate operations:
textLine.erase(
std::remove( textLine.begin(), textLine.end(), '\r'),
textLine.end() );
to get rid of all of the CR's,
std::vector<std:;string> tokens( split( textLine, '\n' ) );
, to break it up into lines, where split is a generalized function
along the lines of the above loop (a useful tool to add to your
toolkit), and finally:
tokens.erase(
std::remove_if( tokens.begin(), tokens.end(),
boost::bind( &std::string::empty, _1 ) ),
tokens.end() );
. (Generally speaking: if this is a one-of situation, use the
std::istringstream based solution. If you think you may have to do
something like this from time to time in the future, add the split
function to your took kit, and use it.)
You could use strtok.
Split string into tokens
A sequence of calls to this function
split str into tokens, which are
sequences of contiguous characters
separated by any of the characters
that are part of delimiters.
I would put the string in a stringstream and then use the getline method like the previous answer mentioned. Then, you could just act like you were reading the text in from a file when it really comes from another string.

parsing an sstream

I am parsing a file which contains both strings and numerical values. I'd like to process the file field by field, each delimited by a space or an end-of-line character.
The ifstream::getline() operation only allows a single delimiting character. What I currently do is thus a getline with the character ' ' as a delimiter, and then manually go back to the previous position in the stream if a '\n' has been encountered :
ifstream ifs ( filename , ifstream::in );
streampos pos;
while (ifs.good())
{
char curField[255];
pos = ifs.tellg();
ifs.getline(curField, 255, ' ');
string s(curField);
if (s.find("\n")!=string::npos)
{
ifs.seekg(pos);
ifs.getline(curField, 255, '\n');
s = string(curField);
}
// process the field contained in the string s...
}
However, the "seekg" seems to position the stream one character too late (I thus miss the first character of each field before each line break).
I know there are other ways to code such a parser, by scanning line by line etc.., but I'd really like to understand why this particular piece of code fails...
Thank you very much!
As Loadmaster said, there may be unaccounted for characters, or this could just be an off-by-one error.
But this just has to be said... you can replace this:
ifstream ifs ( filename , ifstream::in );
streampos pos;
while (ifs.good())
{
char curField[255];
pos = ifs.tellg();
ifs.getline(curField, 255, ' ');
string s(curField);
if (s.find("\n")!=string::npos)
{
ifs.seekg(pos);
ifs.getline(curField, 255, '\n');
s = string(curField);
}
// process the field contained in the string s...
}
With this:
ifstream ifs ( filename , ifstream::in );
streampos pos;
string s;
while (ifs.good())
{
ifs >> s;
// process the field contained in the string s...
}
To get the behavior you want.
There may be a look-ahead/push-back character in the input stream. IIRC, the seek/tell functions are not aware of this.