c++ Reading numbers from text files, ignoring comments

c++ Reading numbers from text files, ignoring comments - c++

So I've seen lots of solutions on this site and tutorials about reading in from a text file in C++, but have yet to figure out a solution to my problem. I'm new at C++ so I think I'm having trouble piecing together some of the documentation to make sense of it all.
What I am trying to do is read a text file numbers while ignoring comments in the file that are denoted by "#". So an example file would look like:
#here is my comment
20 30 40 50
#this is my last comment
60 70 80 90
My code can read numbers fine when there aren't any comments, but I don't understand parsing the stream well enough to ignore the comments. Its kind of a hack solution right now.
/////////////////////// Read the file ///////////////////////
std::string line;
if (input_file.is_open())
{
//While we can still read the file
while (std::getline(input_file, line))
{
std::istringstream iss(line);
float num; // The number in the line
//while the iss is a number
while ((iss >> num))
{
//look at the number
}
}
}
else
{
std::cout << "Unable to open file";
}
/////////////////////// done reading file /////////////////
Is there a way I can incorporate comment handling with this solution or do I need a different approach? Any advice would be great, thanks.

If your file contains # always in the first column, then just test, if the line starts with # like this:
while (std::getline(input_file, line))
{
if (line[0] != "#" )
{
std::istringstream iss(line);
float num; // The number in the line
//while the iss is a number
while ((iss >> num))
{
//look at the number
}
}
}
It is wise though to trim the line of leading and trailing whitespaces, like shown here for example: Remove spaces from std::string in C++

If this is just a one of use, for line oriented input like yours, the
simplest solution is just to strip the comment from the line you just
read:
line.erase( std::find( line.begin(), line.end(), '#' ), line.end() );
A more generic solution would be to use a filtering streambuf, something
like:
class FilterCommentsStreambuf : public std::streambuf
{
std::istream& myOwner;
std::streambuf* mySource;
char myCommentChar;
char myBuffer;
protected:
int underflow()
{
int const eof = std::traits_type::eof();
int results = mySource->sbumpc();
if ( results == myCommentChar ) {
while ( results != eof && results != '\n') {
results = mySource->sbumpc(0;
}
}
if ( results != eof ) {
myBuffer = results;
setg( &myBuffer, &myBuffer, &myBuffer + 1 );
}
return results;
}
public:
FilterCommentsStreambuf( std::istream& source,
char comment = '#' )
: myOwner( source )
, mySource( source.rdbuf() )
, myCommentChar( comment )
{
myOwner.rdbuf( this );
}
~FilterCommentsStreambuf()
{
myOwner.rdbuf( mySource );
}
};
In this case, you could even forgo getline:
FilterCommentsStreambuf filter( input_file );
double num;
while ( input_file >> num || !input_file.eof() ) {
if ( ! input_file ) {
// Formatting error, output error message, clear the
// error, and resynchronize the input---probably by
// ignore'ing until end of line.
} else {
// Do something with the number...
}
}
(In such cases, I've found it useful to also track the line number in
the FilterCommentsStreambuf. That way you have it for error
messages.)

An alternative to the "read aline and parse it as a string", can be use the stream itself as the incoming buffer:
while(input_file)
{
int n = 0;
char c;
input_file >> c; // will skip spaces ad read the first non-blank
if(c == '#')
{
while(c!='\n' && input_file) input_file.get(c);
continue; //may be not soooo beautiful, but does not introduce useless dynamic memory
}
//c is part of something else but comment, so give it back to parse it as number
input_file.unget(); //< this is what all the fuss is about!
if(input_file >> n)
{
// look at the nunber
continue;
}
// something else, but not an integer is there ....
// if you cannot recover the lopop will exit
}

Related

reading csv file for specific information

I am wondering how to read a specific value from a csv file in C++, and then read the next four items in the file. For example, this is what the file would look like:
fire,2.11,2,445,7891.22,water,234,332.11,355,5654.44,air,4535,122,334.222,16,earth,453,46,77.3,454
What I want to do is let my user select one of the values, let's say "air" and also read the next four items(4535 122 334.222 16).
I only want to use fstream,iostream,iomanip libraries. I am a newbie, and I am horrible at writing code, so please, be gentle.

You should read about parsers. Full CSV specifications.
If your fields are free of commas and double quotes, and you need a quick solution, search for getline/strtok, or try this (not compiled/tested):
typedef std::vector< std::string > svector;
bool get_line( std::istream& is, svector& d, const char sep = ',' )
{
d.clear();
if ( ! is )
return false;
char c;
std::string s;
while ( is.get(c) && c != '\n' )
{
if ( c == sep )
{
d.push_back( s );
s.clear();
}
else
{
s += c;
}
}
if ( ! s.empty() )
d.push_back( s );
return ! s.empty();
}
int main()
{
std::ifstream is( "test.txt" );
if ( ! is )
return -1;
svector line;
while ( get_line( is, line ) )
{
//...
}
return 0;
}

How to extract the string pattern in C++ efficiently?

I have a pattern that in the following format:
AUTHOR, "TITLE" (PAGES pp.) [CODE STATUS]
For example, I have a string
P.G. Wodehouse, "Heavy Weather" (336 pp.) [PH.409 AVAILABLE FOR LENDING]
I want to extract
AUTHOR = P.G. Wodehouse
TITLE = Heavy Weather
PAGES = 336
CODE = PH.409
STATUS = AVAILABLE FOR LENDING
I only know how to do that in Python, however, are there any efficient way to do the same thing in C++?

Exactly the same way as in Python. C++11 has regular expressions (and for earlier C++, there's Boost regex.) As for the read loop:
std::string line;
while ( std::getline( file, line ) ) {
// ...
}
is almost exactly the same as:
for line in file:
# ...
The only differences are:
The C++ version will not put the trailing '\n' in the buffer. (In general, the C++ version may be less flexible with regards to end of line handling.)
In case of a read error, the C++ version will break the loop; the Python version will raise an exception.
Neither should be an issue in your case.
EDIT:
It just occurs to me that while regular expressions in C++ and in Python are very similar, the syntax for using them isn't quite the same. So:
In C++, you'd normally declare an instance of the regular expression before using it; something like Python's re.match( r'...', line ) is theoretically possible, but not very idiomatic (and it would still involve explicitly constructuing a regular expression object in the expression). Also, the match function simply returns a boolean; if you want the captures, you need to define a separate object for them. Typical use would probably be something like:
static std::regex const matcher( "the regular expression" );
std::smatch forCaptures;
if ( std::regex_match( line, forCaptures, matcher ) ) {
std::string firstCapture = forCaptures[1];
// ...
}
This corresponds to the Python:
m = re.match( 'the regular expression', line )
if m:
firstCapture = m.group(1)
# ...
EDIT:
Another answer has suggested overloading operator>>; I heartily concur. Just out of curiousity, I gave it a go; something like the following works well:
struct Book
{
std::string author;
std::string title;
int pages;
std::string code;
std::string status;
};
std::istream&
operator>>( std::istream& source, Book& dest )
{
std::string line;
std::getline( source, line );
if ( source )
{
static std::regex const matcher(
R"^(([^,]*),\s*"([^"]*)"\s*\((\d+) pp.\)\s*\[(\S+)\s*([^\]]*)\])^"
);
std::smatch capture;
if ( ! std::regex_match( line, capture, matcher ) ) {
source.setstate( std::ios_base::failbit );
} else {
dest.author = capture[1];
dest.title = capture[2];
dest.pages = std::stoi( capture[3] );
dest.code = capture[4];
dest.status = capture[5];
}
}
return source;
}
Once you've done this, you can write things like:
std::vector<Book> v( (std::istream_iterator<Book>( inputFile )),
(std::istream_iterator<Book>()) );
And load an entire file in the initialization of a vector.
Note the error handling in the operator>>. If a line is misformed, we set failbit; this is the standard convention in C++.
EDIT:
Since there's been so much discussion: the above is fine for small, one time programs, things like school projects, or one time programs which will read the current file, output it in a new format, and then be thrown away. In production code, I would insist on support for comments and empty lines; continuing in case of error, in order to report multiple errors (with line numbers), and probably continuation lines (since titles can get long enough to become unwieldly). It's not practical to do this with operator>>, if for no other reason than the need to output line numbers, so I'd use a parser along the following line:
int
getContinuationLines( std::istream& source, std::string& line )
{
int results = 0;
while ( source.peek() == '&' ) {
std::string more;
std::getline( source, more ); // Cannot fail, because of peek
more[0] = ' ';
line += more;
++ results;
}
return results;
}
void
trimComment( std::string& line )
{
char quoted = '\0';
std::string::iterator position = line.begin();
while ( position != line.end() && (quoted != '\0' || *position == '#') ) {
if ( *position == '\' && std::next( position ) != line.end() ) {
++ position;
} else if ( *position == quoted ) {
quoted = '\0';
} else if ( *position == '\"' || *position == '\'' ) {
quoted = *position;
}
++ position;
}
line.erase( position, line.end() );
}
bool
isEmpty( std::string const& line )
{
return std::all_of(
line.begin(),
line.end(),
[]( unsigned char ch ) { return isspace( ch ); } );
}
std::vector<Book>
parseFile( std::istream& source )
{
std::vector<Book> results;
int lineNumber = 0;
std::string line;
bool errorSeen = false;
while ( std::getline( source, line ) ) {
++ lineNumber;
int extraLines = getContinuationLines( source, line );
trimComment( line );
if ( ! isEmpty( line ) ) {
static std::regex const matcher(
R"^(([^,]*),\s*"([^"]*)"\s*\((\d+) pp.\)\s*\[(\S+)\s*([^\]]*)\])^"
);
std::smatch capture;
if ( ! std::regex_match( line, capture, matcher ) ) {
std::cerr << "Format error, line " << lineNumber << std::endl;
errorSeen = true;
} else {
results.emplace_back(
capture[1],
capture[2],
std::stoi( capture[3] ),
capture[4],
capture[5] );
}
}
lineNumber += extraLines;
}
if ( errorSeen ) {
results.clear(); // Or more likely, throw some sort of exception.
}
return results;
}
The real issue here is how you report the error to the caller; I suspect that in most cases, and exception would be appropriate, but depending on the use case, other alternatives may be valid as well. In this example, I just return an empty vector. (The interaction between comments and continuation lines probably needs to be better defined as well, with modifications according to how it has been defined.)

Your input string is well delimited so I'd recommend using an extraction operator over a regex, for speed and for ease of use.
You'd first need to create a struct for your books:
struct book{
string author;
string title;
int pages;
string code;
string status;
};
Then you'd need to write the actual extraction operator:
istream& operator>>(istream& lhs, book& rhs){
lhs >> ws;
getline(lhs, rhs.author, ',');
lhs.ignore(numeric_limits<streamsize>::max(), '"');
getline(lhs, rhs.title, '"');
lhs.ignore(numeric_limits<streamsize>::max(), '(');
lhs >> rhs.pages;
lhs.ignore(numeric_limits<streamsize>::max(), '[');
lhs >> rhs.code >> ws;
getline(lhs, rhs.status, ']');
return lhs;
}
This gives you a tremendous amount of power. For example you can extract all the books from an istream into a vector like this:
istringstream foo("P.G. Wodehouse, \"Heavy Weather\" (336 pp.) [PH.409 AVAILABLE FOR LENDING]\nJohn Bunyan, \"The Pilgrim's Progress\" (336 pp.) [E.1173 CHECKED OUT]");
vector<book> bar{ istream_iterator<book>(foo), istream_iterator<book>() };

Use flex (it generates C or C++ code, to be used as a part or as the full program)
%%
^[^,]+/, {printf("Autor: %s\n",yytext );}
\"[^"]+\" {printf("Title: %s\n",yytext );}
\([^ ]+/[ ]pp\. {printf("Pages: %s\n",yytext+1);}
..................
.|\n {}
%%
(untested)

Here's the code:
#include <iostream>
#include <cstring>
using namespace std;
string extract (string a)
{
string str = "AUTHOR = "; //the result string
int i = 0;
while (a[i] != ',')
str += a[i++];
while (a[i++] != '\"');
str += "\nTITLE = ";
while (a[i] != '\"')
str += a[i++];
while (a[i++] != '(');
str += "\nPAGES = ";
while (a[i] != ' ')
str += a[i++];
while (a[i++] != '[');
str += "\nCODE = ";
while (a[i] != ' ')
str += a[i++];
while (a[i++] == ' ');
str += "\nSTATUS = ";
while (a[i] != ']')
str += a[i++];
return str;
}
int main ()
{
string a;
getline (cin, a);
cout << extract (a) << endl;
return 0;
}
Happy coding :)

Parse a file in c++ and ignore some characters

I'm trying to parse a file that has the following information
test_wall ; Comments!!
je,5
forward
goto,1
test_random;
je,9
I'm supposed to ignore comments after the ";" and move on to the next line. When there is a comma I'm trying to ignore the comma and store the second value.
string c;
int a;
c=getChar();
ifstream myfile (filename);
if (myfile.is_open())
{
while ( c != ';' && c != EOF)
{
c = getchar();
if( c == ',')
{
a= getChar();
}
}
}
myfile.close();
}

Here's some code. I'm not entirely sure I've understood the problem correctly, but if not hopefully this will set you on the right direction.
ifstream myfile (filename);
if (myfile.is_open())
{
// read one line at a time until EOF
string line;
while (getline(myFile, line))
{
// does the line have a semi-colon?
size_t pos = line.find(';');
if (pos != string::npos)
{
// remove the semi-colon and everything afterwards
line = line.substr(0, pos);
}
// does the line have a comma?
pos = line.find(',');
if (pos != string::npos)
{
// get everything after the comma
line = line.substr(pos + 1);
// store the string
...
}
}
}
I've left the section commented 'store the string' blank because I'm not certain what you want to do here. Possibly you are asking to convert the string into an integer before storing it. If so then add that code, or ask if you don't know how to do that. Actually don't ask, search on stack overflow, because that question has been asked hundreds of times.

Need a regular expression to extract only letters and whitespace from a string

I'm building a small utility method that parses a line (a string) and returns a vector of all the words. The istringstream code I have below works fine except for when there is punctuation so naturally my fix is to want to "sanitize" the line before I run it through the while loop.
I would appreciate some help in using the regex library in c++ for this. My initial solution was to us substr() and go to town but that seems complicated as I'll have to iterate and test each character to see what it is then perform some operations.
vector<string> lineParser(Line * ln)
{
vector<string> result;
string word;
string line = ln->getLine();
istringstream iss(line);
while(iss)
{
iss >> word;
result.push_back(word);
}
return result;
}

Don't need to use regular expressions just for punctuation:
// Replace all punctuation with space character.
std::replace_if(line.begin(), line.end(),
std::ptr_fun<int, int>(&std::ispunct),
' '
);
Or if you want everything but letters and numbers turned into space:
std::replace_if(line.begin(), line.end(),
std::not1(std::ptr_fun<int,int>(&std::isalphanum)),
' '
);
While we are here:
Your while loop is broken and will push the last value into the vector twice.
It should be:
while(iss)
{
iss >> word;
if (iss) // If the read of a word failed. Then iss state is bad.
{ result.push_back(word);// Only push_back() if the state is not bad.
}
}
Or the more common version:
while(iss >> word) // Loop is only entered if the read of the word worked.
{
result.push_back(word);
}
Or you can use the stl:
std::copy(std::istream_iterator<std::string>(iss),
std::istream_iterator<std::string>(),
std::back_inserter(result)
);

[^A-Za-z\s] should do what you need if your replace the matching characters by nothing. It should remove all characters that are not letters and spaces. Or [^A-Za-z0-9\s] if you want to keep numbers too.
You can use online tools like this one : http://gskinner.com/RegExr/ to test out your patterns (Replace tab). Indeed some modifications can be required based on the regex lib you are using.

I'm not positive, but I think this is what you're looking for:
#include<iostream>
#include<regex>
#include<vector>
int
main()
{
std::string line("some words: with some punctuation.");
std::regex words("[\\w]+");
std::sregex_token_iterator i(line.begin(), line.end(), words);
std::vector<std::string> list(i, std::sregex_token_iterator());
for (auto j = list.begin(), e = list.end(); j != e; ++j)
std::cout << *j << '\n';
}
some
words
with
some
punctuation

The simplest solution is probably to create a filtering
streambuf to convert all non alphanumeric characters to space,
then to read using std::copy:
class StripPunct : public std::streambuf
{
std::streambuf* mySource;
char myBuffer;
protected:
virtual int underflow()
{
int result = mySource->sbumpc();
if ( result != EOF ) {
if ( !::isalnum( result ) )
result = ' ';
myBuffer = result;
setg( &myBuffer, &myBuffer, &myBuffer + 1 );
}
return result;
}
public:
explicit StripPunct( std::streambuf* source )
: mySource( source )
{
}
};
std::vector<std::string>
LineParser( std::istream& source )
{
StripPunct sb( source.rdbuf() );
std::istream src( &sb );
return std::vector<std::string>(
(std::istream_iterator<std::string>( src )),
(std::istream_iterator<std::string>()) );
}

How to count lines of a file in C++?

How can I count lines using the standard classes, fstream and ifstream?

How about this :-
std::ifstream inFile("file");
std::count(std::istreambuf_iterator<char>(inFile),
std::istreambuf_iterator<char>(), '\n');

You read the file line by line.
Count the number of lines you read.

This is the correct version of Craig W. Wright's answer:
int numLines = 0;
ifstream in("file.txt");
std::string unused;
while ( std::getline(in, unused) )
++numLines;

kernel methods following #Abhay
A complete code I've done :
#include <fstream>
std::size_t count_line(std::istream &is) {
// skip when is not open or got bad
if (!is || is.bad()) { return 0; }
// save state
auto state_backup = is.rdstate();
// clear state
is.clear();
auto pos_backup = is.tellg();
is.seekg(0);
size_t line_cnt;
size_t lf_cnt = std::count(std::istreambuf_iterator<char>(is), std::istreambuf_iterator<char>(), '\n');
line_cnt = lf_cnt;
// if the file is not end with '\n' , then line_cnt should plus 1
// but we should check whether file empty firstly! or it will raise bug
if (is.tellg() != 0) {
is.unget();
if (is.get() != '\n') { ++line_cnt; }
}
// recover state
is.clear() ; // previous reading may set eofbit
is.seekg(pos_backup);
is.setstate(state_backup);
return line_cnt;
}
it will not change the origin file stream state and including '\n'-miss situation processing for the last line.
Thanks #masoomyf for pointing my bug and I was too stupid to figure it out!

int aNumOfLines = 0;
ifstream aInputFile(iFileName);
string aLineStr;
while (getline(aInputFile, aLineStr))
{
if (!aLineStr.empty())
aNumOfLines++;
}
return aNumOfLines;

This works for me:
std::ifstream fin{"source.txt"};
std::count(std::istream_iterator<char>(fin >> std::noskipws), {}, '\n');

int numLines = 0;
ifstream in("file.txt");
//while ( ! in.eof() )
while ( in.good() )
{
std::string line;
std::getline(in, line);
++numLines;
}
There is a question of how you treat the very last line of the file if it does not end with a newline. Depending upon what you're doing you might want to count it and you might not. This code counts it.
See: http://www.cplusplus.com/reference/string/getline/

Divide the file size by the average number of characters per line!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

c++ Reading numbers from text files, ignoring comments - c++

Related

reading csv file for specific information

How to extract the string pattern in C++ efficiently?

Parse a file in c++ and ignore some characters

Need a regular expression to extract only letters and whitespace from a string

How to count lines of a file in C++?

Categories

Resources