function of searching a string from a file - c++

This is some code I wrote to check a string's presence in a file:
bool aviasm::in1(string s)
{
ifstream in("optab1.txt",ios::in);//opening the optab
//cout<<"entered in1 func"<<endl;
char c;
string x,y;
while((c=in.get())!=EOF)
{
in.putback(c);
in>>x;
in>>y;
if(x==s)
return true;
}
return false;
}
it is sure that the string being searched lies in the first column of the optab1.txt and in total there are two columns in the optab1.txt for every row.
Now the problem is that no matter what string is being passed as the parameter s to the function always returns false. Can you tell me why this happens?

What a hack! Why not use standard C++ string and file reading functions:
bool find_in_file(const std::string & needle)
{
std::ifstream in("optab1.txt");
std::string line;
while (std::getline(in, line)) // remember this idiom!!
{
// if (line.substr(0, needle.length()) == needle) // not so efficient
if (line.length() >= needle.length() && std::equal(needle.begin(), needle.end(), line.begin())) // better
// if (std::search(line.begin(), line.end(), needle.begin(), needle.end()) != line.end()) // for arbitrary position
{
return true;
}
}
return false;
}
You can replace substr by more advanced string searching functions if the search string isn't required to be at the beginning of a line. The substr version is the most readable, but it makes a copy of the substring. The equal version compares the two strings in-place (but requires the additional size check). The search version finds the substring anywhere, not just at the beginning of the line (but at a price).

It's not too clear what you're trying to do, but the condition in the
while will never be met if plain char is unsigned. (It usually
isn't, so you might get away with it.) Also, you're not extracting the
end of line in the loop, so you'll probably see it instead of EOF, and
pass once too often in the loop. I'd write this more along the lines
of:
bool
in1( std::string const& target )
{
std::ifstream in( "optab1.txt" );
if ( ! in.is_open() )
// Some sort of error handling, maybe an exception.
std::string line;
while ( std::getline( in, line )
&& ( line.size() < target.size()
|| ! std::equal( target.begin(), target.end(), line.begin() ) ) )
;
return in;
}
Note the check that the open succeeded. One possible reason you're
always returning false is that you're not successfully opening the file.
(But we can't know unless you check the status after the open.)

Related

Reading from FileStream with arbitrary delimiter

I have encountered a problem to read msg from a file using C++. Usually what people does is create a file stream then use getline() function to fetch msg. getline() function can accept an additional parameter as delimiter so that it return each "line" separated by the new delimiter but not default '\n'. However, this delimiter has to be a char. In my usecase, it is possible the delimiter in the msg is something else like "|--|", so I try to get a solution such that it accept a string as delimiter instead of a char.
I have searched StackOverFlow a little bit and found some interesting posts.
Parse (split) a string in C++ using string delimiter (standard C++)
This one gives a solution to use string::find() and string::substr() to parse with arbitrary delimiter. However, all the solutions there assumes input is a string instead of a stream, In my case, the file stream data is too big/waste to fit into memory at once so it should read in msg by msg (or a bulk of msg at once).
Actually, read through the gcc implementation of std::getline() function, it seems it is much more easier to handle the case delimiter is a singe char. Since every time you load in a chunk of characters, you can always search the delimiter and separate them. While it is different if you delimiter is more than one char, the delimiter itself may straddle between two different chunks and cause many other corner cases.
Not sure whether anyone else has faced this kind of requirement before and how you guys handled it elegantly. It seems it would be nice to have a standard function like istream& getNext (istream&& is, string& str, string delim)? This seems to be a general usecase to me. Why not this one is in Standard lib so that people no longer to implement their own version separately?
Thank you very much
The STL simply does not natively support what you are asking for. You will have to write your own function (or find a 3rd party function) that does what you need.
For instance, you can use std::getline() to read up to the first character of your delimiter, and then use std::istream::get() to read subsequent characters and compare them to the rest of your delimiter. For example:
std::istream& my_getline(std::istream &input, std::string &str, const std::string &delim)
{
if (delim.empty())
throw std::invalid_argument("delim cannot be empty!");
if (delim.size() == 1)
return std::getline(input, str, delim[0]);
str.clear();
std::string temp;
char ch;
bool found = false;
do
{
if (!std::getline(input, temp, delim[0]))
break;
str += temp;
found = true;
for (int i = 1; i < delim.size(); ++i)
{
if (!input.get(ch))
{
if (input.eof())
input.clear(std::ios_base::eofbit);
str.append(delim.c_str(), i);
return input;
}
if (delim[i] != ch)
{
str.append(delim.c_str(), i);
str += ch;
found = false;
break;
}
}
}
while (!found);
return input;
}
if you are ok with reading byte by byte, you could build a state transition table implementation of a finite state machine to recognize your stop condition
std::string delimeter="someString";
//initialize table with a row per target string character, a column per possible char and all zeros
std::vector<vector<int> > table(delimeter.size(),std::vector<int>(256,0));
int endState=delimeter.size();
//set the entry for the state looking for the next letter and finding that character to the next state
for(unsigned int i=0;i<delimeter.size();i++){
table[i][(int)delimeter[i]]=i+1;
}
now in you can use it like this
int currentState=0;
int read=0;
bool done=false;
while(!done&&(read=<istream>.read())>=0){
if(read>=256){
currentState=0;
}else{
currentState=table[currentState][read];
}
if(currentState==endState){
done=true;
}
//do your streamy stuff
}
granted this only works if the delimiter is in extended ASCII, but it will work fine for some thing like your example.
It seems, it is easiest to create something like getline(): read to the last character of the separator. Then check if the string is long enough for the separator and, if so, if it ends with the separator. If it is not, carry on reading:
std::string getline(std::istream& in, std::string& value, std::string const& separator) {
std::istreambuf_iterator<char> it(in), end;
if (separator.empty()) { // empty separator -> return the entire stream
return std::string(it, end);
}
std::string rc;
char last(separator.back());
for (; it != end; ++it) {
rc.push_back(*it);
if (rc.back() == last
&& separator.size() <= rc.size()
&& rc.substr(rc.size() - separator.size()) == separator) {
return rc.resize(rc.size() - separator.size());
}
}
return rc; // no separator was found
}

How to skip header row in csv using C++

In my scenario,I need to create a parameters file using CSV .Every row means a config-data,the first field of the row treated as the header,using as a identifier. The format of CSV like below will be easy for me to parse:
1,field1,field2,field3,field4 // 1 indicated the TARGET that the other fields will be writted to.
1,field1,field2,field3,field4
2,field1,field2,field3,field4
2,field1,field2,field3,field4........
But it's not friendly to users.So,I define a csv file like below:
HeaderLine_Begin,1
field1,field2,field3,field4
field1,field2,field3,field4
HeaderLine_Begin,2
field1,field2,field3,field4
field1,field2,field3,field4
means,every row is the data will be writted to the target by the HeaderLine_Begin.I just separate the ID from the real data.
Then,I create a struct like this:
enum myenum
{
ON,OFF,NOCHANGE
};
struct Setting
{
int TargetID;
string field1;
string field2;
myenum field3;
myenum field4;
};
I knew how to write some code for reading csv line by line like below
filename +=".csv";
std::ifstream file(filename.c_str());
std::string line;
while ( file.good() )
{
getline ( file, line, '\n' ); // read a line until last
if(line.compare(0,1,"#") == 0) // ignore the comment line
continue;
ParseLine();// DONE.Parse the line if it's header row OR data row
}
file.close(); // close file
What I want to do is to create a list like vetor settings to keep the data.The flow should be,like,find the first headerID1,then find the next line.If the next line is dataline,treat it as the dataline belong to headerID1.If the next line is another headerID,loop again.
The problem is,there are no such std::getnextline(int lineIndex) for me to fetch the rows after I found the headerRow.
Your input loop should be more like:
int id = -1;
while (getline(file, line))
{
if (line.empty() || line[0] == '#')
continue;
if (starts_with_and_remove(line, "HeaderLine_Begin,"))
id = boost::lexical_cast<int>(line); // or id = atoi(line.c_str())
else
{
assert(id != -1);
...parse CSV, knowing "id" is in effect...
}
}
With:
bool stats_with_and_remove(std::string& lhs, const std::string& rhs)
{
if (lhs.compare(0, rhs.size(), lhs) == 0) // rhs.size() > lhs.size() IS safe
{
lhs.erase(0, rhs.size());
return true;
}
return false;
}
The simplest solution would be to use regular expressions:
std::string line;
int currentId = 0;
while ( std::getline( source, line ) ) {
trimCommentsAndWhiteSpace( line );
static std::regex const header( "HeaderLine_Begin,(\\d+)" );
std::smatch match;
if ( line.empty() ) {
// ignore
} else if ( std::regex_match( line, match, header ) ) {
std::istringstream s( match[ 1 ] );
s >> currentId;
} else {
// ...
}
}
I regularly use this strategy to parse .ini files, which pose
the same problem: section headers have a different syntax to
other things.
trimCommentsAndWhiteSpace can be as simple as:
void
trimCommentsAndWhiteSpace( std::string& line )
{
if ( !line.empty() && line[0] == '#' ) {
line = "";
}
}
It's fairly easy to expand it to handle end of line comments as
well, however, and it's usually a good policy (in contexts like
this) to trim leading and trailing whitespace---trailing
especially, since a human reader won't see it when looking at
the file.
Alternatively, of course, you could use a regular expression for
the lines you want to treet as comments ("\s*#.*"); this works
well with your current definition, but doesn't really extend
well for end of line comments, especially if you want to allow
# in quoted strings in your fields.
And one final comment: your loop is incorrect. You don't test
that getline succeeded before using its results, and
file.good() may return true even if there is nothing more to
read. (file.good() is one of those things that are there for
historical reasons; there's no case where it makes sense to use
it.)

How do I check that stream extraction has consumed all input?

In the following function, I try to see if a string s is convertible to type T by seeing if I can read a type T, and if the input is completely consumed afterwards. I want
template <class T>
bool can_be_converted_to(const std::string& s, T& t)
{
std::istringstream i(s);
i>>std::boolalpha;
i>>t;
if (i and i.eof())
return true;
else
return false;
}
However, can_be_converted_to<bool>("true") evaluates to false, because i.eof() is false at the end of the function.
This is correct, even though the function has read the entire string, because it hasn't attempted to read past the end of the string. (So, apparently this function works for int and double because istringstream reads past the end when reading these.)
So, assuming that I should indeed be checking (i and <input completely consumed>):
Q: How do I check that the input was completely consumed w/o using eof()?
Use peek() or get() to check what's next in the stream:
return (i >> std::boolalpha >> t && i.peek() == EOF);
Your version doesn't work for integers, either. Consider this input: 123 45. It'll read 123 and report true, even though there are still some characters left in the stream.
In many implementations of the standard library the eof will only be set after you tried reading beyond the end. You can verify that in your code by doing:
char _;
if (i && !(i >> _)) { // i is in a valid state, but
// reading a single extra char fails
Extending on jrok's answer, you can use i.get() just as easily as
i.peek(), at least in this case. (I don't know if there is any reason
to prefer one to the other.)
Also, following the convention that white space is never anything but a
separator, you might want to extract it before checking for the end.
Something like:
return i >> std::ws && i.get() == std::istream::traits_type::eof();
Some older implementations of std::ws were buggy, and would put the
stream in an error state. In that case, you'd have to inverse the test,
and do something like:
return !(i >> std::ws) || i.get() == std::istream::traits_type::eof();
Or just read the std::ws before the condition, and depend uniquely on
the i.get().
(I don't know if buggy std::ws is still a problem. I developed a
version of it that worked back when it was, and I've just continued to
use it.)
I would like to offer a completely different approach:
Take your input string, tokenise it yourself, and then convert the individual fields using boost::lexical_cast<T>.
Reason: I wasted an afternoon on parsing a string containing 2 int and 2 double fields, separated by spaces. Doing the following:
int i, j;
double x, y;
std::istringstream ins{str};
ins >> i >> j >> x >> y;
// how to check errors???...
parses the correct input such as
`"5 3 9.9e+01 5.5e+02"`
correctly, but does not detect the problem with this:
`"5 9.6e+01 5.5e+02"`
What happens is that i will be set to 5 (OK), j will be set to 9 (??), x to 6.0 (=0.6e+01), y to 550 (OK). I was quite surprised to see failbit not being set... (platform info: OS X 10.9, Apple Clang++ 6.0, C++11 mode).
Of course you can say now, "But wait, the Standard states that it should be so", and you may be right, but knowing that it is a feature rather than a bug does not reduce the pain if you want to do proper error checking without writing miles of code.
OTOH, if you use "Marius"'s excellent tokeniser function and split str first on whitespace then suddenly everything becomes very easy. Here is a slightly modified version of the tokeniser. I re-wrote it to return a vector of strings; the original is a template that puts the tokens in a container with elements convertible to strings. (For those who need such a generic approach please consult the original link above.)
// \param str: the input string to be tokenized
// \param delimiters: string of delimiter characters
// \param trimEmpty: if true then empty tokens will be trimmed
// \return a vector of strings containing the tokens
std::vector<std::string> tokenizer(
const std::string& str,
const std::string& delimiters = " ",
const bool trimEmpty = false
) {
std::vector<std::string> tokens;
std::string::size_type pos, lastPos = 0;
const char* strdata = str.data();
while(true) {
pos = str.find_first_of(delimiters, lastPos);
if(pos == std::string::npos) {
// no more delimiters
pos = str.length();
if(pos != lastPos || !trimEmpty) {
tokens.emplace_back(strdata + lastPos, pos - lastPos);
}
break;
} else {
if(pos != lastPos || !trimEmpty) {
tokens.emplace_back(strdata + lastPos, pos - lastPos);
}
}
lastPos = pos + 1;
}
return tokens;
}
and then just use it like this (ParseError is some exception object):
std::vector<std::string> tokens = tokenizer(str, " \t", true);
if (tokens.size() < 4)
throw ParseError{"Too few fields in " + str};
try {
unsigned int i{ boost::lexical_cast<unsigned int>(tokens[0]) },
j{ boost::lexical_cast<unsigned int>(tokens[1]) };
double x{ boost::lexical_cast<double>(tokens[2]) },
y{ boost::lexical_cast<double>(tokens[3]) };
// print or process i, j, x, y ...
} catch(const boost::bad_lexical_cast& error) {
throw ParseError{"Could not parse " + str};
}
Note: you can use the Boost split or the tokenizer if you wish, but they were slower than Marius' tokeniser (at least in my environment).
Update: Instead of boost::lexical_cast<T> you can use the C++11 "std::sto*" functions (e.g. stoi to convert a string token to an int). These throw two kinds of exceptions: std::invalid_argument if the conversion could not be performed and std::out_of_range if the converted value cannot be represented.
You could either catch these separately or their parent std::runtime_error. Modifications to the example code above is left as an exercise to the reader :-)

Skip reading a line in a INI file if its length greater than n in C++

I want to skip reading a line in the INI file if has more than 1000 characters.This is the code i'm using:
#define MAX_LINE 1000
char buf[MAX_LINE];
CString strTemp;
str.Empty();
for(;;)
{
is.getline(buf,MAX_LINE);
strTemp=buf;
if(strTemp.IsEmpty()) break;
str+=strTemp;
if(str.Find("^")>-1)
{
str=str.Left( str.Find("^") );
do
{
is.get(buf,2);
} while(is.gcount()>0);
is.getline(buf,2);
}
else if(strTemp.GetLength()!=MAX_LINE-1) break;
}
//is.getline(buf,MAX_LINE);
return is;
...
The problem i'm facing is that if the characters exceed 1000 if seems to fall in a infinite loop(unable to read next line).How can i make the getline to skip that line and read the next line??
const std::size_t max_line = 1000; // not a macro, macros are disgusting
std::string line;
while (std::getline(is, line))
{
if (line.length() > max_line)
continue;
// else process the line ...
}
How abut checking the return value of getline and break if that fails?
..or if is is an istream, you could check for an eof() condition to break you out.
#define MAX_LINE 1000
char buf[MAX_LINE];
CString strTemp;
str.Empty();
while(is.eof() == false)
{
is.getline(buf,MAX_LINE);
strTemp=buf;
if(strTemp.IsEmpty()) break;
str+=strTemp;
if(str.Find("^")>-1)
{
str=str.Left( str.Find("^") );
do
{
is.get(buf,2);
} while((is.gcount()>0) && (is.eof() == false));
stillReading = is.getline(buf,2);
}
else if(strTemp.GetLength()!=MAX_LINE-1)
{
break;
}
}
return is;
For something completely different:
std::string strTemp;
str.Empty();
while(std::getline(is, strTemp)) {
if(strTemp.empty()) break;
str+=strTemp.c_str(); //don't need .c_str() if str is also a std::string
int pos = str.Find("^"); //extracted this for speed
if(pos>-1){
str=str.Left(pos);
//Did not translate this part since it was buggy
} else
//not sure of the intent here either
//it would stop reading if the line was less than 1000 characters.
}
return is;
This uses strings for ease of use, and no maximum limits on lines. It also uses the std::getline for the dynamic/magic everything, but I did not translate the bit in the middle since it seemed very buggy to me, and I couldn't interpret the intent.
The part in the middle simply reads two characters at a time until it reaches the end of the file, and then everything after that would have done bizarre stuff since you weren't checking return values. Since it was completely wrong, I didn't interpret it.

Cleaning a string of punctuation in C++

Ok so before I even ask my question I want to make one thing clear. I am currently a student at NIU for Computer Science and this does relate to one of my assignments for a class there. So if anyone has a problem read no further and just go on about your business.
Now for anyone who is willing to help heres the situation. For my current assignment we have to read a file that is just a block of text. For each word in the file we are to clear any punctuation in the word (ex : "can't" would end up as "can" and "that--to" would end up as "that" obviously with out the quotes, quotes were used just to specify what the example was).
The problem I've run into is that I can clean the string fine and then insert it into the map that we are using but for some reason with the code I have written it is allowing an empty string to be inserted into the map. Now I've tried everything that I can come up with to stop this from happening and the only thing I've come up with is to use the erase method within the map structure itself.
So what I am looking for is two things, any suggestions about how I could a) fix this with out simply just erasing it and b) any improvements that I could make on the code I already have written.
Here are the functions I have written to read in from the file and then the one that cleans it.
Note: the function that reads in from the file calls the clean_entry function to get rid of punctuation before anything is inserted into the map.
Edit: Thank you Chris. Numbers are allowed :). If anyone has any improvements to the code I've written or any criticisms of something I did I'll listen. At school we really don't get feed back on the correct, proper, or most efficient way to do things.
int get_words(map<string, int>& mapz)
{
int cnt = 0; //set out counter to zero
map<string, int>::const_iterator mapzIter;
ifstream input; //declare instream
input.open( "prog2.d" ); //open instream
assert( input ); //assure it is open
string s; //temp strings to read into
string not_s;
input >> s;
while(!input.eof()) //read in until EOF
{
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() == 0)
{
input >> s;
clean_entry(s, not_s);
}
mapz[not_s]++; //increment occurence
input >>s;
}
input.close(); //close instream
for(mapzIter = mapz.begin(); mapzIter != mapz.end(); mapzIter++)
cnt = cnt + mapzIter->second;
return cnt; //return number of words in instream
}
void clean_entry(const string& non_clean, string& clean)
{
int i, j, begin, end;
for(i = 0; isalnum(non_clean[i]) == 0 && non_clean[i] != '\0'; i++);
begin = i;
if(begin ==(int)non_clean.length())
return;
for(j = begin; isalnum(non_clean[j]) != 0 && non_clean[j] != '\0'; j++);
end = j;
clean = non_clean.substr(begin, (end-begin));
for(i = 0; i < (int)clean.size(); i++)
clean[i] = tolower(clean[i]);
}
The problem with empty entries is in your while loop. If you get an empty string, you clean the next one, and add it without checking. Try changing:
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() == 0)
{
input >> s;
clean_entry(s, not_s);
}
mapz[not_s]++; //increment occurence
input >>s;
to
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() > 0)
{
mapz[not_s]++; //increment occurence
}
input >>s;
EDIT: I notice you are checking if the characters are alphanumeric. If numbers are not allowed, you may need to revisit that area as well.
Further improvements would be to
declare variables only when you use them, and in the innermost scope
use c++-style casts instead of the c-style (int) casts
use empty() instead of length() == 0 comparisons
use the prefix increment operator for the iterators (i.e. ++mapzIter)
A blank string is a valid instance of the string class, so there's nothing special about adding it into the map. What you could do is first check if it's empty, and only increment in that case:
if (!not_s.empty())
mapz[not_s]++;
Style-wise, there's a few things I'd change, one would be to return clean from clean_entry instead of modifying it:
string not_s = clean_entry(s);
...
string clean_entry(const string &non_clean)
{
string clean;
... // as before
if(begin ==(int)non_clean.length())
return clean;
... // as before
return clean;
}
This makes it clearer what the function is doing (taking a string, and returning something based on that string).
The function 'getWords' is doing a lot of distinct actions that could be split out into other functions. There's a good chance that by splitting it up into it's individual parts, you would have found the bug yourself.
From the basic structure, I think you could split the code into (at least):
getNextWord: Return the next (non blank) word from the stream (returns false if none left)
clean_entry: What you have now
getNextCleanWord: Calls getNextWord, and if 'true' calls CleanWord. Returns 'false' if no words left.
The signatures of 'getNextWord' and 'getNextCleanWord' might look something like:
bool getNextWord (std::ifstream & input, std::string & str);
bool getNextCleanWord (std::ifstream & input, std::string & str);
The idea is that each function does a smaller more distinct part of the problem. For example, 'getNextWord' does nothing but get the next non blank word (if there is one). This smaller piece therefore becomes an easier part of the problem to solve and debug if necessary.
The main component of 'getWords' then can be simplified down to:
std::string nextCleanWord;
while (getNextCleanWord (input, nextCleanWord))
{
++map[nextCleanWord];
}
An important aspect to development, IMHO, is to try to Divide and Conquer the problem. Split it up into the individual tasks that need to take place. These sub-tasks will be easier to complete and should also be easier to maintain.