find exact string match in C++ - c++

Here is the code I used to detect the string in a line from a txt file:
int main()
{
std::ifstream file( "C:\\log.txt" );
std::string line;
while(!file.eof())
{
while( std::getline( file, line ) )
{
int found = -1;
if((found = line.find("GetSA"))>-1)
std::cout<<"We found GetSA."<<std::endl;
else if ((found = line.find("GetVol"))>-1)
std::cout<<"We found GetVol."<<std::endl;
else if ((found = line.find("GetSphereSAandVol"))>-1)
std::cout<<"We found GetSphereSAandVol."<<std::endl;
else
std::cout<<"We found nothing!"<<std::endl;
}
}
std::cin.get();
}
And here is my log file:
GetSA (3.000000)
GetVol (3.000000)
GetSphereSAandVol (3.000000)
GetVol (3.000000)
GetSphereSAandVol (3.000000)
GetSA (3.00000)
The error is, the program will not go to find "GetSphereSAandVol", because it stops at "GetSA". Obviously, the program thinks "GetSphereSAandVol" contains "GetSA", so it will execute:
if(found = line.find("GetSA"))
std::cout<<"We found GetSA."<<std::endl;
which is not exactly what I want, because I am expecting the program to execute:
else if (found = line.find("GetSphereSAandVol"))
std::cout<<"We found GetSphereSAandVol."<<std::endl;
So, anyway I can avoid this? to get what I really want? Thanks a lot.

You misunderstand how find works. Read the documentation.
The conditionals should go like this:
if ((found = line.find("xyz")) != line.npos) { /* found "xyz" */ }
I would write your entire program like this:
int main(int argc, char * argv[])
{
if (argc != 2) { std::cout << "Bad invocation\n"; return 0; }
std::ifstream infile(argv[1]);
if (!infile) { std::cout << "Bad filename '" << argv[1] << "'\n"; return 0; }
for (std::string line; std::getline(infile, line); )
{
int pos;
if ((pos = line.find("abc")) != line.npos)
{
std::cout << "Found line 'abc'\n";
continue;
}
if ((pos = line.find("xyz")) != line.npos)
{
std::cout << "Found line 'xyz'\n";
continue;
}
// ...
std::cout << "Line '" << line << "' did not match anything.\n";
}
}

Two errors, one you asked about and one you didn't.
Your if statements are wrong. You misunderstand how string::find works. This is the correct way
if ((found = line.find("GetSA")) != string::npos)
...
else if ((found = line.find("GetVol")) != string::npos)
...
etc.
If string::find does not find what it's looking for it returns a special value string::npos. This is what your if conditions should test for.
Second error, lose the while (!file.eof()) loop, it's completely unnecessary.

The string::find function returns string::npos if not found. Otherwise it returns an index. You are assuming it returns a boolean and are testing accordingly. That will not work, because string::npos evaluates to a boolean truth (non-zero). Also, if the substring is at index zero, that will not pass.
You must instead do this:
if( std::string::npos != (found = line.find("GetSA")) )
// etc...
Personally, I don't like the style of setting a value and testing in this way, but that's up to you. I might do this instead with a simple helper function:
bool FindSubString( std::string& str, const char *substr, int& pos )
{
pos = str.find(substr);
return pos != std::string::npos;
}
Then:
if( FindSubString( line, "GetSA", found ) )
// etc...
But in your case, you're not even using the found variable. So you can ignore what I've said about style and just do:
if( std::string::npos != line.find("GetSA") )
// etc...

Related

how to point to delimiter at fixed position in std::string

Say I have text say with '#' as a delimiter.
example
std::string key = "012#txt1#txt2#txt3#txt4# #some other text:"
I have to insert modified text between #at position 5 and #at position 6. The one shown above with spaces in between.
To accomplish this I need to find 5th # and 6th #.
I wrote a small code but its not doing what i expect to do.It always return first found '#'. can someone please advice me.
std::string temp = key;
size_t found = 0;
size_t pos_key = temp.find('#');
while( ( found !=5 )&& ( pos_key != std::string::npos ) )
{
found++;
temp.find_first_of('#', pos_key + 1 );
temp.erase(0, pos_key );
}
std::cout << " the pos key is " << pos_key << std::endl ;
There are a couple problems going on. first you never update pos_key so you are stomping all over your string when you call erase which I am not sure why you are doing that. If you need to find the nth symbol you can use a function like:
size_t find_nth(const std::string & line, const std::string & symbol, size_t nth)
{
size_t pos = 0;
size_t counter = 0;
while (counter < nth && (pos = line.find(symbol, pos)) != std::string::npos)
{
counter++; // found a match so increment
pos++; // increment so we search for the next one
}
return pos;
}
And you can see it running in this Live Example
It seems you have two problems.
First you are not remembering the position of the '#' when you find it, you need to assign the return value of the std::string::find_first_of function to pos_key.
Second you keep deleting the contents of the string up to the position you find. That throws off all the position information you got from the std::string::find_first_of function.
I think this might be what you need:
int main()
{
std::string key = "012#txt1#txt2#txt3#txt4# #some other text:";
std::string temp = key;
size_t found = 0;
size_t pos_key = temp.find('#');
while((found != 5) && (pos_key != std::string::npos))
{
found++;
// this line does nothing with the found position
// temp.find_first_of('#', pos_key + 1);
// instead record the position of the latest '#'
pos_key = temp.find_first_of('#', pos_key + 1);
// this line just deletes most of the string
// for no apparent reason
// temp.erase(0, pos_key);
}
std::cout << " the pos key is " << pos_key << std::endl;
}

c++ splitting string on non alphabetic characters

I am reading in a file line by line which I want to split on non alphabetic characters and if possible remove all non alphabetic characters at same time so I wouldn't have to do it latter.
I would like to use isalpha, but cant figure out how to use that with str.find() or similar functions, as those usually take single delimiter as a a string.
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))
{
token = str.substr(0, pos);
//transform(str.begin(),str.end(),str.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
cout<<token<<" already in tree.Count "<<ptr->count<<"\n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.\n";
}
ptr=NULL;
str.erase(0, pos);
}
}
My latest attempt which doesn't work... All of examples I could find were based on str.find("single delimiter")
Which is no good to me.
Found a way to use isalpha
template<typename t>
void Tree<t>::readFromFile(string filename)
{
string str;
ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = find_if(str.begin(),str.end(),aZCheck)!=str.end()!=string::npos)
{
token = str.substr(0, pos);
transform(token.begin(),token.end(),token.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
// cout<<token<<" already in tree.Count "<<ptr->count<<"\n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.\n";
}
ptr=NULL;
str.erase(0, pos);
}
}
fileToOpen.close();
}
else
cout<<"Unable to open file!\n";
}
template<typename t>
inline bool Tree<t>::aZCheck(char c)
{
return !isalpha(c);
}
But issue still persists, string is getting split into single characters instead of words, and is whitespace considered valid by isalpha?
#include <algorithm>
#include <cctype>
...
template<typename t>
void Tree<t>::readFromFile(std::string filename)
{
std::string str;
std::ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )
{
for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();
pos = std::find_if(prev, str.end(), isalpha))
{
prev = std::find_if_not(pos, str.end(), isalpha);
std::string token(pos, prev);
std::transform(token.begin(), token.end(), token.begin(), ::tolower);
Node<t>* ptr = search(token, root);
if (ptr != NULL)
{
ptr->count++;
// cout<< token << " already in tree.Count "<<ptr->count<<"\n";
}
else
{
insert(token, root);
cout << token << " added to tree.\n";
}
}
}
fileToOpen.close();
}
else
cout<<"Unable to open file!\n";
}
Online demo
Also since you say you want to save time, it would benefit you if your insert function does something extra. i.e. insert the value if it is not found in the tree, and set the counter at the position to 1. If the value is in the tree, simply increment the counter. This will save you from doing 2 iterations seeing as your tree might be potentially unbalanced
Try this test case. Two problems.
1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)
This causes it to break out of the while. Use npos as a conditional check instead.
2 - You have to advance the postion past the delimiter when you erase, otherwise
it finds the same one over and over.
int pos= 0;
string token;
string str = "Thisis(asdfasdfasdf)and!this)))";
while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )
{
if ( pos != 0 )
{
// Found a token
token = str.substr(0, pos);
cout << "Found: " << token << endl;
}
else
{
// Found another delimiter
// Just move on to next one
}
str.erase(0, pos+1); // Always remove pos+1 to get rid of delimiter
}
// Cover the last (or only) token
if ( str.length() > 0 )
{
token = str;
cout << "Found: " << token << endl;
}
Outputs >>
Found: Thisis
Found: asdfasdfasdf
Found: and
Found: this
Press any key to continue . . .

How to eliminate this extra element?

I am writing a c++ function for reading the nth column of a tab delimited text file, here is what I have done:
typedef unsigned int uint;
inline void fileExists (const std::string& name) {
if ( access( name.c_str(), F_OK ) == -1 ) {
throw std::string("File does not exist!");
}
}
size_t bimNCols(std::string fn) {
try {
fileExists(fn);
std::ifstream in_file(fn);
std::string tmpline;
std::getline(in_file, tmpline);
std::vector<std::string> strs;
strs = boost::split(strs, tmpline, boost::is_any_of("\t"), boost::token_compress_on);
return strs.size();
} catch (const std::string& e) {
std::cerr << "\n" << e << "\n";
exit(EXIT_FAILURE);
}
}
typedef std::vector<std::string> vecStr;
vecStr bimReadCol(std::string fn, uint ncol_select) {
try {
size_t ncols = bimNCols(fn);
if(ncol_select < 1 or ncol_select > ncols) {
throw std::string("Your column selection is out of range!");
}
std::ifstream in_file(fn);
std::string tmpword;
vecStr colsel; // holds the column of strings
while (in_file) {
for(int i=1; i<ncol_select; i++) {
in_file >> tmpword;
}
in_file >> tmpword;
colsel.push_back(tmpword);
in_file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
return colsel;
} catch (const std::string& e) {
std::cerr << "\n" << e << "\n";
exit(EXIT_FAILURE);
}
}
The problem is, in the bimReadCol function, at the last line, after
in_file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
in_file.good() still evaluates to true. So, suppose I have a text file test.txt like this:
a 1 b 2
a 1 b 2
a 1 b 2
bimReadCol("test.txt", 3) would return a vector (b, b, b, b), with an extra element.
Any idea how to fix this?
The usual solution for line oriented input is to read line by
line, then parse each line:
std::string line;
while ( std::getline( in_file, line ) ) {
std::istringstream parser( line );
for ( int i = 1; parser >> tmpword && i <= ncol_select; ++ i ) {
}
if ( parser ) {
colsel.push_back( tmpword );
}
// No need for any ignore.
}
The important thing is that you must absolutely test after the
input (be it from in_file or parser) before you use the
value. A test before the value was read doesn't mean anything
(as you've seen).
Ok, I got it. The last line of the text file does not contain a newline, that's why in_file evaluates
to true at the last line.
I think I should calculate the number of lines of the file, then replace while(in_file) with a
for loop.
If someone has a better idea, please post it and I will accept.
Update
The fix turns out to be rather simple, just check if tmpword is empty:
vecStr bimReadCol(std::string fn, uint ncol_select) {
try {
size_t ncols = bimNCols(fn);
if(ncol_select < 1 or ncol_select > ncols) {
throw std::string("Your column selection is out of range!");
}
std::ifstream in_file(fn);
vecStr colsel; // holds the column of strings
std::string tmpword;
while (in_file) {
tmpword = "";
for(int i=1; i<=ncol_select; i++) {
in_file >> tmpword;
}
if(tmpword != "") {
colsel.push_back(tmpword);
}
in_file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
return colsel;
} catch (const std::string& e) {
std::cerr << "\n" << e << "\n";
exit(EXIT_FAILURE);
}
}
As #James Kanze has pointed out, even if the last line contains a newline, in_file
would still evaluate to true, but since we are at the end of file, the next reading into
tmpword will be empty, so we will be fine as long as we check that.

How to get the last but not empty line in a txt file

I want to get the last but not empty line in a txt file.
This is my code:
string line1, line2;
ifstream myfile(argv[1]);
if(myfile.is_open())
{
while( !myfile.eof() )
{
getline(myfile, line1);
if( line1 != "" || line1 != "\t" || line1 != "\n" || !line1.empty() )
line2 = line1;
}
myfile.close();
}
else
cout << "Unable to open file";
The problem is I cannot check the empty line.
Okay, let's start with the obvious part. This: while( !myfile.eof() ) is essentially always wrong, so you're not going to detect the end of the file correctly. Since you're using getline to read the data, you want to check its return value:
while (getline(myfile, line1)) // ...
Likewise, the logic here:
if( line1 != "" || line1 != "\t" || line1 != "\n" || !line1.empty() )
line2 = line1;
...is clearly wrong. I'm guessing you really want && instead of || for this. As it stands, the result is always true, because no matter what value line1 contains, it must be unequal to at least one of those values (i.e., it can't simultaneously contain only a tab and contain only a new-line and contain nothing at all -- but that would be necessary for the result to be false). Testing for both !line1.empty() and line1 != "" appears redundant as well.
Why not read the file backwards? That way you don't have to scan the entire file to accomplish this. Seems like it ought to be possible.
int main(int argc, char **argv)
{
std::cout<<"Opening "<<fn<<std::endl;
std::fstream fin(fn.c_str(), std::ios_base::in);
//go to end
fin.seekg(0, std::ios_base::end);
int currpos = fin.tellg();
//go to 1 before end of file
if(currpos > 0)
{
//collect the chars here...
std::vector<char> chars;
fin.seekg(currpos - 1);
currpos = fin.tellg();
while(currpos > 0)
{
char c = fin.get();
if(!fin.good())
{
break;
}
chars.push_back(c);
currpos -= 1;
fin.seekg(currpos);
}
//do whatever u want with chars...
//this is the reversed order
for(std::vector<char>::size_type i = 0; i < chars.size(); ++i)
{
std::cout<<chars[i];
}
//this is the forward order...
for(std::vector<char>::size_type i = chars.size(); i != 0; --i)
{
std::cout<<chars[i-1];
}
}
return 0;
}
It wouldn't be enough to change your ||'s to &&'s to check if the line is empty. What if there are seven spaces, a tab character, another 3 spaces and finally a newline? You can't list all the ways of getting only whitespace in a line. Instead, check every character in the line to see if it is whitespace.
In this code, is_empty will be false if any non-space character is found in the line.
bool is_empty = true;
for (int i = 0; i < line.size(); i++) {
char ch = line[i];
is_empty = is_empty && isspace(ch);
}
Full solution:
#include <iostream>
#include <fstream>
#include <cctype>
#include <string>
using namespace std;
int main(int argc, char* argv[]) {
string line;
string last_line;
ifstream myfile(argv[1]);
if(myfile.is_open())
{
while( getline(myfile, line) ) {
bool is_empty = true;
for (int i = 0; i < line.size(); i++) {
char ch = line[i];
is_empty = is_empty && isspace(ch);
}
if (!is_empty) {
last_line = line;
}
}
myfile.close();
cout << "Last line: " << last_line << endl;
}
else {
cout << "Unable to open file";
}
return 0;
}
Additional to what the others said:
You can avoid reading whitespace by doing myfile >> std::ws before you call std::getline(). This will consume all leading whitespaces.
Then your condition reduces to !line1.empty(). This would also work when the line contains nothing but several whitespaces, for which your version fails.
I wasn't able to google an appropriate get_last_line function for my needs and here's what i came up with. You can even read multiple non-empty last lines by recalling the instream get_last_line func without resetting the seeker. It supports a 1 char only file. I added the reset parameter, which can be set to ios_base::end to allow output operations after reading the last line(s)
std::string& get_last_line(
std::istream& in_stream,
std::string& output = std::string(),
std::ios_base::seekdir reset = std::ios_base::cur)
{
output.clear();
std::streambuf& buf = *in_stream.rdbuf();
bool text_found = false;
while(buf.pubseekoff(-1, std::ios_base::cur) >= 0)
{
char c = buf.sgetc();
if(!isspace(c))
text_found = true;
if(text_found)
{
if(c == '\n' || c == -1)
break;
output.insert(0, sizeof c, c);
}
}
buf.pubseekoff(0, reset);
return output;
}
std::string& get_last_line(
const std::string& file_name,
std::string& output = std::string())
{
std::ifstream file_in(
file_name.c_str(),
std::ios_base::in | std::ios_base::ate);
if(!file_in.is_open())
{
output.clear();
return output;
}
get_last_line(file_in, output);
file_in.close();
return output;
}

How to check if a position inside a std string exists ?? (c++)

i have a long string variable and i want to search in it for specific words and limit text according to thoses words.
Say i have the following text :
"This amazing new wearable audio solution features a working speaker embedded into the front of the shirt and can play music or sound effects appropriate for any situation. It's just like starring in your own movie"
and the words : "solution" , "movie".
I want to substract from the big string (like google in results page):
"...new wearable audio solution features a working speaker embedded..."
and
"...just like starring in your own movie"
for that i'm using the code :
for (std::vector<string>::iterator it = words.begin(); it != words.end(); ++it)
{
int loc1 = (int)desc.find( *it, 0 );
if( loc1 != string::npos )
{
while(desc.at(loc1-i) && i<=80)
{
i++;
from=loc1-i;
if(i==80) fromdots=true;
}
i=0;
while(desc.at(loc1+(int)(*it).size()+i) && i<=80)
{
i++;
to=loc1+(int)(*it).size()+i;
if(i==80) todots=true;
}
for(int i=from;i<=to;i++)
{
if(fromdots) mini+="...";
mini+=desc.at(i);
if(todots) mini+="...";
}
}
but desc.at(loc1-i) causes OutOfRange exception... I don't know how to check if that position exists without causing an exception !
Help please!
This is an excellent exercise in taking advantage of what the STL has to offer. You simply open a reference and cherry-pick algorithms and classes for your solution!
#include <iostream> // algorithm,string,list,cctype,functional,boost/assign.hpp
using namespace std;
struct remove_from {
remove_from(string& text) : text(text) { }
void operator()(const string& str) {
typedef string::iterator striter;
striter e(search(text.begin(), text.end(), str.begin(), str.end()));
while( e != text.end() ) {
striter b = e;
advance(e, str.length());
e = find_if(e, text.end(), not1(ptr_fun<int,int>(isspace)));
text.erase(b, e);
e = search(text.begin(), text.end(), str.begin(), str.end());
}
}
private:
string& text;
};
int main(int argc, char* argv[])
{
list<string> toremove = boost::assign::list_of("solution")("movie");
string text("This amazing new wearable ...");
for_each(toremove.begin(), toremove.end(), remove_from(text));
cout << text << endl;
return 0;
}
You can just check desc.size() - if it's less than the index you're looking up + 1 then you'll get an exception
The problem is that you start iterating at the first word, then try and check the word before it, hence the OutOfRange Exception.
Your first if could be:
if( loc1 != string::npos && loc1 != 0)