How to make String::Find(is) omit this - c++

If I have a list, which contains the 4 nodes ("this"; "test example"; "is something of"; "a small") and I want to find every string that has "is" (only 1 positive with this list). This topic has been posted a large number of times, which I have used to help get me this far. However, I can't see anywhere how I omit "this" from a positive result. I could probably use string::c_str, then find it myself, after I've reduced my much larger list. Or is there a way I could use string::find_first_of? It would seem there's a better way. Thanks.
EDIT: I know that I can omit a particular string, but I'm looking for bigger picture b/c my list is quite large (ex: poem).
for(it = phrases.begin(); it != phrases.end(); ++it)
{
found = it->find(look);
if(found != string::npos)
cout << i++ << ". " << *it << endl;
else
{
i++;
insert++;
}
}

Just to clarify: what are you struggling with?
What you want to do is check if what you have found is the start of a word (or the phrase) and is also the end of a word (or the phrase)
ie. check if:
found is equal to phrases.begin OR the element preceding found is a space
AND two elements after found is a space OR phrases.end
EDIT: You can access the character that was found by using found (replace X with the length of the string you're finding (look.length)
found = it->find(look);
if(found!=string::npos)
{
if((found==0 || it->at(found-1)==' ')
&& (found==it->length-X || it->at(found+X)==' '))
{
// Actually found it
}
} else {
// Do whatever
}

We can use boost regex for searching regular expressions. Below is an example code. Using regular expression complex seacrh patterns can be created.
#include <boost/regex.hpp>
#include <string>
#include <iostream>
#include <boost/tokenizer.hpp>
using namespace boost;
using namespace std;
int main()
{
std::string list[4] = {"this","hi how r u ","is this fun is","no"};
regex ex("^is");
for(int x =0;x<4;++x)
{
string::const_iterator start, end;
boost::char_separator<char> sep(" ");
boost::tokenizer<boost::char_separator<char> > token(list[x],sep);
cout << "Search string: " << list[x] <<"\n"<< endl;
int x = 0;
for(boost::tokenizer<boost::char_separator<char> >::iterator itr = token.begin();
itr!=token.end();++itr)
{
start = (*itr).begin();
end = (*itr).end();
boost::match_results<std::string::const_iterator> what;
boost::match_flag_type flags = boost::match_default;
if(boost::regex_search(start, end, what, ex, flags))
{
++x;
cout << "Found--> " << what.str() << endl;
}
}
cout<<"found pattern "<<x <<" times."<<endl<<endl;
}
return 0;
}
Output:
Search string: this
found pattern 0 times.
Search string: hi how r u
found pattern 0 times.
Search string: is this fun is
Found--> is Found--> is found pattern 2 times.
Search string: no
found pattern 0 times.

I didn't realize you only wanted to match "is". You can do this by using an std::istringstream to tokenize it for you:
std::string term("is");
for(std::list<std::string>::const_iterator it = phrases.begin();
it != phrases.end(); ++it)
{
std::istringstream ss(*it);
std::string token;
while(ss >> token)
{
if(token == term)
std::cout << "Found " << token << "\n";
}
}

Related

Using find_first_of with a string instead of a set of predefined characters in c++

I want to take in a code, for example ABC and check whether the characters in the code appear in that exact order in a string, for example with the code ABC, and the string HAPPYBIRTHDAYCACEY, which meets the criteria. The string TRAGICBIRTHDAYCACEY with the code ABC however does not pass, because there's a "c" before the "b" after the "a". I want to use the find_first_of function to search through my string, but i want to check for any of the characters in "code", without knowing what characters are in "code" beforehand. Here is my program so far:
#include <iostream>
#include <string>
using namespace std;
int main() {
string code, str, temp;
int k = 0;
int pos = 0;
cin >> code >> str;
while (k < code.size()) {
pos = str.find_first_of(code,pos);
temp[k] = str[pos];
++k;
++pos;
}
cout << temp << endl; // debug. This is just outputs a newline when i
//run the program
if (temp == code) {
cout << "PASS" << endl;
}
else {
cout << "FAIL" << endl;
}
return 0;
}
I think your best bet is to find just the first character, once found, find the next in the remainder of the string, repeat until end of string or all characters found (and return false or true, respectively).
I don't think there's anything builtin for this. If the characters would need to appear directly after each other, you could use std::string::find() which searches for a substring, but that is not what you want.

how to find a substring or string literal

I am trying to write a code that will search userInput for the word "darn" and if it is found, print out "Censored". if it is not found, it will just print out the userInput. It works in some cases, but not others. If the userInput is "That darn cat!", it will print out "Censored". However, if the userInput is "Dang, that was scary!", it also prints out "Censored". I am trying to use find() to search for the string literal "darn " (the space is because it should be able to determine between the word "darn" and words like "darning". I am not worrying about punctuation after "darn"). However, it seems as though find() is not doing what I would like. Is there another way I could search for a string literal? I tried using substr() but I couldn't figure out what the index and the len should be.
#include <iostream>
#include <string>
using namespace std;
int main() {
string userInput;
userInput = "That darn cat.";
if (userInput.find("darn ") > 0){
cout << "Censored" << endl;
}
else {
cout << userInput << endl;
} //userText.substr(0, 7)
return 0;
}
The problem here is your condition. std::string::find returns a object of std::string::size_type which is an unsigned integer type. That means it can never be less than 0 which means
if (userInput.find("darn ") > 0)
will always be true unless userInput starts with "darn ". Because of this if find doesn't find anything then it returns std::string::npos. What you need to do is compare against that like
if (userInput.find("darn ") != std::string::npos)
Do note that userInput.find("darn ") will not work in all cases. If userInput is just "darn" or "Darn" then it won't match. The space needs to be handled as a separate element. For example:
std::string::size_type position = userInput.find("darn");
if (position != std::string::npos) {
// now you can check which character is at userInput[position + 4]
}
std::search and std::string::replace were made for this:
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main() {
string userInput;
userInput = "That darn cat is a varmint.";
static const string bad_words [] = {
"darn",
"varmint"
};
for(auto&& bad : bad_words)
{
const auto size = distance(bad.begin(), bad.end());
auto i = userInput.begin();
while ((i = std::search(i, userInput.end(), bad.begin(), bad.end())) != userInput.end())
{
// modify this part to allow more or fewer leading letters from the offending words
constexpr std::size_t leading_letters = 1;
// never allow the whole word to appear - hide at least the last letter
auto leading = std::min(leading_letters, std::size_t(size - 1));
auto replacement = std::string(i, i + leading) + std::string(size - leading, '*');
userInput.replace(i, i + size, replacement.begin(), replacement.end());
i += size;
}
}
cout << userInput << endl;
return 0;
}
expected output:
That d*** cat is a v******.

How to compare two arrays and return non matching values in C++

I would like to parse through two vectors of strings and find the strings that match each other and the ones that do not.
Example of what I want get:
input vector 1 would look like: [string1, string2, string3]
input vector 2 would look like: [string2, string3, string4]
Ideal output:
string1: No Match
string2: Match
string3: Match
string4: No Match
At the moment I use this code:
vector<string> function(vector<string> sequences, vector<string> second_sequences){
for(vector<string>::size_type i = 0; i != sequences.size(); i++) {
for(vector<string>::size_type j = 0; j != second_sequences.size(); j++){
if (sequences[i] == second_sequences[j]){
cout << "Match: " << sequences[i];
}else{
cout << "No Match: " << sequences[i];
cout << "No Match: " << second_sequences[j];
}
}
}
}
It works great for the ones that match, but iterates over everything so many times,
and the ones that do not match get printed a large number of times.
How can I improve this?
The best code is the code that you did not have to write.
If you take a (STL) map container it will take care for you of sorting and memorizing the different strings you encounter.
So let the container works for us.
I propose a small code quickly written. You need for this syntax to enable at least the C++ 2011 option of your compiler ( -std=c++11 on gcc for example ). The syntax that should be used before C++11 is much more verbose (but should be known from a scholar point of view ).
You have only a single loop.
This is only a hint for you ( my code does not take into account that in the second vector string4 could be present more than once, but I let you arrange it to your exact needs)
#include <iostream>
#include <vector>
#include <string>
#include <map>
using namespace std;
vector<string> v1 { "string1","string2","string3"};
vector<string> v2 { "string2","string3","string4"};
//ordered map will take care of "alphabetical" ordering
//The key are the strings
//the value is a counter ( or could be any object of your own
//containing more information )
map<string,int> my_map;
int main()
{
cout << "Hello world!" << endl;
//The first vector feeds the map before comparison with
//The second vector
for ( const auto & cstr_ref:v1)
my_map[cstr_ref] = 0;
//We will look into the second vector ( it could also be the third,
//the fourth... )
for ( const auto & cstr_ref:v2)
{
auto iterpair = my_map.equal_range(cstr_ref);
if ( my_map.end() != iterpair.first )
{
//if the element already exist we increment the counter
iterpair.first->second += 1;
}
else
{
//otherwise we put the string inside the map
my_map[cstr_ref] = 0;
}
}
for ( const auto & map_iter: my_map)
{
if ( 0 < map_iter.second )
{
cout << "Match :";
}
else
{
cout << "No Match :" ;
}
cout << map_iter.first << endl;
}
return 0;
}
Output:
No Match :string1
Match :string2
Match :string3
No Match :string4
std::sort(std::begin(v1), std::end(v1));
std::sort(std::begin(v2), std::end(v2));
std::vector<std::string> common_elements;
std::set_intersection(std::begin(v1), std::end(v1)
, std::begin(v2), std::end(v2)
, std::back_inserter(common_elements));
for(auto const& s : common_elements)
{
std::cout<<s<<std::endl;
}

c++11 regex : check if a set of characters exist in a string

If for example, I have the string: "asdf{ asdf }",
I want to check if the string contains any character in the set []{}().
How would I go about doing this?
I'm looking for a general solution that checks if the string has the characters in the set, so that I can continue to add lookup characters in the set in the future.
Your question is unclear on whether you only want to detect if any of the characters in the search set are present in the input string, or whether you want to find all matches.
In either case, use std::regex to create the regular expression object. Because all the characters in your search set have special meanings in regular expressions, you'll need to escape all of them.
std::regex r{R"([\[\]\{\}\(\)])"};
char const *str = "asdf{ asdf }";
If you want to only detect whether at least one match was found, use std::regex_search.
std::cmatch results;
if(std::regex_search(str, results, r)) {
std::cout << "match found\n";
}
On the other hand, if you want to find all the matches, use std::regex_iterator.
std::cmatch results;
auto first = std::cregex_iterator(str, str + std::strlen(str), r);
auto last = std::cregex_iterator();
if(first != last) std::cout << "match found\n";
while(first != last) {
std::cout << (*first++).str() << '\n';
}
Live demo
I know you are asking about regex but this specific problem can be solved without it using std::string::find_first_of() which finds the position of the first character in the string(s) that is contained in a set (g):
#include <string>
#include <iostream>
int main()
{
std::string s = "asdf{ asdf }";
std::string g = "[]{}()";
// Does the string contain one of thecharacters?
if(s.find_first_of(g) != std::string::npos)
std::cout << s << " contains one of " << g << '\n';
// find the position of each occurence of the characters in the string
for(size_t pos = 0; (pos = s.find_first_of(g, pos)) != std::string::npos; ++pos)
std::cout << s << " contains " << s[pos] << " at " << pos << '\n';
}
OUTPUT:
asdf{ asdf } contains one of []{}()
asdf{ asdf } contains { at 4
asdf{ asdf } contains } at 11

How do I find all the positions of a substring in a string?

I want to search a large string for all the locations of a string.
The two other answers are correct but they are very slow and have O(N^2) complexity. But there is the Knuth-Morris-Pratt algorithm, which finds all substrings in O(N) complexity.
Edit:
Also, there is another algorithm: the so-called "Z-function" with O(N) complexity, but I couldn't find an English source for this algorithm (maybe because there is also another more famous one with same name - the Z-function of Rieman), so I will just put its code here and explain what it does.
void calc_z (string &s, vector<int> & z)
{
int len = s.size();
z.resize (len);
int l = 0, r = 0;
for (int i=1; i<len; ++i)
if (z[i-l]+i <= r)
z[i] = z[i-l];
else
{
l = i;
if (i > r) r = i;
for (z[i] = r-i; r<len; ++r, ++z[i])
if (s[r] != s[z[i]])
break;
--r;
}
}
int main()
{
string main_string = "some string where we want to find substring or sub of string or just sub";
string substring = "sub";
string working_string = substring + main_string;
vector<int> z;
calc_z(working_string, z);
//after this z[i] is maximal length of prefix of working_string
//which is equal to string which starting from i-th position of
//working_string. So the positions where z[i] >= substring.size()
//are positions of substrings.
for(int i = substring.size(); i < working_string.size(); ++i)
if(z[i] >=substring.size())
cout << i - substring.size() << endl; //to get position in main_string
}
Using std::string::find. You can do something like:
std::string::size_type start_pos = 0;
while( std::string::npos !=
( start_pos = mystring.find( my_sub_string, start_pos ) ) )
{
// do something with start_pos or store it in a container
++start_pos;
}
EDIT: Doh! Thanks for the remark, Nawaz! Better?
I'll add for completeness, there is another approach that is possible with std::search, works like std::string::find, difference is that you work with iterators, something like:
std::string::iterator it(str.begin()), end(str.end());
std::string::iterator s_it(search_str.begin()), s_end(search_str.end());
it = std::search(it, end, s_it, s_end);
while(it != end)
{
// do something with this position..
// a tiny optimisation could be to buffer the result of the std::distance - heyho..
it = std::search(std::advance(it, std::distance(s_it, s_end)), end, s_it, s_end);
}
I find that this sometimes outperforms std::string::find, esp. if you represent your string as a vector<char>.
Simply use std::string::find() which returns the position at which the substring was found, or std::string::npos if none was found.
Here is the documentation.
An here is the example taken from this documentation:
// string::find
#include <iostream>
#include <string>
using namespace std;
int main ()
{
string str ("There are two needles in this haystack with needles.");
string str2 ("needle");
size_t found;
// different member versions of find in the same order as above:
found=str.find(str2);
if (found!=string::npos)
cout << "first 'needle' found at: " << int(found) << endl;
found=str.find("needles are small",found+1,6);
if (found!=string::npos)
cout << "second 'needle' found at: " << int(found) << endl;
found=str.find("haystack");
if (found!=string::npos)
cout << "'haystack' also found at: " << int(found) << endl;
found=str.find('.');
if (found!=string::npos)
cout << "Period found at: " << int(found) << endl;
// let's replace the first needle:
str.replace(str.find(str2),str2.length(),"preposition");
cout << str << endl;
return 0;
}