I'm working on a multithreading project where for one segment of the project I need to find if a given character sequence exists within a string. Im wondering if C++/C have any pre-built functions which can handle this, but am having trouble figuring out the exact 'definition' to search for.
I know about 'strtr' and 'find', the issue is the function needs to be able to find a sequence which is SPLIT across a string.
Given the string 'Hello World', I need a function that returns true if the sequence 'H-W-l' exists. Is there anything prebuilt which can handle this?
As far as I know, subsequence searching as such is not part of either the standard C library or the standard C++ library.
However, you can express subsequence searching as either a regular expression or a "glob". Posix mandates both regex and glob matching functions, while the C++ standard library includes regular expressions since C++11. Both of these techniques require modifying the search string:
Regular expression: HWl ⇒ H.*W.*l. regexec will do a search for the regular expression (unless anchored, which this one is not); in C++, you would want to use std::regex_search rather than std::regex_match.
Glob: HWl ⇒ *H*W*l*. Glob matching is always a complete match, although in all the implementations I know of a trailing * is optimized. This is available as the fnmatch function in the Posix header fnmatch.h. For this application, provide 0 for the flags parameter.
If you don't like any of the above, you can use the standard C strchr function in a simple loop:
bool has_subsequence(const char* haystack, const char* needle) {
const char* p;
for (p = haystack; *needle && (p = strchr(p, *needle)); ++needle) {
}
return p != NULL;
}
If I understand correctly, then you're trying to search for chars in a given order but aren't necessarily contiguous. If you're in C++, I don't see why you couldn't use the std::find function under the <algorithm> system header. I would load both into a string and then search as follows:
bool has_noncontig_sequence(const std::string& str, const std::string& subStr)
{
typedef std::string::const_iterator iter;
iter start = str.begin();
// loop over substr and save iterator position;
for (iter i = subStr.begin(); i != subStr.end(); ++i)
start = std::find(start, str.end(), *i);
// check position, if at end, then false;
return start != str.end() ? true : false;
}
The std::find function will position start over the first correct character in str if it can find it and then search for the next. If it can't, then start will be positioned at the end, indicating failure.
Related
If I want to verify one string is completely matches with the any one in the vector of strings then i will use
std::find(vectOfStrings.begin(), vectOfStrings.end(), "<targetString>") != v.end()
If the target string matches with any of the string in the vector then it will return true.
But what if i want to check one string is matches with any one of the vector of regular expressions?
Is there any standard library i can use to make it work like
std::find(vectOfRegExprsns.begin(), vectOfRegExprsns.end(), "<targetString>") != v.end()?
Any suggestions would be highly appreciated.
How about using std::find_if() with a lambda?
std::find_if(
vectOfRegExprsns.begin(), vectOfRegExprsns.end(),
[](const std::string& item) { return regex_match(item, std::regex(targetString))});
I have a substring defined by two iterators (start and end). I need to check if this substring is present in another string.
Is there a standard library algorithm or string member I can use or adapt to do this without creating a whole new string object (std::string(start, end)) just for this purpose?
e.g.
struct Substring
{
std::string::const_iterator start, end;
};
auto found = std::contains(whole.begin(), whole.end(), substring.start, substring.end); // ???
std::search
bool found =
std::search(hay.begin(), hay.end(), needle.begin(), needle.end()) != hay.end();
You can use the std::string::find method:
auto found = (whole.find(&*substring.start, 0, substring.end - substring.start)
!= std::string::npos);
The advantage over std::search is that std::find works on strings and can be implemented using Boyer-Moore. Sadly, that is not how gcc's libstdc++ implements it.
I've successfully used:
boost::algorithm::boyer_moore_search<const char *,const char *>( haystack, haystack_end, needle, needle_end )
to look for a needle in a haystack. Now I'd like to use BM_search to do a case-insensitive search for the needle. Since my haystack is giant, my plan is to convert the needle to lower case and have the haystack iterator treat the haystack characters as a special class whose compare function converts alphabetics to lower case before comparing. However, I haven't been able to express this correctly. I'm trying:
class casechar {
public:
char ch;
// other stuff...it's not right, but I don't think the compiler is even getting this far
} ;
class caseiter : public std::iterator<random_access_iterator_tag,casechar> {
const casechar *ptr;
public:
// various iterator functions, but not enough of them, apparently!
} ;
boost::algorithm::boyer_moore_search<const caseiter,const char *>( HaYsTaCk, HaYsTaCk_EnD, needle, needle_end );
The compiler (g++ on OSX) is complaining about an attempt to instantiate hash<casechar>, I guess for some BM internal thing. I'm lost in a maze of template<twisty_passages,all_different>. Could I impose on someone for a bit of direction? I suspect I just need to provide certain implementations in casechar and/or caseiter, but I don't know which ones.
Thanks!
The first problem you will run into is this:
BOOST_STATIC_ASSERT (( boost::is_same<
typename std::iterator_traits<patIter>::value_type,
typename std::iterator_traits<corpusIter>::value_type>::value ));
This requires the value types of the iterator for the pattern and the iterator for the corpus to be the same type. In other words, you'll need to use casechar for the pattern as well.
Here's what I'd do:
Write a casechar, with custom operator == etc for case-insensitive comparison.
No need to write a custom iterator. const casechar * is a perfectly acceptable random access iterator.
Write a std::hash<casechar> specialization. This specialization should probably simply return something like std::hash<char>()(std::tolower(ch)).
That said, I'm somewhat doubtful that this will actually net you a performance gain, compared to just converting everything to lower case. The skip table for chars uses an optimization that uses arrays instead of unordered_maps for faster indexing and fewer heap allocations. This optimization is not used for a custom type such as casechar.
Please find the sample code here
std::vector<std::wstring> names;
names.push_back(L"Rahul");
names.push_back(L"John");
names.push_back(L"Alexa");
names.push_back(L"Tejas");
names.push_back(L"Alexandra");
std::vector<std::wstring> pattern;
pattern.push_back(L"Tejas");
auto itr = boost::algorithm::boyer_moore_search<std::vector<std::wstring>,
std::vector<std::wstring>>(names, pattern);
if (itr != names.end())
{
OutputDebugString(std::wstring(L"pattern found in the names " + *itr).c_str());
}
else
{
OutputDebugString(std::wstring(L"pattern not found in the names").c_str());
}
For the working demo of code i have created a video, please check at :
Boyer moore search | Boost api | C++ tutorial
I'm struggling on this one and I'm to a point where I not making any headway and it's time to ask for help. My familiarity with the boost libraries is only slightly better than superficial. I'm trying to do a progressive scan through a rather large string. In fact, it's the entire contents of a file read into a std::string object (the file isn't going to be that large, it's the output from a command line program).
The output of this program, pnputil, is repetitive. I'm looking for certain patterns in an effort to find the "oemNNN.inf" file I want. Essentially, my algorithm is to find the first "oemNNN.inf", search for identifying characteristics for that file. If it's not the one I want, move on to the next.
In code, it's something like:
std::string filesContents;
std::string::size_type index(filesContents.find_first_of("oem"));
std::string::iterator start(filesContents.begin() + index);
boost::match_results<std::string::const_iterator> matches;
while(!found) {
if(boost::regex_search(start, filesContents.end(), matches, re))
{
// do important stuff with the matches
found = true; // found is used outside of loop too
break;
}
index = filesContents.find_first_of("oem", index + 1);
if(std::string::npos == index) break;
start = filesContents.being() + index;
}
I'm using this example from the boost library documentation for 1.47 (the version I'm using). Someone please explain to me how my usage differs from what this example has (aside from the fact that I'm not storing stuff into maps and such).
From what I can tell, I'm using the same type of iterators the example uses. Yet, when I compile the code, Microsoft's compiler tells me that: no instance of overloaded function boost::regex_search matches argument list. Yet, the intellisense shows this function with the arguments I'm using, although the iterators are named something BidiIterator. I don't know the significance of this, but given the example, I'm assuming that whatever the BidiIterator is, it takes a std::string::iterator for construction (perhaps a bad assumption, but seems to make sense given the example). The example does show a fifth argument, match_flags, but that argument is defaulted to the value: boost::match_default. Therefore, it should be unnecessary. However, just for kicks and grins, I've added that fifth argument and still it doesn't work. How am I misusing the arguments? Especially, when considering the example.
Below is a simple program which demonstrates the problem without the looping algorithm.
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main() {
std::string haystack("This is a string which contains stuff I want to find");
boost::regex needle("stuff");
boost::match_results<std::string::const_iterator> what;
if(boost::regex_search(haystack.begin(), haystack.end(), what, needle, boost::match_default)) {
std::cout << "Found some matches" << std::endl;
std::cout << what[0].first << std::endl;
}
return 0;
}
If you decide to compile, I am compiling and linking against 1.47 of the boost library. The project that I'm working with uses this version extensively and updating isn't for me to decide.
Thanks for any help. This is most frustrating.
Andy
In general iterator's types are different.
std::string haystack("This is a string which contains stuff I want to find");
returning values from begin() and end() will be std::string::iterator.
But your match type is
boost::match_results<std::string::const_iterator> what;
std::string::iterator and std::string::const_iterator are different types.
So there is few variants
declare string as const (i.e. const std::string haystack;)
declare iterators as const_iterators (i.e. std::string::const_iterator begin = haystack.begin(), end = haystack.end();) and pass them to regex_search.
use boost::match_results<std::string::iterator> what;
if you have C++11 you can use haystack.cbegin() and haystack.cend()
example of work
tIs it possible for me to detect if a string is 'all numeric' or not using tr1 regex?
If yes, please help me with a snipped as well since I am new to regex.
Why I am looking towards tr1 regex for something like this, because I don't want to create a separate function for detecting if the string is numeric. I want to do it inline in rest of the client code but do not want it to look ugly as well. I feel maybe tr1 regex might help. Not sure, any advises on this?
If you just want to test whether the string has all numeric characters, you can use std::find_if_not and std::isdigit:
std::find_if_not(s.begin(), s.end(), (int(*)(int))std::isdigit) == s.end()
If you do not have a Standard Library implementation with std::find_if_not, you can easily write it:
template <typename ForwardIt, typename Predicate>
ForwardIt find_if_not(ForwardIt first, ForwardIt last, Predicate pred)
{
for (; first != last; ++first)
if (!pred(first))
return first;
return first;
}
You can use the string::find_first_not_of member function to test for numeric characters.
if (mystring.find_first_not_of("0123456789") == std::string::npos)
{
std::cout << "numeric only!";
}
The regular expression for this is rather trivial. Just try to match "\\D". This will match on any character that's not a digit. If you'd like it to include a decimal separator too, you could use "[^\\d\\.]", which translates to "not a digit or dot".
However, how about simply using strtol() to read the number? You'll be able to retrieve a pointer to the first non-number character. So, if this points to the end of the string, it's been fine. Plus side here is, you won't even need TR1 for this.