boost::regex_search refuses to take my arguments - c++

I'm struggling on this one and I'm to a point where I not making any headway and it's time to ask for help. My familiarity with the boost libraries is only slightly better than superficial. I'm trying to do a progressive scan through a rather large string. In fact, it's the entire contents of a file read into a std::string object (the file isn't going to be that large, it's the output from a command line program).
The output of this program, pnputil, is repetitive. I'm looking for certain patterns in an effort to find the "oemNNN.inf" file I want. Essentially, my algorithm is to find the first "oemNNN.inf", search for identifying characteristics for that file. If it's not the one I want, move on to the next.
In code, it's something like:
std::string filesContents;
std::string::size_type index(filesContents.find_first_of("oem"));
std::string::iterator start(filesContents.begin() + index);
boost::match_results<std::string::const_iterator> matches;
while(!found) {
if(boost::regex_search(start, filesContents.end(), matches, re))
{
// do important stuff with the matches
found = true; // found is used outside of loop too
break;
}
index = filesContents.find_first_of("oem", index + 1);
if(std::string::npos == index) break;
start = filesContents.being() + index;
}
I'm using this example from the boost library documentation for 1.47 (the version I'm using). Someone please explain to me how my usage differs from what this example has (aside from the fact that I'm not storing stuff into maps and such).
From what I can tell, I'm using the same type of iterators the example uses. Yet, when I compile the code, Microsoft's compiler tells me that: no instance of overloaded function boost::regex_search matches argument list. Yet, the intellisense shows this function with the arguments I'm using, although the iterators are named something BidiIterator. I don't know the significance of this, but given the example, I'm assuming that whatever the BidiIterator is, it takes a std::string::iterator for construction (perhaps a bad assumption, but seems to make sense given the example). The example does show a fifth argument, match_flags, but that argument is defaulted to the value: boost::match_default. Therefore, it should be unnecessary. However, just for kicks and grins, I've added that fifth argument and still it doesn't work. How am I misusing the arguments? Especially, when considering the example.
Below is a simple program which demonstrates the problem without the looping algorithm.
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main() {
std::string haystack("This is a string which contains stuff I want to find");
boost::regex needle("stuff");
boost::match_results<std::string::const_iterator> what;
if(boost::regex_search(haystack.begin(), haystack.end(), what, needle, boost::match_default)) {
std::cout << "Found some matches" << std::endl;
std::cout << what[0].first << std::endl;
}
return 0;
}
If you decide to compile, I am compiling and linking against 1.47 of the boost library. The project that I'm working with uses this version extensively and updating isn't for me to decide.
Thanks for any help. This is most frustrating.
Andy

In general iterator's types are different.
std::string haystack("This is a string which contains stuff I want to find");
returning values from begin() and end() will be std::string::iterator.
But your match type is
boost::match_results<std::string::const_iterator> what;
std::string::iterator and std::string::const_iterator are different types.
So there is few variants
declare string as const (i.e. const std::string haystack;)
declare iterators as const_iterators (i.e. std::string::const_iterator begin = haystack.begin(), end = haystack.end();) and pass them to regex_search.
use boost::match_results<std::string::iterator> what;
if you have C++11 you can use haystack.cbegin() and haystack.cend()
example of work

Related

Spell-Checker c++; checking whether the word is in the dictionary text

I'm new to this place so I might not ask my questions clearly. But I really do need help. So my homework is to create a spell-checker in C++ which takes a text file and compares it with another dictionary text file. I have a specific snippet of code that I need solving. I created a help function isValidWord which takes in the dictionary which is of container unordered_set and a string. The function will return true if the string matches a word in the dictionary. I'll just show you what I have so far. My problem is the string doesn't match with everything in the library and just checks only some in the dictionary.
#include <unordered_set>
#include <string>
bool isValidWord(std::unordered_set<std::string> dictionary, std::string& word) {
std::unordered_set<std::string>::iterator it;
for (it = dictionary.begin(); it != dictionary.end(); ++it) {
if (word == *it) {
return true;
}
}
return false;
}
There is a built-in find method in unordered_set that you can utilize instead of reinventing the wheel. Also it is a good idea to pass dictionary by reference to avoid pointless copying.
You may simplify your method with (I added missing const and reference too):
bool isValidWord(const std::unordered_set<std::string>& dictionary,
const std::string& word)
{
return dictionary.count(word) != 0;
}
Your current implementation is correct but not performant:
you pass your dictionary by copy (so you recreate it each time).
You use linear search whereas container provide better complexity. (std::unordered_set::find or std::unordered_set::count).
Final note, if you want to retrieve all invalid words, you may look at std::set_difference (require to have words and dictionary sorted).

Unexpected results from boost::lexical_cast<int> with boost::iterator_range

I tried converting a substring (expressed by a pair of iterators) to an integer by boost::lexical_cast:
#include <iostream>
#include <boost/lexical_cast.hpp>
int main()
{
// assume [first, last) as substring
const std::string s("80");
auto first = s.begin(), last = s.end();
std::cout << boost::lexical_cast<int>(boost::make_iterator_range(first, last)) << std::endl;
return 0;
}
Output: (wandbox)
1
I got expected result (80) by workaround: boost::make_iterator_range(&*first, last - first).
Question: Why above code does not work as expected? And, where does 1 come from?
lexical_cast does not support iterator_range<std::string::(const_)iterator>
misuse of lexical_cast or iterator_range
bugs of lexical_cast or iterator_range
some other reason
The short answer is number 2 from your list, misuse of iterator_range - specifically you're using it without explicitly including the proper header for it.
Adding this:
#include <boost/range/iterator_range.hpp>
will make it behave as you expect.
The iterator_range and related functionality is split into two headers, iterator_range_core.hpp and iterator_range_io.hpp. The first one contains the class definition, the second one contains, among other things, the operator<< overload which makes it streamable and so usable by lexical_cast (usable in the sense that it will actually work as you expect).
Because you didn't included the proper header, you should normally get a compiler error, but in this case you're not getting it because lexical_cast.hpp includes the first of those two headers, iterator_range_core.hpp. This makes everything build fine, but it doesn't get the operator<< from the second header. Without that overload, when lexical_cast writes the range to the stream to perform the conversion, the best overload it finds is the one taking a bool parameter (because iterator_range has a default conversion to bool). That's why you're seeing that 1, because it's actually writing true to the underlying conversion stream.
You can test this easily with something like this:
auto someRange = boost::make_iterator_range(first, last);
std::cout << std::boolalpha<< someRange;
Without #include <boost/range/iterator_range.hpp> this will print true, with that include it will print your string (80).

For loops vs standard library algorithms with a relatively old compiler

I know code is better when there are not any confusing for loops in it. And it is always good to reuse the standard library algorithms when possible. However, I find that the syntax of iterators and algorithms looks really confusing.
I want to give a real life example from my current project: I want to copy the contents of vector<vector<QString>> in into vector<QVariant> out. I can't see the difference between:
for (int i = 0; i < in[0].size(); i++ )
{
if(in[0][i].isNull() || in[0][i].isEmpty() )
out[i] = "NONE";
else
out[i] = in[0][i];
}
and that:
std::transform(in[0].begin(), in[0].end(), out.begin(), [](const QString& a)->QVariant{
if(a.isNull() || a.isEmpty() )
return "NONE";
else
return a;
});
Since we have visual studio 2012 I even have to type the return value of my lambda. After using ranges like:
in[0].map!( a => a.isNull() || a.isEmpty() ? "NONE" : a ).copy(out);
in D language I simply can't live with the std::transform code above. And I am not even sure whether it is better than a basic for loop. My question is: is code using std::transform above better than the for loop?
At least in my opinion, the main problem here is that transform is simply the wrong tool for the job.
What you're trying to do is exactly what std::replace_copy_if does, so (no big surprise) it does it a lot more neatly.
I don't have Qt installed on the machine at hand, so I took the liberty of replacing your QVariant and QString code to just a std::vector<std::string>, but I believe the same basic idea should apply with the Qt types as well.
#include <vector>
#include <algorithm>
#include <iterator>
#include <iostream>
#include <string>
int main() {
std::vector<std::string> input { "one", "two", "", "three" };
std::vector<std::string> output;
// copy input to output, replacing the appropriate strings:
std::replace_copy_if(input.begin(), input.end(),
std::back_inserter(output),
[](std::string const &s) { return s.empty(); },
"NONE");
// and display output to show the results:
std::copy(output.begin(), output.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
For the moment, this just replaces empty strings with NONE, but adding the null check should be pretty trivial (with a type for which isNull is meaningful, of course).
With the data above, I get the result you'd probably expect:
one
two
NONE
three
I should probably add, however, that even this is clearly pretty verbose. It will be nice when we at least have ranges added to the standard library, so (for example) the input.begin(), input.end() can be replaced with just input. The result still probably won't be as terse as the D code you gave, but at least it reduces the verbosity somewhat (and the same applies to most other algorithms as well).
If you care about that, there are a couple of range libraries you might want to look at--Boost Range for one, and (much more interesting, in my opinion) Eric Neibler's range library.
Your code can be improved by using ? : (it might be sensible to create a static QVariant QVNone; that you could use).
std::transform(in[0].begin(), in[0].end(), out.begin(),
[](const QString& a) // for C++14: (auto& a)
{ return a.isNull() || a.isEmpty() ? QVariant("NONE") : a; }
);
Note: this page documents QVariant(const QString&), so the compiler should be able to work out a common type for the ? : values.
C++11 provides automatic determination of lambda return type when there's a single return statement - see syntax (3) here. C++14 already introduces the ability to accept the argument ala (auto& a). Ranges over container elements would help simplify such loops further; I think they're proposed for C++17; a relevant paper's available here.
There are also functional (non-Standard) libraries for C++ that may offer you a notation more like the one you document for D. Library recommendations are off-topic here, but Google should turn up some candidates without much effort.

Use of std::regex_iterator<std::string::iterator> according to CPlusPlus.com

I'm reading the documentation on std::regex_iterator<std::string::iterator> since I'm trying to learn how to use it for parsing HTML tags. The example the site gives is
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("this subject has a submarine as a subsequence");
std::regex e ("\\b(sub)([^ ]*)"); // matches words beginning by "sub"
std::regex_iterator<std::string::iterator> rit ( s.begin(), s.end(), e );
std::regex_iterator<std::string::iterator> rend;
while (rit!=rend) {
std::cout << rit->str() << std::endl;
++rit;
}
return 0;
}
(http://www.cplusplus.com/reference/regex/regex_iterator/regex_iterator/)
and I have one question about that: If rend is never initialized, then how is it being used meaningfully in the rit!=rend?
Also, is the tool I should be using for getting attributes out of HTML tags? What I want to do is take a string like "class='class1 class2' id = 'myId' onclick ='myFunction()' >" and break in into pairs
("class", "class1 class2"), ("id", "myId"), ("onclick", "myFunction()")
and then work with them from there. The regular expression I'm planning to use is
([A-Za-z0-9\\-]+)\\s*=\\s*(['\"])(.*?)\\2
and so I plan to iterate through expression of that type while keeping track of whether I'm still in the tag (i.e. whether I've passed a '>' character). Is it going to be too hard to do this?
Thank you for any guidance you can offer me.
What do you mean with "if rend is never initialized"? Clearly, std::regex_iterator<I> has a default constructor. Since the iteration is only forward iteration the end iterator just needs to be something suitable to detect that the end is used. The default constructor can set up rend correspondingly.
This is an idiom used in a few other places in the standard C++ library, e.g., for std::istream_iterator<T>. Ideally, the end iterator could be indicated using a different type (see, e.g., Eric Niebler's discussion on this issue, the link is to the first of four pages) but the standard currently requires that the two types match when using algorithms.
With respect to parsing HTML using regular expression please refer to this answer.
rend is not uninitialized, it is default-constructed. The page you linked is clear that:
The default constructor (1) constructs an end-of-sequence iterator.
Since default-construction appears to be the only way to obtain an end-of-sequence iterator, comparing rit to rend is the correct way to test whether rit is exhausted.

Example class for Boost boyer_moore search corpusIter type?

I've successfully used:
boost::algorithm::boyer_moore_search<const char *,const char *>( haystack, haystack_end, needle, needle_end )
to look for a needle in a haystack. Now I'd like to use BM_search to do a case-insensitive search for the needle. Since my haystack is giant, my plan is to convert the needle to lower case and have the haystack iterator treat the haystack characters as a special class whose compare function converts alphabetics to lower case before comparing. However, I haven't been able to express this correctly. I'm trying:
class casechar {
public:
char ch;
// other stuff...it's not right, but I don't think the compiler is even getting this far
} ;
class caseiter : public std::iterator<random_access_iterator_tag,casechar> {
const casechar *ptr;
public:
// various iterator functions, but not enough of them, apparently!
} ;
boost::algorithm::boyer_moore_search<const caseiter,const char *>( HaYsTaCk, HaYsTaCk_EnD, needle, needle_end );
The compiler (g++ on OSX) is complaining about an attempt to instantiate hash<casechar>, I guess for some BM internal thing. I'm lost in a maze of template<twisty_passages,all_different>. Could I impose on someone for a bit of direction? I suspect I just need to provide certain implementations in casechar and/or caseiter, but I don't know which ones.
Thanks!
The first problem you will run into is this:
BOOST_STATIC_ASSERT (( boost::is_same<
typename std::iterator_traits<patIter>::value_type,
typename std::iterator_traits<corpusIter>::value_type>::value ));
This requires the value types of the iterator for the pattern and the iterator for the corpus to be the same type. In other words, you'll need to use casechar for the pattern as well.
Here's what I'd do:
Write a casechar, with custom operator == etc for case-insensitive comparison.
No need to write a custom iterator. const casechar * is a perfectly acceptable random access iterator.
Write a std::hash<casechar> specialization. This specialization should probably simply return something like std::hash<char>()(std::tolower(ch)).
That said, I'm somewhat doubtful that this will actually net you a performance gain, compared to just converting everything to lower case. The skip table for chars uses an optimization that uses arrays instead of unordered_maps for faster indexing and fewer heap allocations. This optimization is not used for a custom type such as casechar.
Please find the sample code here
std::vector<std::wstring> names;
names.push_back(L"Rahul");
names.push_back(L"John");
names.push_back(L"Alexa");
names.push_back(L"Tejas");
names.push_back(L"Alexandra");
std::vector<std::wstring> pattern;
pattern.push_back(L"Tejas");
auto itr = boost::algorithm::boyer_moore_search<std::vector<std::wstring>,
std::vector<std::wstring>>(names, pattern);
if (itr != names.end())
{
OutputDebugString(std::wstring(L"pattern found in the names " + *itr).c_str());
}
else
{
OutputDebugString(std::wstring(L"pattern not found in the names").c_str());
}
For the working demo of code i have created a video, please check at :
Boyer moore search | Boost api | C++ tutorial