tIs it possible for me to detect if a string is 'all numeric' or not using tr1 regex?
If yes, please help me with a snipped as well since I am new to regex.
Why I am looking towards tr1 regex for something like this, because I don't want to create a separate function for detecting if the string is numeric. I want to do it inline in rest of the client code but do not want it to look ugly as well. I feel maybe tr1 regex might help. Not sure, any advises on this?
If you just want to test whether the string has all numeric characters, you can use std::find_if_not and std::isdigit:
std::find_if_not(s.begin(), s.end(), (int(*)(int))std::isdigit) == s.end()
If you do not have a Standard Library implementation with std::find_if_not, you can easily write it:
template <typename ForwardIt, typename Predicate>
ForwardIt find_if_not(ForwardIt first, ForwardIt last, Predicate pred)
{
for (; first != last; ++first)
if (!pred(first))
return first;
return first;
}
You can use the string::find_first_not_of member function to test for numeric characters.
if (mystring.find_first_not_of("0123456789") == std::string::npos)
{
std::cout << "numeric only!";
}
The regular expression for this is rather trivial. Just try to match "\\D". This will match on any character that's not a digit. If you'd like it to include a decimal separator too, you could use "[^\\d\\.]", which translates to "not a digit or dot".
However, how about simply using strtol() to read the number? You'll be able to retrieve a pointer to the first non-number character. So, if this points to the end of the string, it's been fine. Plus side here is, you won't even need TR1 for this.
Related
If I want to verify one string is completely matches with the any one in the vector of strings then i will use
std::find(vectOfStrings.begin(), vectOfStrings.end(), "<targetString>") != v.end()
If the target string matches with any of the string in the vector then it will return true.
But what if i want to check one string is matches with any one of the vector of regular expressions?
Is there any standard library i can use to make it work like
std::find(vectOfRegExprsns.begin(), vectOfRegExprsns.end(), "<targetString>") != v.end()?
Any suggestions would be highly appreciated.
How about using std::find_if() with a lambda?
std::find_if(
vectOfRegExprsns.begin(), vectOfRegExprsns.end(),
[](const std::string& item) { return regex_match(item, std::regex(targetString))});
I can't get my head around this. I'm trying to remove all occurrences of a certain character within a string until the string becomes empty. I know we can remove all character occurrences from an std::string by using the combination of string::erase and std::remove like so:
s.erase(remove(s.begin(), s.end(), '.'), s.end());
where the '.' is the actual character to be removed. It even works if I try to remove certain characters. Now let's consider the following string: 'abababababababa'. What I'm trying to achieve is to reduce this string to ashes be removing all 'a's for startes, which will leave me with a couple of 'b's. Then remove all those 'b's which will leave me with an empty string. Of course this is just a part of my task but I could narrow it down for this problem. Here's my naive approach based on the upper combination of functions:
string s = "abababababababa";
while (!s.empty()) {
...
s.erase(remove(s.begin(), s.end(), s[0]), s.end());
...
}
Of course it doesn't work, I just can't seem to find out why. By debugging the application I can see how the "s" string is being modified. While the s.erase... works perfectly if I set a character constant for remove's third parameter it fails if I try to use char variables. Here's what the s string looks like after each iteration:
Removing[a] from [abababababababa] Result: baaaaaaa
Removing[b] from [baaaaaaa] Result: a
Removing[a] from [a] Result: -
While I expected 2 operations until a string should become empty - which works, if I hardcode the letters by hand and use s.erase twice - it actually takes 3 iteration. The most frustrating part however is the fact that, while I'm removing 'a' in the first iteration only the first 'a' is removed and all other 'b'.
Why is this happening? Is it the cause of how erase / remove works internally?
You have undefined behavior.
You get the results you get because std::remove takes the value to remove by reference, once s[0] has been removed, what happens to the reference to it then?
The simple solution is to create a temporary variable, assign e.g. s[0] to it, and pass the variable instead.
The behavior of function remove() template is equivalent to:
template <class ForwardIterator, class T>
ForwardIterator remove (ForwardIterator first, ForwardIterator last, const T& val)
{
ForwardIterator result = first;
while (first!=last) {
if (!(*first == val)) {
*result = move(*first);
++result;
}
++first;
}
return result;
}
As you see, the function will move the element different with val to the front of the range.
so in your case "ababababab",
if you call remove() like you did, the original s[0] is 'a', but it will be instead by 'b' during the remove(), the remaining code will remove the 'b', so the result is not right.
Like Joachim say, assign s[0] to a temporary variable.
the code is reference from http://www.cplusplus.com/reference/algorithm/remove/?kw=remove
I'm reading the documentation on std::regex_iterator<std::string::iterator> since I'm trying to learn how to use it for parsing HTML tags. The example the site gives is
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("this subject has a submarine as a subsequence");
std::regex e ("\\b(sub)([^ ]*)"); // matches words beginning by "sub"
std::regex_iterator<std::string::iterator> rit ( s.begin(), s.end(), e );
std::regex_iterator<std::string::iterator> rend;
while (rit!=rend) {
std::cout << rit->str() << std::endl;
++rit;
}
return 0;
}
(http://www.cplusplus.com/reference/regex/regex_iterator/regex_iterator/)
and I have one question about that: If rend is never initialized, then how is it being used meaningfully in the rit!=rend?
Also, is the tool I should be using for getting attributes out of HTML tags? What I want to do is take a string like "class='class1 class2' id = 'myId' onclick ='myFunction()' >" and break in into pairs
("class", "class1 class2"), ("id", "myId"), ("onclick", "myFunction()")
and then work with them from there. The regular expression I'm planning to use is
([A-Za-z0-9\\-]+)\\s*=\\s*(['\"])(.*?)\\2
and so I plan to iterate through expression of that type while keeping track of whether I'm still in the tag (i.e. whether I've passed a '>' character). Is it going to be too hard to do this?
Thank you for any guidance you can offer me.
What do you mean with "if rend is never initialized"? Clearly, std::regex_iterator<I> has a default constructor. Since the iteration is only forward iteration the end iterator just needs to be something suitable to detect that the end is used. The default constructor can set up rend correspondingly.
This is an idiom used in a few other places in the standard C++ library, e.g., for std::istream_iterator<T>. Ideally, the end iterator could be indicated using a different type (see, e.g., Eric Niebler's discussion on this issue, the link is to the first of four pages) but the standard currently requires that the two types match when using algorithms.
With respect to parsing HTML using regular expression please refer to this answer.
rend is not uninitialized, it is default-constructed. The page you linked is clear that:
The default constructor (1) constructs an end-of-sequence iterator.
Since default-construction appears to be the only way to obtain an end-of-sequence iterator, comparing rit to rend is the correct way to test whether rit is exhausted.
I'm working on a multithreading project where for one segment of the project I need to find if a given character sequence exists within a string. Im wondering if C++/C have any pre-built functions which can handle this, but am having trouble figuring out the exact 'definition' to search for.
I know about 'strtr' and 'find', the issue is the function needs to be able to find a sequence which is SPLIT across a string.
Given the string 'Hello World', I need a function that returns true if the sequence 'H-W-l' exists. Is there anything prebuilt which can handle this?
As far as I know, subsequence searching as such is not part of either the standard C library or the standard C++ library.
However, you can express subsequence searching as either a regular expression or a "glob". Posix mandates both regex and glob matching functions, while the C++ standard library includes regular expressions since C++11. Both of these techniques require modifying the search string:
Regular expression: HWl ⇒ H.*W.*l. regexec will do a search for the regular expression (unless anchored, which this one is not); in C++, you would want to use std::regex_search rather than std::regex_match.
Glob: HWl ⇒ *H*W*l*. Glob matching is always a complete match, although in all the implementations I know of a trailing * is optimized. This is available as the fnmatch function in the Posix header fnmatch.h. For this application, provide 0 for the flags parameter.
If you don't like any of the above, you can use the standard C strchr function in a simple loop:
bool has_subsequence(const char* haystack, const char* needle) {
const char* p;
for (p = haystack; *needle && (p = strchr(p, *needle)); ++needle) {
}
return p != NULL;
}
If I understand correctly, then you're trying to search for chars in a given order but aren't necessarily contiguous. If you're in C++, I don't see why you couldn't use the std::find function under the <algorithm> system header. I would load both into a string and then search as follows:
bool has_noncontig_sequence(const std::string& str, const std::string& subStr)
{
typedef std::string::const_iterator iter;
iter start = str.begin();
// loop over substr and save iterator position;
for (iter i = subStr.begin(); i != subStr.end(); ++i)
start = std::find(start, str.end(), *i);
// check position, if at end, then false;
return start != str.end() ? true : false;
}
The std::find function will position start over the first correct character in str if it can find it and then search for the next. If it can't, then start will be positioned at the end, indicating failure.
I've successfully used:
boost::algorithm::boyer_moore_search<const char *,const char *>( haystack, haystack_end, needle, needle_end )
to look for a needle in a haystack. Now I'd like to use BM_search to do a case-insensitive search for the needle. Since my haystack is giant, my plan is to convert the needle to lower case and have the haystack iterator treat the haystack characters as a special class whose compare function converts alphabetics to lower case before comparing. However, I haven't been able to express this correctly. I'm trying:
class casechar {
public:
char ch;
// other stuff...it's not right, but I don't think the compiler is even getting this far
} ;
class caseiter : public std::iterator<random_access_iterator_tag,casechar> {
const casechar *ptr;
public:
// various iterator functions, but not enough of them, apparently!
} ;
boost::algorithm::boyer_moore_search<const caseiter,const char *>( HaYsTaCk, HaYsTaCk_EnD, needle, needle_end );
The compiler (g++ on OSX) is complaining about an attempt to instantiate hash<casechar>, I guess for some BM internal thing. I'm lost in a maze of template<twisty_passages,all_different>. Could I impose on someone for a bit of direction? I suspect I just need to provide certain implementations in casechar and/or caseiter, but I don't know which ones.
Thanks!
The first problem you will run into is this:
BOOST_STATIC_ASSERT (( boost::is_same<
typename std::iterator_traits<patIter>::value_type,
typename std::iterator_traits<corpusIter>::value_type>::value ));
This requires the value types of the iterator for the pattern and the iterator for the corpus to be the same type. In other words, you'll need to use casechar for the pattern as well.
Here's what I'd do:
Write a casechar, with custom operator == etc for case-insensitive comparison.
No need to write a custom iterator. const casechar * is a perfectly acceptable random access iterator.
Write a std::hash<casechar> specialization. This specialization should probably simply return something like std::hash<char>()(std::tolower(ch)).
That said, I'm somewhat doubtful that this will actually net you a performance gain, compared to just converting everything to lower case. The skip table for chars uses an optimization that uses arrays instead of unordered_maps for faster indexing and fewer heap allocations. This optimization is not used for a custom type such as casechar.
Please find the sample code here
std::vector<std::wstring> names;
names.push_back(L"Rahul");
names.push_back(L"John");
names.push_back(L"Alexa");
names.push_back(L"Tejas");
names.push_back(L"Alexandra");
std::vector<std::wstring> pattern;
pattern.push_back(L"Tejas");
auto itr = boost::algorithm::boyer_moore_search<std::vector<std::wstring>,
std::vector<std::wstring>>(names, pattern);
if (itr != names.end())
{
OutputDebugString(std::wstring(L"pattern found in the names " + *itr).c_str());
}
else
{
OutputDebugString(std::wstring(L"pattern not found in the names").c_str());
}
For the working demo of code i have created a video, please check at :
Boyer moore search | Boost api | C++ tutorial