Regex - count all numbers - c++

I'm looking for a regex pattern that returns true if found 7 numbers on given string. There's no order so if a string is set to: "100 my, str1ng y000" it catches that.

RegEx alone won't count exact occurrences for you, it would return true even if there are more than 7 digits in the string because it would try to find out at least 7 digits in the string.
You can use below code to test exact number (7 in your case) of digits in any string:
var temp = "100 my, str1ng y000 3c43fdgd";
var count = (temp.match(/\d/g) || []).length;
alert(count == 7);

I will show you an C++ Example that
Shows a regex for extracting digit groups
Shows a regex for matching at least 7 digits
Shows, if there is a match for the requested predicate
Shows the number of digits in the string (no regex needed)
Shows the group of digits
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <regex>
// Our test data (raw string). So, containing also \" and so on
std::string testData("100 my, str1ng y000");
std::regex re1(R"#((\d+))#"); // For extracting digit groups
std::regex re2(R"#((\d.*){7,})#"); // For regex match
int main(void)
{
// Define the variable id as vector of string and use the range constructor to read the test data and tokenize it
std::vector<std::string> id{ std::sregex_token_iterator(testData.begin(), testData.end(), re1, 1), std::sregex_token_iterator() };
// Match the regex. Should have at least 7 digits somewhere
std::smatch base_match;
bool containsAtLeast7Digits = std::regex_match(testData, base_match, re2);
// Show result on screen
std::cout << "\nEvaluating string '" << testData <<
"'\n\nThe predicate 'contains-at-leats-7-digits' is " << std::boolalpha << containsAtLeast7Digits <<
"\n\nIt contains overall " <<
std::count_if(
testData.begin(),
testData.end(),
[](const char c) {
return std::isdigit(static_cast<int>(c));
}
) << " digits and " << id.size() << " digit groups. These are:\n\n";
// Print complete vector to std::cout
std::copy(id.begin(), id.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
return 0;
}
Please note: Use std::count for counting. Faster and easier.
Hope this helps . . .

Related

c++11 regexp retrieving all groups with +/* modifiers

I don't understand how to retrieve all groups using regexp in c++
An example:
const std::string s = "1,2,3,5";
std::regex lrx("^(\\d+)(,(\\d+))*$");
std::smatch match;
if (std::regex_search(s, match, lrx))
{
int i = 0;
for (auto m : match)
std::cout << " submatch " << i++ << ": "<< m << std::endl;
}
Gives me the result
submatch 0: 1,2,3,5
submatch 1: 1
submatch 2: ,5
submatch 3: 5
I am missing 2 and 3
You cannot use the current approach, since std::regex does not allow storing of the captured values in memory, each time a part of the string is captured, the former value in the group is re-written with the new one, and only the last value captured is available after a match is found and returned. And since you defined 3 capturing groups in the pattern, you have 3+1 groups in the output.
Mind also, that std::regex_search only returns one match, while you will need multiple matches here.
So, what you may do is to perform 2 steps: 1) validate the string using the pattern you have (no capturing is necessary here), 2) extract the digits (or split with a comma, that depends on the requirements).
A C++ demo:
#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main() {
std::regex rx_extract("[0-9]+");
std::regex rx_validate(R"(^\d+(?:,\d+)*$)");
std::string s = "1,2,3,5";
if (regex_match(s, rx_validate)) {
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), rx_extract);
i != std::sregex_iterator();
++i)
{
std::smatch m = *i;
std::cout << m.str() << '\n';
}
}
return 0;
}
Output:
1
2
3
5

Is it possible to find two strings in one string using regular expressions? [duplicate]

I'm a bit confused about the following C++11 code:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string haystack("abcdefabcghiabc");
std::regex needle("abc");
std::smatch matches;
std::regex_search(haystack, matches, needle);
std::cout << matches.size() << std::endl;
}
I'd expect it to print out 3 but instead I get 1. Am I missing something?
You get 1 because regex_search returns only 1 match, and size() will return the number of capture groups + the whole match value.
Your matches is...:
Object of a match_results type (such as cmatch or smatch) that is filled by this function with information about the match results and any submatches found.
If [the regex search is] successful, it is not empty and contains a series of sub_match objects: the first sub_match element corresponds to the entire match, and, if the regex expression contained sub-expressions to be matched (i.e., parentheses-delimited groups), their corresponding sub-matches are stored as successive sub_match elements in the match_results object.
Here is a code that will find multiple matches:
#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main() {
string str("abcdefabcghiabc");
int i = 0;
regex rgx1("abc");
smatch smtch;
while (regex_search(str, smtch, rgx1)) {
std::cout << i << ": " << smtch[0] << std::endl;
i += 1;
str = smtch.suffix().str();
}
return 0;
}
See IDEONE demo returning abc 3 times.
As this method destroys the input string, here is another alternative based on the std::sregex_iterator (std::wsregex_iterator should be used when your subject is an std::wstring object):
int main() {
std::regex r("ab(c)");
std::string s = "abcdefabcghiabc";
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r);
i != std::sregex_iterator();
++i)
{
std::smatch m = *i;
std::cout << "Match value: " << m.str() << " at Position " << m.position() << '\n';
std::cout << " Capture: " << m[1].str() << " at Position " << m.position(1) << '\n';
}
return 0;
}
See IDEONE demo, returning
Match value: abc at Position 0
Capture: c at Position 2
Match value: abc at Position 6
Capture: c at Position 8
Match value: abc at Position 12
Capture: c at Position 14
What you're missing is that matches is populated with one entry for each capture group (including the entire matched substring as the 0th capture).
If you write
std::regex needle("a(b)c");
then you'll get matches.size()==2, with matches[0]=="abc", and matches[1]=="b".
EDIT: Some people have downvoted this answer. That may be for a variety of reasons, but if it is because it does not apply to the answer I criticized (no one left a comment to explain the decision), they should take note that W. Stribizew changed the code two months after I wrote this, and I was unaware of it until today, 2021-01-18. The rest of the answer is unchanged from when I first wrote it.
#stribizhev's solution has quadratic worst case complexity for sane regular expressions. For insane ones (e.g. "y*"), it doesn't terminate. In some applications, these issues could be DoS attacks waiting to happen. Here's a fixed version:
string str("abcdefabcghiabc");
int i = 0;
regex rgx1("abc");
smatch smtch;
auto beg = str.cbegin();
while (regex_search(beg, str.cend(), smtch, rgx1)) {
std::cout << i << ": " << smtch[0] << std::endl;
i += 1;
if ( smtch.length(0) > 0 )
std::advance(beg, smtch.length(0));
else if ( beg != str.cend() )
++beg;
else
break;
}
According to my personal preference, this will find n+1 matches of an empty regex in a string of length n. You might also just exit the loop after an empty match.
If you want to compare the performance for a string with millions of matches, add the following lines after the definition of str (and don't forget to turn on optimizations), once for each version:
for (int j = 0; j < 20; ++j)
str = str + str;

regex hanging my program

I'm trying to write a c++ regex to essentially match a few symbols and identifiers as part of a tokenizer. Currently, I have this:
EDITED
regex tokens("([a-zA-Z_][a-zA-Z0-9_]*)|(\\S?)|(\\S)")
vector<string> identifiers(std::sregex_token_iterator(str.begin(), str.end(),
IDENTIFIER),std::sregex_token_iterator());
https://regex101.com/r/mFTC1Y/2
The problem is, it hangs my program (just takes forever and I never get to the matches). I don't understand how that can be? The regex tester I'm using says it takes a bout 7ms to match...
Please help!
JUST EDITED: so this regex matches what I want, but only via group captures. If it parses:
main()
It will return
main( // full match
main // group 1
( // group 2
new match
) // full match
) // group 3
I just want the group matches without having to explicitly check the respective groups (i.e. I just don't return the full match to me). How can I update my code to do that?
EDIT
So, this is the full, working code. I'd prefer it be more elegant.
regex TOKENS("([a-zA-Z_][a-zA-Z0-9_]*)|(\\S?)|(\\S)")
auto identifier = sregex_iterator(str.cbegin(), str.cend(), TOKENS);
auto it = sregex_iterator();
for_each(identifier, it, [&](smatch const& m){
string group1(m[1].str());
string group2(m[2].str());
string group3(m[3].str());
if(isKeyword(keywords, group1)) cout << "<keyword> " << group1 << " </keyword>" << endl;
else if(group1 != "") cout << "<identifier> " << group1 << " </identifier>" << endl;
if (isSymbol(symbols, group2)) cout << "<symbol> " << group2 << " </symbol>" << endl;
if (isSymbol(symbols, group3)) cout << "<symbol> " << group3 << " </symbol>" << endl;
});
Something more elegant would probably come at the cost of a very complex regex, or else a very clever one, since essentially what I'm trying to do is tokenize code into one of three types: KEYWORD, ID and SYMBOL - all with one regex. Next I'll have to tackle INT/STRING const and comments. What I'm trying to avoid is tokenizing char by char, because then I'll have even more control-flow statements (which I don't want).
I am not sure, if your regex is correct.
Try the below:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <regex>
// Our test data (raw string). So, containing also \n and so on
std::string testData(
R"#( :-) IDcorrect1 _wrongID I2DCorrect
3FALSE lowercasecorrect Underscore_not_allowed
i3DCorrect,i4 :-)
}
)#");
std::regex re("(\\b[a-zA-Z][a-zA-Z0-9]*\\b)");
int main(void)
{
// Define the variable id as vector of string and use the range constructor to read the test data and tokenize it
std::vector<std::string> id{ std::sregex_token_iterator(testData.begin(), testData.end(), re, 1), std::sregex_token_iterator() };
// For debug output. Print complete vector to std::cout
std::copy(id.begin(), id.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
return 0;
}
All IDs will be in the vector. Then you can further check.

C++11 Regex IfThenElse - Single, closed brackets matched OR no brackets matched

How can I define a c++11/ECMAScript compatible regex statement that matches strings either:
Containing a single, closed, pair of round brackets containing an alphanumeric string of length greater than 0 - for example the regex statement "\(\w+\)", which correctly matches "(abc_123)" and ignores the incorrect "(abc_123", "abc_123)" and "abc_123". However, the above expression does not ignore input strings containing multiple balanced/unbalanced bracketing - I would like to exclude "((abc_123)", "(abc_123))", and "((abc_123))" from my matched results.
Or a single, alphanumeric word, without any unbalanced brackets - for example something like the regex statement "\w+" correctly matches "abc_123", but unfortunately incorrectly matches with "(abc_123", "abc_123)", "((abc_123)", "(abc_123))", and "((abc_123))"...
For clarity, the required matchings for each the test cases above are:
"abc_123" = Match,
"(abc_123)" = Match,
"(abc_123" = Not matched,
"abc_123)" = Not matched,
"((abc_123)" = Not matched,
"(abc_123))" = Not matched,
"((abc_123))" = Not matched.
I've been playing around with implementing the IfThenElse format suggested by http://www.regular-expressions.info/conditional.html, but haven't gotten very far... Is there some way to limit the number of occurrences of a particular group [e.g. "(\(){0,1}" matches zero or one left hand round bracket], AND pass the number of repetitions of a previous group to a later group [say "num\1" equals the number of times the "(" bracket appears in "(\(){0,1}", then I could pass this to the corresponding closing bracket group, "(\)){num\1}" say...]
Not what do you want, I suppose, and non really elegant but...
With "or" (|) you should obtain a better-than-nothing solution based on "\\(\\w+\\)|\\w+".
A full example follows
#include <regex>
#include <iostream>
bool isMatch (std::string const & str)
{
static std::regex const
rgx { "\\(\\w+\\)|\\w+" };
std::smatch srgx;
return std::regex_match(str, srgx, rgx);
}
int main()
{
std::cout << isMatch("abc_123") << std::endl; // print 1
std::cout << isMatch("(abc_123)") << std::endl; // print 1
std::cout << isMatch("(abc_123") << std::endl; // print 0
std::cout << isMatch("abc_123)") << std::endl; // print 0
std::cout << isMatch("((abc_123)") << std::endl; // print 0
std::cout << isMatch("(abc_123))") << std::endl; // print 0
std::cout << isMatch("((abc_123))") << std::endl; // print 0
}

C++ RegEx matching - Pull the matching numbers

Ok, so I'm working with C++ regex and I'm not quite sure how to go about extracting the numbers that I want from my expression.
I'm building an expression BASED on numbers, but not sure how to pull them back out.
Here's my string:
+10.7% Is My String +5 And Some Extra Stuff Here
I use that string to pull the numbers
10 , 7 , 5 out and add them to a vector, no big deal.
I then change that string to become a regex expression.
\+([0-9]+)\.([0-9]+)% Is My String \+([0-9]+) And Some Extra Stuff Here
Now how do I go about using that regexp expression to MATCH my starting string and extracting the numbers back out.
Something along the lines of using the match table?
You must iterate over the submatches to extract them.
Example:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string input = "+10.7% Is My String +5 And Some Extra Stuff Here";
std::regex rx("\\+([0-9]+)\\.([0-9]+)% Is My String \\+([0-9]+) And Some Extra Stuff Here");
std::smatch match;
if (std::regex_match(input, match, rx))
{
for (std::size_t i = 0; i < match.size(); ++i)
{
std::ssub_match sub_match = match[i];
std::string num = sub_match.str();
std::cout << " submatch " << i << ": " << num << std::endl;
}
}
}
Output:
submatch 0: +10.7% Is My String +5 And Some Extra Stuff Here
submatch 1: 10
submatch 2: 7
submatch 3: 5
live example: https://ideone.com/01XJDF