C++: Matching regex, what is in smatch? [duplicate] - c++

This question already has an answer here:
What is returned in std::smatch and how are you supposed to use it?
(1 answer)
Closed 2 years ago.
I'm using a modified regex example from Stroustrup C++ 4th Ed. Page 127 & 128. I'm trying to understand what is in the vector smatch matches.
$ ./a.out
AB00000-0000
AB00000-0000.-0000.
$ ./a.out
AB00000
AB00000..
It seems like the matches in parenthesis () appear in match[1], match[2], ... which the total match appears in match[0].
Appreciate any insight into this.
#include <iostream>
#include <regex>
using namespace std;
int main(int argc, char *argv[])
{
// ZIP code pattern: XXddddd-dddd and variants
regex pat (R"(\w{2}\s*\d{5}(-\d{4})?)");
for (string line; getline(cin,line);) {
smatch matches; // matched strings go here
if (regex_search(line, matches, pat)) { // search for pat in line
for (auto p : matches) {
cout << p << ".";
}
}
cout << endl;
}
return 0;
}

The type of matches is a std::match_results, not a vector, but it does have an operator[].
From the reference:
If n == 0, returns a reference to the std::sub_match representing the part of the target sequence matched by the entire matched regular expression.
If n > 0 and n < size(), returns a reference to the std::sub_match representing the part of the target sequence that was matched by the nth captured marked subexpression).
where n is the argument to operator[]. So matches[0] contains the entire matched expression, and matches[1], matches[2], ... contain consecutive capture group expressions.

Related

Avoid empty elements in match when optional substrings are not present

I am trying to create a regex that match the strings returned by diff terminal command.
These strings start with a decimal number, might have a substring composed by a comma and a number, then a mandatory character (a, c, d) another mandatory decimal number followed by another optional group as the one before.
Examples:
27a27
27a27,30
28c28
28,30c29,31
1d1
1,10d1
I am trying to extract all the groups separately but the optional ones without ,.
I am doing this in C++:
#include<iostream>
#include<string>
#include<fstream>
#include <regex>
using namespace std;
int main(int argc, char* argv[])
{
string t = "47a46";
std::string result;
std::regex re2("(\\d+)(?:,(\\d+))?([acd])(\\d+)(?:,(\\d+))?");
std::smatch match;
std::regex_search(t, match, re2);
cout<<match.size()<<endl;
cout<<match.str(0)<<endl;
if (std::regex_search(t, match, re2))
{
for (int i=1; i<match.size(); i++)
{
result = match.str(i);
cout<<i<<":"<<result<< " ";
}
cout<<endl;
}
return 0;
}
The string variable t is the string I want to manipulate.
My regular expression
(\\d+)(?:,(\\d+))?([acd])(\\d+)(?:,(\\d+))?
is working but with strings that do not have the optional subgroups (such as 47a46, the match variable will contain empty elements in the corresponding position of the expected substrings.
For example in the program above the elements of match (preceded by their index) are:
1:47 2: 3:a 4:46 5:
Elements in position 2 and 5 correspond to the optional substring that in this case are not present so I would like match to avoid retrieving them so that it would be:
1:47 2:a 3:46
How can I do it?
I think the best RE for you would be like this:
std::regex re2(R"((\d+)(?:,\d+)?([a-z])(\d+)(?:,\d+)?)");
- that way it should match all the required groups (but optional)
output:
4
47a46
1:47 2:a 3:46
Note: the re2's argument string is given in c++11 notation.
EDIT: simplified RE a bit

C++ regex finds only 1 sub match [duplicate]

This question already has answers here:
How to match multiple results using std::regex
(6 answers)
Closed 5 years ago.
// Example program
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string strr("1.0.0.0029.443");
std::regex rgx("([0-9])");
std::smatch match;
if (std::regex_search(strr, match, rgx)) {
for(int i=0;i<match.size();i++)
std::cout << match[i] << std::endl;
}
}
this program should write
1
0
0
0
0
2
9
4
4
3
but it writes
1
1
checked it here http://cpp.sh/ and on visual studio, both same results.
Why does it find only 2 matches and why are they same?
As I understand from answers here, regex search stops at first match and match variable holds the necessary (sub?)string value to continue(by repeating) for other matches. Also since it stops at first match, () charachters are used only for sub-matches within the result.
Being called once, regex_search returns only the first match in the match variable. The collection in match comprises the match itself and capture groups if there are any.
In order to get all matches call regex_search in a loop:
while(regex_search(strr, match, rgx))
{
std::cout << match[0] << std::endl;
strr = match.suffix();
}
Note that in your case the first capture group is the same as the whole match so there is no need in the group and you may define the regex simply as [0-9] (without parentheses.)
Demo: https://ideone.com/pQ6IsO
Problems:
Using if only gives you one match. You need to use a while loop to find all the matches. You need to search past the previous match in the next iteration of the loop.
std::smatch::size() returns 1 + number of matches. See its documentation. std::smatch can contain sub-matches. To get the entire text, use match[0].
Here's an updated version of your program:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string strr("1.0.0.0029.443");
std::regex rgx("([0-9])");
std::smatch match;
while (std::regex_search(strr, match, rgx)) {
std::cout << match[0] << std::endl;
strr = match.suffix();
}
}

C++11 regex matching capturing group multiple times

Could someone please help me to extract the text between the : and the ^ symbols using a JavaScript (ECMAScript) regular expression in C++11. I do not need to capture the hw-descriptor itself - but it does have to be present in the line in order for the rest of the line to be considered for a match. Also the :p....^, :m....^ and :u....^ can arrive in any order and there has to be at least 1 present.
I tried using the following regular expression:
static const std::regex gRegex("(?:hw-descriptor)(:[pmu](.*?)\\^)+", std::regex::icase);
against the following text line:
"hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^"
Here is the code which posted on a live coliru. It shows how I attempted to solve this problem, however I am only getting 1 match. I need to see how to extract each of the potential 3 matches corresponding to the p m or u characters described earlier.
#include <iostream>
#include <string>
#include <vector>
#include <regex>
int main()
{
static const std::regex gRegex("(?:hw-descriptor)(:[pmu](.*?)\\^)+", std::regex::icase);
std::string foo = "hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^";
// I seem to only get 1 match here, I was expecting
// to loop through each of the matches, looks like I need something like
// a pcre global option but I don't know how.
std::for_each(std::sregex_iterator(foo.cbegin(), foo.cend(), gRegex), std::sregex_iterator(),
[&](const auto& rMatch) {
for (int i=0; i< static_cast<int>(rMatch.size()); ++i) {
std::cout << rMatch[i] << std::endl;
}
});
}
The above program gives the following output:
g++ -std=c++14 -O2 -Wall -pedantic -pthread main.cpp && ./a.out
hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^
:uTEXT3^
TEXT3
With std::regex, you cannot keep mutliple repeated captures when matching a certain string with consecutive repeated patterns.
What you may do is to match the overall texts containing the prefix and the repeated chunks, capture the latter into a separate group, and then use a second smaller regex to grab all the occurrences of the substrings you want separately.
The first regex here may be
hw-descriptor((?::[pmu][^^]*\\^)+)
See the online demo. It will match hw-descriptor and ((?::[pmu][^^]*\\^)+) will capture into Group 1 one or more repetitions of :[pmu][^^]*\^ pattern: :, p/m/u, 0 or more chars other than ^ and then ^. Upon finding a match, use :[pmu][^^]*\^ regex to return all the real "matches".
C++ demo:
static const std::regex gRegex("hw-descriptor((?::[pmu][^^]*\\^)+)", std::regex::icase);
static const std::regex lRegex(":[pmu][^^]*\\^", std::regex::icase);
std::string foo = "hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^ hw-descriptor:pTEXT8^:mTEXT8^:uTEXT83^";
std::smatch smtch;
for(std::sregex_iterator i = std::sregex_iterator(foo.begin(), foo.end(), gRegex);
i != std::sregex_iterator();
++i)
{
std::smatch m = *i;
std::cout << "Match value: " << m.str() << std::endl;
std::string x = m.str(1);
for(std::sregex_iterator j = std::sregex_iterator(x.begin(), x.end(), lRegex);
j != std::sregex_iterator();
++j)
{
std::cout << "Element value: " << (*j).str() << std::endl;
}
}
Output:
Match value: hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^
Element value: :pTEXT1^
Element value: :mTEXT2^
Element value: :uTEXT3^
Match value: hw-descriptor:pTEXT8^:mTEXT8^:uTEXT83^
Element value: :pTEXT8^
Element value: :mTEXT8^
Element value: :uTEXT83^

C++ std::regex track 2

I want to extract track 2 data from a string, using std::regex in C++.
I have a piece of code, but it does not work. This is the code:
std::string buff("this is ateststring;5581123456781323=160710212423468?hjks");
std::regex e (";\d{0,19}=\d{7}\w*\?", std::regex_constants::basic);
if(std::regex_match(buff, e))
{
cout << "Found!";
}
From the regex_match documentation:
The entire target sequence must match the regular expression for this function to return true (i.e., without any additional characters before or after the match). For a function that returns true when the match is only part of the sequence, see regex_search.
So try using regex_search instead:
std::regex e (";\\d{0,19}=\\d{7}\\w*\\?", std::regex_constants::basic);
if(std::regex_search(buff, e))
{
cout << "Found!";
}

Get String Between 2 Strings

How can I get a string that is between two other declared strings, for example:
String 1 = "[STRING1]"
String 2 = "[STRING2]"
Source:
"832h0ufhu0sdf4[STRING1]I need this text here[STRING2]afyh0fhdfosdfndsf"
How can I get the "I need this text here"?
Since this is homework, only clues:
Find index1 of occurrence of String1
Find index2 of occurrence of String2
Substring from index1+lengthOf(String1) (inclusive) to index2 (exclusive) is what you need
Copy this to a result buffer if necessary (don't forget to null-terminate)
Might be a good case for std::regex, which is part of C++11.
#include <iostream>
#include <string>
#include <regex>
int main()
{
using namespace std::string_literals;
auto start = "\\[STRING1\\]"s;
auto end = "\\[STRING2\\]"s;
std::regex base_regex(start + "(.*)" + end);
auto example = "832h0ufhu0sdf4[STRING1]I need this text here[STRING2]afyh0fhdfosdfndsf"s;
std::smatch base_match;
std::string matched;
if (std::regex_search(example, base_match, base_regex)) {
// The first sub_match is the whole string; the next
// sub_match is the first parenthesized expression.
if (base_match.size() == 2) {
matched = base_match[1].str();
}
}
std::cout << "example: \""<<example << "\"\n";
std::cout << "matched: \""<<matched << "\"\n";
}
Prints:
example: "832h0ufhu0sdf4[STRING1]I need this text here[STRING2]afyh0fhdfosdfndsf"
matched: "I need this text here"
What I did was create a program that creates two strings, start and end that serve as my start and end matches. I then use a regular expression string that will look for those, and match against anything in-between (including nothing). Then I use regex_match to find the matching part of the expression, and set matched as the matched string.
For more info, see http://en.cppreference.com/w/cpp/regex and http://en.cppreference.com/w/cpp/regex/regex_search
Use strstr http://www.cplusplus.com/reference/clibrary/cstring/strstr/ , with that function you will get 2 pointers, now you should compare them (if pointer1 < pointer2) if so, read all chars between them.