Pcre php regex equal in c++ - c++

hello this is pcre regex (php regex)
/\h*(.*?)\h*[=]\h*("(.*?(?:[\\\\]".*?)*)")\h*([,|.*?])/
this regex work for this string
data1 = "value 1", data2 = "value 2", data3 = " data4(" hey ") ",
and get
data, data2, data3
val, val2, data4("val3")
what is this regex equal in c++ regex ?

You should replace \h with \s and use \\ inside a raw string literal.
Refer to the following example code:
#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main() {
std::string pat = R"(\s*(.*?)\s*=\s*(\"(.*?(?:[\\]\".*?)*)\")\s*([,|.*?]))";
std::regex r(pat);
std::cout << pat << "\n";
std::string s = R"(data1 = "value 1", data2 = "value 2", data3 = " data4(" hey ") ",)";
std::cout << s << "\n";
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r);
i != std::sregex_iterator();
++i)
{
std::smatch m = *i;
std::cout << "Capture 1: " << m[1].str() << " at Position " << m.position(1) << '\n';
std::cout << "Capture 3: " << m[3].str() << " at Position " << m.position(3) << '\n';
}
return 0;
}
See IDEONE demo and a JS (ECMA5) regex demo

Related

c++ [regex] how to extract given char value

how to extract digit number value?
std::regex legit_command("^\\([A-Z]+[0-9]+\\-[A-Z]+[0-9]+\\)$");
std::string input;
let say the user key in
(AA11-BB22)
i want get the
first_character = "aa"
first_number = 11
secondt_character = "bb"
second_number = 22
You could use capture groups. In the example below I replaced (AA11+BB22) with (AA11-BB22) to match the regex you posted. Note that regex_match only succeeds if the entire string matches the pattern so the beginning/end of line assertions (^ and $) are not required.
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
const string input = "(AA11-BB22)";
const regex legit_command("\\(([A-Z]+)([0-9]+)-([A-Z]+)([0-9]+)\\)");
smatch matches;
if(regex_match(input, matches, legit_command)) {
cout << "first_character " << matches[1] << endl;
cout << "first_number " << matches[2] << endl;
cout << "second_character " << matches[3] << endl;
cout << "second_number " << matches[4] << endl;
}
}
Output:
$ c++ main.cpp && ./a.out
first_character AA
first_number 11
second_character BB
second_number 22

C++ Regex Alpha without Equal sign

im new to Regex and C++.
My problem is, that '=' is matching when I search for [a-zA-Z]. But this is only a-z without '='?
Can anyone help me please?
string string1 = "s=s;";
enum states state = s1;
regex statement("[a-zA-Z]+[=][a-zA-Z0-9]+[;]");
regex rg_left_letter("[a-zA-Z]");
regex rg_equal("[=]");
regex rg_right_letter("[a-zA-Z0-9]");
regex rg_semicolon("[;]");
for (const auto &s : string1) {
cout << "Current Value: " << s << endl;
// step(&state, s);
if (regex_search(&s, rg_left_letter)) {
cout << "matching: " << s << endl;
} else {
cout << "not matching: " << s << endl;
}
// cout << "Step Executed with sate: " << state << endl;
}
This outputs:
Current Value: s
matching: s
Current Value: =
matching: =
Current Value: s
matching: s
Current Value: ;
not matching: ;
When you write
regex_search(&s, rg_left_letter)
you basically search the C-String &s for a match character-wise, beginning at the character s. Therefore, your loop will search for a match in the remaining sub-strings
s=s;
=s;
s;
;
Which will always succeed, except in the last case, as there is always one character in the entire string that fits your regex. Note however that this assumes that std::string has some 0-termination added, which is, as far as I can tell, not guaranteed if you do not explicitely use the c_str() method, making your code UB.
What you really want to use is the function regex_match, together with your original regex just as simple as:
#include <iostream>
#include <regex>
int main()
{
std::regex statement("[a-zA-Z]+[=][a-zA-Z0-9]+[;]");
if(std::regex_match("s=s;", statement)) { std::cout << "Hooray!\n"; }
}
This is working for me:
int main(void) {
string string1 = "s=s;";
enum states state = s1;
regex statement("[a-zA-Z]+[=][a-zA-Z0-9]+[;]");
regex rg_left_letter("[a-zA-Z]");
regex rg_equal("[=]");
regex rg_right_letter("[a-zA-Z0-9]");
regex rg_semicolon("[;]");
//for (const auto &s : string1) {
for (int i = 0; i < string1.size(); i++) {
cout << "Current Value: " << string1[i] << endl;
// step(&state, s);
if (regex_match(string1.substr(i, 1), rg_left_letter)) {
cout << "matching: " << string1[i] << endl;
} else {
cout << "not matching: " << string1[i] << endl;
}
// cout << "Step Executed with sate: " << state << endl;
}
cout << endl;
return 0;
}

How to use regex_token_iterator<std::string::iterator> get submatch's position of original string by the iterator itself?

Below is the code to find the match of "\b(sub)([^ ]*)" in "this subject has a submarine as a subsequence". But I also want to know the position of those sub matches in original string by regex_token_iterator itself. The result should be 5, 19, 34.
// regex_token_iterator example
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("this subject has a submarine as a subsequence");
std::regex e ("\\b(sub)([^ ]*)"); // matches words beginning by "sub"
// default constructor = end-of-sequence:
std::regex_token_iterator<std::string::iterator> rend;
std::cout << "entire matches:";
std::regex_token_iterator<std::string::iterator> a ( s.begin(), s.end(), e );
while (a!=rend) std::cout << " [" << *a++ << "]";
std::cout << std::endl;
return 0;
}
Output:
entire amtches: [subject] [submarine] [subsequence]
*a return a pair of two iterators over the string s. You could try this:
std::cout << " [" << *a++ << ' ' << a->first - s.begin() << "]";
or this
std::cout << " [" << *a++ << ' ' << std::distance(s.begin(), a->first) << "]";

regex_search and substring matching

Here is my code:
std::string var = "(1,2)";
std::smatch match;
std::regex rgx("[0-9]+");
if(std::regex_search(var,match,rgx))
for (size_t i = 0; i < match.size(); ++i)
std::cout << i << ": " << match[i] << '\n';
I want to be able to extract both 1 AND 2, but so far output is just the first match (1). I can't seem to figure out why and my brain is fried. It's probably something obvious
regex_match's elements are for matching groups within the regex.
In a slightly modified example
std::string var = "(11b,2x)";
std::smatch match;
std::regex rgx("([0-9]+)([a-z])");
if(std::regex_search(var,match,rgx))
for (size_t i = 0; i < match.size(); ++i)
std::cout << i << ": " << match[i] << '\n';
You'd get the following output:
0: 11b
1: 11
2: b
What you want is to use std::regex_iterator to go over all the matches:
auto b = std::sregex_iterator(var.cbegin(), var.cend(), rgx);
auto e = std::sregex_iterator();
std::for_each(b, e, [](std::smatch const& m){
cout << "match: " << m.str() << endl;
});
This will yield the desired output:
match: 1
match: 2
live demo

How to match a sequence of whitespaces with c++11 regex

std::string str = "ahw \t\n";
std::regex re(R"((\s)*)");
std::smatch mr;
if (std::regex_search(str, mr, re))
{
std::cout << "match found: " << mr.size() << "\n";
for (size_t i = 0; i < mr.size(); ++i)
{
std::string strrep = mr.str(i);
int len = mr.length(i);
std::cout << "index: " << i << "len : " << len << " string: '" << strrep << "'\n";
}
}
std::string newStr = std::regex_replace(str, re, "");
std::cout << "new string: '" << newStr << "'\n";
result:
What I expect: only 1 match, strrep should be ' \t\n', and len should be len(strrep) = 6. But both vc12 and gcc4.9.2 show the above result.
What's wrong with my understand? How could I match the whitespace sequence ' \t\n'?
Just turn \s* to \s+ in your regex because \s* matches an empty string also(ie, \s* matches zero or more spaces) also and you don't need to have a capturing group.