Preg match all in pcre c++ - c++

Hello this is my string
last_name, first_name
bjorge, philip
kardashian, kim
mercury, freddie
in php i am using preg_match_all (pcre) to start regex process
preg_match_all("/(.*), (.*)/", $input_lines, $output_array);
now i installed pcre on c++ and i want to know what exactly process in c++ pcre that equal my php code? what exactly function in c++ pcre that work like php preg_match_all ?

In C++ 11 regular expressions are supported by standard library. So you don't need to use pcre without any specific reasons.
As for example above, you can achieve the same using standard regular expressions. E.g.:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::vector<std::string> input = {
"last_name, first_name",
"bjorge, philip",
"kardashian, kim",
"mercury, freddie"
};
std::regex re("(.*), (.*)");
std::smatch pieces;
for (const std::string &s : input) {
if (std::regex_match(s, pieces, re)) {
std::cout << "Pieces: " << pieces.size() << std::endl;
for (size_t i = 0; i < pieces.size(); ++i) {
std::cout << pieces[i].str() << std::endl;
}
}
}
return 0;
}

Related

c++ regular matching ABC HHHH fetID_3141 ProID_1045

ABC HHHH fetID_3141 ProID_1045
The above is a string, I need to extract fetID_3141 and ProID_1045, mainly I need the numbers 3141 and 1045, how can I use C++ to do regular matching?
You can use std::regex for this.
#include <iostream>
#include <regex>
int main()
{
std::regex r("[a-zA-Z]+_([0-9]+)");
std::smatch m;
std::string s = "ABC HHHH fetID_3141 ProID_1045";
while (std::regex_search(s, m, r))
{
std::cout << m[1] << std::endl;
s = m.suffix().str();
}
}

Regex match the entire string [duplicate]

I'm reading a text file in the form of
People list
[Jane]
Female
31
...
and for each line I want to loop through and find the line that contains "[...]"
For example, [Jane]
I came up with the regex expression
"(^[\w+]$)"
which I tested that it works using regex101.com.
However, when I try to use that in my code, it fails to match with anything.
Here's my code:
void Jane::JaneProfile() {
// read each line, for each [title], add the next lines into its array
std::smatch matches;
for(int i = 0; i < m_numberOfLines; i++) { // #lines in text file
std::regex pat ("(^\[\w+\]$)");
if(regex_search(m_lines.at(i), matches, pat)) {
std::cout << "smatch " << matches.str(0) << std::endl;
std::cout << "smatch.size() = " << matches.size() << std::endl;
} else
std::cout << "wth" << std::endl;
}
}
When I run this code, all the lines go to the else loop and nothing matches...
I searched up for answers, but I got confused when I saw that for C++ you have to use double backslashes instead one backslash to escape... But it didn't work for my code even when I used double backslashes...
Where did I go wrong?
By the way, I'm using Qt Creator 3.6.0 Based on (Desktop) Qt 5.5.1 (Clang 6.1 (Apple), 64 bit)
---Edit----
I tried doing:
std::regex pat (R"(^\[\\w+\]$)");
But I get an error saying
Use of undeclared identifier 'R'
I already have #include <regex> but do I need to include something else?
Either escape the backslashes or use the raw character version with a prefix that won't appear in the regex:
escaped:
std::regex pat("^\\[\\w+\\]$");
raw character string:
std::regex pat(R"regex(^\[\w+\]$)regex");
working demo (adapted from OPs posted code):
#include <iostream>
#include <regex>
#include <sstream>
#include <string>
#include <vector>
int main()
{
auto test_data =
"People list\n"
"[Jane]\n"
"Female\n"
"31";
// initialise test data
std::istringstream source(test_data);
std::string buffer;
std::vector<std::string> lines;
while (std::getline(source, buffer)) {
lines.push_back(std::move(buffer));
}
// test the regex
// read each line, for each [title], add the next lines into its array
std::smatch matches;
for(int i = 0; i < lines.size(); ++i) { // #lines in text file
static const std::regex pat ("(^\\[\\w+\\]$)");
if(regex_search(lines.at(i), matches, pat)) {
std::cout << "smatch " << matches.str() << std::endl;
std::cout << "smatch.size() = " << matches.size() << std::endl;
} else
std::cout << "wth" << std::endl;
}
return 0;
}
expected output:
wth
smatch [Jane]
smatch.size() = 2
wth
wth

Regex search overlapping matches c++11

What regex expression should I use to search all occurrences that match:
Start with 55 or 66
followed by a minimum 8 characters in the range of [0-9a-fA-F] (HEX numbers)
Ends with \r (a carriage return)
Example string: 0205065509085503400066/r09\r
My expected result:
5509085503400066\r
5503400066\r
My current result:
5509085503400066\r
Using
(?:55|66)[0-9a-fA-F]{8,}\r
As you can sie, this finds onlny the first result but not the second one.
Edit clarification
I search the string using Regex. It'll select the message for further parsing. The target string can start anywhere in the string. The target string is only valid if it only contains base-16 (HEX) numbers, and ends with a carriage return.
[start] [information part minimum 8 chars] [end symbol-carigge return]
I'm using the std::regex library in c++11 with the flag ECMAScript
Edit
I have created an alternative solution that gives me the expected result. But this is not pure regex.
#include <iostream>
#include <string>
#include <regex>
int main()
{
// repeated search (see also
std::regex_iterator)
std::string log("0055\r0655036608090705\r");
std::regex r("(?:55|66)[0-9a-fA-F]{8,}\r");
std::smatch sm;
while(regex_search(log, sm, r))
{
std::cout << sm.str() << '\n';
log = sm.str();
log += sm.suffix();
log[0] = 'a' ;
}
}
** Edit: Working regex solution based on comments **
#include <iostream>
#include <string>
#include <regex>
int main()
{
// repeated search (see also
std::regex_iterator)
std::string s("0055\r06550003665508090705\r0970");
std::regex r("(?=((?:55|66)[0-9a-fA-F]{8,}\r))");
auto words_begin =
std::sregex_iterator(s.begin(), s.end(), r);
auto words_end = std::sregex_iterator();
std::cout << "Found "
<< std::distance(words_begin, words_end)
<< " words:\n";
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
std::smatch match = *i;
std::string match_str = s.substr(match.position(1), match.length(1) - 1); //-1 cr
std::cout << match_str << " =" << match.position(1) << '\n';
}
}
Your are actually looking for overlapping matches. This can be achieved using a regex lookahead like this:
(?=((?:55|66)[0-9a-fA-F]{8,}\/r))
You will find the matches in question in group 1. The full-match, however, is empty.
Regex Demo (using /r instead of a carriage return for demonstration purposes only)
Sample Code:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main() {
std::string subject("0055\r06550003665508090705\r0970");
try {
std::regex re("(?=((?:55|66)[0-9a-fA-F]{8,}\r))");
std::sregex_iterator next(subject.begin(), subject.end(), re);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str(1) << "\n";
next++;
}
} catch (std::regex_error& e) {
// Syntax error in the regular expression
}
return 0;
}
See also: Regex-Info: C++ Regular Expressions with std::regex

How does Regex work in Boost Regex C++

I am currently using Boost Regex library and am trying to get a function called arguments in C++. For instance, I have a page with HTML and there a JavaScript function called, we will call it something like
XsrfToken.setToken('54sffds');
What I currently have, which isn't working.
std::string response = request->getResponse();
boost::regex expression;
if (type == "CSRF") {
expression = {"XsrfToken.setToken\('(.*?)'\)"};
}
boost::smatch results;
if (boost::regex_search(response, results, expression)) {
std::cout << results[0] << " TOKEN" << std::endl;
}
Where response is the HTML web page, and expression is the regex. The conditional statement is running, therefore I think something is wrong with my regex, but I do not know.
[EDITED]
Forgot to mention that that regex was extracted from PHP and works in a PHP regex checker/debugger
Your mistake not in a regex syntax though the ? is redundant after *, but in C++ string constant literal: the backslash char should be escaped with backslash:
#include <boost/regex.hpp>
#include <iostream>
#include <string>
std::string response("XsrfToken.setToken('ABC')");
boost::regex expression("XsrfToken.setToken\\('(.*?)'\\)");
int main() {
boost::smatch results;
if (boost::regex_search(response, results, expression)) {
std::cout << results[0] << " TOKEN" << std::endl;
}
}

C++ split string on multiple substrings

In C++, I'd like to something similar to:
Split on substring
However, I'd like to specify more than one substring to split on. For example, I'd like to split on "+", "foo", "ba" for the string "fda+hifoolkjba4" into a vector of "fda", "hi", "lkj", "4". Any suggestions? Preferably within STL and Boost (I'd rather not have to incorporate the Qt framework or additional libraries if I can avoid it).
I would go with regular expressions, either from <regex> or <boost/regex.hpp>; the regular expression you need would be something like (.*?)(\+|foo|ba) (plus the final token).
Here's a cut-down example using Boost:
std::string str(argv[1]);
boost::regex r("(.*?)(\\+|foo|ba)");
boost::sregex_iterator rt(str.begin(), str.end(), r), rend;
std::string final;
for ( ; rt != rend; ++rt)
{
std::cout << (*rt)[1] << std::endl;
final = rt->suffix();
}
std::cout << final << std::endl;
I suggest using regular expression support in boost. See here for an example.
here is a sample code that can split the string:
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
int main()
{
boost::regex re("(\\+|foo|ba)");
std::string s("fda+hifoolkjba4");
boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
boost::sregex_token_iterator j;
while (i != j) {
std::cout << *i++ << " ";
}
std::cout << std::endl;
return 0;
}