While working on a solution to this question, I came up with the following c++ regex:
#include <regex>
#include <string>
#include <iostream>
std::string remove_password(std::string const& input)
{
// I think this should work for skipping escaped quotes in the password.
// It works in javascript, but not in the standard library implementation.
// anyone have any ideas?
// (.*password\(("|'))(?:\\\2|[^\2])*?(\2.*)
// const char prog[] = R"__regex((.*password\(')([^']*)('.*)))__regex";
const char prog[] = R"__regex((.*password\(("|'))(?:\\\2|[^\2])*?(\2.*))__regex";
auto reg = std::regex(prog, std::regex_constants::syntax_option_type::ECMAScript);
std::smatch match;
std::regex_match(input, match, reg);
// match[0] is the entire string
// match[1] is pre-password
// match[2] is the password
// match[3] is post-password
return match[1].str() + "********" + match[3].str();
}
int main()
{
using namespace std::literals;
auto test_string = R"__(select * from run_on_hive(server('hdp230m2.labs.teradata.com'),username('vijay'),password('vijay'),dbname('default'),query('analyze table default.test01 compute statistics'));)__";
std::cout << remove_password(test_string);
}
I wanted to capture passwords, even if they contained an escaped quote or double-quote.
However the regex does not compile in clang or gcc.
It compiles correctly in regex101.com when using the javascript syntax.
Am I wrong, or is the implementation incorrect?
Note that ECMAScript is the default flavor in C++ std::regex, you do not have to specify it explicitly. At any rate, std::regex_constants::syntax_option_type::ECMAScript causes one error here since the compiler expects a std::regex_constants value here, and the simplest fix is to remove it or use std::regex(prog, std::regex_constants::ECMAScript).
The [^\2] pattern causes the second issue, Unexpected character in bracket expression. You cannot use backreferences inside bracket expressions, but you may use a negative lookahead to restrict a . / [^] pattern to match anything but what Group 2 holds.
Use
const char prog[] = R"((.*password\((["']))(?:\\\2|(?!\2)[^])*?(\2.*))";
See your fixed C++ demo.
However, it seems you may use a "cleaner" approach using std::regex_replace:
std::string remove_password(std::string const& input)
{
const char prog[] = R"((.*password\((["']))(?:\\\2|(?!\2)[^])*?(\2.*))";
auto reg = std::regex(prog);
return std::regex_replace(input, reg, "$1********$3");
}
See another C++ demo. The $1 and $3 are the placeholders for Group 1 and 3 values.
Related
I want to split the following mathematical expression -1+33+4.4+sin(3)-2-x^2 into tokens using regex. I use the following site to test my regex expression link, this says that nothing wrong. When I implement the regex into my C++, throwing the following error Invalid special open parenthesis I looked for the solution and I find the following stackoverflow site link but it do not helped me solve my problem.
My regex code is (?<=[-+*\/^()])|(?=[-+*\/^()]). In the C++ code I do not use \.
The other problem is that I do not know how to determine the minus sign is an unary operator or a binary operator, if the minus is an unary operator I want to look like this {-1}
I want the tokens looks like this : {-1,+,33,+4.4,+,sin,(,3,),-,2,-,x,^,2}
The unary minus can be anywhere in the string.
If I do not use ^ it still wrong.
code:
std::vector<std::string> split(const std::string& s, std::string rgx_str) {
std::vector<std::string> elems;
std::regex rgx (rgx_str);
std::sregex_token_iterator iter(s.begin(), s.end(), rgx);
std::sregex_token_iterator end;
while (iter != end) {
elems.push_back(*iter);
++iter;
}
return elems;
}
int main() {
std::string str = "-1+33+4.4+sin(3)-2-x^2";
std::string reg = "(?<=[-+*/()^])|(?=[-+*/()^])";
std::vector<std::string> s = split(str,reg);
for(auto& a : s)
cout << a << endl;
return 0;
}
C++ uses a modified ECMAScript regular expression grammar for its std::regex by default. It does support lookaheads (?=) and (?!), but not lookbehinds. So, the (?<=) is not a valid std::regex syntax.
There is a proposal to add this in C++23, but it is not currently implemented.
I am trying to replace one backslash with two. To do that I tried using the following code
str = "d:\test\text.txt"
str.replace("\\","\\\\");
The code does not work. Whole idea is to pass str to deletefile function, which requires double blackslash.
since c++11, you may try using regex
#include <regex>
#include <iostream>
int main() {
auto s = std::string(R"(\tmp\)");
s = std::regex_replace(s, std::regex(R"(\\)"), R"(\\)");
std::cout << s << std::endl;
}
A bit overkill, but does the trick is you want a "quick" sollution
There are two errors in your code.
First line: you forgot to double the \ in the literal string.
It happens that \t is a valid escape representing the tab character, so you get no compiler error, but your string doesn't contain what you expect.
Second line: according to the reference of string::replace,
you can replace a substring by another substring based on the substring position.
However, there is no version that makes a substitution, i.e. replace all occurences of a given substring by another one.
This doesn't exist in the standard library. It exists for example in the boost library, see boost string algorithms. The algorithm you are looking for is called replace_all.
string "I am 5 years old"
regex "(?!am )\d"
if you go to http://regexr.com/ and apply regex to the string you'll get 5.
I would like to get this result with std::regex, but I do not understand how to use match results and probably regex has to be changed as well.
std::regex expression("(?!am )\\d");
std::smatch match;
std::string what("I am 5 years old.");
if (regex_search(what, match, expression))
{
//???
}
The std::smatch is an instantiation of the match_results class template for matches on string objects (with string::const_iterator as its iterator type). The members of this class are those described for match_results, but using string::const_iterator as its BidirectionalIterator template parameter.
std::match_results supports a operator[]:
If n > 0 and n < size(), returns a reference to the std::sub_match representing the part of the target sequence that was matched by the nth captured marked subexpression).
If n == 0, returns a reference to the std::sub_match representing the part of the target sequence matched by the entire matched regular expression.
if n >= size(), returns a reference to a std::sub_match representing an unmatched sub-expression (an empty subrange of the target sequence).
In your case, regex_search finds the first match only and then match[0] holds the entire match text, match[1] would contain the text captured with the first capturing group (the fist parenthesized pattern part), etc. In this case though, your regex does not contain capturing groups.
Here, you need to use a capturing mechanism here since std::regex does not support a lookbehind. You used a lookahead that checks the text that immediately follows the current location, and the regex you have is not doing what you think it is.
So, use the following code:
#include <regex>
#include <string>
#include <iostream>
using namespace std;
int main() {
std::regex expression(R"(am\s+(\d+))");
std::smatch match;
std::string what("I am 5 years old.");
if (regex_search(what, match, expression))
{
cout << match.str(1) << endl;
}
return 0;
}
Here, the pattern is am\s+(\d+)". It is matching am, 1+ whitespaces, and then captures 1 or more digits with (\d+). Inside the code, match.str(1) allows access to the values that are captured with capturing groups. As there is only one (...) in the pattern, one capturing group, its ID is 1. So, str(1) returns the text captured into this group.
The raw string literal (R"(...)") allows using a single backslash for regex escapes (like \d, \s, etc.).
first time regex (in c++ that is)
I have a hard time writing
(?<=name=")(?:[^\\"]+|\\.)*(?=")
that matches for example name="blabla" xyz as blabla as code...
How do I
std::regex TheName("(?<=name=")(?:[^\\"]+|\\.)*(?=")");
correctly please?
You need to use capturing rather than positive lookbehind in C++ regex. Also, it is advisable to use the unroll-the-loop principle to unroll your ([^"\\]|\\.)* subpattern to make the regex as fast as it can be, see [^\"\\]*(?:\\.[^\"\\]*)*. Also, it is advisable to use raw string literals (see R"(<PATTERN>)") when defining regex patterns in order to avoid overescaping.
See the C++ demo:
#include <iostream>
#include <regex>
using namespace std;
int main() {
std::string s = "name=\"bla \\\"bla\\\"\"";
std::regex TheName(R"(name=\"([^\"\\]*(?:\\.[^\"\\]*)*)\")");
std::smatch m;
if (regex_search(s, m, TheName)) {
std::cout << m[1].str() << std::endl;
}
return 0;
}
Result: bla \"bla\"
I need to match the text '\0' with the same regex that I would match 'a' or 'b'. (a regex for a character constant in C++). I've tried a bunch of different regexes, but haven't gotten a successful one yet. My latest attempt:
^['].|\\0[']
Most of the other things I've tried have given seg faults, so this is really the closest I've gotten.
This works pretty nicely with what I've tested ('a','b','\0').
If you don't have std::regex or boost::regex I guess what you can get out of it is the fact that the regex I used is ('.'|'\\0').
#include <boost/regex.hpp>
#include <string>
#include <iostream>
#include <vector>
int main() {
std::vector<std::string> strings;
strings.push_back(R"('a')");
strings.push_back(R"('b')");
strings.push_back(R"('\0')");
boost::regex rgx(R"(('.'|'\\0'))");
boost::smatch match;
for(auto& i : strings) {
if(boost::regex_match(i,match, rgx)) {
boost::ssub_match submatch = match[1];
std::cout << submatch.str() << '\n';
}
}
}
Example
There's nothing magic about '\0'; it's just a character, like any other character, and there's nothing (almost) special you have to do to use it in a regular expression. The only problem you might run into is if you use it in the middle of a character literal that you pass to a function that treats it as the end of a string. To avoid that, force it into a std::string:
const char s[] = "a\0b";
std::string not_my_str(s); // not_my_str holds "a"
std::string str(s, 3); // str holds "a\0b"
Once you've constructed the string object, the embedded '\0' gets no special treatment. Except, of course, if you copy the contents with a function that treats it specially.
The regex that works (in this instance, using the C header ) is:
^('(.|([\\]0))')
Thanks to #WhozCraig for the help!