I want to split the following mathematical expression -1+33+4.4+sin(3)-2-x^2 into tokens using regex. I use the following site to test my regex expression link, this says that nothing wrong. When I implement the regex into my C++, throwing the following error Invalid special open parenthesis I looked for the solution and I find the following stackoverflow site link but it do not helped me solve my problem.
My regex code is (?<=[-+*\/^()])|(?=[-+*\/^()]). In the C++ code I do not use \.
The other problem is that I do not know how to determine the minus sign is an unary operator or a binary operator, if the minus is an unary operator I want to look like this {-1}
I want the tokens looks like this : {-1,+,33,+4.4,+,sin,(,3,),-,2,-,x,^,2}
The unary minus can be anywhere in the string.
If I do not use ^ it still wrong.
code:
std::vector<std::string> split(const std::string& s, std::string rgx_str) {
std::vector<std::string> elems;
std::regex rgx (rgx_str);
std::sregex_token_iterator iter(s.begin(), s.end(), rgx);
std::sregex_token_iterator end;
while (iter != end) {
elems.push_back(*iter);
++iter;
}
return elems;
}
int main() {
std::string str = "-1+33+4.4+sin(3)-2-x^2";
std::string reg = "(?<=[-+*/()^])|(?=[-+*/()^])";
std::vector<std::string> s = split(str,reg);
for(auto& a : s)
cout << a << endl;
return 0;
}
C++ uses a modified ECMAScript regular expression grammar for its std::regex by default. It does support lookaheads (?=) and (?!), but not lookbehinds. So, the (?<=) is not a valid std::regex syntax.
There is a proposal to add this in C++23, but it is not currently implemented.
Related
I'm string to create a std::regex(__FILE__) as part of a unit test which checks some exception output that prints the file name.
On Windows it fails with:
regex_error(error_escape): The expression contained an invalid escaped character, or a trailing escape.
because the __FILE__ macro expansion contains un-escaped backslashes.
Is there a more elegant way to escape the backslashes than to loop through the resulting string (i.e. with a std algorithm or some std::string function)?
File paths can contain many characters that have special meaning in regular expression patterns. Escaping just the backslashes is not enough for robust checking in the general case.
Even a simple path, like C:\Program Files (x86)\Vendor\Product\app.exe, contains several special characters. If you want to turn that into a regular expression (or part of a regular expression), you would need to escape not only the backslashes but also the parentheses and the period (dot).
Fortunately, we can solve our regular expression problem with more regular expressions:
std::string EscapeForRegularExpression(const std::string &s) {
static const std::regex metacharacters(R"([\.\^\$\-\+\(\)\[\]\{\}\|\?\*)");
return std::regex_replace(s, metacharacters, "\\$&");
}
(File paths can't contain * or ?, but I've included them to keep the function general.)
If you don't abide by the "no raw loops" guideline, a probably faster implementation would avoid regular expressions:
std::string EscapeForRegularExpression(const std::string &s) {
static const char metacharacters[] = R"(\.^$-+()[]{}|?*)";
std::string out;
out.reserve(s.size());
for (auto ch : s) {
if (std::strchr(metacharacters, ch))
out.push_back('\\');
out.push_back(ch);
}
return out;
}
Although the loop adds some clutter, this approach allows us to drop a level of escaping on the definition of metacharacters, which is a readability win over the regex version.
Here is polymapper.
It takes an operation that takes and element and returns a range, the "map operation".
It produces a function object that takes a container, and applies the "map operation" to each element. It returns the same type as the container, where each element has been expanded/contracted by the "map operation".
template<class Op>
auto polymapper( Op&& op ) {
return [op=std::forward<Op>(op)](auto&& r) {
using std::begin;
using R=std::decay_t<decltype(r)>;
using iterator = decltype( begin(r) );
using T = typename std::iterator_traits<iterator>::value_type;
std::vector<T> data;
for (auto&& e:decltype(r)(r)) {
for (auto&& out:op(e)) {
data.push_back(out);
}
}
return R{ data.begin(), data.end() };
};
}
Here is escape_stuff:
auto escape_stuff = polymapper([](char c)->std::vector<char> {
if (c != '\\') return {c};
else return {c,c};
});
live example.
int main() {
std::cout << escape_stuff(std::string(__FILE__)) << "\n";
}
The advantage of this approach is that the action of messing with the guts of the container is factored out. You write code that messes with the characters or elements, and the overall logic is not your problem.
The disadvantage is polymapper is a bit strange, and needless memory allocations are done. (Those could be optimized out, but that makes the code more convoluted).
EDIT
In the end, I switched to #AdrianMcCarthy 's more robust approach.
Here's the inelegant method in which I solved the problem in case someone stumbles on this actually looking for a workaround:
std::string escapeBackslashes(const std::string& s)
{
std::string out;
for (auto c : s)
{
out += c;
if (c == '\\')
out += c;
}
return out;
}
and then
std::regex(escapeBackslashes(__FILE__));
It's O(N) which is probably as good as you can do here, but involves a lot of string copying which I'd like to think isn't strictly necessary.
While working on a solution to this question, I came up with the following c++ regex:
#include <regex>
#include <string>
#include <iostream>
std::string remove_password(std::string const& input)
{
// I think this should work for skipping escaped quotes in the password.
// It works in javascript, but not in the standard library implementation.
// anyone have any ideas?
// (.*password\(("|'))(?:\\\2|[^\2])*?(\2.*)
// const char prog[] = R"__regex((.*password\(')([^']*)('.*)))__regex";
const char prog[] = R"__regex((.*password\(("|'))(?:\\\2|[^\2])*?(\2.*))__regex";
auto reg = std::regex(prog, std::regex_constants::syntax_option_type::ECMAScript);
std::smatch match;
std::regex_match(input, match, reg);
// match[0] is the entire string
// match[1] is pre-password
// match[2] is the password
// match[3] is post-password
return match[1].str() + "********" + match[3].str();
}
int main()
{
using namespace std::literals;
auto test_string = R"__(select * from run_on_hive(server('hdp230m2.labs.teradata.com'),username('vijay'),password('vijay'),dbname('default'),query('analyze table default.test01 compute statistics'));)__";
std::cout << remove_password(test_string);
}
I wanted to capture passwords, even if they contained an escaped quote or double-quote.
However the regex does not compile in clang or gcc.
It compiles correctly in regex101.com when using the javascript syntax.
Am I wrong, or is the implementation incorrect?
Note that ECMAScript is the default flavor in C++ std::regex, you do not have to specify it explicitly. At any rate, std::regex_constants::syntax_option_type::ECMAScript causes one error here since the compiler expects a std::regex_constants value here, and the simplest fix is to remove it or use std::regex(prog, std::regex_constants::ECMAScript).
The [^\2] pattern causes the second issue, Unexpected character in bracket expression. You cannot use backreferences inside bracket expressions, but you may use a negative lookahead to restrict a . / [^] pattern to match anything but what Group 2 holds.
Use
const char prog[] = R"((.*password\((["']))(?:\\\2|(?!\2)[^])*?(\2.*))";
See your fixed C++ demo.
However, it seems you may use a "cleaner" approach using std::regex_replace:
std::string remove_password(std::string const& input)
{
const char prog[] = R"((.*password\((["']))(?:\\\2|(?!\2)[^])*?(\2.*))";
auto reg = std::regex(prog);
return std::regex_replace(input, reg, "$1********$3");
}
See another C++ demo. The $1 and $3 are the placeholders for Group 1 and 3 values.
May we have similar question here stackoverflow:
But my question is:
First I tried to match all x in the string so I write the following code, and it's working well:
string str = line;
regex rx("x");
vector<int> index_matches; // results saved here
for (auto it = std::sregex_iterator(str.begin(), str.end(), rx);
it != std::sregex_iterator();
++it)
{
index_matches.push_back(it->position());
}
Now if I tried to match all { I tried to replace
regex rx("x"); with regex rx("{"); andregex rx("\{");.
So I got an exception and I think it should throw an exception because we use {
sometimes to express the regular expression, and it expect to have } in the regex at the end that's why it throw an exception.
So first is my explanation correct?
Second question I need to match all { using the same code above, is that possible to change the regex rx("{"); to something else?
You need to escape characters with special meaning in regular expressions, i.e. use \{ regular expression. But, \ has special meaning in C++ string literals. So, next you need to escape characters with special meaning in C++ string literals, i.e. write:
regex rx("\\{");
This question already has an answer here:
Get last match with Boost::Regex
(1 answer)
Closed 9 years ago.
Somehow, I've failed to find out, how to put only the first occurrence or regular expression to string. I can create a regex object:
static const boost::regex e("<(From )?([A-Za-z0-9_]+)>(.*?)");
Now, I need to match ([A-Za-z0-9_]+) to std::string, say playername.
std::string chat_input("<Darker> Hello");
std::string playername = e.some_match_method(chat_input, 1); //Get contents of the second (...)
What have I missed?
What should be instead of some_match_method and what parameters should it take?
You can do something like this:
static const regex e("<(From )?([A-Za-z0-9_]+)>(.*?)");
string chat_input("<Darker> Hello");
smatch mr;
if (regex_search(begin(chat_input), end(chat_input), mr, e)
string playername = mr[2].str(); //Get contents of the second (...)
Please note that regex is part of C++11, so you don't need boost for it, unless your regular expression is complex (as C++11 and newer still has difficulties processing complex regular expressions).
I think what you're missing is that boost::regex is the regular expression, but it doesn't do the parsing against a given input. You need to actually use it as a parameter to boost::regex_search or boost::regex_match, which evaluate a string (or iterator pairs) against the regular expression.
static const boost::regex e("<(From )?([A-Za-z0-9_]+)>(.*?)");
std::string chat_input("<Darker> Hello");
boost::match_results<std::string::const_iterator> results;
if (boost::regex_match(chat_input, results, e))
{
std::string playername = results[2]; //Get contents of the second (...)
}
I'm familiar with Regex itself, but whenever I try to find any examples or documentation to use regex with Unix computers, I just get tutorials on how to write regex or how to use the .NET specific libraries available for Windows. I've been searching for a while and I can't find any good tutorials on C++ regex on Unix machines.
What I'm trying to do:
Parse a string using regex by breaking it up and then reading the different subgroups. To make a PHP analogy, something like preg_match that returns all $matches.
Consider using Boost.Regex.
An example (from the website):
bool validate_card_format(const std::string& s)
{
static const boost::regex e("(\\d{4}[- ]){3}\\d{4}");
return regex_match(s, e);
}
Another example:
// match any format with the regular expression:
const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z");
const std::string machine_format("\\1\\2\\3\\4");
const std::string human_format("\\1-\\2-\\3-\\4");
std::string machine_readable_card_number(const std::string s)
{
return regex_replace(s, e, machine_format, boost::match_default | boost::format_sed);
}
std::string human_readable_card_number(const std::string s)
{
return regex_replace(s, e, human_format, boost::match_default | boost::format_sed);
}
Look up the documentation for TR1 regexes or (almost equivalently) boost regex. Both work quite nicely on various Unix systems. The TR1 regex classes have been accepted into C++ 0x, so though they're not exactly part of the standard yet, they will be reasonably soon.
Edit: To break a string into subgroups, you can use an sregex_token_iterator. You can specify either what you want matched as tokens, or what you want matched as separators. Here's a quickie demo of both:
#include <iterator>
#include <regex>
#include <string>
#include <iostream>
int main() {
std::string line;
std::cout << "Please enter some words: " << std::flush;
std::getline(std::cin, line);
std::tr1::regex r("[ .,:;\\t\\n]+");
std::tr1::regex w("[A-Za-z]+");
std::cout << "Matching words:\n";
std::copy(std::tr1::sregex_token_iterator(line.begin(), line.end(), w),
std::tr1::sregex_token_iterator(),
std::ostream_iterator<std::string>(std::cout, "\n"));
std::cout << "\nMatching separators:\n";
std::copy(std::tr1::sregex_token_iterator(line.begin(), line.end(), r, -1),
std::tr1::sregex_token_iterator(),
std::ostream_iterator<std::string>(std::cout, "\n"));
return 0;
}
If you give it input like this: "This is some 999 text", the result is like this:
Matching words:
This
is
some
text
Matching separators:
This
is
some
999
text
You are looking for regcomp, regexec and regfree.
One thing to be careful about is that the Posix regular expressions actually implement two different languages, regular (default) and extended (include the flag REG_EXTENDED in the call to regcomp). If you are coming from the PHP world, the extended language closer to what you are used to.
For perl-compatible regular expressions (pcre/preg), I'd suggest boost.regex.
My best bet would be boost::regex.
Try pcre. And pcrepp.
Feel free to have a look at this small color grep tool I wrote.
At github
It uses regcomp, regexec and regfree that R Samuel Klatchko refers to.
I use "GNU regex": http://www.gnu.org/s/libc/manual/html_node/Regular-Expressions.html
Works well but can't find clear solution for UTF-8 regexp.
Regards