May we have similar question here stackoverflow:
But my question is:
First I tried to match all x in the string so I write the following code, and it's working well:
string str = line;
regex rx("x");
vector<int> index_matches; // results saved here
for (auto it = std::sregex_iterator(str.begin(), str.end(), rx);
it != std::sregex_iterator();
++it)
{
index_matches.push_back(it->position());
}
Now if I tried to match all { I tried to replace
regex rx("x"); with regex rx("{"); andregex rx("\{");.
So I got an exception and I think it should throw an exception because we use {
sometimes to express the regular expression, and it expect to have } in the regex at the end that's why it throw an exception.
So first is my explanation correct?
Second question I need to match all { using the same code above, is that possible to change the regex rx("{"); to something else?
You need to escape characters with special meaning in regular expressions, i.e. use \{ regular expression. But, \ has special meaning in C++ string literals. So, next you need to escape characters with special meaning in C++ string literals, i.e. write:
regex rx("\\{");
Related
I want to split the following mathematical expression -1+33+4.4+sin(3)-2-x^2 into tokens using regex. I use the following site to test my regex expression link, this says that nothing wrong. When I implement the regex into my C++, throwing the following error Invalid special open parenthesis I looked for the solution and I find the following stackoverflow site link but it do not helped me solve my problem.
My regex code is (?<=[-+*\/^()])|(?=[-+*\/^()]). In the C++ code I do not use \.
The other problem is that I do not know how to determine the minus sign is an unary operator or a binary operator, if the minus is an unary operator I want to look like this {-1}
I want the tokens looks like this : {-1,+,33,+4.4,+,sin,(,3,),-,2,-,x,^,2}
The unary minus can be anywhere in the string.
If I do not use ^ it still wrong.
code:
std::vector<std::string> split(const std::string& s, std::string rgx_str) {
std::vector<std::string> elems;
std::regex rgx (rgx_str);
std::sregex_token_iterator iter(s.begin(), s.end(), rgx);
std::sregex_token_iterator end;
while (iter != end) {
elems.push_back(*iter);
++iter;
}
return elems;
}
int main() {
std::string str = "-1+33+4.4+sin(3)-2-x^2";
std::string reg = "(?<=[-+*/()^])|(?=[-+*/()^])";
std::vector<std::string> s = split(str,reg);
for(auto& a : s)
cout << a << endl;
return 0;
}
C++ uses a modified ECMAScript regular expression grammar for its std::regex by default. It does support lookaheads (?=) and (?!), but not lookbehinds. So, the (?<=) is not a valid std::regex syntax.
There is a proposal to add this in C++23, but it is not currently implemented.
In verilog language, the statements are enclosed in a begin-end delimiter instead of bracket.
always# (*) begin
if (condA) begin
a = c
end
else begin
b = d
end
end
I'd like to parse outermost begin-end with its statements to check coding rule in python. Using regular expression, I want results with regular expression like:
if (condA) begin
a = c
end
else begin
b = d
end
I found similar answer for bracket delimiter.
int funcA() {
if (condA) {
b = a
}
}
regular expression:
/({(?>[^{}]+|(?R))*})/g
However, I don't know how to modify atomic group ([^{}]) for "begin-end"?
/(begin(?>[??????]+|(?R))*end)/g
The point of the [??????]+ part is to match any text that does not match a char that is equal or is the starting point of the delimiters.
So, in your case, you need to match any char other than a char that starts either begin or end substring:
/begin(?>(?!begin|end).|(?R))*end/gs
See the regex demo
The . here will match any char including line break chars due to the s modifier. Note that the actual implementation might need adjustments (e.g. in PHP, the g modifier should not be used as there are specific functions/features for that).
Also, since you recurse the whole pattern, you need no outer parentheses.
Here's the regular expression:
let legalStr = "(?:[eE][\\+\\-]?[0-9]{1,3})?$"
Here's the invocation:
if let match = sender.stringValue.rangeOfString(legalStr, options: .RegularExpressionSearch) {
print("\(sender.stringValue) is legal")
}
else {
print( "\(sender.stringValue) is not legal")
}
If I type garbage, like "abcd" is returns illegal string.
If I type something like "e123" it returns legal string.
(note that the empty string is also legal.)
However, if I type "e1234" it still returns "legal". I'd expect it to return "not legal". Am I missing something here? BTW, note the "$" at the end of the regular expression. The three digits should appear at the end of the string.
If it's not immediately clear, the source of the string is a text edit box.
Your pattern is only anchored at the end, and matches the empty string. So any string at all will match successfully by just matching your pattern as an empty string at the end.
Add a ^ to the front to anchor it on that side, too.
What is the regular expression for removing the suffix of file names? For example, if I have a file name in a string such as "vnb.txt", what is the regular expression to remove ".txt"?
Thanks.
Do you really need a regular expression to do this? Why not just look for the last period in the string, and trim the string up to that point? Frankly, there's a lot of overhead for a regular expression, and I don't think you need it in this case.
As suggested by tstenner, you can try one of the following, depending on what kinds of strings you're using:
std::strrchr
std::string::find_last_of
First example:
char* str = "Directory/file.txt";
size_t index;
char* pStr = strrchr(str,'.');
if(nullptr != pStr)
{
index = pStr - str;
}
Second example:
int index = string("Directory/file.txt").find_last_of('.');
If you are using Qt already, you could use QFileInfo, and use the baseName() function to get just the name (if one exists), or the suffix() function to get the extension (if one exists).
If you're looking for a solution that will give you anything except for the suffix, you should use string::find_last_of.
Your code could look like this:
const std::string removesuffix(const std::string& s) {
size_t suffixbegin = s.find_last_of('.');
//This will handle cases like "directory.foo/bar"
size_t dir = s.find_last_of('/');
if(dir != std::string::npos && dir > suffixbegin) return s;
if(suffixbegin == std::string::npos) return s;
else return s.substr(0,suffixbegin);
}
If you're looking for a regular expression, use \.[^.]+$.
You have to escape the first ., otherwise it will match any character, and put a $ at the end, so it will only match at the end of a string.
Different operating systems may allow different characters in filenams, the simplest regex might be (.+)\.txt$. Get the first capture group to get the filename sans extension.
I thought that $ indicates the end of string. However, the following piece of code gives "testbbbccc" as a result, which is quite astonishing to me... This means that $ actually matches end of line, not end of the whole string.
#include <iostream>
#include <regex>
using namespace std;
int main()
{
tr1::regex r("aaa([^]*?)(ogr|$)");
string test("bbbaaatestbbbccc\nddd");
vector<int> captures;
captures.push_back(1);
const std::tr1::sregex_token_iterator end;
for (std::tr1::sregex_token_iterator iter(test.begin(), test.end(), r, captures); iter != end; )
{
string& t1 = iter->str();
iter++;
cout << t1;
}
}
I have been trying to find a "multiline" switch (which actually can be easily found in PCRE), but without success... Can someone point me to the right direction?
Regards,
R.P.
As Boost::Regex was selected for tr1, try the following:
From Boost::Regex
Anchors:
A '^' character shall match the start
of a line when used as the first
character of an expression, or the
first character of a sub-expression.
A '$' character shall match the end of
a line when used as the last character
of an expression, or the last
character of a sub-expression.
So the behavior you observed is correct.
From: Boost Regex as well:
\A Matches at the start of a buffer
only (the same as \`).
\z Matches at
the end of a buffer only (the same as
\').
\Z Matches an optional sequence
of newlines at the end of a buffer:
equivalent to the regular expression
\n*\z
I hope that helps.
There is no multiline switch in TR1 regexs. It's not exactly the same, but you could get the same functionality matching everything:
(.|\r|\n)*?
This matches non-greedily every character, including new line and carriage return.
Note: Remember to escape the backslashes '\' like this '\\' if your pattern is a C++ string in code.
Note 2: If you don't want to capture the matched contents, append '?:' to the opening bracket:
(?:.|\r|\n)*?