Regex expression to extract everything inside brackets - c++

I need to extract content inside brackets () from the following string in C++;
#82=IFCCLASSIFICATIONREFERENCE($,'E05.11.a','Rectangular',#28);
I tried following regex but it gives an output with brackets intact.
std::regex e2 ("\\((.*?)\\)");
if (std::regex_search(sLine,m,e2)){
}
Output should be:
$,'E05.11.a','Rectangular',#28

The result you are looking for should be in the first matching subexpression, i.e. comprised in the [[1].first, m[1].second) interval.
This is because your regex matches also the enclosing parentheses, but you specified a grouping subexpression, i.e. (.*?). Here is a starting point to some documentation

Use lookaheads: "(?<=\\()[^)]*?(?=\\))". Watch out, as this won't work for nested parentheses.
You can also use backreferences.

(?<=\().*(?=\))
Try this i only tested in one tester but it worked. It basically looks for any character after a ( and before a ) but not including them.

Related

regular expression replace removes first and last character when using $1

I have string like this:
&breakUp=Mumbai;city,Puma;brand&
where Mumbai;city and Puma;brand are filters(let say) separated by comma(,). I have to add more filters like Delhi;State.
I am using following regular expression to find the above string:
&breakUp=.([\w;,]*).&
and following regular expression to replace it:
&breakUp=$1,Delhi;State&
It is finding the string correctly but while replacing it is removing the first and last character and giving the following result:
&breakUp=umbai;city,Puma;bran,Delhi;State&
How to resolve this?
Also, If I have no filters I don't want that first comma. Like
&breakUp=&
should become
&breakUp=Delhi;State&
How to do it?
My guess is that your expression is just fine, there are two extra . in there, that we would remove those:
&breakUp=([\w;,]*)&
In this demo, the expression is explained, if you might be interested.
To bypass &breakUp=&, we can likely apply this expression:
&breakUp=([^&]+)&
Demo
Your problem seems to be the leading and trailing period, they are matched to any character.
Try using this regex:
&breakUp=([\w;,]*)&

Regular expression to find a same consecutive words

I'm a newbie to regular expressions and i have a problem in identifying the same consecutive words using regular expression. below is the scenario.
Here is the data :
;af;aj;am;an;ao;ap12;aq123;ar;as;ad;af1223;
and my current regular expression is (;[a-z][a-z];) and it only matches the below sets ;af; , ;am; , ;ao; , ;ar; , ;ad; but my expectation is to match all these sets. ;af;aj;am;an;ao; & ;ar;as;ad;.
Could guys please guide me how to match these patterns?
It seems like your trying to extract the substrings which are in this ;[a-z][a-z]; format. If yes, then you could simply put your regex inside a lookahead to do a overlapping match.
(?=(;[a-z][a-z];))
DEMO
(;[a-z][a-z](?=;))
Try this.This returns the group you are looking for though its not clear how they are same.
The reason why urs was not working wass due to that fact (;[a-z][a-z];) doesnt leave a ; for the next element to start with.So it is not able to match as there is no ; in front of it.A lookahead assertion doesnt cosume ; thereby enabling all matches.
See demo.
http://regex101.com/r/tF4jD3/4

Regex to capture most of alternating pattern

I have the following file names which should pass through regex
6505208533_95d2834be5_b#2x.jpg
6505208533_95d2834be5_b~ipad.jpg
6505208533_95d2834be5_b~ipad#2x.jpg
6505218557_8407260688_b#2x.png
6505218557_8407260688_b~ipad.png
6505218557_8407260688_b~ipad#2x.png
6505237749_b71c648be2_b#2x.jpg
6505237749_b71c648be2_b~ipad.jpg
6505237749_b71c648be2_b~ipad#2x.jpg
The following regex should capture all file name suffixes: ~ipad#2x, #2x and ~ipad.
(.+)(#2x|~ipad|~ipad#2x)\.(?:jpg|png)
However, it does NOT capture ~ipad#2x. How to solve it?
You should use the lazy operator after .+:
(.+?)
instead of:
(.+)
Otherwise it will try to be greedy and match the longest possible string (demo).
The better and more semantically correct solution is to simply change the order of your suffixes since your combo suffix "~ipad#2x" is never reached in the search because it is a combination of the other two which always match first:
(.+?)(~ipad#2x|#2x|~ipad)\.(?:jpg|png)

how to avoid to match the last letter in this regexp?

I have a quesion about regexp in tcl:
first output: TIP_12.3.4 %
second output: TIP_12.3.4 %
and sometimes the output maybe look like:
first output: TIP_12 %
second output: TIP_12 %
I want to get the number 12.3.4 or 12 using the following exgexp:
output: TIP_(/[0-9].*/[0-9])
but why it does not matches 12.3.4 or 12%?
You need to escape the dot, else it stands for "match every character". Also, I'm not sure about the slashes in your regexp. Better solution:
/TIP_(\d+\.?)+/
Your problem is that / is not special in Tcl's regular expression language at all. It's just an ordinary printable non-letter character. (Other languages are a little different, as it is quite common to enclose regular expressions in / characters; this is not the case in Tcl.) Because it is a simple literal, using it in your RE makes it expect it in the input (despite it not being there); unsurprisingly, that makes the RE not match.
Fixing things: I'd use a regular expression like this: output: TIP_([\d.]+) under the assumption that the data is reasonably well formatted. That would lead to code like this:
regexp {output: TIP_([0-9.]+)} $input -> dottedDigits
Everything not in parentheses is a literal here, so that the code is able to find what to match. Inside the parentheses (the bit we're saving for later) we want one or more digits or periods; putting them inside a square-bracketed-set is perfect and simple. The net effect is to store the 12.3.4 in the variable dottedDigits (if found) and to yield a boolean result that says whether it matched (i.e., you can put it in an if condition usefully).
NB: the regular expression is enclosed in braces because square brackets are also Tcl language metacharacters; putting the RE in braces avoids trouble with misinterpretation of your script. (You could use backslashes instead, but they're ugly…)
Try this :
output: TIP_(/([0-9\.^%]*)/[0-9])
Capture group 1.
Demo here :
http://regexr.com?31f6g
The following expression works for me:
{TIP_((\d+\.?)+)}

Regular expression to match text between either square or curly brackets

Related to my previous question, I have a string on the following format:
this {is} a [sample] string with [some] {special} words. [another one]
What is the regular expression to extract the words within either square or curly brackets, ie.
{is}
[sample]
[some]
{special}
[another one]
Note: In my use case, brackets cannot be nested. I would also like to keep the enclosing characters, so that I can tell the difference between them when processing the results.
Simply or (|) the different things you wish to match together:
\[.*?\]|\{.*?\}
This one seems to work:
[[{].*?[}\]]
Or this one:
\[.*?\]|{.*?}
If you want to catch the cases mentioned in the comments below.
You can use an online regex tester to try these things out. I think http://gskinner.com/RegExr/ is one of the more user-friendly options.