Regular expression to match string in brackets - regex

I'm building a regular expression which have to extract strings from brackets. This is an example string:
((?X is parent ?Y)(?X is child ?Z))
I need to get strings: '?X is parent ?Y' and also '?X is child ?Z'. This is what I've created yet:
^(\((.*?)\))+$
The problem is that it matches only the string in the second bracket. Could anybody help me to improve the expression so that it matches both strings in brackets?
Note: brackets can contain any content, like ((AAA)(BBB)). In this case 'AAA' and 'BBB' should be matched.
Thanks forward.

Based on your comments, it seems that you just want to match anything inside the brackets, for that you can use:
String Sample1 = "((something)(world)(example))";
Pattern regex = Pattern.compile("\\(?\\((.*?)\\)\\)?");
Matcher regexMatcher = regex.matcher(Sample1);
while (regexMatcher.find()) {
System.out.print(regexMatcher.group(1));
// something world example
}
Demo
Regex Explanation
Match the character “(” literally «\(?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the character “(” literally «\(»
Match the regular expression below and capture its match into backreference number 1 «(.*?)»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “)” literally «\)»
Match the character “)” literally «\)?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»

This seems to work:
Pattern.compile("[\\(]{0,1}(\\((.*?)\\))")
Thanks all for replies and comments.

Related

regex to match if string has dot as its first occurenc?

I have to write regex to match regex to match if string has dot as its first occurence.
I want 5
to match below string
.hello
The string which starts with dot
Below strings not have to match:
helo.h
I have tried like below but it wont work
/\.(.*)/g
https://regexr.com/4ibiu
you can try the following :
^\..+$
^ asserts position at start of a line
\. matches the character . literally (case sensitive)
.+ matches any character (except for line terminators)
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of a line
https://regex101.com/r/1WQHWA/1
Answer for my question is
https://regexr.com/4ibl0
Answer is given by #The fourth bird
Probably something like this:
/(^\..+)/
regexr.com/4ibjd
Just note that this will also match a string of all dots. If you want a word after a single dot, you can try something like this: /(^\.[a-z]+)/
try this:
[^a-z](\.(.*))
it basicly does the same as you have, and defines that there shouldn't be a a-z in front of the match

Regular expression to remove syslog date in filebeat?

I would like to parse some syslog lines that they look like
Oct 20 16:34:59 artguard TTN-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
I would like to turn them into
TTN-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
So I was wondering how the regular expression should look like that would allow me to do so, since the first part will change every day, because it is appended by the syslog.
EDIT: to avoid duplicated, I am trying to use REGEX with filebeat, where no all regex are supported as explained here
Regex101
(TTN-.*$)
Debuggex Demo
Explained
1st Capturing Group (TTN-.*$)
TTN- matches the characters TTN- literally (case sensitive)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
The regular expression TTN-\S* is probably a way of doing what you're looking for, here it is in a java-script example.
var value = "Oct 20 16:34:59 artguard TTN-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
var matches = value.match(
new RegExp("TTN-\\S*", "gi")
);
document.writeln(matches);
It works in two main parts:
The TTN- matches TTN- (obviously)
The \S* matches any character that is not a white-space, this is done as many times as possible.
Currently it is always expecting atleas a '-' after the TTN but if you repace the '-' with a '-{01}' in the regex it will expect TNN maybe a dash followed by 0-n characters that are not a white-space. You could also replace \S* with \w* to get all the letters and digits or .* to get all characters apart from end of line /n character, TNN-\S*[^\s{2}] too end the match with two spaces. Hope this was helpful.

regex to match a word with unique (non-repeating) characters

I'm looking for a regex that will match a word only if all its characters are unique, meaning, every character in the word appears only once.
Example:
abcdefg -> will return MATCH
abcdefgbh -> will return NO MATCH (because the letter b repeats more than once)
Try this, it might work,
^(?:([A-Za-z])(?!.*\1))*$
Explanation
Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match the regular expression below «(?:([A-Z])(?!.*\1))*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below and capture its match into backreference number 1 «([A-Z])»
Match a single character in the range between “A” and “Z” «[A-Z]»
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!.*\1)»
Match any single character that is not a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the same text as most recently matched by capturing group number 1 «\1»
Assert position at the end of a line (at the end of the string or before a line break character) «$»
You can check whether there are 2 instances of the character in the string:
^.*(.).*\1.*$
(I just simply capture one of the character and check whether it has a copy elsewhere with back reference. The rest of .* are don't-cares).
If the regex above match, then the string has repeating character. If the regex above doesn't match, then all the characters are unique.
The good thing about the regex above is when the regex engine doesn't support look around.
Apparently John Woo's solution is a beautiful way to check for the uniqueness directly. It assert at every character that the string ahead will not contain the current character.
This one would also provide a full match to any length word with non-repeating letters:
^(?!.*(.).*\1)[a-z]+$
I slightly revised the answer provided by #Bohemian to another question a while ago to get this.
It has also been a while since the question above has been asked but I thought it would be nice to also have this regex pattern here.

Regex for url rewrite with if then else

Given these urls:
1: http://site/page-name-one-123/
2: http://site/page-name-set2/
3: http://site/set20
I wrote this expression that will be applied to last url segment:
(?(?<=set[\d])([\d]+)|([^/]+))
What I'd want to do is to catch every digits followed by 'set' only if the url segment starts with 'set' and a digit immediately after; otherwise i want to use the whole segment (excluding slashes).
As I wrote this regex, it matches any character that is not a '/'. I think I'm doing something wrong in test statement.
Could anyone point me right?
Thanks
UPDATE
Thanks to Josh input I played around for a bit and found that this one fits better my needs:
set-(?P<number>[0-9]+)|(?P<segment>[^/]+)
I hope this pattern can help you out, I put it together based on your requirements. You may want to play around with setting some of the groups to not capture so that you only get the segments that you need. However, it does seperate capture your set URL's without set at the start.
((?<=/{1})(((?<!set)[\w|-]*?)(\d+(?=/?))|((?:set)\d+)))
I suggest using RegExr to pick it apart if you need to.
Try this:
((?<=/)set\d+|(?<=/)[^/]+?set\d+)
Explanation
<!--
Options: ^ and $ match at line breaks
Match the regular expression below and capture its match into backreference number 1 «((?<=/)set\d+|(?<=/)[^/]+?set\d+)»
Match either the regular expression below (attempting the next alternative only if this one fails) «(?<=/)set\d+»
Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=/)»
Match the character “/” literally «/»
Match the characters “set” literally «set»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Or match regular expression number 2 below (the entire group fails if this one fails to match) «(?<=/)[^/]+?set\d+»
Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=/)»
Match the character “/” literally «/»
Match any character that is NOT a “/” «[^/]+?»
Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
Match the characters “set” literally «set»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
-->

regex to parse a iCalendar file in ActionScript

I use a library to parse an iCalendar file, but I don't understand the regex to split property.
iCalendar property has 3 different style:
BEGIN:VEVENT
DTSTART;VALUE=DATE:20080402
RRULE:FREQ=YEARLY;WKST=MO
The library uses this regex that I would like to understand:
var matches:Array = data.match(/(.+?)(;(.*?)=(.*?)((,(.*?)=(.*?))*?))?:(.*)$/);
p.name = matches[1];
p.value = matches[9];
p.paramString = matches[2];
Thanks.
That's a terrible regular expression! .* and .*? mean to match as many (greedy) or as few (lazy) of anything. These should only be used as a last resort. Improper use will result in catastrophic backtracking when the regex cannot match the input text. All you need to understand about this regular expression that you don't want to write regexes like this.
Let me show how I would approach the problem. Apparently the iCalendar File Format is line-based. Each line has a property and a value separated by a colon. The property can have parameters that are separated from it by a semicolon. This implies that a property cannot contain line breaks, semicolons or colons, that the optional parameters cannot contain line breaks or colons, and that the value cannot contain line breaks. This knowledge allows us to write an efficient regular expression that uses negated character classes:
([^\r\n;:]+)(;[^\r\n:]+)?:(.+)
Or in ActionScript:
var matches:Array = data.match(/([^\r\n;:]+)(;[^\r\n:]+)?:(.+)/);
p.name = matches[1];
p.value = matches[3];
p.paramString = matches[2];
As explained by RegexBuddy:
Match the regular expression below and capture its match into backreference number 1 «([^\r\n;:]+)»
Match a single character NOT present in the list below «[^\r\n;:]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A carriage return character «\r»
A line feed character «\n»
One of the characters “;:” «;:»
Match the regular expression below and capture its match into backreference number 2 «(;[^\r\n:]+)?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the character “;” literally «;»
Match a single character NOT present in the list below «[^\r\n:]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A carriage return character «\r»
A line feed character «\n»
The character “:” «:»
Match the character “:” literally «:»
Match the regular expression below and capture its match into backreference number 3 «(.+)»
Match any single character that is not a line break character «.+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»