Find specific segments using regex - regex

I've got a string which i want split up in specific segments but i cant match the correct segment of the string because of two occurences of the same pattern.
My string:
#if(text.text isempty){<customer_comment>#cc{txt_without_comments}cc#</customer_comment>}else{#if(text.answer=='no'){<customer_comment>#{text.text}</customer_comment>}else{<answer>#{text.text}</answer>}endif#}endif#
I need to match: #if(text.text isempty){#cc{txt_without_comments}cc#}else{....}endif#
and not the nested dots in the else-block.
Here is my incomplete regex:
(?<match>(?<open>#if\((?<statement>[^)]*)\)\s*{)(?<ifblock>(.+?)(?:}else{)(?<elseblock>.*))(?<-open>)}endif#)
This regex is too greedy in the ifblock group it supposed to stop at the first }else{ pattern.
Edit:
This is the exact result i want to produce:
match: #if(text.text isempty){<customer_comment>#cc{txt_without_comments}cc#</customer_comment>}else{#if(text.answer=='no'){<customer_comment>#{text.text}</customer_comment>}else{<answer>#{text.text}</answer>}endif#}endif#
statement: text.text isempty
ifblock: <customer_comment>#cc{txt_without_comments}cc#</customer_comment>
elseblock: #if(text.answer=='no'){<customer_comment>#{text.text}</customer_comment>}else{<answer>#{text.text}</answer>}endif#

You are not using balancing groups correctly. Balancing groups must be used to push some values into the stack using a capture and removed from the stack with other captures, and then a conditional construct is necessary to check if the group stack is empty, and if it is not, fail the match to enforce backtracking.
So, if the regex is the only way for you to match these strings, use the following:
(?s)(?<match>#if\((?<statement>[^)]*)\)\s*{\s*(?<ifblock>.*?)\s*}\s*else\s*{\s*(?<elseblock>#if\s*\((?:(?!#if\s*\(|\}\s*endif#).|(?<a>)#if\s*\(|(?<-a>)\}\s*endif#)*(?(a)(?!)))\}\s*endif#)
See the regex demo. However, writing a custom parser might turn out a better approach here.
Pattern details:
(?s) - single line mode on (. matches newline)
(?<match> - start of the outer group "match"
#if\( - a literal char sequence #if(
(?<statement>[^)]*) - Group "statement" capturing 0+ chars other than )
\)\s*{\s* - ), 0+ whitespaces, {, 0+ whitespaces
(?<ifblock>.*?) - Group "ifblock" that captures any 0+ chars, as few as possible up to the first...
\s*}\s*else\s*{\s* - 0+ whitespaces, }, 0+ whitespaces, else, 0+ whitespaces, {, 0+ whitespaces
(?<elseblock>#if\s*\((?:(?!#if\s*\(|\}\s*endif#).|(?<a>)#if\s*\(|(?<-a>)\}\s*endif#)*(?(a)(?!))) - Group "elseblock" capturing:
#if\s*\( - #if, 0+ whitespaces, (
(?: - start of the alternation group, that is repeated 0+ times
(?!#if\s*\(|\}\s*endif#).| - any char not starting the #if, 0+ whitespaces, ( sequence and not starting the }, 0+ whitespaces, endif# sequence or...
(?<a>)#if\s*\(| - Group "a" pushing the #if, 0+ whitespaces and ( into stack
(?<-a>)\}\s*endif# - }, 0+ whitespaces, endif# removed from "a" group stack
)* - end of the alternation group
(?(a)(?!)) - conditional checking if the balanced amount of if and endif are matched
\}\s*endif# - }, 0+ whitespaces, endif#
) - end of the outer "match" group.

Related

Regex for not only spaces

I am looking for regex that not allowing only spaces (but more than one). One allows (blank space).
I got something like this .*\S.*' or this .*[^ ].* but i want to allow only one space but not more than one only spaces.
You can use
pattern="\S*(?:\s\S*)?"
The pattern will get parsed as a ^(?:\S*(?:\s\S*)?)$ pattern and will match
^ - start of string
(?: - start of a non-capturing group:
\S* - zero or more chars other than whitespace
(?:\s\S*)? - an optional sequence of a whitespace and zero or more non-whitespace chars
) - end of a non-capturing group
$ - end of string.

Regex to capture everything after optional token

I have fields which contain data in the following possible formats (each line is a different possibility):
AAA - Something Here
AAA - Something Here - D
Something Here
Note that the first group of letters (AAA) can be of varying lengths.
What I am trying to capture is the "Something Here" or "Something Here - D" (if it exists) using PCRE, but I can't get the Regex to work properly for all three cases. I have tried:
- (.*) which works fine for cases 1 and 2 but obviously not 3;
(?<= - )(.*) which also works fine for cases 1 and 2;
(?! - )(.+)| - (.+) works for cases 2 and 3 but not 1.
I feel like I'm on the verge of it but I can't seem to crack it.
Thanks in advance for your help.
Edit: I realized that I was unclear in my requirements. If there is a trailing " - D" (the letter in the data is arbitrary but should only be a single character), that needs to be captured as well.
About the patterns that you tried:
- (.*)This pattern will match the first occurrence of - followed by matching the rest of the line. It will match too much for the second example as the .* will also match the second occurrence of -
(?<= - )(.*)This pattern will match the same as the first example without the - as it asserts that is should occur directly to the left
(?! - )(.+)| - (.+) This pattern uses a negative lookahead which asserts what is directly to the right is not (?! - ). As none of the example start with - , the whole line will be matched directly after the negative lookahead due to .+ and the second part after the alternation | will not be evaluated
If the first group of letters can be of varying length, you could make the match either specific matching 1 or more uppercase characters [A-Z]+ or 1+ word characters \w+.
To get a more broad match, you could match 1 or more non whitespace characters using \S+
^(?:\S+\h-\h)?\K\S+(?:\h(?!-\h)\S+)*
Explanation
^ Start of string
(?:\S+\h-\h)? Optionally match the first group of non whitespace chars followed by - between horizontal whitespace chars
\K Clear the match buffer (Forget what is currently matched)
\S+ Match 1+ non whitespace characters
(?: Non capture group
\h(?!-\h) Match a horizontal whitespace char and assert what is directly to the right is not - followed by another horizontal whitespace char
\S+ Match 1+ non whitespace chars
)* Close non capture group and repeat 1+ times to match more "words" separated by spaces
Regex demo
Edit
To match an optional hyphen and trailing single character, you could add an optional non capturing group (?:-\h\S\h*)?$ and assert the end of the string if the pattern should match the whole string:
^(?:\S+\h-\h)?\K\S+(?:\h(?!-\h)\S+)*\h*(?:-\h\S\h*)?$
Regex demo
You may use
^(?:.*? - )?\K.*?(?= - | *$)
^(?:.*?\h-\h)?\K.*?(?=\h-\h|\h*$)
See the regex demo
Details
^ - start of string
-(?:.*? - )? - an optional non-capturing group matching any 0+ chars other than line break chars as few as possible up to the first space-space
\K - match reset operator
.*? - any 0+ chars other than line break chars as few as possible
(?= - | *$) - space-space or 0+ spaces till the end of string should follow immediately on the right.
Note that \h matches any horizontal whitespace chars.
^(?:[A-Z]+ - \K)?.*\S
demo
Since "Something Here" can be anything, there's no reason to specially describe the eventual last letter in the pattern. You don't need something more complicated.
With this pattern I assume that you are not interested by the trailing spaces, that's why I ended it with \S. If you want to keep them, remove the \S and change the previous quantifier to +.

Matching the character before a linebreak, excluding whitespaces?

So I currently have a regex (https://regex101.com/r/zBE4Ju/1) that highlights the words before and after a linebreak. This is nice, but the issue is sometimes there are whitespaces after the word that appears BEFORE the line break. So they end up
You can see on my regex101 how the issue happens, and I have outlined the problem. I need to recognize the word before and after the line break, regardless of if there is a space after the word.
(\w*(?:[\n](?![\n])\w*)+)
You can see it in action here https://regex101.com/r/zBE4Ju/3
Expected: Line 1
Actual: Line 3
You can use $1 from:
/([^ ]+) *(\r|\n)/gm
https://regex101.com/r/o87VP7/5
If you want to highlight the last "word" in the sentence followed by possible spaces and a newline, you could repeat 0+ times a group matching 1+ non whitespace chars followed by 1+ spaces.
Then capture in a group matching non whitespace chars (\S+) and match possible spaces followed by a newline.
^ *(?:\S+ +)*(\S+) *\r?\n
Explanation
^ Start of string
* Match 0+ times a space
(?: Non capturing group
\S+ + Match 1+ non whitespace chars and 1+ spaces
-)* Close non capturing group and repeat 0+ times (to also match a single word at the beginning)
(\S+) Capture group 1, match 1+ times a non whitespace char
*\r?\n Match 0+ times a space followed by a newline
Regex demo

regex combination of two lookaround - regexstorm.net

I have to collect two informantion from a text using regex. The name and the database and relate then in one table. But a can only collect then individually.
This is an example, i have many blocks of these, and two of then don't have a database value, these i need to ingnore
[SCD] {I need the name between []}
Driver=/opt/pcenter/pc961/ODBC7.1/lib/DWmsss27.so
Description=
Database=scd {I need the value after Defaut|Database}
Address=#######
LogonID=######
Password=######
QuoteId=No
AnsiNPW=No
ApplicationsUsingThreads=1
The regex to find the name is:
(?<=\[)(.*)(?=\])
The regex to find the value after database is
(?<=Defaut|Database=)(.*)
How can i combine both of then into onde regex ?
To match both values you could use 2 capturing groups instead and use a repeating pattern and a negative lookahead to check if a line do not start with Default of Database until the line does.
\[([^]]+)\](?:\r?\n(?!Default|Database).*)*\r?\n(?:Default|Database)=(\S+)
About the pattern
\[ Match [
( Capture group 1
[^]]+ match 1+ times not ]
) Close group 1
\] Match ]
(?: Non capturing group
\r?\n Match newline,
(?! Negative lookahead, assert what is directly on the right is not
Default|Database Match one of the options
).* Close negative lookahead and match any char except a newline 0+ times
)* Close non capturing group and repeat 0+ times
\r?\n(?:Default|Database)= Match newline, any of the options and =
(\S+) Capturing group 2, match 1+ times a non whitespace char (or use (.+) to match any char 1+ times)
regexstorm demo

Regular expression number starting with zero,one or more whitespaces followed by plus or minus

I have following regex;
^(\s)*[+-]?\d+$
It fails if input contains multiple whitespaces before first non-whitespace character.
Currently it is working on next examples
- :false
-1 :true
+1 :true
What I want is same logic if there is 0,1 or more whitespaces at the beginning:
: true (empty input string)
: true (one or more spaces)
-: false
-1: true
+1: true
235: true
Here I'm matching numbers, but on more general scheme I would like same behaviour if there are decimalan, on some special words etc.
So, basicly, I want that my regex match if there is any number of whitespaces at the beginning or empty string, followed by something I wannna match (number, email, special words...)
You need to make the whole pattern optional with an optional grouping construct and put the \s* before the group:
^\s*(?:[+-]?\d+)?$
^^^ ^^
See the regex demo
Details:
^ - start of a string
\s* - 0+ whitespaces
(?: - start of a non-capturing group (if the engine does not support non-capturing groups, remove ?:) matching....
[+-]? - an optional (1 or 0 occurrences) + or - symbols
\d+ - 1+ digits
)? - .... 1 or 0 times
$ - end of string.
I think you want the asterisk in with the \s:
^\s*[-+]?\d+$