Regex: match after capture group 1 match, using it's result - regex

I'm having trouble understanding what I did wrong in the regex below:
/^([^#]+)#(\1.*?(?=\||$))/gi
It correctly matches foo#foo,xyz,asd123|bar,asd123,xyz, with the desired result being foo,xyz,asd123.
But it does not match bar#foo,xyz,asd123|bar,asd123,xyz. Expected output would be bar,asd123,xyz.
Basically, I need to use the result of the capture group 1 to search further in the string after the # character. However, it's only working for the match immediately after # and nothing else. I feel like I'm missing a very basic thing here.
regexr.com/6ussf

You can use
^([^#]+)#.*?(\1.*?)(?=\||$)
Details
^ - start of string
([^#]+) - Group 1 (\1): one or more chars other than #
# - a # char
.*? - zero or more chars other than line break chars as few as possible
(\1.*?) - Group 2: same value as captured into Group 1 and then any zero or more chars other than line break chars as few as possible
(?=\||$) - a positive lookahead that requires | or end of string immediately to the right of the current location.

Related

Problem with regular expression with 2 capture group, one is optional

I'm struggling to write the correct regex to match the data below. I want to capture the "Focus+Terminal" and its optional parameter "NYET". How can I re-write my incorrect regex?
user:\/\/(.*)(?:=(.*+))?
I also tried and failed:
user:\/\/(.*)=?(?:(.*+))?
Sample Data
* user://Focus+Terminal=NYET
* user://Focus+Terminal
You can use
user:\/\/(.*?)(?:=(.*))?$
See the regex demo.
Details:
user:\/\/ - a user:// string
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
(?:=(.*))? - an optional non-capturing group that matches a = and then captures into Group 2 any zero or more chars other than line break chars as many as possible
$ - end of string.
As an alternative you might use a negated character class excluding matching a newline or equals sign for the first capture group.
user:\/\/([^=\n]*)(?:=(.*))?
Explanation
user:\/\/ Match user://
([^=\n]*) Capture group 1, match optional chars other than = or a newline
(?:=(.*))? Optionally match = and capture the rest of the line in group 2
Regex demo

How to add an if condition to regex on Golang to match only if the group exists in Golang

I am testing two strings and want a common regex to be applied based on the condition if one group exists then apply the Regex Pattern, else other Pattern, but some reason on Regex online editor it seems that the ?=() is not recognized to add if else condition.
I have the following 2 test strings:
/public/weltweit/nfsk/2022/05/18/668e9f57-30be-40b6-bc85-5bf66671e41d/668e9f57-30be-40b6-bc85-5bf66671e41d_AVC-270.mp4
Expected extraction using ^\/(\bpublic\b[\/])*(.+[a-z]{1,}.*[\/|_]+)+.*?$ is
weltweit/nfsk/2022/05/18/668e9f57-30be-40b6-bc85-5bf66671e41d/668e9f57-30be-40b6-bc85-5bf66671e41d_
which is expected but for the other test string :
/medp/ondemand/weltweit/fsk0/258/2580407/2580407_40256616.mp4 with same Regex I get medp/ondemand/weltweit/fsk0/258/2580407/2580407_
My expected extraction is **medp/ondemand/weltweit/fsk0/258/2580407/**
I want to add a if condition to a group (\bpublic\b[\/]) so that an underscore **_**is chosen if the group exists; otherwise a slash **/**
Any pointers is appreciated.
Thank you!
You can use
^/(?:public/(.*_)|(.*/))
See the regex demo. The result is either in Group 1 or Group 2.
Details:
^ - start of string
/ - a slash
(?: - start of a non-capturing group:
public/ - a fixed string
(.*_) - Group 1: any zero or more chars other than line break chars as many as possible and then a _ char
| - or
(.*/) - Group 2: any zero or more chars other than line break chars as many as possible and then a / char
) - end of the group.

How to built a regexp to match optional patterns

I have the following strings sample:
MAREMMA TOSCANA BIANCO DOC 2020 CALASOLE MONTEMASSI0,750
CHIANTI CLASSICO DOCG 2012 RISERVA ALBOLA LT.0,750
I need to separate in 5 parts (where I put the | in the following samples:
MAREMMA TOSCANA BIANCO DOC |2020| CALASOLE MONTEMASSI|0,750
CHIANTI CLASSICO DOCG |2012| RISERVA ALBOLA |LT.|0,750
AS you can see, the fourth part is optional.
I tried some variation of this regexp on https://regex101.com/r/NX3DE3/1, but the LT. part is incorporated in the precedent one:
([A-Za-z ]+)((20\d\d)|(19\d\d))([A-Za-z ]*)((LT))\.?[0-9,]*
the ((LT)) group is optional, but if I add a ? it run in the first example, but is not in the second and viceversa.
I would also like to trim the different parts, but really don't know how!
You can use
^(.*?)\s*((?:20|19)\d\d)\s*(.*?)(?:\s+(LT)[. ])?(\d[\d,]*)
See the regex demo. Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
\s* - zero or more whitespaces
((?:20|19)\d\d) - Group 2: 20 or 19 and then two digits
\s* - zero or more whitespaces
(.*?) - Group 3: any zero or more chars other than line break chars as few as possible
(?:\s+(LT)[. ])? - an optional non-capturing group matching one or more whitespaces and then capturing into Group 4 LT and then a space or .
(\d[\d,]*) - Group 5: a digit and then zero or more digits or commas.

RegEx - if then else

I am trying to work out a regex expression but struggle with conditionals. I have a list of 100s of URLs that look like this:
/name/something/details/55334
/name/page/1/2
/name/somethingdifferent/34523
/name/page/1
/name/something/553/1
Bottom line is that I want to remove everything when a number appears apart from a scenario where the last thing before the number is a word 'page'.
1. /name/something/details/
2. /name/page/1/2
3. /name/somethingdifferent/
4. /name/page/1
5. /name/something
I will be removing it with Google Analytics Content Grouping or potentially with DataStudio. I already removed /name/ so I have:
1. /something/details/55334
2. /page/1/2
3. /somethingdifferent/34523
4. /page/1
5. /something/553/1
but want to add another rule and remove the numbers so I get:
1. /something/details/
2. /page/1/2
3. /somethingdifferent/
4. /page/1
5. /something
have already tried:
\(?(?=(page\/[0-9]+))(\2)|(\/\d+)
following the syntax of:
(?(?=condition))(IF)|(ELSE)
but it highlights all numbers after text.
Thanks for your help.
sampak
Try ^(\/page.*|[^0-9]*), works with your example.
A Version incl. name: ^(page[\/\d]*|[^\d\s])*
One option might be to match not a whitespace or digit while not matching /page.
Then match a forward slash and 1+ digits followed by any char 0+ times to omit that from the result.
^((?:(?!\/page)[^\d\s])*\/)\d.*
In parts
^ Start of string
( Capture group 1
(?: Non capturing group
(?!\/page) Negative lookahead, assert what is directly to the right is not
[^\d\s] Match any char except a digit or whitespace char
)* Close non capturing group and repeat 0+ times
\/ Match /
) Close group 1
\d.* Match a digit followed by any char except a newline 0+ times
In the replacement use the first capturing group
Regex demo
If you also want to remove /name you could use:
^\/name((?:(?!\/page)[^\d\s])*\/)\d.*
Regex demo

Is there a regex for adding the first 4 characters to end of string and the last 4 characters to start of string?

I have some lines which I need to alter. They are protein sequences. How would I copy the first 4 characters of the line to the end of the line, and also copy the last 4 characters to the beginning of the line?
The strings are variable which complicates it, for example:
>X
LTGLGIGTGMAATIINAISVGLSAATILSLISGVASGGAWVLAGAKQALKEGGKKAGIAF
>Y
LVATGMAAGVAKTIVNAVSAGMDIATALSLFSGAFTAAGGIMALIKKYAQKKLWKQLIAA
Moreover, how could I exclude lines with a '>' at the beginning (these are names of the corresponding sequence)?
Does anyone know a regex which will allow this to work?
I've already tried some regex solutions but I'm not very experienced with this sort of thing and I can find the end string but can't get it to replace:
Find:
(...)$
Replace:
^$2$1"
An example of what I want to achieve is:
>1
ABCDEFGHIJKLMNOPQRSTUVWXYZ
becomes:
>1
WXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCD
Thanks
Try doing a find, in regex mode, on the following pattern:
^([A-Z]{4}).*([A-Z]{4})$
Then replace with the first four and last four characters swapped:
$2$0$1
Demo
You can use the regex below.
^(([A-Z]{4})([A-Z]*)([A-Z]{4}))$
^ asserts the position at the start of the line, so nothing can come before it.
( is the start of a capture group, this is group 1.
( is the start of a capture group, this is group 2. This group is inside group 1.
[A-Z]{4} means exactly 4 capital characters from A to Z.
) is the end of capture group 2.
( is the start of a capture group, this is group 3.
[A-Z]* matches capital characters from A to Z between zero and infinite times.
) is the end of capture group 3.
( is the start of a capture group, this is group 4.
[A-Z]{4} means exactly 4 capital characters from A to Z.
) is the end of capture group 4.
$ asserts the position at the end of the line, so nothing can come after it.
See how it works with a replace here: https://regex101.com/r/W786uL/3.
$4$1$2
$4 means put capture group 4 here. Which is the last 4 characters.
$1 means put capture group 1 here. Which is everything in the entire string.
$2 means put capture group 2 here. Which is the first 4 characters.
You can use
^(.{4})(.*?)(.{4})$
^ - start of sting
(.{4}) - Match any for characters except new line
(.*?) - Match any character zero or more time (lazy mode)
$ - End of string
Demo