How to make an If Then Else Regex conditional statement

How to make an If Then Else Regex conditional statement - regex

I am trying to make an If-Then-Else conditional statement in regular expressions.
The regex takes as input a string representing a filename.
Here are my test strings...
The Edge Of Seventeen 2016 720p.mp4
20180511 2314 - Film4 - Northern Soul.ts
20150526 2059 - BBC Four - We Need to Talk About Kevin.ts
In the first string, 2016 represents a year but in the other two strings 2314 and 2059 represent times in 24 hour clock format.
The filename should be retained unchanged if it matches this regex:
\d{8} \d{4} -.*?- .*?\.ts
Which I have tested and it works. It can match these test strings:
20180511 2314 - Film4 - Northern Soul.ts
20150526 2059 - BBC Four - We Need to Talk About Kevin.ts
If the filename does not match that first regex then this regex should be applied to it:
(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?
This is a cleandatetime regexp that is used by Kodi to remove everything from a string AFTER a four digit number, if it exists, representing a date between 1900 and 2099. I have also tested this and it works.
Here is what I have tried to make the If-Then-Else Regex but it doesn't work:
I use this format --> (?(A)X|Y)
(?(\d{8} \d{4} -.*?- .*?\.ts)^.*$|(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?)
This is A
\d{8} \d{4} -.*?- .*?\.ts
This is X
^.*$
This is Y
(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?
This is the expected output...
Test string:
The Edge Of Seventeen 2016 720p.mp4
Expected output:
"The Edge Of Seventeen 2016 " (quotes only included to show that a trailing space can be left at the end)
Test String:
20180511 2314 - Film4 - Northern Soul.ts
Expected output:
20180511 2314 - Film4 - Northern Soul.ts
Test String:
20150526 2059 - BBC Four - We Need to Talk About Kevin.ts
Expected output:
20150526 2059 - BBC Four - We Need to Talk About Kevin.ts
I am looking for a solution entirely in regular expression syntax. Can someone help me to make it work please?
Cheers,
Flex

You may use a PCRE pattern like
^(?!\d{8} \d{4} -.*?- .*?\.ts$)(.*[^ _,.()\[\]-][ _.()\[\]-]+(?:19|20)[0-9]{2})(?:[ _,.()\[\]-]|[^0-9]$)?.*
Replace with $1, see the regex demo.
It matches
^ - start of string
(?!\d{8} \d{4} -.*?- .*?\.ts$) - the negative lookahead fails the match if the whole string matches
\d{8} \d{4} - 8 digits, space, 4 digits, space
-.*?- .*? - -, then any 0 or more chars other than line break chars, as few as possible, - and a space and then again 0 or more chars other than line break chars, as few as possible
\.ts$ - .ts at the end of string
(.*[^ _,.()\[\]-][ _.()\[\]-]+(?:19|20)[0-9]{2})(?:[ _,.()\[\]-]|[^0-9]$)?.*: an optional Group 1 and then the rest of the string:
.* - any 0+ chars other than line break chars as many as possible
[^ _,.()\[\]-] - a char other than
[ _.()\[\]-]+ - 1+ spaces, _, ., (, ), [, ] or -
(?:19|20) - 19 or 20
[0-9]{2} - two digits
(?:[ _,.()\[\]-]|[^0-9]$)? - an optional non-capturing group matching a space, _, ., (, ), [, ] or - or any char other than digit at the end of the string.
.*[^ _,.()\[\]-][ _.()\[\]-]+(?:19|20)[0-9]{2})(?:[ _,.()\[\]-]|[^0-9]$
.* - any 0+ chars other than line break chars as many as possible.

Since you have mentioned that A, X and Y are tested and found working, and since there are only 2 patterns, I think this pattern will work (Python style):
pattern = "(.?(?=" + A + ")" + X + ")|(" + Y + ")"
which means:
(.?(?=A)X)|(Y)
Explanation:
There are two groups - one for X and one for Y.
The group for capturing X starts with .? just to make the engine start moving and check if there is a part matching X ahead (a lookahead). If yes, it continues with matching X since it will encounter it after the lookahead block.
If in (2), the lookahead doesn't match, then the | (or) part, which is Y will take over. If that matches, you get a result. Else, no output.
(Sadly, the patterns for A and Y you posted were not working for me on Python, so I replaced them with my own for testing. Please do confirm if the pattern is working with the original ones.)

Related

How to built a regexp to match optional patterns

I have the following strings sample:
MAREMMA TOSCANA BIANCO DOC 2020 CALASOLE MONTEMASSI0,750
CHIANTI CLASSICO DOCG 2012 RISERVA ALBOLA LT.0,750
I need to separate in 5 parts (where I put the | in the following samples:
MAREMMA TOSCANA BIANCO DOC |2020| CALASOLE MONTEMASSI|0,750
CHIANTI CLASSICO DOCG |2012| RISERVA ALBOLA |LT.|0,750
AS you can see, the fourth part is optional.
I tried some variation of this regexp on https://regex101.com/r/NX3DE3/1, but the LT. part is incorporated in the precedent one:
([A-Za-z ]+)((20\d\d)|(19\d\d))([A-Za-z ]*)((LT))\.?[0-9,]*
the ((LT)) group is optional, but if I add a ? it run in the first example, but is not in the second and viceversa.
I would also like to trim the different parts, but really don't know how!

You can use
^(.*?)\s*((?:20|19)\d\d)\s*(.*?)(?:\s+(LT)[. ])?(\d[\d,]*)
See the regex demo. Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
\s* - zero or more whitespaces
((?:20|19)\d\d) - Group 2: 20 or 19 and then two digits
\s* - zero or more whitespaces
(.*?) - Group 3: any zero or more chars other than line break chars as few as possible
(?:\s+(LT)[. ])? - an optional non-capturing group matching one or more whitespaces and then capturing into Group 4 LT and then a space or .
(\d[\d,]*) - Group 5: a digit and then zero or more digits or commas.

regex for matching latitude, longitudes without any character

I am looking for one regex which strictly allows 2 floating point numbers which are comma separated.
Test cases:
0,0
0.021312311323,0
0,0.012312312312
1.1,0.9836373
Regex that I have tried is
^[-+]?([1-8]?\d(\.\d+)?|90(\.0+)?),\s*[-+]?(180(\.0+)?|((1[0-7]\d)|([1-9]?\d))(\.\d+)?)$\D+|\d*\.?\d+
These are latitudes and longitudes but I just want 2 values in these paremeters.
This regex fails in:
-10a, 10a
10a,10b
I would really appreciate any help and guidance.

Your regex ends with a couple of redundant patterns, you should remove \D+|\d*\.?\d+ after $. As $ means the end of string, there can be no more text after it, and the \D+|\d*\.?\d+ requires one or more non-digit chars, or just matches any float or integer number with \d*\.?\d+ - this matched your unwelcome strings.
You can use
^([-+]?(?:[1-8]?\d(?:\.\d+)?|90(?:\.0+)?)),\s*([-+]?(?:180(?:\.0+)?|(?:1[0-7]\d|[1-9]?\d)(?:\.\d+)?))$
See the regex demo. Note I converted some capturing groups into non-capturing, so that there remain just two "notional" capturing groups in the pattern.
Details
^ - start of string
([-+]?(?:[1-8]?\d(?:\.\d+)?|90(?:\.0+)?)) - Group 1:
[-+]? - an optional - or +
(?:[1-8]?\d(?:\.\d+)?|90(?:\.0+)?) - either a number from 0 to 89 ([1-8]?\d) and then an optional fractional part ((?:\.\d+)?) or 90 and then an optional . followed with one or more 0 chars
,\s* - a comma and 0+ whitespace chars
([-+]?(?:180(?:\.0+)?|(?:1[0-7]\d|[1-9]?\d)(?:\.\d+)?)) - Group 2:
[-+]? - an optional - or +
(?:180(?:\.0+)?|(?:1[0-7]\d|[1-9]?\d)(?:\.\d+)?) - either a 180 number followed with an optional . + one or more 0 chars, or a number from 0 to 179 and then an optional fractional part
$ - end of string.

Your regular expression is almost correct. You should have stopped at $ indicating the end of the string.
const testCases = [ "0,0",
"0.021312311323,0",
"0,0.012312312312",
"1.1,0.9836373",
"-10a, 10a",
"10a,10b"];
const re = /^[-+]?([1-8]?\d(\.\d+)?|90(\.0+)?),\s*[-+]?(180(\.0+)?|((1[0-7]\d)|([1-9]?\d))(\.\d+)?)$/g;
testCases.forEach(tc => {
if(tc.match(re)) {
console.log(" VALID : " + tc );
} else {
console.log("NOT VALID : " + tc);
}
});

RegEx match anything except linebreaks up to positive lookahead

I'm trying to match certain text lines up to a specific string in RegEx (PCRE). Here's an example:
000000
999999900
20.10.19
Amoxicillin 1000 Heumann 20 Filmtbl. N2 - PZN: 04472730
-
Dr. Max Mustermann
In this text, I'd like to match exactly this part:
Amoxicillin 1000 Heumann 20 Filmtbl. N2
The similarity is always the part with the PZN and a 7-8 digit number behind that at the end of every line I'd like to match. However, the PZN part might sometimes be in the next line instead of directly behind it:
000000
999999900
20.10.19
Amoxicillin 1000 Heumann 20 Filmtbl. N2
- PZN: 04472730
-
Dr. Max Mustermann
So it's either directly behind it or in the next line. I've tried to do so using this RegEx:
.*(?=[ \-\r\n]+PZN)
This does work, however, in the first example above, it matches this:
Amoxicillin 1000 Heumann 20 Filmtbl. N2 -
Notice the " -" at the end. This should not be included in the match. I suppose RegEx prioritizes the .* part since it's working from left to right, and therefore only strips the very last character of the lookahead. I can't wrap my head around as to how to do it otherwise though.
Any ideas?

One option is to use a capturing group and match 0+ whitespace chars before the - PZN: part.
^(?![^\S\r\n]*$)(.+)\s* - PZN: \d{7,8}$
^ Start of line
(?![^\S\r\n]*$) Assert not an empty line
(.+)\s* Capture in group 1 matching any char 1+ times followed by 0+ times a whitespace char
- PZN: Match a space - and space followed by PZN: and space
\d{7,8} Match 7-8 digits
$ End of line
Regex demo
Another option is the same pattern in the form of using a lookahead
^(?![^\S\r\n]*$).+(?=\s* - PZN: \d{7,8}$)
Regex demo

This would work:
^(.+?)(?=\s?- PZN:)
^(.+?) - at the start of a line lazily match everything
(?=\s?- PZN:) - tell .+? to quit matching once we detect an upcoming PZN:
https://regex101.com/r/dhpth0/1/

Match all type of numbers

I need regular expression which extracts all numbers with different delimiters (single whitespace, comma, dot). Each number can use none or all of them.
Example:
text: 'numbers: 3.14 2 544 345,345.55 506 test 120 100 100'
output: '3.14', '2 544', '345,345.55', '506', '120 100 100'
I created re: \d+[(.|,|\s)\d+]+, but it not works properly.

I assume the numbers you need to extract are separated with 2 or more whitespaces, else it would be impossible to differentiate between the end of the previous number and the start of a new one.
If you need to extract the numbers in the formats as shown above, XXX XXX.XXX or XXX,XXX,XXX.XX or XXX or XXX XXX XXX, you may use
\b\d{1,3}(?:[, ]\d{3})*(?:\.\d+)?\b
See the regex demo
Details:
\b - leading word boundary
\d{1,3} - 1 to 3 digits
(?:[, ]\d{3})* - 0+ sequences of a comma or space ([, ]) and 3 digits (\d{3})
(?:\.\d+)? - an optional sequence of a dot followed with 1+ digits
\b - trailing word boundary
A less restrictive pattern would be the same as above, but with limiting quantifiers replaced with a +:
\b\d+(?:[, ]\d+)*(?:\.\d+)?\b
See this regex demo
It will also match numbers like 1234566 and 124354354.343344.

Regex match depending on lookbehind match

I need to match these values:
(First approach to a regex that roughly does what I want)
\d+([.,]\d{3})*[.,]\d{2}
like
24,56
24.56
1.234,56
1,234.56
1234,56
1234.56
but I need to not match
1.234.56
1,234,56
So somehow I need to check the last occurrence of "." or "," to not be the same as the previous "." or ",".
Background: Amounts shall be matched in English and German format with (optional) 1000-Separators.
But even with help of regex101 I completely fail at coming up with a correctly working look-behind. Any suggestions are highly appreciated.
UPDATE
Based on the answers I got so far, I came up with this (demo):
\d{1,3}(?:([\.,'])?\d{3})*(?!\1)[\.,\s]\d{2}
But it matches for example 1234.567,23 which is not desirable.

You may capture the digit grouping symbol and use a negative lookahead with a backreference to restrict the decimal separator:
^(?:\d+|\d{1,3}(?:([.,])\d{3})*)(?!\1)[.,]\d{2}$
^ ^ ^^^^^
See the regex demo
Group 1 will contain the last value of the digit grouping symbol and (?!\1)[.,] will match the other symbol.
Details:
^ - start of string
(?:\d+|\d{1,3}(?:([.,])\d{3})*) - either of the two alternatives:
\d+ - 1+ digits
| - or
\d{1,3} - 1 to 3 digits,
(?:([.,])\d{3})* - zero or more sequences of:
([.,]) - Group 1 capturing . or ,
\d{3} - 3 digits
(?!\1)[.,] - a . or , but not equal to what was last captured with ([.,]) pattern above
\d{2} - 2 digits
$ - end of string.

You can use
^\d+(([.,])\d{3})*(?!\2)[.,]\d{2}$
live demo

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to make an If Then Else Regex conditional statement - regex

Related

How to built a regexp to match optional patterns

regex for matching latitude, longitudes without any character

RegEx match anything except linebreaks up to positive lookahead

Match all type of numbers

Regex match depending on lookbehind match

Categories

Resources