I am trying to make an If-Then-Else conditional statement in regular expressions.
The regex takes as input a string representing a filename.
Here are my test strings...
The Edge Of Seventeen 2016 720p.mp4
20180511 2314 - Film4 - Northern Soul.ts
20150526 2059 - BBC Four - We Need to Talk About Kevin.ts
In the first string, 2016 represents a year but in the other two strings 2314 and 2059 represent times in 24 hour clock format.
The filename should be retained unchanged if it matches this regex:
\d{8} \d{4} -.*?- .*?\.ts
Which I have tested and it works. It can match these test strings:
20180511 2314 - Film4 - Northern Soul.ts
20150526 2059 - BBC Four - We Need to Talk About Kevin.ts
If the filename does not match that first regex then this regex should be applied to it:
(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?
This is a cleandatetime regexp that is used by Kodi to remove everything from a string AFTER a four digit number, if it exists, representing a date between 1900 and 2099. I have also tested this and it works.
Here is what I have tried to make the If-Then-Else Regex but it doesn't work:
I use this format --> (?(A)X|Y)
(?(\d{8} \d{4} -.*?- .*?\.ts)^.*$|(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?)
This is A
\d{8} \d{4} -.*?- .*?\.ts
This is X
^.*$
This is Y
(.*[^ _\,\.\(\)\[\]\-])[ _\.\(\)\[\]\-]+(19[0-9][0-9]|20[0-9][0-9])([ _\,\.\(\)\[\]\-]|[^0-9]$)?
This is the expected output...
Test string:
The Edge Of Seventeen 2016 720p.mp4
Expected output:
"The Edge Of Seventeen 2016 " (quotes only included to show that a trailing space can be left at the end)
Test String:
20180511 2314 - Film4 - Northern Soul.ts
Expected output:
20180511 2314 - Film4 - Northern Soul.ts
Test String:
20150526 2059 - BBC Four - We Need to Talk About Kevin.ts
Expected output:
20150526 2059 - BBC Four - We Need to Talk About Kevin.ts
I am looking for a solution entirely in regular expression syntax. Can someone help me to make it work please?
Cheers,
Flex
You may use a PCRE pattern like
^(?!\d{8} \d{4} -.*?- .*?\.ts$)(.*[^ _,.()\[\]-][ _.()\[\]-]+(?:19|20)[0-9]{2})(?:[ _,.()\[\]-]|[^0-9]$)?.*
Replace with $1, see the regex demo.
It matches
^ - start of string
(?!\d{8} \d{4} -.*?- .*?\.ts$) - the negative lookahead fails the match if the whole string matches
\d{8} \d{4} - 8 digits, space, 4 digits, space
-.*?- .*? - -, then any 0 or more chars other than line break chars, as few as possible, - and a space and then again 0 or more chars other than line break chars, as few as possible
\.ts$ - .ts at the end of string
(.*[^ _,.()\[\]-][ _.()\[\]-]+(?:19|20)[0-9]{2})(?:[ _,.()\[\]-]|[^0-9]$)?.*: an optional Group 1 and then the rest of the string:
.* - any 0+ chars other than line break chars as many as possible
[^ _,.()\[\]-] - a char other than
[ _.()\[\]-]+ - 1+ spaces, _, ., (, ), [, ] or -
(?:19|20) - 19 or 20
[0-9]{2} - two digits
(?:[ _,.()\[\]-]|[^0-9]$)? - an optional non-capturing group matching a space, _, ., (, ), [, ] or - or any char other than digit at the end of the string.
.*[^ _,.()\[\]-][ _.()\[\]-]+(?:19|20)[0-9]{2})(?:[ _,.()\[\]-]|[^0-9]$
.* - any 0+ chars other than line break chars as many as possible.
Since you have mentioned that A, X and Y are tested and found working, and since there are only 2 patterns, I think this pattern will work (Python style):
pattern = "(.?(?=" + A + ")" + X + ")|(" + Y + ")"
which means:
(.?(?=A)X)|(Y)
Explanation:
There are two groups - one for X and one for Y.
The group for capturing X starts with .? just to make the engine start moving and check if there is a part matching X ahead (a lookahead). If yes, it continues with matching X since it will encounter it after the lookahead block.
If in (2), the lookahead doesn't match, then the | (or) part, which is Y will take over. If that matches, you get a result. Else, no output.
(Sadly, the patterns for A and Y you posted were not working for me on Python, so I replaced them with my own for testing. Please do confirm if the pattern is working with the original ones.)
Related
I have the following strings sample:
MAREMMA TOSCANA BIANCO DOC 2020 CALASOLE MONTEMASSI0,750
CHIANTI CLASSICO DOCG 2012 RISERVA ALBOLA LT.0,750
I need to separate in 5 parts (where I put the | in the following samples:
MAREMMA TOSCANA BIANCO DOC |2020| CALASOLE MONTEMASSI|0,750
CHIANTI CLASSICO DOCG |2012| RISERVA ALBOLA |LT.|0,750
AS you can see, the fourth part is optional.
I tried some variation of this regexp on https://regex101.com/r/NX3DE3/1, but the LT. part is incorporated in the precedent one:
([A-Za-z ]+)((20\d\d)|(19\d\d))([A-Za-z ]*)((LT))\.?[0-9,]*
the ((LT)) group is optional, but if I add a ? it run in the first example, but is not in the second and viceversa.
I would also like to trim the different parts, but really don't know how!
You can use
^(.*?)\s*((?:20|19)\d\d)\s*(.*?)(?:\s+(LT)[. ])?(\d[\d,]*)
See the regex demo. Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
\s* - zero or more whitespaces
((?:20|19)\d\d) - Group 2: 20 or 19 and then two digits
\s* - zero or more whitespaces
(.*?) - Group 3: any zero or more chars other than line break chars as few as possible
(?:\s+(LT)[. ])? - an optional non-capturing group matching one or more whitespaces and then capturing into Group 4 LT and then a space or .
(\d[\d,]*) - Group 5: a digit and then zero or more digits or commas.
I am looking for one regex which strictly allows 2 floating point numbers which are comma separated.
Test cases:
0,0
0.021312311323,0
0,0.012312312312
1.1,0.9836373
Regex that I have tried is
^[-+]?([1-8]?\d(\.\d+)?|90(\.0+)?),\s*[-+]?(180(\.0+)?|((1[0-7]\d)|([1-9]?\d))(\.\d+)?)$\D+|\d*\.?\d+
These are latitudes and longitudes but I just want 2 values in these paremeters.
This regex fails in:
-10a, 10a
10a,10b
I would really appreciate any help and guidance.
Your regex ends with a couple of redundant patterns, you should remove \D+|\d*\.?\d+ after $. As $ means the end of string, there can be no more text after it, and the \D+|\d*\.?\d+ requires one or more non-digit chars, or just matches any float or integer number with \d*\.?\d+ - this matched your unwelcome strings.
You can use
^([-+]?(?:[1-8]?\d(?:\.\d+)?|90(?:\.0+)?)),\s*([-+]?(?:180(?:\.0+)?|(?:1[0-7]\d|[1-9]?\d)(?:\.\d+)?))$
See the regex demo. Note I converted some capturing groups into non-capturing, so that there remain just two "notional" capturing groups in the pattern.
Details
^ - start of string
([-+]?(?:[1-8]?\d(?:\.\d+)?|90(?:\.0+)?)) - Group 1:
[-+]? - an optional - or +
(?:[1-8]?\d(?:\.\d+)?|90(?:\.0+)?) - either a number from 0 to 89 ([1-8]?\d) and then an optional fractional part ((?:\.\d+)?) or 90 and then an optional . followed with one or more 0 chars
,\s* - a comma and 0+ whitespace chars
([-+]?(?:180(?:\.0+)?|(?:1[0-7]\d|[1-9]?\d)(?:\.\d+)?)) - Group 2:
[-+]? - an optional - or +
(?:180(?:\.0+)?|(?:1[0-7]\d|[1-9]?\d)(?:\.\d+)?) - either a 180 number followed with an optional . + one or more 0 chars, or a number from 0 to 179 and then an optional fractional part
$ - end of string.
Your regular expression is almost correct. You should have stopped at $ indicating the end of the string.
const testCases = [ "0,0",
"0.021312311323,0",
"0,0.012312312312",
"1.1,0.9836373",
"-10a, 10a",
"10a,10b"];
const re = /^[-+]?([1-8]?\d(\.\d+)?|90(\.0+)?),\s*[-+]?(180(\.0+)?|((1[0-7]\d)|([1-9]?\d))(\.\d+)?)$/g;
testCases.forEach(tc => {
if(tc.match(re)) {
console.log(" VALID : " + tc );
} else {
console.log("NOT VALID : " + tc);
}
});
I'm trying to match certain text lines up to a specific string in RegEx (PCRE). Here's an example:
000000
999999900
20.10.19
Amoxicillin 1000 Heumann 20 Filmtbl. N2 - PZN: 04472730
-
Dr. Max Mustermann
In this text, I'd like to match exactly this part:
Amoxicillin 1000 Heumann 20 Filmtbl. N2
The similarity is always the part with the PZN and a 7-8 digit number behind that at the end of every line I'd like to match. However, the PZN part might sometimes be in the next line instead of directly behind it:
000000
999999900
20.10.19
Amoxicillin 1000 Heumann 20 Filmtbl. N2
- PZN: 04472730
-
Dr. Max Mustermann
So it's either directly behind it or in the next line. I've tried to do so using this RegEx:
.*(?=[ \-\r\n]+PZN)
This does work, however, in the first example above, it matches this:
Amoxicillin 1000 Heumann 20 Filmtbl. N2 -
Notice the " -" at the end. This should not be included in the match. I suppose RegEx prioritizes the .* part since it's working from left to right, and therefore only strips the very last character of the lookahead. I can't wrap my head around as to how to do it otherwise though.
Any ideas?
One option is to use a capturing group and match 0+ whitespace chars before the - PZN: part.
^(?![^\S\r\n]*$)(.+)\s* - PZN: \d{7,8}$
^ Start of line
(?![^\S\r\n]*$) Assert not an empty line
(.+)\s* Capture in group 1 matching any char 1+ times followed by 0+ times a whitespace char
- PZN: Match a space - and space followed by PZN: and space
\d{7,8} Match 7-8 digits
$ End of line
Regex demo
Another option is the same pattern in the form of using a lookahead
^(?![^\S\r\n]*$).+(?=\s* - PZN: \d{7,8}$)
Regex demo
This would work:
^(.+?)(?=\s?- PZN:)
^(.+?) - at the start of a line lazily match everything
(?=\s?- PZN:) - tell .+? to quit matching once we detect an upcoming PZN:
https://regex101.com/r/dhpth0/1/
I need regular expression which extracts all numbers with different delimiters (single whitespace, comma, dot). Each number can use none or all of them.
Example:
text: 'numbers: 3.14 2 544 345,345.55 506 test 120 100 100'
output: '3.14', '2 544', '345,345.55', '506', '120 100 100'
I created re: \d+[(.|,|\s)\d+]+, but it not works properly.
I assume the numbers you need to extract are separated with 2 or more whitespaces, else it would be impossible to differentiate between the end of the previous number and the start of a new one.
If you need to extract the numbers in the formats as shown above, XXX XXX.XXX or XXX,XXX,XXX.XX or XXX or XXX XXX XXX, you may use
\b\d{1,3}(?:[, ]\d{3})*(?:\.\d+)?\b
See the regex demo
Details:
\b - leading word boundary
\d{1,3} - 1 to 3 digits
(?:[, ]\d{3})* - 0+ sequences of a comma or space ([, ]) and 3 digits (\d{3})
(?:\.\d+)? - an optional sequence of a dot followed with 1+ digits
\b - trailing word boundary
A less restrictive pattern would be the same as above, but with limiting quantifiers replaced with a +:
\b\d+(?:[, ]\d+)*(?:\.\d+)?\b
See this regex demo
It will also match numbers like 1234566 and 124354354.343344.
I need to match these values:
(First approach to a regex that roughly does what I want)
\d+([.,]\d{3})*[.,]\d{2}
like
24,56
24.56
1.234,56
1,234.56
1234,56
1234.56
but I need to not match
1.234.56
1,234,56
So somehow I need to check the last occurrence of "." or "," to not be the same as the previous "." or ",".
Background: Amounts shall be matched in English and German format with (optional) 1000-Separators.
But even with help of regex101 I completely fail at coming up with a correctly working look-behind. Any suggestions are highly appreciated.
UPDATE
Based on the answers I got so far, I came up with this (demo):
\d{1,3}(?:([\.,'])?\d{3})*(?!\1)[\.,\s]\d{2}
But it matches for example 1234.567,23 which is not desirable.
You may capture the digit grouping symbol and use a negative lookahead with a backreference to restrict the decimal separator:
^(?:\d+|\d{1,3}(?:([.,])\d{3})*)(?!\1)[.,]\d{2}$
^ ^ ^^^^^
See the regex demo
Group 1 will contain the last value of the digit grouping symbol and (?!\1)[.,] will match the other symbol.
Details:
^ - start of string
(?:\d+|\d{1,3}(?:([.,])\d{3})*) - either of the two alternatives:
\d+ - 1+ digits
| - or
\d{1,3} - 1 to 3 digits,
(?:([.,])\d{3})* - zero or more sequences of:
([.,]) - Group 1 capturing . or ,
\d{3} - 3 digits
(?!\1)[.,] - a . or , but not equal to what was last captured with ([.,]) pattern above
\d{2} - 2 digits
$ - end of string.
You can use
^\d+(([.,])\d{3})*(?!\2)[.,]\d{2}$
live demo