Regex - date format not using mixed seperators - regex

I've written a regex that identifies dates in the form of dd/mm/yyyy or dd.mm.yyyy but it currently accepts dd/mm.yyyy as a correct format but I don't want mixed separators to be accepted as valid. How would I modify my regex to fix this issue.
My Regex is:
^(0[1-9]|[12][0-9]|3[01])[/|./.](0[1-9]|1[012])[/./.](19|20)\d\d$

Use a look-ahead to require that the same separators is used:
^(?=.*([/.]).*\1)<your regex here>$
The expression (?=.*([/.]).*\1) is a look ahead that contains a back reference \1 to the first separator [/.], meaning it must be repeated later in the input.
The whole regex would be (simplifying the separator expression to just [/.]):
^(?=.*([/.]).*\1)(0[1-9]|[12][0-9]|3[01])[/.](0[1-9]|1[012])[/.](19|20)\d\d$

Try
^(?:0[1-9]|[12][0-9]|3[01])(\/|\.)(?:0[1-9]|1[012])\1(19|20)\d\d$
This will match
01/02/2018
or
01.02.2018
But will not match
01/02.2018
\1 matches same contents in the first bracket which is (\/|\.) in this case. This is called "back reference". So the second separator have to be the repeat of what's matching in the first bracket.
By using (?:) instead normal () it will prevent the bracket to be counted as matching patterns for back reference, it will make it easier to code, and this is better for performance too, because anything in normal bracket will be stored in the memory to be prepared for back reference. So you should use (?:) if you are using brackets just to cover patterns.

Solution for PHP and Python.
Regex: ^(?:[0-2][0-9]|3[01])(?:(\/)|\.)(?:0[1-9]|1[0-2])(?(1)\/|\.)(?:19|20)\d{2}$
Details:
(?:) Non capturing group
() Capturing group
[] Match a single character present in the list
| Or
(?(1)) If Clause, Group 1.
Output:
01/12/1999 true
25.12.1999 true
23.12/1999 false
23/12.1999 false
23,12/1999 false

Related

Generalized replacement by matching group id

Given a string of the form <digit>-<non-digit> or <non-digit>-<digit>, I need to remove the hyphen (in Python). I.e. 2-f becomes 2f, f-2 becomes f2.
So far I have (?:\d-\D)|(?:\D-\d), which finds the patterns but I can't figure out a way to replace the hyphen with blank. In particular:
if I sub the regex above, it will replace the surrounding characters (because they are the ones matched);
I can do (?:(\d)-(\D))|(?:(\D)-(\d)) to expressly capture the characters and then sub with \1\2 will correctly process 2-f, turning it to 2f... but! it will fail f-2 of course because those characters are in the 3rd and 4th groups, so we'll need to sub with \3\4. Tried to give names to the group failed because all names need to be unique.
I know I can just run it through 2 sub statements, but is there a more elegant solution? I know regex is super-powerful if you know what you're doing... Thank you!
The alternative you could be using \1\2 in the replacement using the regex PyPi module in combination with a branch reset group (?| to be able to use the same group numbers with an alternation.
(?|(\d)-(\D)|(\D)-(\d))
Note that \D can also match a space or a newline. If you want to match a non whitespace char other than a digit, you could also use [^\s\d] instead of \D.
See a Python demo and regex demo.
For example:
import regex
pattern = r"(?|(\d)-(\D)|(\D)-(\d))"
s = "2-f or f-2"
print(regex.sub(pattern, r"\1\2", s))
Output
2f or f2
There is nothing that stops you from replacing with \1\2\3\4:
import re
text = "2-f becomes 2f, f-2 becomes f2"
print( re.sub(r"(\d)-(\D)|(\D)-(\d)", r"\1\2\3\4", text) )
See the regex demo and the Python demo.
This is possible because all backreferences pointing to groups that did not participate in the match are initialized with an empty string beginning with Python 3.5 (before, they were not and that caused issues, see Empty string instead of unmatched group error, and you would have to use a callable as a replacement argument).
Certainly, (?<=\d)-(?=\D)|(?<=\D)-(?=\d) regex, with positive lookarounds instead of capturing groups, looks much cleaner in the current scenario, but it will not work if the boundary patterns are of variable length.

Regex: a number vs. a backreference to a capture group

I've been studying regular expressions, and I'm scratching my head on this one. On this page (https://www.regular-expressions.info/conditional.html) I see that, in a conditional regex, a reference to a numbered backreference is just a number. For example,
(a)?b(?(1)c|d)
How does regex know that we aren't supposed to match the number "1" instead of the backreference to the 1st capture group? Previously in the lessons I had learned that a backreference would be escaped, such as \1, \2, etc.
As per the regex tutorial you're following:
A special construct (?ifthen|else) allows you to create conditional regular expressions. If the if part evaluates to true, then the regex engine will attempt to match the then part. Otherwise, the else part is attempted instead. The syntax consists of a pair of parentheses. The opening bracket must be followed by a question mark, immediately followed by the if part, immediately followed by the then part. This part can be followed by a vertical bar and the else part. You may omit the else part, and the vertical bar with it.
Alternatively, you can check in the if part whether a capturing group has taken part in the match thus far. Place the number of the capturing group inside parentheses, and use that as the if part.
Your second question is this:
RegEx Demo of \b(a)?b(?(1)c|d)\b
Note that I have added word boundary to avoid matching string like abd partially.
What if someone actually wanted to match the literal 1 this way?
valid input: 1c or d invalid input: 1d
That would be:
\b(1)?(?(1)c|d)\b

Mixing Lookahead and Lookbehind in 1 Regexp

I'm trying to match first occurrence of window.location.replace("http://stackoverflow.com") in some HTML string.
Especially I want to capture the URL of the first window.location.replace entry in whole HTML string.
So for capturing URL I formulated this 2 rules:
it should be after this string: window.location.redirect("
it should be before this string ")
To achieve it I think I need to use lookbehind (for 1st rule) and lookahead (for 2nd rule).
I end up with this Regex:
.+(?<=window\.location\.redirect\(\"?=\"\))
It doesn't work. I'm not even sure that it legal to mix both rules like I did.
Can you please help me with translating my rules to Regex? Other ways of doing this (without lookahead(behind)) also appreciated.
The pattern you wrote is really not the one you need as it matches something very different from what you expect: text window.location.redirect("=") in text window.location.redirect("=") something. And it will only work in PCRE/Python if you remove the ? from before \" (as lookbehinds should be fixed-width in PCRE). It will work with ? in .NET regex.
If it is JS, you just cannot use a lookbehind as its regex engine does not support them.
Instead, use a capturing group around the unknown part you want to get:
/window\.location\.redirect\("([^"]*)"\)/
or
/window\.location\.redirect\("(.*?)"\)/
See the regex demo
No /g modifier will allow matching just one, first occurrence. Access the value you need inside Group 1.
The ([^"]*) captures 0+ characters other than a double quote (URLs you need should not have it). If these URLs you have contain a ", you should use the second approach as (.*?) will match any 0+ characters other than a newline up to the first ").

Regex Expression to allow comma only inside a string (within quotes)and not outside it

I am kind of new to regex. I am looking for a regex expression to add it as a constraint not to allow comma outside a string .
My input is like
"1,212121,121212","Extra_data"
Here the regex expression should not check for comma in the first value within quotes "1,212121,121212" but should check after the quotes including ,"Extra_data" . In short expression should allow comma in a string only inside quotes and not outside.
Kindly help me with the expression.
I think this is what you're looking for, essentially a group of numbers or commas surrounded by parentheses then followed by comma and another phrase (not necessarily numbers) in parentheses. Capturing group #1 gives you "1,212121,121212" and capturing group #2 gives you ,"Extra_data"
("[\d,]+")(,"[^"]+")
It would be helpful to see more of how your input might come in. I think that the biggest question that remains is whether that first group always contain only numbers/commas, or are there sometimes other characters such as letters, underscores, etc in that first group? If that first group contains only numbers, as I've assumed, then this should work. If it doesn't, then this will not work.
Edit:
"\s*(,\s*"[^"]+")
try this
".*?(?=,).*?"
it only extract comma in a string only inside quotes
Try the following regex:
"[^"]*"(,)[^"]*"[^"]*"
It will capture the commas you need. But note that PHP has no support for captures of the same groups. i.e. in your case:
If the input is : "1,212121,121212","Extra_data","hel,lo","a,bc"
It will capture commas before "Extra_data" and "a,bc" but will exclude the comma before "hel,lo". For that you'll have to use recursion.
You can try using this regex.
(^,)|("\s*,\s*")|(,$)
If you find any match for this regex, then the string will be invalid.

Trying to figure out how to capture text between slashes regex

I have a regex
/([/<=][^/]*[/=?])$/g
I'm trying to capture text between the last slashes in a file path
/1/2/test/
but this regex matches "/test/" instead of just test. What am I doing wrong?
You need to use lookaround assertions.
(?<=\/)[^\/]*(?=\/[^\/]*$)
DEMO
or
Use the below regex and then grab the string you want from group index 1.
\/([^\/]*)\/[^\/]*$
The easy way
Match:
every character that is not a "/"
Get what was matched here. This is done by creating a backreference, ie: put inside parenthesis.
followed by "/" and then the end of string $
Code:
([^/]*)/$
Get the text in group(1)
Harder to read, only if you want to avoid groups
Match exactly the same as before, except now we're telling the regex engine not to consume characters when trying to match (2). This is done with a lookahead: (?= ).
Code:
[^/]*(?=/$)
Get what is returned by the match object.
The issue with your code is your opening and closing slashes are part of your capture group.
Demo
text: /1/2/test/
regex: /\/(\[^\/\]*?)(?=\/)/g
captures a list of three: "1", "2", "test"
The language you're using affects the results. For instance, JavaScript might not have certain lookarounds, or may actually capture something in a non-capture group. However, the above should work as intended. In PHP, all / match characters must be escaped (according to regex101.com), which is why the cleaner [/] wasn't used.
If you're only after the last match (i.e., test), you don't need the positive lookahead:
/\/([^\/]*?)\/$/