I am trying to put together a regex statement to match on each of the below date formats.
* Mar 7, 2017
Mar. 7, 2017
* March 7, 2017
3-7-2017
03-07-2017
3-7-17
03-07-17
* 03/7/2017
* 03/07/17
* 3/7/17
Mar-07-2017
Mar-7-2017
March-07-2017
The below regex matches on the date formats that are indicated by an asterisk above. I have tried in vain to add to what I already have but have been unsuccessful.
([0-9]+)/([0-9]+)/([0-9]+)|([12]\d{3}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]))|\w+\s\d{2},\s\d{4}|(?i)\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec](?:ember)?)\b
(?:0?[1-9]|[1-2][0-9]|3[01]),? \d{4}
Any help is always appreciated!
* Bonus question *
On some occasions, there may be multiple date matches and I need it to find a match following a certain word. In the past I've used the below syntax by enclosing the regex statement between the parenthesis after the period.
(?<=Word).(StatementHere)
Try this then ...
([0-9]+)/([0-9]+)/([0-9]+)|((0?[1-9]|1[0-2])-(0?[1-9]|[12]\d|3[01])-(\d{4}|\d{2}))|\w+\s\d{2},\s\d{4}|(?i)\b(Jan(?:uary|\.)?|Feb(?:ruary|\.)?|Mar(?:ch|\.)?|Apr(?:il|\.)?|May|Jun(?:e|\.)?|Jul(?:y|\.)?|Aug(?:ust|\.)?|Sep(?:tember|\.)?|Oct(?:ober|\.)?|Nov(?:ember|\.)?|Dec(?:ember|\.)?)([ ](?:0?[1-9]|[1-2][0-9]|3[01]),?[ ]|-(?:0?[1-9]|[1-2][0-9]|3[01])-)(\d{4})
https://regex101.com/r/k1vaVN/1
Readable version
( [0-9]+ ) # (1)
/
( [0-9]+ ) # (2)
/
( [0-9]+ ) # (3)
|
( # (4 start)
( 0? [1-9] | 1 [0-2] ) # (5)
-
( 0? [1-9] | [12] \d | 3 [01] ) # (6)
-
( \d{4} | \d{2} ) # (7)
) # (4 end)
|
\w+ \s \d{2} , \s \d{4}
|
(?i)
\b
( # (8 start)
Jan
(?: uary | \. )?
| Feb
(?: ruary | \. )?
| Mar
(?: ch | \. )?
| Apr
(?: il | \. )?
| May
| Jun
(?: e | \. )?
| Jul
(?: y | \. )?
| Aug
(?: ust | \. )?
| Sep
(?: tember | \. )?
| Oct
(?: ober | \. )?
| Nov
(?: ember | \. )?
| Dec
(?: ember | \. )?
) # (8 end)
( # (9 start)
[ ]
(?: 0? [1-9] | [1-2] [0-9] | 3 [01] )
,? [ ]
| -
(?: 0? [1-9] | [1-2] [0-9] | 3 [01] )
-
) # (9 end)
( \d{4} ) # (10)
update
Just wrap the dates in a (?: ) group, then add whatever qualifier before
it that you need.
word[ ]or[ ]phrase[ ]+\K(?:([0-9]+)/([0-9]+)/([0-9]+)|((0?[1-9]|1[0-2])-(0?[1-9]|[12]\d|3[01])-(\d{4}|\d{2}))|\w+\s\d{2},\s\d{4}|(?i)\b(Jan(?:uary|\.)?|Feb(?:ruary|\.)?|Mar(?:ch|\.)?|Apr(?:il|\.)?|May|Jun(?:e|\.)?|Jul(?:y|\.)?|Aug(?:ust|\.)?|Sep(?:tember|\.)?|Oct(?:ober|\.)?|Nov(?:ember|\.)?|Dec(?:ember|\.)?)([ ](?:0?[1-9]|[1-2][0-9]|3[01]),?[ ]|-(?:0?[1-9]|[1-2][0-9]|3[01])-)(\d{4}))
Related
There's a brilliant bit of regex by Macs Dickinson that validates DD/MM/YYYY strings taking into account allowable days for each month (eg 28 vs 30 vs 31) and the possibility of February 29th but only on leap years:
^(((0[1-9]|[12][0-9]|3[01])[- /.](0[13578]|1[02])|(0[1-9]|[12][0-9]|30)[- /.](0[469]|11)|(0[1-9]|1\d|2[0-8])[- /.]02)[- /.]\d{4}|29[- /.]02[- /.](\d{2}(0[48]|[2468][048]|[13579][26])|([02468][048]|[1359][26])00))$
I'm looking to re-arrange this to use for MM/DD/YYYY strings, but I can't wrap my head around it enough to get it there. Any help is much appreciated!
The regex you posted has a mistake in the Leap Year section.
Otherwise it is a fair facsimile of that form.
I've fixed his mistake, and rearranged the day/month for you.
It now matches MM/DD/YYYY.
((?:0[13578]|1[02])[- /.](?:0[1-9]|[12][0-9]|3[01])|(?:0[469]|11)[- /.](?:0[1-9]|[12][0-9]|30)|02[- /.](?:0[1-9]|1\d|2[0-8]))[- /.](\d{4})|(02[- /.]29)[- /.]([0-9]{2}(?:0[48]|[13579][26]|[2468][048])|(?:[02468][048]|[13579][26])00)
Formatted / explained
( # (1 start), Non-LeapYr MM/DD
(?: 0 [13578] | 1 [02] ) # Months with 31 days
[- /.]
(?: 0 [1-9] | [12] [0-9] | 3 [01] )
|
(?: 0 [469] | 11 ) # Months with 30 days
[- /.]
(?: 0 [1-9] | [12] [0-9] | 30 )
|
02 # February with 28 days
[- /.]
(?: 0 [1-9] | 1 \d | 2 [0-8] )
) # (1 end)
[- /.]
( \d{4} ) # (2), Any year 0000 - 9999
| # OR,
( 02 [- /.] 29 ) # (3), LeapYear MM/DD
[- /.]
( # (4 start), Leap Years 0000 - 9996
[0-9]{2}
(?: 0 [48] | [13579] [26] | [2468] [048] )
|
(?: [02468] [048] | [13579] [26] )
00
) # (4 end)
Below is just two lines of string that I am matching too
6 |UDP |ENABLED | |15006 |010.247.060.120 | UDP/IP Communications | UDP/IP Communications GH1870
10 |Gway |ONLINE | |41794 |127.000.000.001 | DM-MD64x64 | DM-MD64x64
Below is the regex I have so far, but it only matches the bottom line
(?i)(?<cipid>([\w\.]+))\s*\|\s*(?<ty>\w+)?\s*\|\s*(?<stat>[\w ]+)\s*\|\s*(?<devid>\w+)?\s*\|\s*(?<prt>\d+)\s*\|\s*(?<ip>([\d\.]+))\s*\|\s*(?<mdl>[\w-]+)\s*\|\s*(?<desc>.+)
I was wondering if I could have a regular expression that just matches every character between every vertical line, instead of having to explicitly say what is between the vertical lines
Thanks all
This usually works. (?:^|(?<=\|))[^|]*?(?=\||$)
https://regex101.com/r/KMNc47/1
Formatted
(?: ^ | (?<= \| ) ) # BOS or Pipe behind
[^|]*? # Optional non-pipe chars
(?= \| | $ ) # Pipe ahead or EOS
Here it is with whitespace trim and includes a capture group.
(?:^|(?<=\|))\s*([^|]*?)\s*(?=\||$)
https://regex101.com/r/KMNc47/2
Formatted
(?: ^ | (?<= \| ) ) # BOS or Pipe behind
\s*
( [^|]*? ) # (1), Optional non-pipe chars
\s*
(?= \| | $ ) # Pipe ahead or EOS
Here it is in a Capture Collection configuration.
(?:(?:^|\|)\s*([^|]*?)\s*(?=\||$))+
https://regex101.com/r/KMNc47/3
Formatted
(?:
(?: ^ | \| ) # BOS or Pipe
\s*
( [^|]*? ) # (1), Optional non-pipe chars
\s*
(?= \| | $ ) # Pipe ahead or EOS
)+
I'm trying to come up with two regular expressions, one for latitude value, -85.05112878 < lat < 85.05112878, and one for longitude value, -180.0 < long < 180.0
help is much appreciated
Not very pretty, you can try this one for the latitude
-85.05112878 < lat < 85.05112878
^(?:-?85\.0(?:000000\d*|0{1,5}(?:[1-9]\d*)?|[1-4]\d*|5(?:0\d*)?|5(?:1(?:0\d*)?)?|511(?:[0-1]\d*)?|5112(?:[0-7]\d*)?|51128(?:[0-6]\d*)?|511287[0-8]?)?0*|(?:-[1-9]|-?[1-7]\d|-?8[0-4]|\d)\.\d+)$
Expanded
^
(?:
-? 85
\.0
(?:
000000 \d*
| 0{1,5} (?: [1-9] \d* )?
| [1-4] \d*
| 5 (?: 0 \d* )?
| 5 (?: 1 (?: 0 \d* )? )?
| 511 (?: [0-1] \d* )?
| 5112 (?: [0-7] \d* )?
| 51128 (?: [0-6] \d* )?
| 511287 [0-8]?
)?
0*
|
(?:
- [1-9]
| -? [1-7] \d
| -? 8 [0-4]
| \d
)
\. \d+
)
$
And this for the longitude
-180.0 < long < 180.0
^(?:-?180\.0+|(?:-[1-9]|-?[1-9]\d|-?1[0-7]\d|\d)\.\d+)$
Expanded
^
(?:
-? 180 \. 0+
|
(?:
- [1-9]
| -? [1-9] \d
| -? 1 [0-7] \d
| \d
)
\. \d+
)
$
edit
This is the same as above except it matches partial (valid) forms like
54
54.
54.1
etc ...
lat
^(?:-?85(?:\.(?:0(?:000000\d*|0{1,5}(?:[1-9]\d*)?|[1-4]\d*|5(?:0\d*)?|5(?:1(?:0\d*)?)?|511(?:[0-1]\d*)?|5112(?:[0-7]\d*)?|51128(?:[0-6]\d*)?|511287[0-8]?)?)?0*)?|(?:-[1-9]|-?[1-7]\d|-?8[0-4]|\d)(?:\.\d*)?)$
Expanded
^
(?:
-? 85
(?:
\.
(?:
0
(?:
000000 \d*
| 0{1,5} (?: [1-9] \d* )?
| [1-4] \d*
| 5 (?: 0 \d* )?
| 5 (?: 1 (?: 0 \d* )? )?
| 511 (?: [0-1] \d* )?
| 5112 (?: [0-7] \d* )?
| 51128 (?: [0-6] \d* )?
| 511287 [0-8]?
)?
)?
0*
)?
|
(?:
- [1-9]
| -? [1-7] \d
| -? 8 [0-4]
| \d
)
(?: \. \d* )?
)
$
long
^(?:-?180(?:\.0*)?|(?:-[1-9]|-?[1-9]\d|-?1[0-7]\d|\d)(?:\.\d*)?)$
Expanded
^
(?:
-? 180
(?: \. 0* )?
|
(?:
- [1-9]
| -? [1-9] \d
| -? 1 [0-7] \d
| \d
)
(?: \. \d* )?
)
$
I have a .txt file which contains:
"'the url address i checked is: https://www.google.com/ for 2times and it's awesome!."
After parsing, the expected output should be:
['"',"'",'the','url','address','i','checked','is',':','https://www.google.com/','for','2','times','and',"it's",'awesome','!','.','"']
How do I split this list to get the output using the re module.
I came up with this pattern:
pattern = re.compile(r"\d+|[a-zA-Z]+[a-zA-Z']*|[^\w\s]")
but this is also splitting my URL.
Can any one please help?
Just pick a url regex from somewhere and make it first in the alternations.
An example only -
# (?!mailto:)(?:(?:https?|ftp)://)?(?:\S+(?::\S*)?#)?(?:(?:(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))|localhost)(?::\d{2,5})?(?:/[^\s]*)?|\d+|[a-zA-Z]+[a-zA-Z']*|[^\w\s]
(?! mailto: )
(?:
(?: https? | ftp )
://
)?
(?:
\S+
(?: : \S* )?
#
)?
(?:
(?:
(?:
[1-9] \d?
| 1 \d\d
| 2 [01] \d
| 22 [0-3]
)
(?:
\.
(?: 1? \d{1,2} | 2 [0-4] \d | 25 [0-5] )
){2}
(?:
\.
(?:
[1-9] \d?
| 1 \d\d
| 2 [0-4] \d
| 25 [0-4]
)
)
| (?:
(?: [a-z\u00a1-\uffff0-9]+ -? )*
[a-z\u00a1-\uffff0-9]+
)
(?:
\.
(?: [a-z\u00a1-\uffff0-9]+ -? )*
[a-z\u00a1-\uffff0-9]+
)*
(?:
\.
(?: [a-z\u00a1-\uffff]{2,} )
)
)
| localhost
)
(?: : \d{2,5} )?
(?: / [^\s]* )?
| \d+
| [a-zA-Z]+ [a-zA-Z']*
| [^\w\s]
Outputs:
['"',"'",'the','url','address','i','checked','is',':','https://www.google.com/','for','2','times','and',"it's",'awesome','!','.','"']
I have defined the following regex for a specific date:
(0[1-9]|1[012]|[1-9])[\/-]
(0[1-9]|1[0-9]|2[0-9]|3[0]|[1-9])[\/-]
(18[0-9]+|19[0-9]+|20[0-9]+|0[1-9]|1[0-9]|2[0-9]|3[0-9]|4[0-9]|5[0-9]|6[0-9]|7[0-9]|8[0-9]|9[0-9])
First line defines the month, second line the date and third year formats.
I am good with the limits for dates, months and years but I dont know how to reject mixed formats like mm/dd-yyyy or mm-dd/yyyy.
Can someone please help??
You can match the first delimiter, then use a back reference to it.
# /(0[1-9]|1[012]|[1-9])([\/-])(0[1-9]|1[0-9]|2[0-9]|3[0]|[1-9])\2(18[0-9]+|19[0-9]+|20[0-9]+|0[1-9]|[1-9][0-9])/
( 0 [1-9] | 1 [012] | [1-9] ) # (1), Month
( [/-] ) # (2), Delimiter / or -
( # (3 start), Day
0 [1-9]
| 1 [0-9]
| 2 [0-9]
| 3 [0]
| [1-9]
) # (3 end)
\2 # Delimiter backreference
( # (4 start), Year
18 [0-9]+
| 19 [0-9]+
| 20 [0-9]+
| 0 [1-9]
| [1-9] [0-9]
) # (4 end)