Matching anything else but a date - regex

I'm using the REGEX below to effectively check if a string is a YYYY-MM-DD date.
[0-9]{4}-[0-9]{2}-[0-9]{2}
How do I do the reverse and use a similar REGEX to check a string is NOT a date in this format.

You could use a negative look-ahead to solve this:
(?![0-9]{4}-[0-9]{2}-[0-9]{2})
Edit: After watching the post by Lucasus I've made a new regex to have a more strict validation.
Year can be any combination of 4 digits, which e.g. allows dates pre 1900
The month is in the range of 1-12
Days in the range of 1-31,
Validates in the format YYYY-MM-DD
New Regex:
(?!([0-9]{4})-([1-9]|0[1-9]|1[012])-([1-9]|0[1-9]|[12][0-9]|3[01]))

Your regex matches more than only a valid date (for example "3333-99-99"), You can use a longer expression:
^(19[0-9][0-9]|20[0-9][0-9])-([1-9]|0[1-9]|1[012])-([1-9]|0[1-9]|[12][0-9]|3[01])$
If you want match everything but the date, use negative look-ahead, as Marcus wrote:
^(?!((19[0-9][0-9]|20[0-9][0-9])-([1-9]|0[1-9]|1[012])-([1-9]|0[1-9]|[12][0-9]|3[01])))$
The regex is from this link

import datetime
try:
datetime.datetime.strptime('2011-10-27','%Y-%m-%d')
except ValueError:
print 'The string is NOT in the right format'
else:
print 'The string is in the right format'
Indeed this isn't regex, but it might perform better - may be worth benchmarking...

First, your regex doesn't ensure that the string is a date, just that it contains one. If you wanted to make sure the string contains nothing but a date (according to your formulation), you would need to anchor it:
^[0-9]{4}-[0-9]{2}-[0-9]{2}$
...and the simplest way to ensure that the string that is not a date is to try to match it with that regex and negate the result. How you do that depends on the language; in C# you could do this:
if ( !Regex.IsMatch(s, "^[0-9]{4}-[0-9]{2}-[0-9]{2}$") )
If you must do the check with the regex itself (for example, if you're using a simple regex-based validation control), you can use a negative lookahead, as other responders advised:
^(?![0-9]{4}-[0-9]{2}-[0-9]{2}$).*
Starting from the beginning of the string (because of the ^), the lookahead tries to match a date followed by the end of the string ($). If that fails, the match position is reset to the beginning, and the .* goes ahead and consumes the whole string.

Related

Regex to get all data before second last special character

I couldnt find any previously asked question similar to this. I need a regex to get all the data before second last special character.
For example:
suite 1, street 1, zip city, country
I need only suite 1, street 1.
I know how to get the data before just the last special character using [^,]*$ but not the second last one.
You can use the following regex and the first capturing group will have your desired substring:
(.*)(?:,[^,]*){2}$
Demo: https://regex101.com/r/AWpsL3/1
Or if the tool you're using does support capturing groups, you can use the following regex with lookahead instead:
.*(?=(?:,[^,]*){2}$)
Demo: https://regex101.com/r/AWpsL3/4
you can use look ahead
.+(?=.*,.*,)
explanation
.+ matches everything until the position look head starts , if the look ahead does not fail
Positive Look ahead (?=.*,.*,)
asserts two commas exist at the end
check demo
Depending on the implementation of regex, it may not support lookaround (which is what the above solution uses). A work around for this would be to perform a string split on your delimiter character (in this case, comma). Then perform a string join of the first two elements.
mystr = 'suite 1, street 1, zip city, country';
parts = mystr.split(',');
return parts[0]+','+parts[1];
Try (([ \w]+,?){2})(?=,) and dont use global flag (doesn't return after first match)
regex

Regular expression: matching part of words [duplicate]

I'm trying to make a Regex that matches this string {Date HH:MM:ss}, but here's the trick: HH, MM and ss are optional, but it needs to be "HH", not just "H" (the same thing applies to MM and ss). If a single "H" shows up, the string shouldn't be matched.
I know I can use H{2} to match HH, but I can't seem to use that functionality plus the ? to match zero or one time (zero because it's optional, and one time max).
So far I'm doing this (which is obviously not working):
Regex dateRegex = new Regex(#"\{Date H{2}?:M{2}?:s{2}?\}");
Next question. Now that I have the match on the first string, I want to take only the HH:MM:ss part and put it in another string (that will be the format for a TimeStamp object).
I used the same approach, like this:
Regex dateFormatRegex = new Regex(#"(HH)?:?(MM)?:?(ss)?");
But when I try that on "{Date HH:MM}" I don't get any matches. Why?
If I add a space like this Regex dateFormatRegex = new Regex(#" (HH)?:?(MM)?:?(ss)?");, I have the result, but I don't want the space...
I thought that the first parenthesis needed to be escaped, but \( won't work in this case. I guess because it's not a parenthesis that is part of the string to match, but a key-character.
(H{2})? matches zero or two H characters.
However, in your case, writing it twice would be more readable:
Regex dateRegex = new Regex(#"\{Date (HH)?:(MM)?:(ss)?\}");
Besides that, make sure there are no functions available for whatever you are trying to do. Parsing dates is pretty common and most programming languages have functions in their standard library - I'd almost bet 1k of my reputation that .NET has such functions, too.
In your edit you mention an unwanted leading space in the result… to check a leading or trailing condition together with your regex without including this to the result you can use lookaround feature of regex.
new Regex(#"(?<=Date )(HH)?:?(MM)?:?(ss)?")
(?<=...) is a lookbehind pattern.
Regex test site with this example.
For input Date HH:MM:ss, it will match both regexes (with or without lookbehind).
But input FooBar HH:MM:ss will still match a simple regex, but the lookbehind will fail here. Lookaround doesn't change the content of the result, but it prevents false matches (e.g., this second input that is not a Date).
Find more information on regex and lookaround here.

Regex for url route with query string

I am having hard time learning regex and honestly I have no time at the moment.
I am looking for a regex expression that would match url route with query string
What I need is regex to match population?filter=nation of course where nation can be any string.
Based on my current regex knowledge I have also tried with regex expression /^population\/(?P<filterval>\d+)\/filter$/ to match population/nation/filter but this does not work.
Any suggestion and help is welcome.
This does match only your first query string format:
population\?filter=[\w]+[-_]?[\w]+
Addiotionally it allows for - and _ as bindings between words. If you know, that your string ends right there, you can also add an $ to the end to mark it so.
If you know that the nation is only alphabetical characters, yu can use the simplified version:
population\?filter=[\w]+
Demo

Regular expression for repeated sequence

i am a learner of regular expressions. I am trying to find the date from the below string. The element <ext:serviceitem> can be repeated upto 20 times in actual xml. I need to take out only the date strings (like any element ending with Date in its name, i need that element's value which is a date). For example and . I want all those dates (only) to be printed out.
<ext:serviceitem><ext:name>EnhancedSupport</ext:name><ext:serviceItemData><ext:serviceItemAttribute name="Name">E69D7F93-81F4-09E2-E043-9D3226AD8E1D-1</ext:serviceItemAttribute><ext:serviceItemAttribute name="ProductionDatabase">P1APRD</ext:serviceItemAttribute><ext:serviceItemAttribute name="SupportType">Monthly</ext:serviceItemAttribute><ext:serviceItemAttribute name="Environment">DV1</ext:serviceItemAttribute><ext:serviceItemAttribute name="StartDate">2013-11-04 10:02</ext:serviceItemAttribute><ext:serviceItemAttribute name="EndDate">2013-11-12 10:02</ext:serviceItemAttribute><ext:serviceItemAttribute name="No_of_WeeksSupported"></ext:serviceItemAttribute><ext:serviceItemAttribute name="Cost"></ext:serviceItemAttribute><ext:serviceItemAttribute name="SupportNotes"></ext:serviceItemAttribute><ext:serviceItemAttribute name="FiscalQuarterNumber"></ext:serviceItemAttribute><ext:subscription><ext:loginID>kbasavar</ext:loginID><ext:ouname>020072748</ext:ouname></ext:subscription></ext:serviceItemData></ext:serviceitem><ext:serviceitem><ext:name>EnhancedSupport</ext:name><ext:serviceItemData><ext:serviceItemAttribute name="Name">E69D7F93-81F4-09E2-E043-9D3226AD8E1D-2</ext:serviceItemAttribute><ext:serviceItemAttribute name="ProductionDatabase">P1BPRD</ext:serviceItemAttribute><ext:serviceItemAttribute name="SupportType">Quarterly</ext:serviceItemAttribute><ext:serviceItemAttribute name="Environment">TS2</ext:serviceItemAttribute><ext:serviceItemAttribute name="StartDate">2013-11-11 10:03</ext:serviceItemAttribute><ext:serviceItemAttribute name="EndDate">2013-11-28 10:03</ext:serviceItemAttribute><ext:serviceItemAttribute name="No_of_WeeksSupported"></ext:serviceItemAttribute><ext:serviceItemAttribute name="Cost"></ext:serviceItemAttribute><ext:serviceItemAttribute name="SupportNotes"></ext:serviceItemAttribute><ext:serviceItemAttribute name="FiscalQuarterNumber"></ext:serviceItemAttribute><ext:subscription><ext:loginID>kbasavar</ext:loginID><ext:ouname>020072748</ext:ouname></ext:subscription></ext:serviceItemData></ext:serviceitem>
I tried with below regex, but its returning rest of the string after the first occurence.
(?<=Date\"\>).*(?=\<\/ext\:serviceItemAttribute\>)
Any help would be highly appreciated.
Your problem is that .* is greedy, meaning that it will grab from the first instance of Date to the last instance of </ext:ser..... Replace the .* with .*? and it will alter the behaviour to what you're after.
#(?<=Date">).*?(?=</ext:serviceItemAttribute>)#i
You should have .*? in a capture group: (.*?).
#(?<=Date">)(.*?)(?=</ext:serviceItemAttribute>)#i
You could also do it - more simply - like:
#Date">(.*?)</ext#i
Update
As has been pointed out in the comment below this (above) solution relies on the use of non-greedy matching.
To get around this you could use the following: ([^<]*) instead of (.*?)
NOTE: This does not impact the alternatives below.
Alternatives
/(\d{4}-\d{2}-\d{2})/
/(\d{4}-\d{2}-\d{2} \d{2}:\d{2})/
The above patterns will match dates in the format YYYY-XX-XX and YYYY-XX-XX HH:MM respectively

Regex to extract date with negative lookahead

I am using this pattern to extract confirmation dates from a text file and converting them to a date object (see my post here Extract/convert date from string in MS Access).
The current pattern matches all strings that look like a date, but may not be the confirmation date (which is always preceded by Confirmed by), and moreover, may not have complete date information (e.g. no AM or PM).
Pattern: (\d+/\d+/\d+\s+\d+:\d+:\d+\s+\w+|\d+-\w+-\d+\s+\d+:\d+:\d+)
Sample text:
WHEN COMPARED WITH RESULT OF 7/13/12 09:06:42 NO SIGNIFICANT
CHANGE; Confirmed by SMITH, MD, JOHN (2242) on 7/14/2012 3:46:21 PM;
The above pattern matches the following:
WHEN COMPARED WITH RESULT OF 7/13/12 09:06:42 NO SIGNIFICANT
^^^^^^^^^^^^^^^^^^^^
CHANGE; Confirmed by SMITH, MD, JOHN (2242) on 7/14/2012 3:46:21 PM;
^^^^^^^^^^^^^^^^^^^^
I want the pattern to look for the date in the segment of the text file that begins with Confirmed by and ends with a semi-colon. Also, in order to properly convert the time, the pattern should match only AM or PM at the end. How can I restrict the pattern to this segment and add the additional AM or PM criteria?
Can anyone help?
In order to match the end of the string, use $ at the end of your regex. To match the entire phrase "Confirmed by <someone> on <date>", use plain text (remember that plain text can be used in a regex as well -- if you aren't using special characters, the matcher will match your query verbatim). You need to use a negative look-ahead to exclude entire words.So maybe something like this:
Confirmed by (?!\ on\ )(\d+/\d+/\d+\s+\d+:\d+:\d+\s+\w+|\d+-\w+-\d+\s+\d+:\d+:\d+)$
Which will allow you to match a string that starts with "Confirmed by", followed by anything except for " on ", followed by the date that you capture, and the end of the string.
Edit: the negative look-ahead part is tricky, look at the answer below for more reference:
A regular expression to exclude a word/string
I don't see any need for a lookahead here, positive or negative. This works correctly on your sample string:
Confirmed by [^;]*(\d+/\d+/\d+\s+\d+:\d+:\d+(?:\s+(?:AM|PM))?|\d+-\w+-\d+\s+\d+:\d+:\d+);
The [^;]* effectively corrals the match between a Confirmed by sequence and its closing semicolon. (I'm assuming the semicolon will always be present.)
+(?:\s+(?:AM|PM))? makes the AM/PM optional, along with its leading whitespace.
The actual date will be stored in capturing group #1.
Try this:
(\d+/\d+/\d+\s+\d+:\d+:\d+\s+(?:AM|PM));
The simplest answer is more than often a good enough solution. By turning of the default greedy behavior (using the question mark: .*?) the regular expression will instead try to find the shortest match that matches the pattern. A pattern never matches the same string more than once, this means that each Confirmed by can only be coupled with one date which in this case is the next to follow.
Confirmed by.*?(\d+/\d+/\d+\s+\d+:\d+:\d+\s+(?:AM|PM));