I have a string in date format 06/09/2011 03:00 PM. I want to remove all of the forward slashes, and if the first digit of the month (06) is a zero, remove it as well as the first digit of the day (09) remove it as well. Any body who can help me out?
thanks!
The usual way to do this is by taking an available date parser where you hand in the input format and output it to a different output format.
Patterns differ, Implementations etc differ also. It is not convenient neither practicable to do date parsing via regex.
Something like that
0([1-9]+)/0([1-9]+)/([0-9]+)
Of course, it will only work in valid dates; it does not parse the date or anything.
BTW: I find better (more readable, detects errors in a more meaningful manner) fyr's answer. This is just to show that it can be done with regex, if fyr's solution is not available in your platform.
Related
First off, I'm sorry if this has already been asked somewhere else, it's just I could not find it. If it has been, I apologize deeply.
I am terrible at Regular Expressions and generally avoid them but I know the problem I have can be simply solved using them so I have come here for help.
I have a text field containing some information about a company (name, address, identifier, etc), but not all information always appears on the field and the order the information appears in is not set.
What I need is the company identifier which is a 14-digit number which can or cannot be formatted as such: XX.XXX.XXX/XXXX-XX
What expression could I use that would identify if there are either 14 digits in a row or the number formatted in the manner described above?
/[0-9]{2}[.]{1}[0-9]{3}[.]{1}[0-9]{3}[\/]{1}[0-9]{4}[-]{1}[0-9]{2}/ for XX.XXX.XXX/XXXX-XX
/[0-9]{14}/ for 14 digits
There's probably some edge cases in here somewhere.
There's also probably a way to do both of these in one, but I don't have the patience nor the time to try and figure it out.
Try Regex: \b(?:\d{14}|\d{2}\.\d{3}\.\d{3}\/\d{4}-\d{2})\b
Demo
I know it's easy enough to do manual corrections on date typos, but I want to automate such corrections using one or more SAS functions, given that my dataset is large and typos are frequent.
For instance, it seems that whomever created the dataset I am cleaning often transposed digits in the year of someone's birthdate (e.g., '2102' rather than '2012', '2110' instead of '2010', etc). I'm aware of string functions such as INDEX() that find certain character values or strings and then allow for the replacement of said characters in the same position (i.e., replace "ABCD" with "ABBB", regardless of the string's location in a value). Can the same process be replicated with numeric (and specifically date) values?
I don't think SAS has any functions that would check numeric values for digit patterns. I often do data cleaning and address this issue by making a character variable out of the numeric date variable, then using character functions and Perl regex to clean the character values, and then storing the cleaned values as numeric date.
For specifically date values, you could try using SAS date functions (e.g. DAY(), MONTH(), YEAR(), MDY(), etc.) to extract parts of the date value, error-check them, and put them all back together into a date value. This could be a good quick solution if you expect a limited set of typos and you roughly know what they are. For a more thorough error check, converting the numeric values to character and using char or regex functions would give you more options.
The only really concise suggestion I can imagine is using mdy (Assuming this is date, not datetime variables).
For example:
data want;
set have;
if year(datevar) > 2100 then
datevar = mdy(month(datevar),day(datevar),year(datevar)-90);
run;
would correct any '2104' to '2014'. That's a very simple correction (and may well do as much harm as good, since '2114' is also a possible typo), but things along those lines - break the date up into its pieces, verify the pieces, reconstruct using mdy.
I write a regular expression to determine the date time.(the assumption are every month has 31 days and the year only contain 1900 to 2099)
^(((((0?[1-9]|1[012])[- /.\\](0?[1-9]|[12][0-9]|3[01]))|((0?[1-9]|(1|2)[0-9]|3[01])[- /.\\](0?[1-9]|[1][012])))([- /.\\](19|20)\d{2})))$
the format of date time are:
dd-mm-yyyy
mm-dd-yyyy
0m-0d-yyyy
0d-0m-yyyy
m-d-yyyy
d-m-yyyy
everything works fine except one thing; if the date time like 32-10-2010, in my thought it should not be recognized, but in regex tester 2-10-2010 has been recognized. I wonder if there is any way to modify the regular expression to prevent it.
After removing the / at the end, your RegEx is working for me. Here's a simple Sublime Text RegEx Find/Replace:
Here is the adjusted regex:
^(((((0?[1-9]|1[012])[- /.\\](0?[1-9]|[12][0-9]|3[01]))|((0?[1-9]|(1|2)[0-9]|3[01])[- /.\\](0?[1-9]|[1][012])))([- /.\\](19|20)\d{2})))$
But a better solution would be to use the languages native date functionality. I can't think of a language that doesn't have inbuilt methods for these sorts of things.
For example, using JavaScript's Date object, or some such...
Try this one:
^((3[01]|0?[1-9]|[1-2][0-9])-(1[012]|0?[1-9])|((1[012]|0?[1-9])-(3[01]|0?[1-9]|[1-2][0-9])))-(19|20)[0-9][0-9]$
I've already given such an answer here.
This match one invalid date : 29-02-1900 but is correct for any date between 01-01-1900 and 31-12-2099
The valid date format pattern in your case is:
/^\d{1,2}-\d{1,2}-\d{4}$/
With RegEx you can validate only format of date, not a correct date, because it's a bad practice! Months can be with different days, so good luck to write pattern that will be consider it.
If You want to validate is date correct, use other build-in functions in your language. For example checkdate for PHP or etc.
I need an expression for the form yyyy-mm-dd(like date format), and I used regular expression : ^(19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$
But it only includes date till year 2099 but it is not the requirement, how to increase the upper limit to more ? Like 3000-01-01, 5000-01-01... etc
You need something like this:
^\d{4}[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$
I did not test it...
The problem lies with the "19|20" you see at the beginning. That's saying that the year must start with 19 or 20. You can generalize with by substituting (19|20) with \d\d which is to say any two digits are allowed. So the following:
^\d\d\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$
Whether this is acceptable or not is entirely up to you, though the more flexible you are with these sorts of things, the easier it is for a user to insert a value which they probably didn't mean to insert.
Also do keep in mind that there are other date formats used in the world. :)
I have this regex (\d{4})-(\d{2})-(\d{2}) to detect a valid date, however, it is not perfect as some of the incoming data are 2009-24-09 (YYYY-DD-MM) and some are 2009-09-24 (YYYY-MM-DD).
Is it possible to have a one-line regex to detect whether the second & third portion is greater than 12 to better validate the date?
If you don't know the format, you will get ambiguous results.
take 2010-01-04 is that January 4th or March 1st?
You can't validate that with a regex.
As Albert said, try to parse the date, and make sure users know which format to use. You might try to separate the month and year portions into different fields or comboboxes.
Regex are not really good with dates validation, in my opinion is better to try to parse the date, and you could keep the regex as a sanity check before parsing it.
But if you still need it you can fix the month section using the following regex (\d{4})-(\d{2})-((1[012])|(0\d)|\d) but it goes downhill after that, since you need to check for correct days on months and leap years.
(\d{4})-((0[1-9]|1[0-2])-(\d{2}))|((\d{2})-(0[1-9]|1[0-2]))
YYYY-(MM-DD)|(DD-MM)
to validate YYYY-MM-DD or YYYY-DD-MM:
$ptn = '/(\d{4})-(?:(0[1-9]|1[0-2])-(0[1-9]|[1-2][0-9]|3[0-2])|(0[1-9]|' .
'[1-2][0-9]|3[0-2])-(0[1-9]|1[0-2]))/';
echo preg_match_all($ptn, '2009-24-09 2009-09-24 dd', $m); // returns 2
even so, the date could be invalid, e.g.: 2010-02-29, to deal with that there's checkdate():
checkdate(2, 29, 2010); // returns false