Imagine having 500 string like this with different dates:
The certificate has expired on 02/05/2014 15:43:01 UTC.
Given that this is a String and I'am using powershell. I need to treat the date (02/05/2014) as an object, so I can use operatators (-lt -gt).
Is the only way doing this is using RegEx, and in this case - can anyone help me finding the first 6 numbers (which change every time) using regEx.
>$regexStr = "(?<date>\d{2}\/\d{2}\/\d{4})"
>$testStr = "The certificate has expired on 02/05/2014 15:43:01 UTC."
>$testStr -match $regexStr
# $Matches will contain the regex group called "date"
>$Matches.date
02/05/2014
>$date = Get-Date ($Matches.date)
>$date
Wednesday, February 5, 2014 12:00:00 AM
If you need to parse the date string with another format you can do:
>$dateObj = [datetime]::ParseExact($Matches.date,”dd/MM/yyyy”,$null)
>$dateObj.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True DateTime System.ValueType
Hope that helps
Related
I wanted to separate the time and date from this string using REGEX because I feel like it is the only way I can separate it. But I am not really familiar on how to do it maybe someone can help me out here.
The original string: Your item was delivered in or at the mailbox at 3:34 pm on September 1, 2016 in TEXAS, MT 59102
The output i want to achieve/populate:
lv_time = 3:34 pm
lv_date = September 1, 2016
Here's the code I was trying to do but I am only able to cut it like this:
lv_status = Your item was delivered in or at the mailbox at
lv_time = 3
lv_date = :34 pm on September 1, 2016 in TEXAS, MT 59102.
Here's the code I have so far:
DATA: lv_status TYPE string,
lv_time TYPE string,
lv_date TYPE string,
lv_off TYPE i.
lv_status = 'Your item was delivered in or at the mailbox at 3:34 pm on September 1, 2016 in TEXAS, MT 59102.'.
FIND REGEX '(\d+)\s*(.*)' IN lv_status SUBMATCHES lv_time lv_date MATCH OFFSET lv_off.
lv_status = lv_status(lv_off).
You asked for it, here it comes:
\b((1[0-2]|0?[1-9]):([0-5][0-9]) ([AaPp][Mm])) on (January|February|March|April|May|June|July|August|September|October|November|December)\D?(\d{1,2}\D?)?\D?((?:19[7-9]\d|20\d{2})|\d{2})
This accepts time in HH:MM am/pm format, and dates in Jan-Dec, dd 1970-2999.
Each part is captured in its own group.
The demo shows a version that allows abbreviated month names:
Demo
I am trying to use the value.match command in OpenRefine 2.6 for splitting the information presents in a column into (at least) 2 columns.
The data are, however, quite messed up.
I have sometimes full dates:
May 30, 1949
Sometimes full dates are combined with other dates and attributes:
May 30, 1949, published 1979
May 30, 1949 and 1951, published 1979
May 30, 1949, printed 1980
May 30, 1949, print executed 1988
May 30, 1949, prints executed 1988
published 1940
Sometimes you have timespan:
1905-05 OR 1905-1906
Sometimes only the year
1905
Sometimes year with attributes
August or September 1908
Doesn't seems to follow any specific schema or order.
I would like to extract (at least)ca start and end date year, in order to have two columns:
-----------------------
|start_date | end_date|
|1905 | 1906 |
-----------------------
without the rest of the attributes.
I can find the last date using
value.match(/.*(\d{4}).*?/)[0]
and the first one with
value.match(/.*^(\d{4}).*?/)[0]
but I got some trouble with the two formulas.
The latter cannot match anything in case of:
May 30, 1949 and 1951, published 1979
while in the case of:
Paris, winter 1911-12
The latter formula cannot match anything and the former formula match 1911
Anyone know how I can resolve the problem?
I would need a solution that take the first date as start_date and final date as end_date, or better (don't know if it is possible) earliest date as start_date and latest date as end_date.
Moreover, I would be glad to have some clue about how to extract other information, such as
if published or printed or executed is present in the text -> copy date to a new column name “execution”.
should be something like create a new column
if(value.match("string1|string2|string3" + (\d{4}), "perform the operation", do nothing)
value.match() is a very useful but sometimes tricky function. To extract a pattern from a text, I prefer to use Python/Jython's regular expressions :
import re
pattern = re.compile(r"\d{4}")
return pattern.findall(value)
From there, you can create a string with all the years concatenated:
return ",".join(pattern.findall(value))
Or select only the first:
return pattern.findall(value)[0]
Or the last:
return pattern.findall(value)[-1]
etc.
Same thing for your sub-question:
import re
pattern = re.compile(r"(published|printed|executed)\s+(\d+)")
return pattern.findall(value)[0][1]
Or :
import re
pattern = re.compile(r"(published|printed|executed)\s+(\d+)")
m = re.search(pattern, value)
return m.group(2)
Example:
Here is a regex which will extract start_date and end_date in named groups :
If there is only one date, then it consider it's the start_date :
((?<start_date>\d{4}).*?)?(?<end_date>\d{4}|(?<=-)\d{2})?$
Demo
The query
SELECT REGEXP_SUBSTR('Outstanding Trade Ticket Report_08 Apr 14.xlsx', '\_(.*)\.') AS FILE_DATE FROM DUAL
gives the OUTPUT:
_08 Apr 14.
Please advise the correct regex to be used for getting the date without the characters.
I can use RTRIM and LTRIM but want to try it using regex.
You can use:
SELECT REGEXP_SUBSTR('Outstanding Trade Ticket Report_08 Apr 14.xlsx', '\_(.*)\.',
1, 1, NULL, 1) from dual
The last argument is used to determine which matched group to return.
Link to Fiddler
I have been struggling making a regex to extract the information in below divided in 3 part between the ",". Only the first and second sequence (Friday and the date has succeded).
Friday, 26 Apr 2013, 18:30
I hope someone has the experience.
Best regards
Why not simply split the string and trim the excess whitespace of the individual parts? For example, verbosely written in C#:
string input = "Friday, 26 Apr 2013, 18:30";
string[] parts = input.Split(',');
for (int i = 0; i < parts.Length; i++)
{
parts[i] = parts[i].Trim();
}
Console.WriteLine(parts[0]); // "Friday"
Console.WriteLine(parts[1]); // "26 Apr 2013"
Console.WriteLine(parts[2]); // "18:30"
If you really want to use a regular expression for this, ^(.*),(.*),(.*)$ should work:
string input = "Friday, 26 Apr 2013, 18:30";
Regex regex = new Regex("^(.*),(.*),(.*)$", RegexOptions.Singleline);
Match match = regex.Match(input);
Console.WriteLine(match.Groups[1].Value.Trim()); // "Friday"
Console.WriteLine(match.Groups[2].Value.Trim()); // "26 Apr 2013"
Console.WriteLine(match.Groups[3].Value.Trim()); // "18:30"
Adding appropriate error checking is left as an exercise for the reader.
The following Regex expression is matching this whole part :
, 18:30
I hope someone has the experience.
Best regards
,+\s[0-9]+:[0-9]+ \r*.*
But yeah, that's kind of ultra specific to this ", Hour:Minuts [...]" format. You should do a split if you're using PHP or the equivalent in your language.
I think what you really want is something like this:
from datetime import datetime
s="Friday, 26 Apr 2013, 18:30"
d=datetime.strptime(s, "%A, %d %b %Y, %H:%M")
d
Out[7]: datetime.datetime(2013, 4, 26, 18, 30)
See the strptime and date format docs for details :)
Edit: sorry, I was somehow assuming you were using Python. Other languages have similar idioms though, e.g. PHP's date_parse, C#'s DateTime.Parse, etc.
You didn't specify a language so I'm going to answer this with a standard REGEX approach.
(?<=(^|,\s+)).+?(?=(,|$)) Will work for you.
Let me break up what it's doing.
(?<=(^|,\s+) - Look ahead for the start of a string or a comma followed by whitespace, but don't include it in the match. All matches must have this in front of them.
.+? - Grab all characters, but don't be greedy.
(?=(,|$)) - Look behind for the end of string or a comma. All matches must have this behind them.
When ran on your test case of Friday, 26 Apr 2013, 18:30, I get 3 matches:
Friday
26 Apr 2013
18:30
Like m01's answer, you could try this approach with C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Globalization;
namespace TestDate
{
class Program
{
static void Main(string[] args)
{
string dateString = "Friday, 26 Apr 2013, 18:30"; // Modified from MSDN
string format = "dddd, dd MMM yyyy, HH:mm";
DateTime dateTime = DateTime.ParseExact(dateString, format, CultureInfo.InvariantCulture);
Console.WriteLine(dateTime);
Console.Read();
}
}
}
This will print out the localized date and time that is configured on the user's machine. For me it printed out 4/16/2013 6:30:00 PM.
Is there any way I can easily check if a string conforms to the SortableDateTimePattern ("s"), or do I need to write a regular expression?
I've got a form where users can input a copyright date (as a string), and these are the allowed formats:
Year: YYYY (eg 1997)
Year and month: YYYY-MM (eg 1997-07)
Complete date: YYYY-MM-DD (eg 1997-07-16)
Complete date plus hours and minutes: YYYY-MM-DDThh:mmTZD (eg 1997-07-16T19:20+01:00)
Complete date plus hours, minutes and seconds: YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00)
Complete date plus hours, minutes, seconds and a decimal fraction of a second
YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00)
I don't have much experience of writing regular expressions so if there's an easier way of doing it I'd be very grateful!
Not thoroughly tested and hence not foolproof, but the following seems to work:
var regex:RegExp = /(?<=\s|^)\d{4}(-\d{2}(-\d{2}(T\d{2}:\d{2}(:\d{2}(\.\d{2})?)?\+\d{2}:\d{2})?)?)?(?=\s|$)/g;
var test:String = "23 1997 1998-07 1995-07s 1937-04-16 " +
"1970-0716 1993-07-16T19:20+01:01 1979-07-16T19:20+0100 " +
"2997-07-16T19:20:30+01:08 3997-07-16T19:20:30.45+01:00";
var result:Object
while(result = regex.exec(test))
trace(result[0]);
Traced output:
1997
1998-07
1937-04-16
1993-07-16T19:20+01:01
2997-07-16T19:20:30+01:08
3997-07-16T19:20:30.45+01:00
I am using ActionScript here, but the regex should work in most flavors. When implementing it in your language, note that the first and last / are delimiters and the last g stands for global.
I'd split the input field into many (one for year, month, day etc.).
You can use Javscript to advance from one field to the next once full (i.e. once four characters are in the year box, move focus to month) for smoother entry.
You can then validate each field independently and finally construct the complete date string.