Is there a way to extract in one call all the matched subgroups of a string according to a regular expression.
I have a date like this:
Thu, 07 Apr 2022 15:03:32 GMT
And I created the following regexp to extract all the parts of this date:
let re =
{|\([a-zA-Z]+\), \([0-9]+\) \([a-zA-Z]+\) \([0-9]+\) \([0-9]+\):\([0-9]+\):\([0-9]+\).*|}
And to extract each parts I use it like this:
let parse_date date =
let re =
{|\([a-zA-Z]+\), \([0-9]+\) \([a-zA-Z]+\) \([0-9]+\) \([0-9]+\):\([0-9]+\):\([0-9]+\).*|}
let wday = Str.replace_first re {|\1|} date in
let day = Str.replace_first re {|\2|} date in
let mon = Str.replace_first re {|\3|} date in
let year = Str.replace_first re {|\4|} date in
let hour = Str.replace_first re {|\5|} date in
let min = Str.replace_first re {|\6|} date in
let sec = Str.replace_first re {|\7|} date in
Format.eprintf "RE DATE: %s %s %s %s %s %s %s#." wday day mon year hour min
If the parts were stored in an array I could easily use it like this:
let parse_date date =
let re =
{|\([a-zA-Z]+\), \([0-9]+\) \([a-zA-Z]+\) \([0-9]+\) \([0-9]+\):\([0-9]+\):\([0-9]+\).*|}
let parts = Str.match_groups re date in (* this function doesn't exist *)
let wday = parts.(1) in
let day = parts.(2) in
let mon = parts.(3) in
let year = parts.(4) in
let hour = parts.(5) in
let min = parts.(6) in
let sec = parts.(7) in
Format.eprintf "RE DATE: %s %s %s %s %s %s %s#." wday day mon year hour min
but this doesn't appear to exist. Is there another way to do it or is my solution the only one available?
Since this isn't a XY problem, my goal is really to extract each part of a date so maybe there's another solution than using Str and I'll be happy to use it.
You can use Str.matched_group to return a particular capture group's match:
let parse_date date =
let re = Str.regexp
{|\([a-zA-Z]+\), \([0-9]+\) \([a-zA-Z]+\) \([0-9]+\) \([0-9]+\):\([0-9]+\):\([0-9]+\).*|} in
if Str.string_match re date 0 then
let wday = Str.matched_group 1 date in
let day = Str.matched_group 2 date in
let mon = Str.matched_group 3 date in
let year = Str.matched_group 4 date in
let hour = Str.matched_group 5 date in
let min = Str.matched_group 6 date in
let sec = Str.matched_group 7 date in
Format.sprintf "RE DATE: %s %s %s %s %s %s %s#." wday day mon year hour min sec
"RE DATE: Not matched"
let _ = parse_date "Thu, 07 Apr 2022 15:03:32 GMT" |> print_endline
The Str package is pretty primitive, though. I'd suggest using a different library for regular expressions, like PCRE-Ocaml. It does have a way to get an array of matched groups:
let parse_date2 date =
let rex = Pcre.regexp
{|([a-zA-Z]+), ([0-9]+) ([a-zA-Z]+) ([0-9]+) ([0-9]+):([0-9]+):([0-9]+).*|} in
let parts = Pcre.exec ~rex date |> Pcre.get_substrings in
let wday = parts.(1) in
let day = parts.(2) in
let mon = parts.(3) in
let year = parts.(4) in
let hour = parts.(5) in
let min = parts.(6) in
let sec = parts.(7) in
Format.sprintf "RE DATE: %s %s %s %s %s %s %s#." wday day mon year hour min sec
with Not_found -> "RE DATE: Not matched"
let _ = parse_date2 "Thu, 07 Apr 2022 15:03:32 GMT" |> print_endline
For simple format with fixed number of fields and separators, Scanf might be enough:
let date s = Scanf.sscanf s "%s#, %02d %s %d %d:%d:%d %s"
(fun day_name day month year h m s timezone ->
let x = date "Thu, 07 Apr 2022 15:03:32 GMT"
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I wrote the following regex to match date strings looking like:
2019/01/02 08:20:19
the regex is val reg = "([\\d]{4})/([\\d]{2})/([\\d]{2}) ([\\d]{2}).*.r"
The Scala function is:
val dateExtraction: String => Map[String, String] = {
string: String => {
string match {
case reg(year, month, day, hour) =>
Map(YEAR -> year, MONTH -> month, DAY -> day, HOUR -> hour )
case _ => Map(YEAR -> "", MONTH -> "", DAY -> "", HOUR -> "")
val YEAR = "YEAR"
val DAY = "DAY"
val HOUR= "HOUR"
I want to get the year, month, day and hour from the regex.
But the date above is not parsed as expected and I get a null result. Any idea how to fix this, please.
I would use java.time for such a problem, like:
val input = "2019/01/02 08:20:19";
val formatter = DateTimeFormatter.ofPattern("yyyy/MM/dd HH:mm:ss")
val dt = LocalDateTime.from(formatter.parse(input)).atZone(ZoneId.systemDefault())
dt.getYear() // 2019
dt.getMonthValue() // 1
dt.getDayOfMonth() // 2
dt.getHour() // 8
I wrote the following code :
val reg = "([\\d]{4})-([\\d]{2})-([\\d]{2})(T)([\\d]{2}):([\\d]{2})".r
val dataExtraction: String => Map[String, String] = {
string: String => {
string match {
case reg(year, month, day, symbol, hour, minutes) =>
Map(YEAR -> year, MONTH -> month, DAY -> day, HOUR -> hour)
case _ => Map(YEAR -> "", MONTH -> "", DAY -> "", HOUR -> "")
val YEAR = "YEAR"
val DAY = "DAY"
val HOUR = "HOUR"
This function is supposed to be applied to strings having the following format: 2018-08-22T19:10:53.094Z
When I call the function :
Your pattern, for all its deficiencies, does work. You just have to unanchor it.
val reg = "([\\d]{4})-([\\d]{2})-([\\d]{2})(T)([\\d]{2}):([\\d]{2})".r.unanchored
. . .
//res0: Map[String,String] = Map(YEAR -> 2018, MONTH -> 08, DAY -> 22, HOUR -> 19)
But the comment from #CAustin is correct, you could just let the Java LocalDateTime API handle all the heavy lifting.
import java.time.LocalDateTime
import java.time.format.DateTimeFormatter._
val dt = LocalDateTime.parse("2018-08-22T19:10:53.094Z", ISO_DATE_TIME)
Now you have access to all the data without actually saving it to a Map.
dt.getYear //res0: Int = 2018
dt.getMonthValue //res1: Int = 8
dt.getDayOfMonth //res2: Int = 22
dt.getHour //res3: Int = 19
dt.getMinute //res4: Int = 10
dt.getSecond //res5: Int = 53
Your pattern matches only strings that look exactly like yyyy-mm-ddThh:mm, while the one you are testing against has milliseconds and a Z at the end.
You can append .* at the end of your pattern to cover strings that have additional characters at the end.
In addition, let me show you a more idiomatic way of writing your code:
// Create a type for the data instead of using a map.
case class Timestamp(year: Int, month: Int, day: Int, hour: Int, minutes: Int)
// Use triple quotes to avoid extra escaping.
// Don't capture parts that you will not use.
// Add .* at the end to account for milliseconds and timezone.
val reg = """(\d{4})-(\d{2})-(\d{2})T(\d{2}):(\d{2}).*""".r
// Instead of empty strings, use Option to represent a value that can be missing.
// Convert to Int after parsing.
def dataExtraction(str: String): Option[Timestamp] = str match {
case reg(y, m, d, h, min) => Some(Timestamp(y.toInt, m.toInt, d.toInt, h.toInt, min.toInt))
case _ => None
// It works!
dataExtraction("2018-08-22T19:10:53.094Z") // => Some(Timestamp(2018,8,22,19,10))
I have 2 variables where I get 2 times from datePicker and I need to save on a variable the difference between them.
let timeFormatter = DateFormatter()
timeFormatter.dateFormat = "HHmm"
time2 = timeFormatter.string(from:!
I have tried to get the timeIntervalSince1970 from both of them and them substract them and get the difference on milliseconds which I will turn back to hours and minutes, but I get a very big number which doesn't corresponds to the actual time.
let dateTest = time2.timeIntervalSince1970 - time1.timeIntervalSince1970
Then I have tried using time2.timeIntervalSince(date: time1), but again the result milliseconds are much much more than the actual time.
How I can get the correct time difference between 2 times and have the result as hours and minutes in format "0823" for 8 hours and 23 minutes?
The recommended way to do any date math is Calendar and DateComponents
let difference = Calendar.current.dateComponents([.hour, .minute], from: time1, to: time2)
let formattedString = String(format: "%02ld%02ld", difference.hour!, difference.minute!)
The format %02ld adds the padding zero.
If you need a standard format with a colon between hours and minutes DateComponentsFormatter() could be a more convenient way
let formatter = DateComponentsFormatter()
formatter.allowedUnits = [.hour, .minute]
print(formatter.string(from: time1, to: time2)!)
TimeInterval measures seconds, not milliseconds:
let date1 = Date()
let date2 = Date(timeIntervalSinceNow: 12600) // 3:30
let diff = Int(date2.timeIntervalSince1970 - date1.timeIntervalSince1970)
let hours = diff / 3600
let minutes = (diff - hours * 3600) / 60
To get duration in seconds between two time intervals, this can be used -
let time1 = Date(timeIntervalSince1970: startTime)
let time2 = Date(timeIntervalSince1970: endTime)
let difference = Calendar.current.dateComponents([.second], from: time1, to: time2)
let duration = difference.second
Now you can do it in swift 5 this way,
func getDateDiff(start: Date, end: Date) -> Int {
let calendar = Calendar.current
let dateComponents = calendar.dateComponents([Calendar.Component.second], from: start, to: end)
let seconds = dateComponents.second
return Int(seconds!)
The date format that I'm passing my DateFormatter is not working.
Why is the year appearing as the first component of the date when I specify month in the dateFormat? Why is my am/pm marker in the dateFormat being ignored? Finally, why or how do I correct for time zone? I didn't specify, yet a different time zone has been used.
if let publishDateString = post["publishDate"] as? String {
print("publishDateString is \(publishDateString)") // 2/17/2016 2:49:00 PM
let myDateFormatter = DateFormatter()
myDateFormatter.dateFormat = "M/d/yyyy h:mm:ss a"
let dateFromString = publishDateString)!
print("My date from string is \(dateFromString)") // 2016-02-17 19:49:00 +0000
I am trying to get a proper structured output into a csv.
import pandas as pd
from datetime import datetime,time
import numpy as np
fn = r'00_Dart.csv'
cols = ['UserID','StartTime','StopTime', 'gps1', 'gps2']
df = pd.read_csv(fn, header=None, names=cols)
df['m'] = df.StopTime + df.StartTime
df['d'] = df.StopTime - df.StartTime
# 'start' and 'end' for the reporting DF: `r`
# which will contain equal intervals (1 hour in this case)
start = pd.to_datetime(df.StartTime.min(), unit='s').date()
end = pd.to_datetime(df.StopTime.max(), unit='s').date() + pd.Timedelta(days=1)
# building reporting DF: `r`
freq = '1H' # 1 Hour frequency
idx = pd.date_range(start, end, freq=freq)
r = pd.DataFrame(index=idx)
r['start'] = (r.index - pd.datetime(1970,1,1)).total_seconds().astype(np.int64)
# 1 hour in seconds, minus one second (so that we will not count it twice)
interval = 60*60 - 1
r['LogCount'] = 0
r['UniqueIDCount'] = 0
for i, row in r.iterrows():
# intervals overlap test
# i've slightly simplified the calculations of m and d
# by getting rid of division by 2,
# because it can be done eliminating common terms
u = df[np.abs(df.m - 2*row.start - interval) < df.d + interval].UserID
r.ix[i, ['LogCount', 'UniqueIDCount']] = [len(u), u.nunique()]
r['Day'] = pd.to_datetime(r.start, unit='s').dt.weekday_name.str[:3]
r['StartTime'] = pd.to_datetime(r.start, unit='s').dt.time
r['EndTime'] = pd.to_datetime(r.start + interval + 1, unit='s').dt.time
#df.to_csv((r[r.LogCount > 0])'example.csv')
#print(r[r.LogCount > 0]) -- This gives the correct count and unique count but I want to write the output in a structure.
print (r['StartTime'], ['EndTime'], ['Day'], ['LogCount'], ['UniqueIDCount'])
Output: This is the output that I am getting which is not what I am looking for.
(2004-01-05 00:00:00 00:00:00
2004-01-05 01:00:00 01:00:00
2004-01-05 02:00:00 02:00:00
2004-01-05 03:00:00 03:00:00
2004-01-05 04:00:00 04:00:00
2004-01-05 05:00:00 05:00:00
2004-01-05 06:00:00 06:00:00
2004-01-05 07:00:00 07:00:00
2004-01-05 08:00:00 08:00:00
2004-01-05 09:00:00 09:00:00
And the Expected output headers are
StartTime, EndTime, Day, Count, UniqueIDCount
How do I structure the Write statement in code to have the above mentioned columns in my output csv.
Try This:
rout = r[['StartTime', 'EndTime', 'Day', 'LogCount', 'UniqueIDCount'] ]
print rout
rout.to_csv('results.csv', index=False)