How to write a regex for a date-time string - regex

dateTime = "SATURDAY1200PM1230PMWEEKLY"
Desired Result: "12:00 PM - 12:30 PM"
I tried doing this: let str = "SATURDAY600PM630PMWEEKLY".split(/[^A-Z][0-9]{3,4}(A|P)M/);
But I keep getting an array with chars/numbers. I am unsure if split is the way to go here.

Try a match approach:
var dateTime = "SATURDAY1200PM1230PMWEEKLY";
var ts = dateTime.match(/\d{3,4}[AP]M/g)
.map(x => x.replace(/(\d{1,2})(\d{2})([AP]M)/, "$1:$2 $3"))
.join(" - ");
console.log(ts);

As the programming language was not given I will provide a straightforward solution in Ruby which I expect could be converted easily to most other languages.
str = "SATURDAY1130AM130PMWEEKLY"
rgx = /\A[A-Z]+(\d{1,2})(\d{2})([AP]M)(\d{1,2})(\d{2})([AP]M)[A-Z]+\z/
m = str.match(rgx)
#=> #<MatchData "1130AM130PM" 1:"11" 2:"30" 3:"AM" 4:"1" 5:"30" 6:"PM">
"%s:%s %s - %s:%s %s" % [$1, $2, $3, $4, $5, $6]
#=> "11:30 AM - 1:30 PM"
Demo
The regular expression could be broken down as follows.
\A # match beginning of string
[A-Z]+ # match one or more uppercase letters
(\d{1,2}) # match 1 or 2 digits, save to capture group 1
(\d{2}) # match 2 digits, save to capture group 2
([AP]M) # match 'AM' or 'PM', save to capture group 3
(\d{1,2}) # match 1 or 2 digits, save to capture group 4
(\d{2}) # match 2 digits, save to capture group 5
([AP]M) # match 'AM' or 'PM', save to capture group 6
[A-Z]+ # match one or more uppercase letters
\z # match end of string
The last statement could also be written:
"%s:%s %s - %s:%s %s" % m.captures
#=> "11:30 AM - 1:30 PM"
which of course is specific to Ruby.
Another way is to make use of a language's date-time library. Again, this could be done as follows in Ruby.
require 'time'
s1, s2 = str.scan(/\d{3,4}[AP]M/).map do |s|
s.sub(/(?=\d{2}[AP])/, ' ')
end
#=> ["11 30AM", "1 30PM"]
t1 = DateTime.strptime(s1, '%I %M%p')
#=> #<DateTime: 2022-02-01T11:30:00+00:00
# ((2459612j,41400s,0n),+0s,2299161j)>
t2 = DateTime.strptime(s2, '%I %M%p')
#=> #<DateTime: 2022-02-01T13:30:00+00:00
# ((2459612j,48600s,0n),+0s,2299161j)>
t1.strftime('%l:%M %p') + " - " + t2.strftime('%l:%M %p')
#=> "11:30 AM - 1:30 PM"
If you are wondering why .map do |s| s.sub(/(?=\d{2}[AP])/, ' ') end is needed in calculating s1 and s2 try removing it and changing the format string to '%I%M%p'.

Solution is use match and then convert resoult to your string
let str = "SATURDAY600PM630PMWEEKLY"
.match(/[\d]{3,4}(A|P)M/g)
.map((time) => {
const AMPM = time.slice(-2);
const m = time.slice(-4,-2);
const h = time.slice(0,-4);
return `${h}:${m} ${AMPM}`;
})
.join(' - ')
console.log(str)

Related

Remove only non-leading and non-trailing spaces from a string in Ruby?

I'm trying to write a Ruby method that will return true only if the input is a valid phone number, which means, among other rules, it can have spaces and/or dashes between the digits, but not before or after the digits.
In a sense, I need a method that does the opposite of String#strip! (remove all spaces except leading and trailing spaces), plus the same for dashes.
I've tried using String#gsub!, but when I try to match a space or a dash between digits, then it replaces the digits as well as the space/dash.
Here's an example of the code I'm using to remove spaces. I figure once I know how to do that, it will be the same story with the dashes.
def valid_phone_number?(number)
phone_number_pattern = /^0[^0]\d{8}$/
# remove spaces
number.gsub!(/\d\s+\d/, "")
return number.match?(phone_number_pattern)
end
What happens is if I call the method with the following input:
valid_phone_number?(" 09 777 55 888 ")
I get false because line 5 transforms the number into " 0788 ", i.e. it gets rid of the digits around the spaces as well as the spaces. What I want it to do is just to get rid of the inner spaces, so as to produce " 0977755888 ".
I've tried
number.gsub!(/\d(\s+)\d/, "") and number.gsub!(/\d(\s+)\d/) { |match| "" } to no avail.
Thank you!!
If you want to return a boolean, you might for example use a pattern that accepts leading and trailing spaces, and matches 10 digits (as in your example data) where there can be optional spaces or hyphens in between.
^ *\d(?:[ -]?\d){9} *$
For example
def valid_phone_number?(number)
phone_number_pattern = /^ *\d(?:[ -]*\d){9} *$/
return number.match?(phone_number_pattern)
end
See a Ruby demo and a regex demo.
To remove spaces & hyphen inbetween digits, try:
(?:\d+|\G(?!^)\d+)\K[- ]+(?=\d)
See an online regex demo
(?: - Open non-capture group;
d+ - Match 1+ digits;
| - Or;
\G(?!^)\d+ - Assert position at end of previous match but (negate start-line) with following 1+ digits;
)\K - Close non-capture group and reset matching point;
[- ]+ - Match 1+ space/hyphen;
(?=\d) - Assert position is followed by digits.
p " 09 777 55 888 ".gsub(/(?:\d+|\G(?!^)\d+)\K[- ]+(?=\d)/, '')
Prints: " 0977755888 "
Using a very simple regex (/\d/ tests for a digit):
str = " 09 777 55 888 "
r = str.index(/\d/)..str.rindex(/\d/)
str[r] = str[r].delete(" -")
p str # => " 0977755888 "
Passing a block to gsub is an option, capture groups available as globals:
>> str = " 09 777 55 888 "
# simple, easy to understand
>> str.gsub(/(^\s+)([\d\s-]+?)(\s+$)/){ "#$1#{$2.delete('- ')}#$3" }
=> " 0977755888 "
# a different take on #steenslag's answer, to avoid using range.
>> s = str.dup; s[/^\s+([\d\s-]+?)\s+$/, 1] = s.delete("- "); s
=> " 0977755888 "
Benchmark, not that it matters that much:
n = 1_000_000
puts(Benchmark.bmbm do |x|
# just a match
x.report("match") { n.times {str.match(/^ *\d(?:[ -]*\d){9} *$/) } }
# use regex in []=
x.report("[//]=") { n.times {s = str.dup; s[/^\s+([\d\s-]+?)\s+$/, 1] = s.delete("- "); s } }
# use range in []=
x.report("[..]=") { n.times {s = str.dup; r = s.index(/\d/)..s.rindex(/\d/); s[r] = s[r].delete(" -"); s } }
# block in gsub
x.report("block") { n.times {str.gsub(/(^\s+)([\d\s-]+?)(\s+$)/){ "#$1#{$2.delete('- ')}#$3" }} }
# long regex
x.report("regex") { n.times {str.gsub(/(?:\d+|\G(?!^)\d+)\K[- ]+(?=\d)/, "")} }
end)
Rehearsal -----------------------------------------
match 0.997458 0.000004 0.997462 ( 0.998003)
[//]= 1.822698 0.003983 1.826681 ( 1.827574)
[..]= 3.095630 0.007955 3.103585 ( 3.105489)
block 3.515401 0.003982 3.519383 ( 3.521392)
regex 4.761748 0.007967 4.769715 ( 4.772972)
------------------------------- total: 14.216826sec
user system total real
match 1.031670 0.000000 1.031670 ( 1.032347)
[//]= 1.859028 0.000000 1.859028 ( 1.860013)
[..]= 3.074159 0.003978 3.078137 ( 3.079825)
block 3.751532 0.011982 3.763514 ( 3.765673)
regex 4.634857 0.003972 4.638829 ( 4.641259)

Scala regex on a whole column

I have the following pattern that I could parse using pandas in Python, but struggle with translating the code into Scala.
grade string_column
85 (str:ann smith,14)(str:frank chase,15)
86 (str:john foo,15)(str:al more,14)
In python I used:
df.set_index('grade')['string_column']\
.str.extractall(r'\((str:[^,]+),(\d+)\)')\
.droplevel(1)
with the output:
grade 0 1
85 str:ann smith 14
85 str:frank chase 15
86 str:john foo 15
86 str:al more 14
In Scala I tried to duplicate the approach, but it's failing:
import scala.util.matching.Regex
val pattern = new Regex("((str:[^,]+),(\d+)\)")
val str = "(str:ann smith,14)(str:frank chase,15)"
println(pattern findAllIn(str)).mkString(","))
There are a few notes about the code:
There is an unmatched parenthesis for a group, but that one should be escaped
The backslashes should be double escaped
In the println you don't have to use all the parenthesis and the dot
findAllIn returns a MatchIterator, and looping those will expose a matched string. Joining those matched strings with a comma, will in this case give back the same string again.
For example
import scala.util.matching.Regex
val pattern = new Regex("\\((str:[^,]+),(\\d+)\\)")
val str = "(str:ann smith,14)(str:frank chase,15)"
println(pattern findAllIn str mkString ",")
Output
(str:ann smith,14),(str:frank chase,15)
But if you want to print out the group 1 and group 2 values, you can use findAllMatchIn that returns a collection of Regex Matches:
import scala.util.matching.Regex
val pattern = new Regex("\\((str:[^,]+),(\\d+)\\)")
val str = "(str:ann smith,14)(str:frank chase,15)"
pattern findAllMatchIn str foreach(m => {
println(m.group(1))
println(m.group(2))
}
)
Output
str:ann smith
14
str:frank chase
15
In Python, Series.str.extractall only returns captured substrings. In Scala, findAllIn returns the matched values if you do not query its matchData property that in its turn contains a subgroups property.
So, to get the captures only in Scala, you need to use
val pattern = """\((str:[^,()]+),(\d+)\)""".r
val str = "(str:ann smith,14)(str:frank chase,15)"
(pattern findAllIn str).matchData foreach {
m => println(m.subgroups.mkString(","))
}
Output:
str:ann smith,14
str:frank chase,15
See the Scala online demo.
Here, m.subgroups accesses all subgroups (captures) of each match (m).
Also, note you do not need to double backslashes in triple-quoted string literals. \((str:[^,()]+),(\d+)\) matches
\( - a ( char
(str:[^,()]+) - Group 1: str: and one or more chars other than ,, ( and )
, - a comma
(\d+) - Group 2: one or more digits
\) - a ) char.
If you just want to get all matches without captures, you can use
val pattern = """\((str:[^,]+),(\d+)\)""".r
println((pattern findAllIn str).matchData.mkString(","))
Output:
(str:ann smith,14),(str:frank chase,15)
See the online demo.

Trouble sorting a list after using regex

The code below is parsing data from this text sample:
rf-Parameters-v1020
supportedBandCombination-r10: 128 items
Item 0
BandCombinationParameters-r10: 1 item
Item 0
BandParameters-r10
bandEUTRA-r10: 2
bandParametersUL-r10: 1 item
Item 0
CA-MIMO-ParametersUL-r10
ca-BandwidthClassUL-r10: a (0)
bandParametersDL-r10: 1 item
Item 0
CA-MIMO-ParametersDL-r10
ca-BandwidthClassDL-r10: a (0)
supportedMIMO-CapabilityDL-r10: fourLayers (1)
I am having trouble replacing the first 'a' from the "ca-BandwidthClassUL-r10" line with 'u' and placing it before 'm' in the final output: [2 a(0) u m]
import re
regex = r"bandEUTRA-r10: *(\d+)(?:\r?\n(?!ca-BandwidthClassUL-r10:).*)*\r?\nca-BandwidthClassUL-r10*: *(\w.*)(" \
r"?:\r?\n(?!ca-BandwidthClassDL-r10:).*)*\r?\nca-BandwidthClassDL-r10*: *(" \
r"\w.*)\nsupportedMIMO-CapabilityDL-r10: *(.*) "
regex2 = r"^.*bandEUTRA-r10: *(\d+)(?:\r?\n(?!ca-BandwidthClassUL-r10:).*)*\r?\nca-BandwidthClassUL-r10*: *(\w.*)(?:\r?\n(?!ca-BandwidthClassDL-r10:).*)*\r?\nca-BandwidthClassDL-r10*: *(\w.*)\nsupportedMIMO-CapabilityDL-r10: *(.*)(?:\r?\n(?!bandEUTRA-r10:).*)*\r?\nbandEUTRA-r10: *(\d+)(?:\r?\n(?!ca-BandwidthClassDL-r10:).*)*\r?\nca-BandwidthClassDL-r10*: *(\w.*)\nsupportedMIMO-CapabilityDL-r10: *(.*)"
my_file = open("files.txt", "r")
content = my_file.read().replace("fourLayers", 'm').replace("twoLayers", " ")
#print(content)
#if 'BandCombinationParameters-r10: 1 item' in content:
result = ["".join(m) for m in re.findall(regex, content, re.MULTILINE)]
print(result)
You might use an optional part where you capture group 2.
Then you can print group 3 concatenated with u if there is group 2, else only print group 3.
As you are already matching the text in the regex, you don't have to do the separate replacement calls. You can use the text in the replacement itself.
bandEUTRA-r10: *(\d+)(?:\r?\n(?!ca-BandwidthClassUL-r10:).*)*(?:\r?\n(ca-BandwidthClassUL-r10)?: *(\w.*))(?:\r?\n(?!ca-BandwidthClassDL-r10:).*)*\r?\nca-BandwidthClassDL-r10*: *\w.*\nsupportedMIMO-CapabilityDL-r10:
Regex demo | Python demo
For example
import re
regex = r"bandEUTRA-r10: *(\d+)(?:\r?\n(?!ca-BandwidthClassUL-r10:).*)*(?:\r?\n(ca-BandwidthClassUL-r10)?: *(\w.*))(?:\r?\n(?!ca-BandwidthClassDL-r10:).*)*\r?\nca-BandwidthClassDL-r10*: *\w.*\nsupportedMIMO-CapabilityDL-r10:"
s = "here the example data with and without ca-BandwidthClassUL-r10"
matches = re.finditer(regex, s, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
result = "{0}{1} m".format(
match.group(1),
match.group(3) + " u" if match.group(2) else match.group(3)
)
print(result)
Output
2a (0) u m
2a (0) m

Regex capture optional group in any order

I would like to capture groups based on a consecutive occurrence of matched groups in any order. And when one set type is repeated without the alternative set type, the alternative set is returned as nil.
So the following:
"123 dog cat cow 456 678 890 sheep"
Would return the following:
[["123", "dog"], [nil, "cat"], ["456", "cow"], ["678", nil], ["890", sheep]]
A regular expression can get us part of the way, but I do not believe all the way.
r = /
(?: # begin non-capture group
\d+ # match 1+ digits
[ ] # match 1 space
[^ \d]+ # match 1+ chars other than digits and spaces
| # or
[^ \d]+ # match 1+ chars other than digits and spaces
[ ] # match 1 space
\d+ # match 1+ digits
| # or
[^ ]+ # match 1+ chars other than spaces
) # end non-capture group
/x # free-spacing regex definition mode
str = "123 dog cat cow 456 678 890 sheep"
str.scan(r).map do |s|
case s
when /\d [^ \d]/
s.split(' ')
when /[^ \d] \d/
s.split(' ').reverse
when /\d/
[s,nil]
else
[nil,s]
end
end
#=> [["123", "dog"], [nil, "cat"], ["456", "cow"],
# ["678", nil], ["890", "sheep"]]
Note:
str.scan r
#=> ["123 dog", "cat", "cow 456", "678", "890 sheep"]
This regular expression is conventionally written
/(?:\d+ [^ \d]+|[^ \d]+ \d+|[^ ]+)/
Here is another solution that only uses regular expressions incidentally.
def doit(str)
str.gsub(/[^ ]+/).with_object([]) do |s,a|
prev = a.empty? ? [0,'a'] : a.last
case s
when /\A\d+\z/ # all digits
if prev[0].nil?
a[-1][0] = s
else
a << [s,nil]
end
when /\A\D+\z/ # all non-digits
if prev[1].nil?
a[-1][1] = s
else
a << [nil,s]
end
else
raise ArgumentError
end
end
end
doit str
#=> [["123", "dog"], [nil, "cat"], ["456", "cow"], ["678", nil],
# ["890", "sheep"]]
This uses of the form of String#gsub that has no block and therefore returns an enumerator:
enum = str.gsub(/[^ ]+/)
#=> #<Enumerator: "123 dog cat cow 456 678 890 sheep":gsub(/[^ ]+/)>
enum.next
#=> "123"
enum.next
#=> "dog"
...
enum.next
#=> "sheep"
enum.next
#=> StopIteration (iteration reached an end)

Replace '-' with space if the next charcter is a letter not a digit and remove when it is at the start

I have a list of string i.e.
slist = ["-args", "-111111", "20-args", "20 - 20", "20-10", "args-deep"]
I want to remove the '-' from string where it is the first character and is followed by strings but not numbers or if before the '-' there is number/alphabet but after it is alphabets, then it should replace the '-' with space
So for the list slist I want the output as
["args", "-111111", "20 args", "20 - 20", "20-10", "args deep"]
I have tried
slist = ["-args", "-111111", "20-args", "20 - 20", "20-10", "args-deep"]
nlist = list()
for estr in slist:
nlist.append(re.sub("((^-[a-zA-Z])|([0-9]*-[a-zA-Z]))", "", estr))
print (nlist)
and i get the output
['rgs', '-111111', 'rgs', '20 - 20', '20-10', 'argseep']
You may use
nlist.append(re.sub(r"-(?=[a-zA-Z])", " ", estr).lstrip())
or
nlist.append(re.sub(r"-(?=[^\W\d_])", " ", estr).lstrip())
Result: ['args', '-111111', '20 args', '20 - 20', '20-10', 'args deep']
See the Python demo.
The -(?=[a-zA-Z]) pattern matches a hyphen before an ASCII letter (-(?=[^\W\d_]) matches a hyphen before any letter), and replaces the match with a space. Since - may be matched at the start of a string, the space may appear at that position, so .lstrip() is used to remove the space(s) there.
Here, we might just want to capture the first letter after a starting -, then replace it with that letter only, maybe with an i flag expression similar to:
^-([a-z])
DEMO
Test
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"^-([a-z])"
test_str = ("-args\n"
"-111111\n"
"20-args\n"
"20 - 20\n"
"20-10\n"
"args-deep")
subst = "\\1"
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE | re.IGNORECASE)
if result:
print (result)
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
Demo
const regex = /^-([a-z])/gmi;
const str = `-args
-111111
20-args
20 - 20
20-10
args-deep`;
const subst = `$1`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
RegEx
If this expression wasn't desired, it can be modified or changed in regex101.com.
RegEx Circuit
jex.im visualizes regular expressions:
One option could be to do 2 times a replacement. First match the hyphen at the start when there are only alphabets following:
^-(?=[a-zA-Z]+$)
Regex demo
In the replacement use an empty string.
Then capture 1 or more times an alphabet or digit in group 1, match - followed by capturing 1+ times an alphabet in group 2.
^([a-zA-Z0-9]+)-([a-zA-Z]+)$
Regex demo
In the replacement use r"\1 \2"
For example
import re
regex1 = r"^-(?=[a-zA-Z]+$)"
regex2 = r"^([a-zA-Z0-9]+)-([a-zA-Z]+)$"
slist = ["-args", "-111111", "20-args", "20 - 20", "20-10", "args-deep"]
slist = list(map(lambda s: re.sub(regex2, r"\1 \2", re.sub(regex1, "", s)), slist))
print(slist)
Result
['args', '-111111', '20 args', '20 - 20', '20-10', 'args deep']
Python demo