Regex to match time

Regex to match time - regex

I want my users to be able to enter a time form.
If more info necessary, users use this to express how much time is needed to complete a task, and it will be saved in a database if filled.
here is what I have:
/^$|^([0-1]?[0-9]|2[0-4]):([0-5][0-9])(:[0-5][0-9])?$/
It matches an empty form or 01:30 and 01:30:00 formatted times. I really won't need the seconds as every task takes a minute at least, but I tried removing it and it just crashed my code and removed support for empty string.. I really don't understand regex at all.
What I'd like, is for it to also match simple minutes and simple hours, like for instance 3:30, 3:00, 5. Is this possible? It would greatly improve the user experience and limit waste typing. But I'd like to keep the zero optional in case some users find it natural to type it.

I think the following pattern does what you want:
p="((([01]?\d)|(2[0-4])):)?([0-5]\d)?(:[0-5]\d)?"
The first part:
(([01]?\d)|(2[0-3])):)?
is an optional group which deals with hours in format 00-24.
The second part:
([0-5]\d)?
is an optional group which deals with minutes if hours or seconds are present in your expression. The group also deals with expressions containing only minutes or only hours.
The third part:
(:[0-5]\d)?
is an optional group dealing with seconds.
The following samples show the pattern at work:
In [180]: re.match(p,'14:25:30').string
Out[180]: '14:25:30'
In [182]: re.match(p,'2:34:05').string
Out[182]: '2:34:05'
In [184]: re.match(p,'02:34').string
Out[184]: '02:34'
In [186]: re.match(p,'59:59').string
Out[186]: '59:59'
In [188]: re.match(p,'59').string
Out[188]: '59'
In [189]: re.match(p,'').string
Out[189]: ''
As every group is optional the pattern matches also the empty string. I've tested it with Python but I think it will work with other languages too with minimal changes.

Related

POSIX ERE Regex - Creating Efficient Regex

I'm working to create some regex entries that are well-formed, and efficient. I'll place an emphasis on efficient, as these regex entries can see thousands of logs per second. Inefficient regex entries can cause severe performance impacts.
Question: Does regex101 (through one flavor) support POSIX ERE Regex? Googling shows that PCRE2 should support BRE+ERE and more.
Regex Type: POSIX ERE
Syslog App: rsyslog (EL7)
Sample Payload (Well formed - Sensitive Information Stripped):
Jul 10 00:00:00 Firewall-Name-Removed CEF:0|Fortinet|FortiGate-removed|1.2.3,build1111 (GA)|0000000013|forward traffic accept|5|start=Jul 10 2022 00:00:00 logver=604091966 deviceExternalId=FG9A9A9A9999999 dvchost=Firewall-Name-Removed ad.vd=root ad.eventtime=1111111111111111111 ad.tz=-9999 ad.logid=0000000013 cat=traffic ad.subtype=forward deviceSeverity=notice src=1.1.1.1 shost=RandomHost1 spt=62119 deviceInboundInterface=DII-Out ad.srcintfrole=lan ad.srcssid=SSID Has Been Removed ad.apsn=ABC123D ad.ap=CHL-07 ad.channel=157 ad.radioband=802.11ac n-only ad.signal=-40 ad.snr=55 dst=2.2.2.2 dpt=53 deviceOutboundInterface=DOI-Out ad.dstintfrole=undefined ad.srccountry=Reserved ad.dstcountry=CountryRemoved externalID=123456789 proto=00 act=accept ad.policyid=000 ad.policytype=policy ad.poluuid=UUID-Removed ad.policyname=policy_name_removed app=DNS ad.trandisp=noop ad.appid=16195 ad.app=DNS ad.appcat=Network.Service ad.apprisk=elevated ad.applist=UTM Name - Removed ad.duration=180 out=0 in=205 ad.sentpkt=0 ad.rcvdpkt=1 ad.utmaction=allow ad.countdns=1 ad.osname=Windows ad.srcswversion=10 ad.mastersrcmac=MAC removed ad.srcmac=MAC removed ad.srcserver=0 tz="-9999"
What I'm attempting to do is remove specific logs that are not required. Normally I'd do this at a SIEM level through something like routing rules (where I can utilize fields), but this isn't possible for the foreseeable future. In this particular case: I'm trying to exclude on the following pieces of information.
Source IP: Is in a specific range
deviceOutboundInterface: is DOI-Out
Current Regex: "\bsrc=1.1.1[4-5]{0,1}.[0-9]{0,3}\b.*?\bdeviceOutboundInterface=DOI-Out\b" (Regex101 link in PCRE2). If that is matched, the log is rejected (through the stop call). Otherwise, it moves onto the other entries to check for unnecessary logs.
Most of my regex entries are in the low double-digits because they're a lot simpler. Is there a better way to make the more complex regex more efficient?
Thank you for any insight you can offer.

You might be able to cut some time with:
src=1\.1\.1[4-5]{0,1}\.[0-9]{0,3}.*?deviceOutboundInterface=DOI-Out
changes:
remove word boundaries
change the . to . in IP address
regex101 has the original efficiency at 383 steps, new is 301 so a potential savings of ~21%. Not terrible but you'll want to make sure any removals were OK.
to be honest, what you have looks pretty good to me.

This RE reduces the number of steps on Reg101 from 383 to 270 (~ -29.5%):
src=1\.1\.1[45]?\.\d{0,3}.*?O[boundIter]*?face=DOI-Out
The original RE already is quite simple, only matching one pattern and one literal string which makes it difficult to optimize. But we can do if we know (from the documentation of the text in question, here the Log Message manual) that an even simpler pattern will not lead to ambiguities.
Changes:
matching literal text whereever possible
replacing range '4-5' with simple elements
instead of matching the long 'deviceOutboundInterface=', use a pattern which will just barely match this string but would possibly match other words if they ever occurred in log messages - but we know they don't.

Complex Regex is not validating all repetitions of one specific rule

I need help to validate a field using regex. It will run in Postgres 9.5.
The rules are
The string must contain all seven services: Oil, Wiper blades, Air filter, Tires, Battery, Brake, Antifreeze
All services must have the operation hours, and the accepted values are HH[:MM]{am|pm}-HH[:MM]{am|pm}, or the literals ”working hours”, ”after hours”, ”not available” (this is the rule that I couldn't find the solution)
It is case insensitive, and the spaces should be irrelevant.
The services as separated by a pipe, and the service and working hours are separated by a colon
I did the regex:
^(?=.*(Oil))(?=.*(Wiper blades))(?=.*(Air filter))(?=.*(Tires))(?=.*(Battery))(?=.*(Brake))(?=.*(Antifreeze))(?=.*(\s{0,}(1{0,1}[0-2]|[1-9])(:[0-5][0-9]){0,1}\s{0,}([ap]m)\s{0,}-\s{0,}(1{0,1}[0-2]|[1-9])(:[0-5][0-9]){0,1}\s{0,}([ap]m)|working hours|after hours|not availabl)).+
This part of the regex is validating only one sequence, not all seven sequences.
(?=.*(\s{0,}(1{0,1}[0-2]|[1-9])(:[0-5][0-9]){0,1}\s{0,}([ap]m)\s{0,}-\s{0,}(1{0,1}[0-2]|[1-9])(:[0-5][0-9]){0,1}\s{0,}([ap]m)|working hours|after hours|not availabl))
Example of good string
Oil:8AM-10PM|Wiper blades:8 AM -10 PM|Air filter:8AM-10pm|Tires:8AM-10PM|Battery:8AM-10PM|Brake:8AM-9PM|Antifreeze:not available
Example of bad strings
Oil:8AM-10PM|Wiper blades:8AM-10PM|Air filter:8AM-10PM|Tires:8AM-10PM|Battery:8AM-10PM|Brake:8AM-9PM|Antifreeze:fsdfdsfs
Oil:8AM-10PM|Wiper blades:8AM-10PM|Air filter:8AM|Tires:8AM-10PM|Battery:8AM-10PM|Brake:8AM-9PM|Antifreeze:
Oil:8AM-10PM|Wiper blades:8AM-10PM|Air filter:8AM-10PM|Tires:8AM-10PM|Battery:|Brake:|Antifreeze:8AM-9PM
Oil:8AM-10PM|Wiper blades:8AM-10PM
Do someone have any idea what is missing to validate the seven occurrences?

I've made another regex that works :
^(((oil|Air\ filter|Wiper\ blades|Tires|Battery|Brake|Antifreeze):((((\d{1,2})((A|P)M)(-?)){2})|(not available))(\|?)){7})$
How ever, this regex does not take counts of repetition. Which mean, you could have Oil two time it will still works.
I've create a regex101 if you wish to tests more cases.

Excluding % from a Regex number search

I'm attempting to create a Regex that finds only 2-digit integers or numbers with a precision of 2 decimal points.
In the example string at the bottom, I want to find only the following:
21 and 10.50
Using this expression, 100% is getting captured, in addition to the strings I desire to capture:
(\d){1,2}(\.?)([0-9]?[0-9]?){1,2}
I know I need to use ^% somewhere, but I can't figure out where it goes. Any suggestions are greatly appreciated.
Here's my sample string:
Earn Up to $21 Per Hour - Deliver Food with !!
Delivery Drivers work when they want and make great money when they do.
All orders are prepaid, just pick them up and deliver them to hungry diners. No waiting in line or fumbling with receipts and prepaid cards.
It's fast and easy to start working. Get started today.
Apply Now
Why choose ?
More orders than any other takeout platform
100% of our restaurants are official partners
Competitive pay: Per order fee + mileage + tips
We guarantee an hourly minimum of $10.50/hour*
Create your own schedule & work the hours you want

Word boundaries in your regular expression will grant you a bit more control.
Since word boundaries are a bit strict, we need to introduce an OR condition to address both cases which will satisfy your regex.
(\b[\d]{2}\.[\d]{2}\b)|(\b[\d]{2}\b)
Edit: Try this one,
\b[\d]{2}\b(\.[\d]{2})?
The first example has a chance to fail as it is order dependent due to the way it short-circuits. This I believe should address multiple cases properly.

I think this should work:
(?<!\d)((\d+\.\d\d)|(\d\d))(?!%|\d)
Demo (and explanation)
EDIT:
Improved version:
(?<!\d)(\d{1,2}(?:\.\d{1,2})?)(?!%|\d)
Demo (and explanation)

You can try this variant: (\d{1,}|[\d.])\b(?!%)
It uses negative lookahead (?!%) to exclude digits following by % sign.
Details at regex101

Regular expression to expand to sentence

I'm trying to extract regions around keywords from longer passages of text. They should include complete sentences, based on the following conditions:
n=250 Charactars before / after keyword should be included if existing (the keyword can be closer then this to the start / end of the text)
from there it should expand further to include the complete sentence (let's assume here we can define sentence borders with ".?! or :" knowing it's not completely accurate)
I already achieved the expanding to the end of the last sentence, but not to start of the first in the following example, where vitamin is the keyword and the italic is captured by the regex. However, it should capture from "An extra 24 hours..."
Apparently I don't get the corresponding group up front, neither using lazy nor using lookbehind.
((.{0,250}(vitamin)\b.{0,250})(.+?(\.|\!|\?|\:))?)/ig
Well, this year you’re getting an extra day to get ahead on your taxes or (finally) clean out the garage. (Hey, we’re not trying to tell you what do but you might as well be productive.) February 29 is back on the calendar this year because it’s a leap year. Whether you love or loathe the extra winter day, you’re probably wondering why it happens in the first place. An extra 24 hours — or day — is built into the calen dar every four years to ensure it aligns with the Earth’s movement around the sun. There’s 365 days in a calendar year, but it actually takes longer for the Earth’s annual journey — about 365.2421 days — around the star that gives us light, life and vitamin D. The difference may seem like no big deal to us, but over time, it adds up. “To ensure consistency with the true astronomical year, it is necessary to periodically add in an extra day to make up the lost time and get the calendar back in sync with the heavens,” according the history. com.
Acknowledgement of the need for a leap year happened around the time of Julius Caesar. In 46 B.C., Caesar enlisted the help of astronomer Sosigenes to update the calendar so that it had 12 months and 365 days, including a leap year every four years.,

You can try something like this:
(([.?!:][^.?!:]*.{250}\bvitamin\b.{250})[^.?!:]*[.?!:])
It works by consuming 250 characters of text before and after the keyword "vitamin". From that point it finds the first punctuation point (.?!:) before/after the 250 characters of text.
Here's a sample of it in action.
You can you use extra parentheses () to strategically group what exact output you want. For example, the above answer includes the ending period from the preceding sentence in the output. So you could use
(([.?!:]([^.?!:]*.{250}\bvitamin\b.{250})[^.?!:]*[.?!:]))
and use group 3 from the result set which doesn't have this ending period.

I do not see how the specification in the question can be matched by a regex. It boils down to the following logic problem:
to match as many characters as possible but no more than 250 before/after the keyword, .{0,250} needs to be greedy and can neither be lazy .{0,250}? nor possessive .{0,250}+
if this part is greedy, you will miss the occurrences of the keyword that start before the .{0,250} part is matched.
The same logic applies to my understanding to the 'match back to the start of the sentnence as well.
I played around with the following more or less meaningful regex:
[.?!:]?([^.?!:]*?(.{0,250}\byear\b.{0,250})[^.?!:]*[.?!:]?) misses first 'year'
[.?!:]?([^.?!:]*?(.{0,250}?\byear\b.{0,250})[^.?!:]*[.?!:]?) gets the first 'year' but fails on others.
I suggest you write your on extraction logic in a function, eihter using regex or not, to achieve the extraction you want.
You could for example find the index of the start of the keyword \bkeyword\b and the full stops (\.[^\d]|[.?!:]$) and then with this information extract the part of the text you want.

Using RegEx to Find a Block of Text

I'm attempting to block a long string of unnecessary text that's on every page of a document.
Ex: "36075 This is another page and this is the date March 4 2013"
I know this must be very simple, but I'm hoping there is a way to block text verbatim. Is the only way to block this text by using a lot of /d/s/w+/+ etc or is there is a way to say, "match 36075 This is another page and this is the date March 4 2013".
This would be SO HELPFUL to know. Thank you for helping!

From what you wrote I assume you need to get leading numbers from string, to do it you just need to use this pattern: ^\d+ which from this input:
36075 This is another page and this is the date March 4 2013
will return this:
36075
For future, in case of such questions please provide example string and expected output. As well as what you have tried.

I realized the issue I was having. I didn't need to use RegEx. The program I was using has the functionality to match specific words or groups of words and pronounce them differently. What I discovered is that it will not match the words unless the word groups are input exactly the way the program typically reads them.
Ergo --> The channel saw
the end of the British hold over
Would have to be listed as one group for, "The channel saw" and a second group for "the end of the British hold over"
In addition, there were some numbers --> 11960_30_o_ho_
and if the program naturally read 119 and then 60_3 and then _o_ho_ then three strings would need to be input for each section.
A few frustrating hours later, problem solved :) Thank you for your assistance.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js