regex to find the data - regex

EDIT - I added all the last 50 texts, I saw that were sent from various people, unfortunately, it's not an automatic email...
list of all the text is HERE
I'm struggling to find a matched pattern that will identify the needed items (date, start time, time zone) from this text:
1 April 20 16:00-16:30 Israel Time
Tomorrow, Wed Feb 12, 08:00-9:00 AM IST(IL)
Tomorrow, Wed Jan 22, 09:30-10:00 PM PST
11-May-20 19:00-20:30 Israel Time
The start time is an easy one: (\d+:\d+)- but I'm not sure what to be done with the other words and digits.

Based on the data you provided, something like this would do it, with 3 captures as requested:
(\d+[-\s]\w+[-\s]\d+|\w+ \d+),?\s(\d+\:\d+)\-\d+\:\d+\s(?:AM\s|PM\s)?(.*)
Online reference

Related

Regex extracking information

Hope you are fine. I was trying to build a regex for extracting some strings in some documents. This is a sample of the strings I have to extract:
5003t00001fJK8EAAWFri Jan 27 2023 11:37:45 GMT-0600
0033t00004A596CAARFri Jan 27 2023 11:44:04 GMT-0600
The regex I wrote so far I have:
a) 500[\w._%+ :-]+(GMT)
b) (00).[0-9][\w._%+ :-]+(GMT)
However I am getting some observations with from 1) using the b) Regex that I build.
Any suggestion?

How to isolate specific words and years in group 1 no matter the combination

I got some values coming in with various look such as
autumn ux-s 2021
2021 pes-3 autumn P-S
pes-3 autumn 2021 32
autumn usd- fosd 2021 2
I really want to isolate "autumn" and "2021" - both in group 1
Of course the "autumn" could also be "spring", "summer", "winter" while the year of course should match the year.
It doesnt matter if i get "2021 autumn" and "autumn 2021" as long as i can isolate it within the same group 1
How could i achieve this? I simply cant see how i can keep it within one single group ?
I can isolate the location here, but of course still match the whole thing
((?:(?:autumn|spring)(?:\s*[a-zA-Z]*\s*)\d{4})|(?:\d{4}(?:\s*[a-zA-Z]*\s*)(?:autumn|spring)))
Can i somehow substract only partials from here and combine them into a single group result?
I did not find a regex to capture what you want in 1 group but maybe this one-liner solution helps you?
import re
text = ["autumn ux-s 2021", "2021 pes-3 autumn P-S", "pes-3 autumn 2021 32" ,"autumn usd- fosd 2021 2"]
pattern = r"(autumn|summer|winter|spring).*(\d{4})|(\d{4}).*(autumn|summer|winter|spring)"
print([' '.join(filter(None, re.search(pattern, txt, re.IGNORECASE).groups())) for txt in text])
output: ['autumn 2021', '2021 autumn', 'autumn 2021', 'autumn 2021']

Django ORM group by, and find latest item of each group (window functions)

Say we have a model as below
class Cake(models.Model):
baked_on = models.DateTimeField(auto_now_add=True)
cake_name = models.CharField(max_length=20)
Now, there are multiple Cakes baked on the same day, and I need a query that will return me a monthly cake report which consists of each day of the month, and the names of the first and last cakes baked on that day.
For example, if the data is something like this:
baked_on cake_name
11 Jan 12:30 Vanilla
11 Jan 14:30 Strawberry
11 Jan 20:45 Avocado
12 Jan 09:05 Raspberry
12 Jan 16:30 Sprinkles
12 Jan 20:11 Chocolate
My query's output should look like
date first last
11 Jan Vanilla Avocado
12 Jan Raspberry Chocolate
How should I go about doing this in a single ORM call?
Django 2.0 introduced window functions that are made for that kind of queries. Simple answer for your question will be:
Cake.objects.annotate(
first_cake=Window(
expression=FirstValue('cake_name'),
partition_by=[TruncDate('baked_on')],
order_by=F('baked_on').asc(),
),
last_cake=Window(
expression=FirstValue('cake_name'),
partition_by=[TruncDate('baked_on')],
order_by=F('baked_on').desc(),
),
day=TruncDate('baked_on'),
).distinct().values_list('day', 'first_cake', 'last_cake')
Why FirstValue in last_cake? That's becaues window query by default will traverse through each row and won't look ahead, so for every row, last row will be equal to current row. Using last_row together with descending sorting will fix that. Either that or you can define frame for which window query should work:
Cake.objects.annotate(
first_cake=Window(
expression=FirstValue('cake_name'),
partition_by=[TruncDate('baked_on')],
order_by=F('baked_on').asc(),
),
last_cake=Window(
expression=LastValue('cake_name'),
partition_by=[TruncDate('baked_on')],
order_by=F('baked_on').asc(),
frame=ValueRange(),
),
day=TruncDate('baked_on'),
).distinct().values_list('day', 'first_cake', 'last_cake')

All CSV values in column 0 are strings

For some reason a csv file I wrote (win7) with Python has all the values as a string in column 0 and cannot perform any operation.
It has no labels.
The format is (I would like to keep the last value - date - as a date format):
"Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0,"" date: Feb 04, 2016 """
EDIT - When I read it with the csv module it prints it out like:
['Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0," date: Feb 04, 2016\t\t\t"']
What is the best way to convert the strings into comma separated values like this?
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0, date:, Feb 04, 2016
Thanks a lot.
s="Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0,"" date: Feb 04, 2016 """
print(s)
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0, date: Feb 04, 2016
to add a comma after "date:" you need to add some logic (like replace ":" with ":,"; or after first word etc.
First, your date field is quoted, which is ok (and needed) because there is a comma inside:
" date: Feb 04, 2016 "
But then the whole line also gets quoted (and thus seen as a single field). And because there are already quotes around the date field, those get escaped with another quote:
"Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0,"" date: Feb 04, 2016 """
So, if you remove that last quoting, everything should be fine (but you might want to trim the date field):
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0," date: Feb 04, 2016 "
If you want it exactly like this, you need another comma after date: :
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0, date:,"Feb 04, 2016"
On the other hand, it would be better to use a header instead:
Name,Name2,Ave,Max,Min,analist disp,date
Rob,Avanti,12.83,4.0,-21.9,-1.0,"Feb 04, 2016"

Coldfusion crontime incorrectly running on weekend

I have a scheduled task that needs to run three times a day, on each weekday. The setup surrounding the task is Coldfusion, and it is in the Crontime format. It should run at 11:30, 15:45 and 18:30 server time.
For some reason the task is occasionally running on weekends, which it should not do.
Here are the three strings for each of the days:
0 30 11 ? * 1-5
0 45 15 ? * 1-5
0 30 18 ? * 1-5
Can anyone point out to me why the task is sometimes running on weekends? Is there a mistake in my string?
The Coldfusion crontime documentation can be found here:
According to This, 1 = Sunday.
Days-of-Week can be specified as values between 1 and 7 (1 = Sunday) or by using the strings SUN, MON, TUE, WED, THU, FRI and SAT.
Try replacing 1-5 with MON-FRI?
An example of a complete cron-expression is the string "0 0 12 ? * WED" - which means "every Wednesday at 12:00:00 pm".
Individual sub-expressions can contain ranges and/or lists. For example, the day of week field in the previous (which reads "WED") example could be replaced with "MON-FRI", "MON,WED,FRI", or even "MON-WED,SAT".