Regex extracking information - regex

Hope you are fine. I was trying to build a regex for extracting some strings in some documents. This is a sample of the strings I have to extract:
5003t00001fJK8EAAWFri Jan 27 2023 11:37:45 GMT-0600
0033t00004A596CAARFri Jan 27 2023 11:44:04 GMT-0600
The regex I wrote so far I have:
a) 500[\w._%+ :-]+(GMT)
b) (00).[0-9][\w._%+ :-]+(GMT)
However I am getting some observations with from 1) using the b) Regex that I build.
Any suggestion?

Related

How to isolate specific words and years in group 1 no matter the combination

I got some values coming in with various look such as
autumn ux-s 2021
2021 pes-3 autumn P-S
pes-3 autumn 2021 32
autumn usd- fosd 2021 2
I really want to isolate "autumn" and "2021" - both in group 1
Of course the "autumn" could also be "spring", "summer", "winter" while the year of course should match the year.
It doesnt matter if i get "2021 autumn" and "autumn 2021" as long as i can isolate it within the same group 1
How could i achieve this? I simply cant see how i can keep it within one single group ?
I can isolate the location here, but of course still match the whole thing
((?:(?:autumn|spring)(?:\s*[a-zA-Z]*\s*)\d{4})|(?:\d{4}(?:\s*[a-zA-Z]*\s*)(?:autumn|spring)))
Can i somehow substract only partials from here and combine them into a single group result?
I did not find a regex to capture what you want in 1 group but maybe this one-liner solution helps you?
import re
text = ["autumn ux-s 2021", "2021 pes-3 autumn P-S", "pes-3 autumn 2021 32" ,"autumn usd- fosd 2021 2"]
pattern = r"(autumn|summer|winter|spring).*(\d{4})|(\d{4}).*(autumn|summer|winter|spring)"
print([' '.join(filter(None, re.search(pattern, txt, re.IGNORECASE).groups())) for txt in text])
output: ['autumn 2021', '2021 autumn', 'autumn 2021', 'autumn 2021']

regex to find the data

EDIT - I added all the last 50 texts, I saw that were sent from various people, unfortunately, it's not an automatic email...
list of all the text is HERE
I'm struggling to find a matched pattern that will identify the needed items (date, start time, time zone) from this text:
1 April 20 16:00-16:30 Israel Time
Tomorrow, Wed Feb 12, 08:00-9:00 AM IST(IL)
Tomorrow, Wed Jan 22, 09:30-10:00 PM PST
11-May-20 19:00-20:30 Israel Time
The start time is an easy one: (\d+:\d+)- but I'm not sure what to be done with the other words and digits.
Based on the data you provided, something like this would do it, with 3 captures as requested:
(\d+[-\s]\w+[-\s]\d+|\w+ \d+),?\s(\d+\:\d+)\-\d+\:\d+\s(?:AM\s|PM\s)?(.*)
Online reference

Substituting with regex in python does not work

I have a long string containing different dummy dates and want to replace a part of a mail multiple times with different date values in my dummy_date string. The below code return 1 result substituted with the last date of dummy_date. How can I do that for all date values in the dummy_date? I cannot iterate over string. Also, when I write my string to .txt file, just last item is written.
dummy_dates= "14 B-Laycan Dec I-Laycan 2012 I-Laycan
15 B-Laycan Jan I-Laycan 2013 I-Laycan
16 B-Laycan Feb I-Laycan 2014 I-Laycan
17 B-Laycan Mar I-Laycan 2014 I-Laycan"
dummy1= re.sub(r"((\d{1,2})([\w\W]{1,10})( B-Laycan)+(([\w\W]{1,10}( I-Laycan)+){0,6}))", dummy_dates ,mail)

All CSV values in column 0 are strings

For some reason a csv file I wrote (win7) with Python has all the values as a string in column 0 and cannot perform any operation.
It has no labels.
The format is (I would like to keep the last value - date - as a date format):
"Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0,"" date: Feb 04, 2016 """
EDIT - When I read it with the csv module it prints it out like:
['Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0," date: Feb 04, 2016\t\t\t"']
What is the best way to convert the strings into comma separated values like this?
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0, date:, Feb 04, 2016
Thanks a lot.
s="Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0,"" date: Feb 04, 2016 """
print(s)
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0, date: Feb 04, 2016
to add a comma after "date:" you need to add some logic (like replace ":" with ":,"; or after first word etc.
First, your date field is quoted, which is ok (and needed) because there is a comma inside:
" date: Feb 04, 2016 "
But then the whole line also gets quoted (and thus seen as a single field). And because there are already quotes around the date field, those get escaped with another quote:
"Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0,"" date: Feb 04, 2016 """
So, if you remove that last quoting, everything should be fine (but you might want to trim the date field):
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0," date: Feb 04, 2016 "
If you want it exactly like this, you need another comma after date: :
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0, date:,"Feb 04, 2016"
On the other hand, it would be better to use a header instead:
Name,Name2,Ave,Max,Min,analist disp,date
Rob,Avanti,12.83,4.0,-21.9,-1.0,"Feb 04, 2016"

javascript regular expression: how do I find date without year or date with year<2010

I need to find date without year, or date with year<2010.
basically,
Feb 15
Feb 20
Feb 20, 2009
Feb 20, 1995
should be accepted
Feb 20, 2010
Feb 20, 2011
should be rejected
How do I do it?
Thanks,
Cheng
Try this:
(Jan|Feb|Mar...Dec)\s\d{1,2},\s([1][0-9][0-9][0-9]|200[0-9])
Note: Expand the month list with proepr names. I was too lazy to spell it all out.