How to extract part of string using regex - regex

I am trying to extract a part of a string as date-timestamp.
Example string:
Upgrade starting on Mon Aug 9 06:46:00 UTC 2021 with ...
Extracted values should be:
Mon Aug 9 06:46:00 UTC 2021
I tried applying the following regex to extract the timestamp:
(\d{2}:\d{2}:\d{2})
How can I extract the day month and year as well.

Use regex to extract part of string from raw string, the following is the whole code
package main
import (
"fmt"
"regexp"
)
func main() {
// extract part of string using regex
str := "Upgrade starting on Mon Aug 9 06:46:00 UTC 2021 with ..."
// extract string "Mon Aug 9 06:46:00 UTC 2021" using regex
re := regexp.MustCompile(`(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{1,2} \d{2}:\d{2}:\d{2} (\S{3}) \d{4}`)
t := re.FindString(str)
fmt.Println(t)
}

Related

Get datetime using qmake

We can get datetime using qmake _DATE_ which outputs
Sat Mar 12 17:29:00 2022
Can we format this output?
By using QDateTime by passing the value of DATE (in fromString), you may be able to format the output the way you want, by using toString.
You can try things like this :
toString("ddd MMMM d hh:mm:ss yy"); //Sat Mar 12 17:29:00 2022

Regex - Slice Date - Aug 22, 2017 02:00 PM EDT

I'm trying to take a date, for example Aug 22, 2017 02:00 PM EDT
and get the month, day, year from it.
month = re.findall(r'', date)[0]
day = re.findall(r'', date)[0]
year = re.findall(r'', date)[0]
I've started with something like this:
(.*)(?<=[a-zA-Z]{3}\s)
for the month. Is there a better way to do this?
You need to first convert to datetime and then extract the needed values like this (reusing the example):
from datetime import datetime
datetime_object = datetime.strptime('Jun 1 2005 1:33PM', '%b %d %Y %I:%M%p')
print(datetime_object.year, datetime_object.month, datetime_object.day)
From what I can see you probably won't need to specify the format but pass the string directly to the datetime.strptime function.

Python Aggregate column C based on A & B

I have some log files that I am trying to analyze. Using a little regex I have gotten the following structure:
Month/Year, URL, Count
Sep 2016,/,100513
Sep 2016,/,68221
Oct 2016,/,536365
Oct 2016,/,362350
Oct 2016,/,89203
Nov 2016,/,526455
Nov 2016,/,351360
Nov 2016,/,88279
Dec 2016,/,538702
Dec 2016,/,156063
Dec 2016,/,89094
Jan 2017,/,535684
Jan 2017,/,105867
Jan 2017,/,87492
Feb 2017,/,483897
Feb 2017,/,80502
Feb 2017,/,47554
Mar 2017,/,434830
Mar 2017,/,72355
Mar 2017,/,43036
It's several 100k lines long so I can't use Excel or Google Sheets so I am trying to aggregate the Count by both Month and URL in python. What is a good method to do this?
You can do this using pandas. Your example is a csv file so the following would work.
import pandas as pd
df = pd.read_csv('x.csv', parse_dates=True)
print df.groupby(['Month/Year', 'URL']).sum()
If you need a solution without external dependencies (maybe a strict corporate environment):
months = {}
urls = {}
with open ('./parsed-data.txt', 'r') as f:
lines = f.readlines()
for line in lines:
# [Month, URL, Count]
data = line.split(',')
months[data[0]] = months.setdefault(data[0], 0) + int(data[2])
urls[data[1]] = urls.setdefault(data[1], 0) + int(data[2])
# Do whatever with months and urls here

All CSV values in column 0 are strings

For some reason a csv file I wrote (win7) with Python has all the values as a string in column 0 and cannot perform any operation.
It has no labels.
The format is (I would like to keep the last value - date - as a date format):
"Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0,"" date: Feb 04, 2016 """
EDIT - When I read it with the csv module it prints it out like:
['Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0," date: Feb 04, 2016\t\t\t"']
What is the best way to convert the strings into comma separated values like this?
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0, date:, Feb 04, 2016
Thanks a lot.
s="Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0,"" date: Feb 04, 2016 """
print(s)
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0, date: Feb 04, 2016
to add a comma after "date:" you need to add some logic (like replace ":" with ":,"; or after first word etc.
First, your date field is quoted, which is ok (and needed) because there is a comma inside:
" date: Feb 04, 2016 "
But then the whole line also gets quoted (and thus seen as a single field). And because there are already quotes around the date field, those get escaped with another quote:
"Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0,"" date: Feb 04, 2016 """
So, if you remove that last quoting, everything should be fine (but you might want to trim the date field):
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0," date: Feb 04, 2016 "
If you want it exactly like this, you need another comma after date: :
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0, date:,"Feb 04, 2016"
On the other hand, it would be better to use a header instead:
Name,Name2,Ave,Max,Min,analist disp,date
Rob,Avanti,12.83,4.0,-21.9,-1.0,"Feb 04, 2016"

Multiple line regex perl [duplicate]

This question already has answers here:
Extracting specific lines with Perl
(4 answers)
Closed 8 years ago.
I'm trying to parse out data from a log file spanning over multiple lines (shown below).
Archiver Started: Fri May 16 00:35:00 2014
Daily Archive for (Thu) May. 15, 2014 STATUS: Successful Fri May 16 00:37:43 2014
Daily Archive for (Thu) May. 15, 2014 STATUS: Successful Fri May 16 00:39:54 2014
Archiver Completed: Fri May 16 00:42:37 2014
I want to split on Archiver Started: on the first line and split on Archiver Completed: on the last line for anything in between these lines. So I would be left with the following:
Daily Archive for (Thu) May. 15, 2014 STATUS: Successful Fri May 16 00:37:43 2014
Daily Archive for (Thu) May. 15, 2014 STATUS: Successful Fri May 16 00:39:54 2014
As sometimes the there can be a single or multiple entry for one day, week or month.
Is this possible with a Regex?
Use a Range Operator ...
The return value of a flipflop is a sequence number (starting with 1), so you simply need to filter out 1 and the ending number which has the the string "E0" appended to it.
use strict;
use warnings;
while (<DATA>) {
if (my $range = /Archiver Started/ .. /Archiver Completed/ ) {
print if $range != 1 && $range !~ /E/;
}
}
__DATA__
stuff
more stuff
Archiver Started: Fri May 16 00:35:00 2014
Daily Archive for (Thu) May. 15, 2014 STATUS: Successful Fri May 16 00:37:43 2014
Daily Archive for (Thu) May. 15, 2014 STATUS: Successful Fri May 16 00:39:54 2014
Archiver Completed: Fri May 16 00:42:37 2014
other stuff
ending stuff
Outputs:
Daily Archive for (Thu) May. 15, 2014 STATUS: Successful Fri May 16 00:37:43 2014
Daily Archive for (Thu) May. 15, 2014 STATUS: Successful Fri May 16 00:39:54 2014
you can use next trick:
my #result = ();
my $catch;
LINE:
for my $line ( #lines ) {
if ( $line =~ m/^Archiver Started/i ) {
$catch = 1;
next LINE;
} elsif ( $line =~ m/^Archiver Completed/i ) {
$catch = 0;
next LINE;
}
next LINE unless $catch;
push #result, $line;
}