Grok pattern for [Mon Jan 04 08:36:12 2021] - regex

I am working on shipping some logs to elasticsearch using logstash. I am unable to figure out the grok pattern for [Mon Jan 04 08:36:12 2021] .The format is Day Month Date Time Year Help and Suggestions are most welcome.
Log - [Mon Jan 04 08:36:12 2021]
Grok I tried - \[%{DAY:day} %{MONTH:month} %{TIME:time} %{YEAR:year}]
Result Expected - Day:Mon Month:Jan Date:04 Hour:08 Minute:36 Second:12 Year:2021

You forgot to specify the %{MONTHDAY} in between the month and time variables.
You can use
\[%{DAY:day} %{MONTH:month} %{MONTHDAY} %{TIME:time} %{YEAR:year}]
Grok pattern list used:
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)
MONTH \b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|รค)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y|i)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
HOUR (?:2[0123]|[01]?[0-9])
MINUTE (?:[0-5][0-9])
SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)
YEAR (?>\d\d){1,2}

Related

What DAX will calculate amount for same month in last months report?

I am struggling to work out the correct DAX measure to sum amount for the same period (e.g. month) but from a previous report.
Full Year forecast reports are provided monthly with a Report Date of EOMONTH for the month provided. These contain Amounts for each month in the year (called Period End Date, also dated EOMONTH). These are imported to a "Forecasts" table.
The following sample table replicates data from 3 separate monthly forecast reports, but rows are limited only to Jan-Mar months for simplicity (note - I have created Amounts based on concatenated month numbers for Report Date & Period End Date, for ease of identifying correct amounts in expected results):
Report Date
Period End Date
Amount
31 Jan 22
31 Jan 22
11
31 Jan 22
28 Feb 22
12
31 Jan 22
31 Mar 22
13
28 Feb 22
31 Jan 22
21
28 Feb 22
28 Feb 22
22
28 Feb 22
31 Mar 22
23
31 Mar 22
31 Jan 22
31
31 Mar 22
28 Feb 22
32
31 Mar 22
31 Mar 22
33
There is a "Dates" table, relationship = Dates[Date] to Forecasts[Period End Date], 1 to many, single direction. The user selects a single Report Date via a slicer creating filter context, visuals categories use Forecasts[Period End Date] for row context.
I've tried various methods but so far am unsuccessful, I'm having a mind meld and am probably overthinking this and therefore not looking at best method. One DAX example is:
Prev Month Forecast =
VAR _SelRptDate = SELECTEDVALUE( 'Forecasts'[Report Date] )
VAR _PreRptDate = EOMONTH( _SelRptDate, -1)
VAR _Result =
CALCULATE(
SUM( 'Forecasts'[Amount] ),
FILTER( 'Forecasts', 'Forecasts'[Report Date] = _PreRptDate)
)
RETURN _Result
Ideally I'd like to use an appropriate time intelligence function if one works for this. Happy to create inactive relationships if needed (e.g. Dates[Date] to Forecasts[Period End Date]). Preferably a Measure, not calculated column.
The following examples demonstrate expected results based on Report Date selected by the user:
Report Date = 28 Feb 22
Period End Date
Amount
Amount in Previous Report Date for Same Period End Date
31 Jan 22
21
11
28 Feb 22
22
12
31 Mar 22
23
13
Report Date = 31 Mar 22
Period End Date
Amount
Amount in Previous Report Date for Same Period End Date
31 Jan 22
31
21
28 Feb 22
32
22
31 Mar 22
33
23
Really appreciate any help / direction.
Thanks

Python Aggregate column C based on A & B

I have some log files that I am trying to analyze. Using a little regex I have gotten the following structure:
Month/Year, URL, Count
Sep 2016,/,100513
Sep 2016,/,68221
Oct 2016,/,536365
Oct 2016,/,362350
Oct 2016,/,89203
Nov 2016,/,526455
Nov 2016,/,351360
Nov 2016,/,88279
Dec 2016,/,538702
Dec 2016,/,156063
Dec 2016,/,89094
Jan 2017,/,535684
Jan 2017,/,105867
Jan 2017,/,87492
Feb 2017,/,483897
Feb 2017,/,80502
Feb 2017,/,47554
Mar 2017,/,434830
Mar 2017,/,72355
Mar 2017,/,43036
It's several 100k lines long so I can't use Excel or Google Sheets so I am trying to aggregate the Count by both Month and URL in python. What is a good method to do this?
You can do this using pandas. Your example is a csv file so the following would work.
import pandas as pd
df = pd.read_csv('x.csv', parse_dates=True)
print df.groupby(['Month/Year', 'URL']).sum()
If you need a solution without external dependencies (maybe a strict corporate environment):
months = {}
urls = {}
with open ('./parsed-data.txt', 'r') as f:
lines = f.readlines()
for line in lines:
# [Month, URL, Count]
data = line.split(',')
months[data[0]] = months.setdefault(data[0], 0) + int(data[2])
urls[data[1]] = urls.setdefault(data[1], 0) + int(data[2])
# Do whatever with months and urls here

All CSV values in column 0 are strings

For some reason a csv file I wrote (win7) with Python has all the values as a string in column 0 and cannot perform any operation.
It has no labels.
The format is (I would like to keep the last value - date - as a date format):
"Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0,"" date: Feb 04, 2016 """
EDIT - When I read it with the csv module it prints it out like:
['Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0," date: Feb 04, 2016\t\t\t"']
What is the best way to convert the strings into comma separated values like this?
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0, date:, Feb 04, 2016
Thanks a lot.
s="Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0,"" date: Feb 04, 2016 """
print(s)
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0, date: Feb 04, 2016
to add a comma after "date:" you need to add some logic (like replace ":" with ":,"; or after first word etc.
First, your date field is quoted, which is ok (and needed) because there is a comma inside:
" date: Feb 04, 2016 "
But then the whole line also gets quoted (and thus seen as a single field). And because there are already quotes around the date field, those get escaped with another quote:
"Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0,"" date: Feb 04, 2016 """
So, if you remove that last quoting, everything should be fine (but you might want to trim the date field):
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0," date: Feb 04, 2016 "
If you want it exactly like this, you need another comma after date: :
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0, date:,"Feb 04, 2016"
On the other hand, it would be better to use a header instead:
Name,Name2,Ave,Max,Min,analist disp,date
Rob,Avanti,12.83,4.0,-21.9,-1.0,"Feb 04, 2016"

regexp to wrap a line with ${color} and $color

Is there a way to have this regex put ${color orange} at the beginning, and $color at the end of the line where the date is found?
DJS=`date +%_d`;
cat thisweek.txt | sed s/"\(^\|[^0-9]\)$DJS"'\b'/'\1${color orange}'"$DJS"'$color'/
With this expression I get this:
Saturday Aug 13 12pm - 9pm 4pm - 5pm
Sunday Aug 14 9:30am - 6pm 1pm - 2pm
Monday Aug 15 6:30pm - 11:30pm None
Tuesday Aug 16 6pm - 11pm None
Wednesday Aug 17 Not Currently Scheduled for This Day
Thursday Aug ${color orange}18$color Not Currently Scheduled for This Day
Friday Aug 19 7am - 3:30pm 10:30am - 11:30am
What I want to have is this:
Saturday Aug 13 12pm - 9pm 4pm - 5pm Sunday Aug 14 9:30am - 6pm 1pm - 2pm
Monday Aug 15 6:30pm - 11:30pm None
Tuesday Aug 16 6pm - 11pm None
Wednesday Aug 17 Not Currently Scheduled for This Day
${color orange}Thursday Aug 18 Not Currently Scheduled for This Day$color
Friday Aug 19 7am - 3:30pm 10:30am - 11:30am
Acually, it works for me. Depending on your version of sed, you might need to pass -r. Also, as tripleee says, don't use cat here
DJS=`date +%_d`
sed -r s/"\(^\|[^0-9]\)$DJS"'\b'/'\1${color orange}'"$DJS"'$color'/ thisweek.txt
EDIT: Ok, so with the new information I arrived at this:
sed -r "s/([^0-9]+19.+)/\${color orange}\1\$color/" thisweek.txt
This gives me the output
Saturday Aug 13 12pm - 9pm 4pm - 5pm
Sunday Aug 14 9:30am - 6pm 1pm - 2pm
Monday Aug 15 6:30pm - 11:30pm None
Tuesday Aug 16 6pm - 11pm None
Wednesday Aug 17 Not Currently Scheduled for This Day
Thursday Aug 18 Not Currently Scheduled for This Day
${color orange}Friday Aug 19 7am - 3:30pm 10:30am - 11:30am $color
(Note that it differs from your's since it's friday at least in my time zone)

javascript regular expression: how do I find date without year or date with year<2010

I need to find date without year, or date with year<2010.
basically,
Feb 15
Feb 20
Feb 20, 2009
Feb 20, 1995
should be accepted
Feb 20, 2010
Feb 20, 2011
should be rejected
How do I do it?
Thanks,
Cheng
Try this:
(Jan|Feb|Mar...Dec)\s\d{1,2},\s([1][0-9][0-9][0-9]|200[0-9])
Note: Expand the month list with proepr names. I was too lazy to spell it all out.