How to grep lines with date formats? - regex

I have a log file that is created from a bash script that uses $(date), so there are dates in such a format:
Fri Apr 24 22:10:39 CEST 2015
The log file looks like this:
Using SCRIPTS_ROOTDIR: /home/gillin/moses/scripts
Using multi-thread GIZA
using gzip
(1) preparing corpus # Fri Apr 24 22:10:39 CEST 2015
Executing: mkdir -p /media/2tb/ccexp/phrase-clustercat-mgiza/work.en-ru/training/corpus
(1.0) selecting factors # Fri Apr 24 22:10:39 CEST 2015
Forking...
(1.2) creating vcb file /media/2tb/ccexp/phrase-clustercat-mgiza/work.en-ru/training/corpus/en.vcb # Fri Apr 24 22:10:39 CEST 2015
(1.1) running mkcls # Fri Apr 24 22:10:39 CEST 2015
/home/gillin/moses/training-tools/mkcls -c50 -n2 -p/media/2tb/ccexp/corpus.exp/train-clean.en -V/media/2tb/ccexp/phrase-clustercat-mgiza/work.en-ru/training/corpus/en.vcb.classes opt
Executing: /home/gillin/moses/training-tools/mkcls -c50 -n2 -p/media/2tb/ccexp/corpus.exp/train-clean.en -V/media/2tb/ccexp/phrase-clustercat-mgiza/work.en-ru/training/corpus/en.vcb.classes opt
(1.1) running mkcls # Fri Apr 24 22:10:39 CEST 2015
/home/gillin/moses/training-tools/mkcls -c50 -n2 -p/media/2tb/ccexp/corpus.exp/train-clean.ru -V/media/2tb/ccexp/phrase-clustercat-mgiza/work.en-ru/training/corpus/ru.vcb.classes opt
Executing: /home/gillin/moses/training-tools/mkcls -c50 -n2 -p/media/2tb/ccexp/corpus.exp/train-clean.ru -V/media/2tb/ccexp/phrase-clustercat-mgiza/work.en-ru/training/corpus/ru.vcb.classes opt
Is there a way such that i can grep all the lines that contain the output of $(date)?
Currently I'm using this regex:
[a-z].*[1-9] [0-2][1-9]:[0-6][0-9]:[0-6][0-9] CEST 2015
And it catches line like
preparing corpus # Fri Apr 24 22:10:39 CEST 2015
But i need the full line:
(1) preparing corpus # Fri Apr 24 22:10:39 CEST 2015
And also the year and time is sort of hard coded. Is there a better regex or unix tool that can extract lines with $(date) outputs?

Try this:
unalias grep
grep --color=never '.*[a-z].*[1-9] [0-2][1-9]:[0-6][0-9]:[0-6][0-9] CEST 2015' file

Related

AWS cron expression to run every other Monday

I want to schedule a CloudWatch event to run every other Monday and have started with this command:
0 14 ? * 2 *
Currently with the above command, I get a weekly schedule of Monday executions:
Mon, 27 Jul 2020 14:00:00 GMT
Mon, 03 Aug 2020 14:00:00 GMT
Mon, 10 Aug 2020 14:00:00 GMT
Mon, 17 Aug 2020 14:00:00 GMT
Mon, 24 Aug 2020 14:00:00 GMT
Mon, 31 Aug 2020 14:00:00 GMT
Mon, 07 Sep 2020 14:00:00 GMT
Mon, 14 Sep 2020 14:00:00 GMT
Mon, 21 Sep 2020 14:00:00 GMT
Mon, 28 Sep 2020 14:00:00 GMT
However, I would like the schedule to be set to every other Monday, e.g.
Mon, 27 Jul 2020 14:00:00 GMT
Mon, 10 Aug 2020 14:00:00 GMT
Mon, 24 Aug 2020 14:00:00 GMT
Mon, 07 Sep 2020 14:00:00 GMT
Mon, 21 Sep 2020 14:00:00 GMT
I have seen examples with exp and # being used, but I don't think AWS CloudWatch events accept these sort of parameters.
Chris' answer is correct. Currently, there is no way that I could think of to express this as part of CloudWatch Scheduled Events.
However, a workaround could be to set it to every Monday (0 14 ? * 2 *) and trigger a Lambda function that checks whether it's in the on-week or the off-week before triggering the actual target.
Even though this adds some complexity, it would be a viable solution.
You won't be able to do any of the fancier commands (especially those using variables from the command line).
You could do this very basically but would require 2 separate events in order to carry it out:
0 14 ? * 2#1 * - Run on the first Monday of the month.
0 14 ? * 2#3 * - Run on the third Monday of the month.
Unfortunately there is no compatible syntax for scheduled expressions that would allow the concept of every other week, so the above commands occasionally could lead to a 3 week gap.
If you don't care about the Monday you could of course use 0 14 1,15 * * to run on the 1st and 15th of each month (roughly every 2 weeks).
The final option would be to run every Monday, but have the script exit if it is not the every other week, the expression would then just be 0 14 ? * 2 *.
More information about the syntax is available on the Cron Expressions section of the Scheduled Events page.

RegEx: Match Within Bounded Groups

I need to match carriage-returns in blocks of text between a pre-determined tag and an indeterminate tag.
In this case, the bounding tags are:
Pre-determined: X-Gmail-Labels:
Indeterminate: (?:^[\w\-]+:) eg: Delivered-To: or ABC123:
Thanks to Wiktor Stribiżew for his answer to this thread, I have a rough idea of the solution I should pursue.
I am unsure of how to apply what I believe is needed: an uncaptured lookahead group for the bounding indeterminate tag.
Plainly stated, I'd like to delete all the carriage-returns in the text associated with the X-Gmail-Labels:. If I can match them, I can delete them!
Initial attempted regex:
(?:\bX-Gmail-Labels:|(?!^)\G)[^\r]*\K\r
Sample data:
From 1604610346950104244#xxx Fri Jun 29 12:34:35 +0000 2018
X-GM-THRID: 1604610346950104244
X-Gmail-Labels: Archived thing,Unread
Delivered-To: joe.schmoe#gmail.com
Received: by 2002:a9f:3005:0:0:0:0:0 with SMTP id h5-v6csp731836uab;
Fri, 29 Jun 2018 05:34:36 -0700 (PDT)
From 1604610346950104244#xxx Fri Jun 29 12:34:35 +0000 2018
X-GM-THRID: 1604610346950104244
X-Gmail-Labels: Also Archived
Day-of-week: Tuesday
Received: by 2002:a9f:3005:0:0:0:0:0 with SMTP id h5-v6csp731836uab;
Fri, 29 Jun 2018 05:34:36 -0700 (PDT)
From 1604610346950104244#xxx Fri Jun 29 12:34:35 +0000 2018
X-GM-THRID: 1604610346950104244
X-Gmail-Labels: Archived
thing,
Unread
Favorite-fruit: bananas
Received: by 2002:a9f:3005:0:0:0:0:0 with SMTP id h5-v6csp731836uab;
Fri, 29 Jun 2018 05:34:36 -0700 (PDT)
From 1604610346950104244#xxx Fri Jun 29 12:34:35 +0000 2018
X-GM-THRID: 1604610346950104244
X-Gmail-Labels: Archived
,Read
ABC123: DoReMe
Received: by 2002:a9f:3005:0:0:0:0:0 with SMTP id h5-v6csp731836uab;
Fri, 29 Jun 2018 05:34:36 -0700 (PDT)
From 1604610346950104244#xxx Fri Jun 29 12:34:35 +0000 2018
X-GM-THRID: 1604610346950104244
X-Gmail-Labels: Archived
thing,Unread
emais
Received: by 2002:a9f:3005:0:0:0:0:0 with SMTP id h5-v6csp731836uab;
Fri, 29 Jun 2018 05:34:36 -0700 (PDT)
(?:^[\w\-]+:)
Above regex applied to data showing indeterminate tag pattern.
(?:\bX-Gmail-Labels:\G)[^\r]*\K\r
Above regex applied to data showing un-end-bounded matches.
Thanks!
-Fitz

Print the first line occurrence of each matching pattern with sed

I would like to filter the output of the utility last based on a variable set of usernames.
This is sample output from last unfiltered,
reboot system boot server Wed Apr 6 13:15 - 14:24 (01:09)
user1 pts/0 server Wed Apr 6 13:08 - 13:15 (00:06)
reboot system boot system Wed Apr 6 13:08 - 13:15 (00:06)
user1 pts/0 server Wed Apr 6 13:06 - down (00:01)
reboot system boot system Wed Apr 6 13:06 - 13:07 (00:01)
user1 pts/0 server Wed Apr 6 12:59 - down (00:06)
What I would like to do is pipe the output of last to sed. Then, using sed I would print the first occurrence of each specified user name i.e. their last log entry in wtmp. The output should appear as so,
reboot system boot server Wed Apr 6 13:15 - 14:24 (01:09)
user1 pts/0 server Wed Apr 6 13:08 - 13:15 (00:06)
The sed expression that I particularly like is,
last|sed '/user1/{p;q;}'
Unfortunately this only gives me the ability to match the first occurrence of one username. Using this syntax is there a way I could specify a multiple of usernames? Thanks in advance!
awk is better fit here than sed due to awk's ability to use associative arrays:
last | awk '!seen[$1]++'
reboot system boot server Wed Apr 6 13:15 - 14:24 (01:09)
user1 pts/0 server Wed Apr 6 13:08 - 13:15 (00:06)

How to remove time date stamp from string

Hello sed/awk/bash experts,
I have thousands of certs to report on and I want to remove the time:
www.bob.com | Jul 28 19:22:38 2015 | Jul 27 19:22:38 2017
How can I (easily) remove 19:22:38 & 19:22:38 so I just have:
www.bob.com | Jul 28 2015 | Jul 27 2017
If you want to edit the file in place rather than just outputting to the screen, use a modified version of anubhava's command:
sed -E 's/[[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2}[[:blank:]]+//g' file > file.tmp && mv file.tmp file
It also has the added benefit of not wiping your original file should sed fail. See here.
If you're using an older version of sed you could try the following:
$ echo "www.bob.com | Jul 28 19:22:38 2015 | Jul 27 19:22:38 2017" | sed 's/\([a-zA-Z]*[ ]*[0-9]*[ ]*\)[0-9:]*\([ ]*[0-9]*\)/\1\2/g'
www.bob.com | Jul 28 2015 | Jul 27 2017
Or perhaps to just remove the time, you could instead use:
echo "www.bob.com | Jul 28 19:22:38 2015 | Jul 27 19:22:38 2017" | sed 's/[0-9][0-9]:[0-9][0-9]:[0-9][0-9]//g'
www.bob.com | Jul 28 2015 | Jul 27 2017
awk '{$5=""; $10=""; print}' file
www.bob.com | Jul 28 2015 | Jul 27 2017
Using sed:
sed -E 's/[[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2}[[:blank:]]+//g' file
www.bob.com | Jul 28 2015 | Jul 27 2017

Confusion on grep pattern search

Consider this log file
SN PID Date Status
1 P01 Fri Feb 14 19:32:36 IST 2014 Alive
2 P02 Fri Feb 14 19:32:36 IST 2014 Alive
3 P03 Fri Feb 14 19:32:36 IST 2014 Alive
4 P04 Fri Feb 14 19:32:36 IST 2014 Alive
5 P05 Fri Feb 14 19:32:36 IST 2014 Alive
6 P06 Fri Feb 14 19:32:36 IST 2014 Alive
7 P07 Fri Feb 14 19:32:36 IST 2014 Alive
8 P08 Fri Feb 14 19:32:36 IST 2014 Alive
9 P09 Fri Feb 14 19:32:36 IST 2014 Alive
10 P010 Fri Feb 14 19:32:36 IST 2014 Alive
When i do => grep "P01" File
output is : (as expected)
1 P01 Fri Feb 14 19:32:36 IST 2014 Alive
10 P010 Fri Feb 14 19:32:36 IST 2014 Alive
But when i do => grep " P01 " File (notice the space before and after P01)
I do not get any output!
Question : grep matches pattern in a line, so " P01 " ( with space around ) should match the first PID of P01 as it has spaces around it....but seems that this logic is wrong....what obvious thing i am missing here!!!?
If the log uses tabs not spaces, your grep pattern won't match. I would add word boundaries to the word you want to find:
grep '\<P01\>' file
If you really want to use whitespace in your pattern, use one of:
grep '[[:blank:]]P01[[:blank:]]' file # horizontal whitespace, tabs and spaces
grep -P '\sP01\s' file # using Perl regex