Print the first line occurrence of each matching pattern with sed - regex

I would like to filter the output of the utility last based on a variable set of usernames.
This is sample output from last unfiltered,
reboot system boot server Wed Apr 6 13:15 - 14:24 (01:09)
user1 pts/0 server Wed Apr 6 13:08 - 13:15 (00:06)
reboot system boot system Wed Apr 6 13:08 - 13:15 (00:06)
user1 pts/0 server Wed Apr 6 13:06 - down (00:01)
reboot system boot system Wed Apr 6 13:06 - 13:07 (00:01)
user1 pts/0 server Wed Apr 6 12:59 - down (00:06)
What I would like to do is pipe the output of last to sed. Then, using sed I would print the first occurrence of each specified user name i.e. their last log entry in wtmp. The output should appear as so,
reboot system boot server Wed Apr 6 13:15 - 14:24 (01:09)
user1 pts/0 server Wed Apr 6 13:08 - 13:15 (00:06)
The sed expression that I particularly like is,
last|sed '/user1/{p;q;}'
Unfortunately this only gives me the ability to match the first occurrence of one username. Using this syntax is there a way I could specify a multiple of usernames? Thanks in advance!

awk is better fit here than sed due to awk's ability to use associative arrays:
last | awk '!seen[$1]++'
reboot system boot server Wed Apr 6 13:15 - 14:24 (01:09)
user1 pts/0 server Wed Apr 6 13:08 - 13:15 (00:06)

Related

Howto grep over months with defined start and end date

so here's my problem: I have big log files and want a script to grep certain periods of time and safe them to a file (sorted), basically
bash script.sh Jul 4 Sep 30
will return for example
Sep 30 user0 logged in
Sep 15 user1 logged in
Aug 6 user0 logged in
Aug 3 user1 logged in
Jul 28 user2 logged in
Jul 27 user2 logged in
Jul 4 user0 logged in
My first attempt was that every month and date gets his own variable like
bash script.sh Jul 4 Sep 3 0
so I can use $1 for start month (July), $2 for start date (4) and so on in grep like
for logs in logs*
do
grep -qEe "^\"$1\" [\"$2\"-9]\s" $messages >> result.txt
done
to get all logs from July 4 to 9 but I don't know how to get logs from the entire time period that aren't in the same month nor in a period like 1-9 or 10-19 and so on
Any help greatly appreciated!
EDIT:
As some people asked, here's how my log files look like (just much bigger and not sorted):
Sep 30 user0 logged in
Jul 27 user2 logged in
Aug 6 user0 logged in
Aug 31 user1 logged in
Jul 8 user2 logged in
Sep 5 user1 logged in
Jul 27 user2 logged in
Jul 14 user0 logged in
[...]
Here's my take:
#/bin/bash
year="$(date +"%Y")"
start="$(date -d"$1 $2, $year" +'%s')"
end="$(($(date -d"$3 $4, $year" +'%s')+86400))"
for log in logs*; do
while IFS= read -r line; do
d="$(date -d"$(cut -d' ' -f1,2 <<< "$line"), $year" +'%s')"
if (( $start <= $d && $d < $end )); then
echo "$s"
fi
done < "$log"
done
You run it like that: ./script.sh Jul 04 Sep 03. Since no year is included in the logs, it assumes that all dates (including the ones in the command line) are for the current year. It's probably not the most optimal solution but it works. It relies on date which it repeatedly calls to parse dates into a unix timestamp. unix timestamps are nice because they are just numbers and thus can be used in numeric comparisons.
$ range="Jul 4 Sep 30"
$ awk -v range="$range" '
BEGIN {
numMths = split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec",m)
for (i in m) {
mths[m[i]] = i
}
split(range,r)
beg = sprintf("%02d%02d", mths[r[1]], r[2])
end = sprintf("%02d%02d", mths[r[3]], r[4])
}
{ cur = sprintf("%02d%02d", mths[$1], $2) }
(cur >= beg) && (cur <= end) { vals[$1,$2] = $0 }
END {
for (mthNr=numMths; mthNr>0; mthNr--) {
for (dayNr=31; dayNr>0; dayNr--) {
date = m[mthNr] SUBSEP dayNr
if (date in vals) {
print vals[date]
}
}
}
}
' file
Sep 30 user0 logged in
Sep 5 user1 logged in
Aug 31 user1 logged in
Aug 6 user0 logged in
Jul 27 user2 logged in
Jul 14 user0 logged in
Jul 8 user2 logged in

Select specific columns from a record using only 'sed' without using 'awk'

Here are some sample input I obtain from doing ls -l :
-rwxr-xr-x 1 root root 1779 Jan 10 2014 zcmp
-rwxr-xr-x 1 root root 5766 Jan 10 2014 zdiff
-rwxr-xr-x 1 root root 142 Jan 10 2014 zegrep
-rwxr-xr-x 1 root root 142 Jan 10 2014 zfgrep
-rwxr-xr-x 1 root root 2133 Jan 10 2014 zforce
-rwxr-xr-x 1 root root 5940 Jan 10 2014 zgrep
lrwxrwxrwx 1 root root 8 Dec 5 2015 ypdomainname -> hostname
I would like to print out the last column and 5th column using ONLY sed like this:
zcmp 1779
zdiff 5766
zegrep 142
zfgrep 142
zforce 2133
zgrep 5940
ypdomainname -> hostname 8
I'm trying to find a regex to match but have not succeeded. And I'm not allowed to use awk or cut either.
Thank you in advance.
Try this;
ls -l | sed -r 's/^(\S+\s+){5}(\S+\s+){3}/\1/' | sed 's/^\(.*\) \(.*\)$/\2\ \1/g'

How to grep lines with date formats?

I have a log file that is created from a bash script that uses $(date), so there are dates in such a format:
Fri Apr 24 22:10:39 CEST 2015
The log file looks like this:
Using SCRIPTS_ROOTDIR: /home/gillin/moses/scripts
Using multi-thread GIZA
using gzip
(1) preparing corpus # Fri Apr 24 22:10:39 CEST 2015
Executing: mkdir -p /media/2tb/ccexp/phrase-clustercat-mgiza/work.en-ru/training/corpus
(1.0) selecting factors # Fri Apr 24 22:10:39 CEST 2015
Forking...
(1.2) creating vcb file /media/2tb/ccexp/phrase-clustercat-mgiza/work.en-ru/training/corpus/en.vcb # Fri Apr 24 22:10:39 CEST 2015
(1.1) running mkcls # Fri Apr 24 22:10:39 CEST 2015
/home/gillin/moses/training-tools/mkcls -c50 -n2 -p/media/2tb/ccexp/corpus.exp/train-clean.en -V/media/2tb/ccexp/phrase-clustercat-mgiza/work.en-ru/training/corpus/en.vcb.classes opt
Executing: /home/gillin/moses/training-tools/mkcls -c50 -n2 -p/media/2tb/ccexp/corpus.exp/train-clean.en -V/media/2tb/ccexp/phrase-clustercat-mgiza/work.en-ru/training/corpus/en.vcb.classes opt
(1.1) running mkcls # Fri Apr 24 22:10:39 CEST 2015
/home/gillin/moses/training-tools/mkcls -c50 -n2 -p/media/2tb/ccexp/corpus.exp/train-clean.ru -V/media/2tb/ccexp/phrase-clustercat-mgiza/work.en-ru/training/corpus/ru.vcb.classes opt
Executing: /home/gillin/moses/training-tools/mkcls -c50 -n2 -p/media/2tb/ccexp/corpus.exp/train-clean.ru -V/media/2tb/ccexp/phrase-clustercat-mgiza/work.en-ru/training/corpus/ru.vcb.classes opt
Is there a way such that i can grep all the lines that contain the output of $(date)?
Currently I'm using this regex:
[a-z].*[1-9] [0-2][1-9]:[0-6][0-9]:[0-6][0-9] CEST 2015
And it catches line like
preparing corpus # Fri Apr 24 22:10:39 CEST 2015
But i need the full line:
(1) preparing corpus # Fri Apr 24 22:10:39 CEST 2015
And also the year and time is sort of hard coded. Is there a better regex or unix tool that can extract lines with $(date) outputs?
Try this:
unalias grep
grep --color=never '.*[a-z].*[1-9] [0-2][1-9]:[0-6][0-9]:[0-6][0-9] CEST 2015' file

Confusion on grep pattern search

Consider this log file
SN PID Date Status
1 P01 Fri Feb 14 19:32:36 IST 2014 Alive
2 P02 Fri Feb 14 19:32:36 IST 2014 Alive
3 P03 Fri Feb 14 19:32:36 IST 2014 Alive
4 P04 Fri Feb 14 19:32:36 IST 2014 Alive
5 P05 Fri Feb 14 19:32:36 IST 2014 Alive
6 P06 Fri Feb 14 19:32:36 IST 2014 Alive
7 P07 Fri Feb 14 19:32:36 IST 2014 Alive
8 P08 Fri Feb 14 19:32:36 IST 2014 Alive
9 P09 Fri Feb 14 19:32:36 IST 2014 Alive
10 P010 Fri Feb 14 19:32:36 IST 2014 Alive
When i do => grep "P01" File
output is : (as expected)
1 P01 Fri Feb 14 19:32:36 IST 2014 Alive
10 P010 Fri Feb 14 19:32:36 IST 2014 Alive
But when i do => grep " P01 " File (notice the space before and after P01)
I do not get any output!
Question : grep matches pattern in a line, so " P01 " ( with space around ) should match the first PID of P01 as it has spaces around it....but seems that this logic is wrong....what obvious thing i am missing here!!!?
If the log uses tabs not spaces, your grep pattern won't match. I would add word boundaries to the word you want to find:
grep '\<P01\>' file
If you really want to use whitespace in your pattern, use one of:
grep '[[:blank:]]P01[[:blank:]]' file # horizontal whitespace, tabs and spaces
grep -P '\sP01\s' file # using Perl regex

regexp to wrap a line with ${color} and $color

Is there a way to have this regex put ${color orange} at the beginning, and $color at the end of the line where the date is found?
DJS=`date +%_d`;
cat thisweek.txt | sed s/"\(^\|[^0-9]\)$DJS"'\b'/'\1${color orange}'"$DJS"'$color'/
With this expression I get this:
Saturday Aug 13 12pm - 9pm 4pm - 5pm
Sunday Aug 14 9:30am - 6pm 1pm - 2pm
Monday Aug 15 6:30pm - 11:30pm None
Tuesday Aug 16 6pm - 11pm None
Wednesday Aug 17 Not Currently Scheduled for This Day
Thursday Aug ${color orange}18$color Not Currently Scheduled for This Day
Friday Aug 19 7am - 3:30pm 10:30am - 11:30am
What I want to have is this:
Saturday Aug 13 12pm - 9pm 4pm - 5pm Sunday Aug 14 9:30am - 6pm 1pm - 2pm
Monday Aug 15 6:30pm - 11:30pm None
Tuesday Aug 16 6pm - 11pm None
Wednesday Aug 17 Not Currently Scheduled for This Day
${color orange}Thursday Aug 18 Not Currently Scheduled for This Day$color
Friday Aug 19 7am - 3:30pm 10:30am - 11:30am
Acually, it works for me. Depending on your version of sed, you might need to pass -r. Also, as tripleee says, don't use cat here
DJS=`date +%_d`
sed -r s/"\(^\|[^0-9]\)$DJS"'\b'/'\1${color orange}'"$DJS"'$color'/ thisweek.txt
EDIT: Ok, so with the new information I arrived at this:
sed -r "s/([^0-9]+19.+)/\${color orange}\1\$color/" thisweek.txt
This gives me the output
Saturday Aug 13 12pm - 9pm 4pm - 5pm
Sunday Aug 14 9:30am - 6pm 1pm - 2pm
Monday Aug 15 6:30pm - 11:30pm None
Tuesday Aug 16 6pm - 11pm None
Wednesday Aug 17 Not Currently Scheduled for This Day
Thursday Aug 18 Not Currently Scheduled for This Day
${color orange}Friday Aug 19 7am - 3:30pm 10:30am - 11:30am $color
(Note that it differs from your's since it's friday at least in my time zone)