Grep logs between two timestamps in Shell - regex

I am writing a script where I need to grep the logs exactly between two given timestamps . I don't want to use regex as it not full proof. Is there any other way through which I can achieve this ?
e.g: between time range 04:15:00 to 05:15:00
Log Format:
170.37.144.10 - - [17/Dec/2015:04:00:00 -0500] "GET /abc/def/ghi/xyz.jsp HTTP/1.1" 200 337 3440 0000FqZTmTG2yuMTJeny7hPDOvG
170.37.144.10 - - [17/Dec/2015:05:10:09 -0500] "POST /abc/def/ghi/xyz.jsp HTTP/1.1" 200 27 21124 0000FqZTmTG2yuMTJ

This might be what you want to do, using GNU awk for time functions:
$ cat tst.awk
BEGIN { FS="[][ ]+"; beg=t2s(beg); end=t2s(end) }
{ cur = t2s($4) }
(cur >= beg) && (cur <= end)
function t2s(time, t) {
split(time,t,/[\/:]/)
t[2]=(match("JanFebMarAprMayJunJulAugSepOctNovDec",t[2])+2)/3
return mktime(t[3]" "t[2]" "t[1]" "t[4]+0" "t[5]+0" "t[6]+0)
}
$ awk -v beg="17/Dec/2015:04:15" -v end="17/Dec/2015:05:15" -f tst.awk file
access_log.aging.20151217040207:170.37.144.10 - - [17/Dec/2015:05:10:09 -0500] "POST /abc/def/ghi/xyz.jsp HTTP/1.1" 200 27 21124 0000FqZTmTG2yuMTJ
but it's hard to guess without more sample input and expected output.

If you don't want to use regular expressions nor patterns for matching lines, then grep alone is not enough.
Here's a Bash+date solution:
# start and stop may be parameters of your script ("$1" and "$2"),
# here they are hardcoded for convenience.
start="17/Dec/2015 04:15:00 -0500"
stop="17/Dec/2015 05:15:00 -0500"
get_tstamp() {
# '17/Dec/2015:05:10:09 -0500' -> '17/Dec/2015 05:10:09 -0500'
datetime="${1/:/ }"
# '17/Dec/2015 05:10:09 -0500' -> '17 Dec 2015 05:10:09 -0500'
datetime="${datetime//// }"
# datetime to unix timestamp
date -d "$datetime" '+%s'
}
start=$(get_tstamp "$start")
stop=$(get_tstamp "$stop")
while read -r line
do
datetime="${line%%:*}" # remove ':170.37.144.10 ...'
tstamp="$(get_tstamp "$datetime")"
# $tstamp now contains a number like 1450347009;
# check if it is in range $start..$stop
[[ "$tstamp" -ge "$start" && "$tstamp" -le "$stop" ]] && echo "$line"
done

Related

Retrieve log pattern via awk

I would like to retrieve from the following logs the date, the 5 URI length, the ab and cde:
10.10.10.10 - - [26/Oct/2020:19:50:13 +0000] "GET /five/six/seven/eight/nine/en?from=1603738800&to=1603785600ncludedInRange=false HTTP/1.1" 200 255441 "-" "Opera com.test.super/1.10.4;11072 (Linux;Neon KNWWWfj;0,02.2)" "10.10.10.10""f799b6b9-747f-4f22-a1bf-4b7de885fc60""-" "-" "-" "-"ab=0.110 cde=0.102
11.1.1.1 - - [26/Oct/2020:19:50:14 +0000] "GET /one/two/three/four/five/en HTTP/1.1" 200 2832 "-" "Opera com.test.super/1.10.4;11072 (Linux;Neon KNWWWfj;0,02.2)" "11.1.1.1""19a8ee3c-9cb3-4ba6-9732-eb4923601e92""-" "-" "-" "-"ab=0.111 cde=0.112
e.g.
26/Oct/2020:19:50:13 /five/six/seven/eight/nine ab=0.110 cde=0.102
I have tried the following command, but I get only a part of it. Can you please help?
awk '{print $4 "\t" $7 "\t" $(NF-1),"\t",$NF}' |sed 's/"-"//g'
$ awk -F'[][[:space:]"]+' -v OFS='\t' '{match($7,"(/[^/]*){5}"); print $4, substr($7,1,RLENGTH), $(NF-1), $NF}' file
26/Oct/2020:19:50:13 /five/six/seven/eight/nine ab=0.110 cde=0.102
26/Oct/2020:19:50:14 /one/two/three/four/five ab=0.111 cde=0.112
Based on #Ed Morton, but setting FS in 5 parts:
$ awk -v FS='[[]|\\+[[:digit:]]+[]]|GET |/en|"+-"' '{print $2,$4,$NF}' file
26/Oct/2020:19:50:13 /five/six/seven/eight/nine ab=0.110 cde=0.102
26/Oct/2020:19:50:14 /one/two/three/four/five ab=0.111 cde=0.112
Updated.
Thanks to #Ed Morton.

AWK catching a regular expression

I have been using this little script for months now with success. Today I realize there is one output it cant seem to catch, screen comes up blank with a new prompt:
user#computer ~]$ myscan ipsFile 23
user#computer ~]$
Here is the code
#!/bin/bash
sudo nmap -v -Pn -p T:$2 -reason -i $1 | awk ' {
if (/syn-ack/) {
print "Yes"
c++
}
else if (/no-response|reset|host-unreach/) {
print "No"
c++
}
}
END { print c} '
If I run the nmap against one of the IPs then it returns
Starting Nmap 5.51 ( http://nmap.org ) at 2017-09-26 11:44 CDT
Initiating Parallel DNS resolution of 1 host. at 11:44
Completed Parallel DNS resolution of 1 host. at 11:44, 0.00s elapsed
Initiating Connect Scan at 11:44
Scanning 1.1.1.1 [1 port]
Completed Connect Scan at 11:44, 0.20s elapsed (1 total ports)
Nmap scan report for 1.1.1.1
Host is up, received user-set (0.20s latency).
PORT STATE SERVICE REASON
23/tcp filtered telnet host-unreach
Read data files from: /usr/share/nmap
Nmap done: 1 IP address (1 host up) scanned in 0.26 seconds
How can I catch the 'host-unreach' portion?
Let's try and debug this. Execute this:
nmap -v -Pn -p T:23 -reason -i ipsFile | awk '{print $0}/syn-ack/{print "Yes";c++}/no-response|reset|host-unreach/{print "No";c++}END {print c}' > out.txt
The only difference here is that the awk script prints $0 (i.e. the output of your nmap calls) to file out.txt. Try to grep your unreach value.
I tried this myself and found that instead of a host-unreach I got a net-unreach. Might be the same thing in your case.
Have you tried piping stderr to stdout like
#!/bin/bash
sudo nmap -v -Pn -p T:$2 -reason -i $1 2>&1 | awk ' {
if (/syn-ack/) {
print "Yes"
c++
}
else if (/no-response|reset|host-unreach/) {
print "No"
c++
}
}
END { print c} '

awk regex magic (match first occurrence of character in each line)

Have been scratching my head over this one, hoping there's a simple solution that I've missed.
Summary
Simplified the following code can't cope with IPv6 addresses in the (here abbreviated) apache log parsed to it. Do I SED the variable before parsing to AWK or can I change the AWK regex to match only the first ":" on each line in $clog?
$ clog='djerk.nl:80 200.87.62.227 - - [20/Nov/2015:01:06:25 +0100] "GET /some_url HTTP/1.1" 404 37252
bogus.com:80 200.87.62.227 - - [20/Nov/2015:01:06:27 +0100] "GET /some_url HTTP/1.1" 404 37262
djerk.nl:80 200.87.62.227 - - [20/Nov/2015:01:06:29 +0100] "GET /another_url HTTP/1.1" 200 11142
ipv6.com:80 2a01:3e8:abcd:320::1 - - [20/Nov/2015:01:35:24 +0100] "GET /some_url HTTP/1.1" 200 273'
$ echo "$clog" | awk -F '[: -]+' '{ vHost[$1]+=$13 } END { for (var in vHost) { printf "%s %.0f\n", var, vHost[var] }}'
> bogus.com 37262
> djerk.nl 48394
> ipv6.com 0
As can be seen the last line of variable $clog, the vhost domain is caught but not the byte count which should come out at 273 instead of 0.
Original long question
The problem I have is with the ":" character. In addition to the other two characters (space and dash), I need AWK to match only the first occurrence of ":" in each line it's evaluating. the following splits each line by three characters which works fine, until the log entries contain IPv6 addresses.
matrix=$( echo "$clog" | awk -F '[: -]+' '{ vHost[$1]++; Bytes[$1]+=$13 } END { for (var in vHost) { printf "%s %.0f %.0f\n", var, vHost[var], Bytes[var] }}' )
The above code converts the following log entries (contained in variable $clog):
djerk.nl:80 200.87.62.227 - - [20/Nov/2015:01:06:25 +0100] "GET /some_url HTTP/1.1" 404 37252 "-" "Safari/11601.1.56 CFNetwork/760.0.5 Darwin/15.0.0 (x86_64)"
bogus.com:80 200.87.62.227 - - [20/Nov/2015:01:06:27 +0100] "GET /some_url HTTP/1.1" 404 37262 "-" "Safari/11601.1.56 CFNetwork/760.0.5 Darwin/15.0.0 (x86_64)"
djerk.nl:80 200.87.62.227 - - [20/Nov/2015:01:06:29 +0100] "GET /wordpress/2014/ssl-intercept-headaches HTTP/1.1" 200 11142 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B410 Safari/600.1.4"
djerk.nl:80 200.87.62.227 - - [20/Nov/2015:01:06:30 +0100] "GET /some_other_url HTTP/1.1" 404 37264 "-" "Safari/11601.1.56 CFNetwork/760.0.5 Darwin/15.0.0 (x86_64)"
Into a table like so, containing vhost name (sans TCP port number), hits and cumulative byte count. One line per vhost:
djerk.nl 3 85658
bogus.com 1 37262
But IPv6 addresses get unintentionally split due to their notation and this causes AWK to produce bogus output when evaluation these log entries. Sample IPv6 log entry:
djerk.nl:80 2a01:3e8:abcd:320::1 - - [20/Nov/2015:01:35:24 +0100] "POST /wordpress/wp-cron.php?doing_wp_cron=*** HTTP/1.0" 200 273 "-" "WordPress; http://www.djerk.nl/wordpress"
I guess a work around would be to mangle variable $clog to replace the first occurrence of ":" and remove this character from the AWK regex. But I don't think native bash substitution is capable of negotiating variables with multiple lines.
clog=$(sed 's/:/ /' <<< "$clog")
matrix=$( echo "$clog" | awk -F '[ -]+' '{ vHost[$1]++; Bytes[$1]+=$10 } END { for (var in vHost) { printf "%s %.0f %.0f\n", var, vHost[var], Bytes[var] }}' )
This works because $clog is quoted which preserves the line feeds and runs sed on each line individually. As a result (and shown) the AWK line needs to be adjusted to ignore ":" and grab $10 instead of $13 for the byte count.
So as it turns out, in writing this, I've already given myself a solution. But I'm sure someone will know of a better more efficient way.
Just don't split the entire line on colons. Remove the port number from the field you extract instead.
split($1, v, /:/); vHost[v[1]]++; ...
I don't see why you would split on dashes, either; either way, the field numbers will be renumbered, so you would end up with something like
awk '{ split($1, v, /:/); vHost[v[1]]++; Bytes[v[1]]+=$11 }
END { for (var in vHost)
printf "%s %.0f %.0f\n", var, vHost[var], Bytes[var] }'

in vi, how to search and append something?

I have a bash file
#!/bin/bash
curl '/entry/point.json?name=ab'
echo 1
sleep 60
curl '/entry/point.json?name=bc'
echo 2
sleep 60
and so on.
I'd like to append
|jsonpp
after each curl command. How am I supposed to do that in VI? Such as
:%s/curl/?????/g
1. :global
:g/curl/norm A|jsonpp
would be my favourite
2. 'interactive' repeats:
'Highlight' curl (using * or #)
A|jsonppEsc
repeat at leisure: n. n. n. ...
3. bash functions
Alternatively,
#!/bin/bash
function curl()
{
/usr/bin/curl "$#" | jsonpp
}
curl '/entry/point.json?name=ab'
echo 1
sleep 60
curl '/entry/point.json?name=bc'
echo 2
sleep 60
:%s/^\(curl .*\)$/\1 | jsonpp/g

regex to modificate domino http log to common format for Piwik import

I need to import old http log files from my Domino webserver into my piwik tracking.
the problem is the format of the log if an user is logged in.
Normal/good format example:
123.123.123 www.example.com - [17/Mar/2013:00:00:39 +0100] "GET /example.org HTTP/1.1" 200 3810 "" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 234 "" "example"
Bad format example - produced if user is logged in
123.123.123 www.example.com "CN=SomeUser/OU=SomeOU/O=SomeO" - [17/Mar/2013:00:00:39 +0100] "GET /example.org HTTP/1.1" 200 3810 "" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 234 "" "example
i am looking for a one-liner bash to remove those CN information if it is included.
UPDATE:
this is my solution to get a one liner to import an domino log file into piwik. maybe someday someone finds this thing and needn't flip his table
for i in `ls -v *.log`; do date && echo " bearbeite" $i && echo " " && awk '{sub(/ +"CN=[^"]+" +/," - ")}1' $i grep -v http.monitor | grep -v nagios > $i.cleanTmp && python /var/www/piwik/misc/log-analytics/import_logs.py --url=http://127.0.0.1/piwik --idsite=8 $i.cleanTmp --dry-run && rm $i.cleanTmp; done;
If You need a pure bash solution You can do something like this:
Example file
cat >infile <<XXX
123.123.123 www.example.com "CN=SomeUser/OU=SomeOU/O=SomeO" - [17/Mar/2013:00:00:39 +0100] "GET /example.org HTTP/1.1" 200 3810 "" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 234 "" "example"
XXX
while read x; do
[[ $x =~ \ +\"CN=[^\"]+\"\ + ]] && x=${x/$BASH_REMATCH/ }
echo $x
done <infile
Output:
123.123.123 www.example.com - [17/Mar/2013:00:00:39 +0100] "GET /example.org HTTP/1.1" 200 3810 "" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 234 "" "example"
It parses for a string starting with spaces then "CN= and then any non " characters, then a " then some spaces. If this patten found, it replaces with a space.
If the log files are big ones (>1MB) and this should be done periodically, then use awk instead of the pure bash solution.
awk '{sub(/ +"CN=[^"]+" +/," ")}1' infile
So you just want to remove the "CN=SomeUser/OU=SomeOU/O=SomeO" part?
The regex to match that looks like this:
"CN=\w+\/OU=\w+\/O=\w+"