I have below simple regex expressions that works pretty well to split the given sample log. This would provides separate groups of object where I could access with $1 $2 $3 ... etc. I'm using this in Splunk.
Eg.
$1 = https
$2 = 2020-08-20T12:40:00.274478Z
$3 = app/my-aws-alb/e7538073dd1a6fd8
(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+?)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)
https 2020-08-20T12:40:00.274478Z app/my-aws-alb/e7538073dd1a6fd8 162.158.26.188:21098 172.0.51.37:80 0.000 0.004 0.000 405 405 974 424 "POST https://my-aws-alb-domain:443/api/ps/fpx/callback HTTP/1.1" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.2840.91 Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:ap-southeast-1:111111111111:targetgroup/my-aws-target-group/41dbd234b301e3d84 "Root=1-5f3e6f20-3fdasdsfffdsf" "api.mydomain.com" "arn:aws:acm:ap-southeast-1:11111111111:certificate/be4344424-a40f-416e-8434c-88a8a3b072f5" 0 2020-08-20T12:40:00.270000Z "forward" "-" "-" "172.0.51.37:80" "405" "-" "-"
The problem here is, I want to separate IP:Port into separate group. There are multiple places which have the IP:Port. Those I need as a separate group like other object.
Eg.
$4 = 162.158.26.188
$5 = 21098
$6 = 172.0.51.37
$7 = 80
Can anyone help on this? Thank you!
Here's a regex that will pull all of the ip:port values from a field:
| rex field=_raw max_match=0 "(?<ip_port>\d+\.\d+\.\d+\.\d+\:\d+)"
Now expand the ip_port field:
| mvexpand ip_port
And then extract from ip_port into ip & port:
| rex field=ip_port "(?<ip>\d+\.\d+\.\d+\.\d+\)\:(?<port>\d+)"
Related
I have the following line from which I want to replace space with whitespace (tab) but want to keep the spaces within the double quotes as it is. I am on Notepad++.
[11/May/2020:10:10:20 -0400] "GET / HTTP/1.1" 302 523 52197 url.com - - TLSv1.2 19922 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36" https://somelinkhere - -
Desired output:
[11/May/2020:10:10:20 -0400] "GET / HTTP/1.1" 302 523 52197 url.com - - TLSv1.2 19922 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36" https://somelinkhere - -
Through the following regex I was able to select the string under the double quotes, but it's of no use for me.
"([^"]*)"
Can you please help me how this can be achieved?
You can use
("[^"]*")|[ ]
Replace with (?1$1:\t).
Details:
("[^"]*") - Capturing group 1: a ", then zero or more chars other than " and then a "
| - or
[ ] - matches a space (you may remove [ and ] here , they are used to make the space pattern visible in the answer).
See the demo screenshot:
My logs look like this:
200 59903 0.056 - [24/Jun/2020:00:06:56 +0530] "GET /xxxxx/xxxxx/xxxxx HTTP/1.1" xxxxx.com [xxxx:4900:xxxx:b798:xxxx:c8ba:xxxx:6a23] - - xxx.xxx.xxx.xxx - - - "http://xxxxx/xxxxx/xxxxx" 164551836 1 HIT "-" "-" "Mozilla/5.0 (Linux; Android 9; Mi A1 Build/PKQ1.180917.001; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/77.0.3865.92 Mobile Safari/537.36" "-" "-" "dhDebug=-" "-" - -
200 11485 0.000 - [24/Jun/2020:00:06:56 +0530] "GET /xxxxx/xxxxx/xxxxx/xxxxx HTTP/1.1" xxxxx.com xxx.xxx.xxx.xxx - - xxx.xxx.xxx.xxx - - - "-" 164551710 7 HIT "-" "-" "Dalvik/2.1.0 (Linux; U; Android 9; vivo 1915 Build/PPR1.180610.011)" "-" "-" "dhDebug=appVersion=13.0.8&osVersion=9&clientId=1271210612&conn_type=4G&conn_quality=NO_CONNECTION&sessionSource=organic&featureMask=1879044085&featureMaskV1=635" "-" 40 -
The two logs are almost same except the fact that the last one contains a detailed output of dhDebug.
This is how my parsers.conf looks like:
[PARSER]
Name head
Format regex
Regex (?<responseCode>\d{3})\s(?<responseSize>\d+)\s(?<responseTime>\d+.\d+)\s.*?\s\[(?<time>.*?)\]\s"(?<method>.*?)\s(?<url1>.*?)\s(?<protocol>.*?)"\s(?<servedBy>.*?)\s(?<Akamai_ip1>.*?)\s(?<ClientId_ip2>.*?)\s(?<ip3>.*?)\s(?<lb_ip4>.*?)\s(?<ip5>.*?)\s(?<ip6>.*?)\s(?<ip7>.*?)\s+"(?<url2>.*?)".*?".*?"\s".*?"\s"(?<agentInfo>.*?)"
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
Time_Keep On
Types responseTime:float
Please suggest any idea on how to implement the information of dhDebug in a separate key-value pair in the same regex that works on both the types of logs.
EDITED!!
You can use (?:case1|case2) for case1: is null and case2: is not null
So Regex will be:
(?<responseCode>\d{3})\s(?<responseSize>\d+)\s(?<responseTime>\d+.\d+)\s.*?\s\[(?<time>.*?)\]\s"(?<method>.*?)\s(?<url1>.*?)\s(?<protocol>.*?)"\s(?<servedBy>.*?)\s(?<Akamai_ip1>.*?)\s(?<ClientId_ip2>.*?)\s(?<ip3>.*?)\s(?<lb_ip4>.*?)\s(?<ip5>.*?)\s(?<ip6>.*?)\s(?<ip7>.*?)\s+"(?<url2>.*?)".*?".*?"\s".*?"\s"(?<agentInfo>.*?)"\s"-"\s"-"\s"dhDebug=(?:-|appVersion=(?<appVersion>.*?)&osVersion=(?<osVersion>.*?)&clientId=(?<clientId>.*?)&conn_type=(?<conn_type>.*?)&conn_quality=(?<conn_quality>.*?)&sessionSource=(?<sessionSource>.*?)&featureMask=(?<featureMask>.*?)&featureMaskV1=(?<featureMaskV1>.*?))"
With this you get null for each field name of dhDebug for the first log line and field names with values for the second one.
You can test it at http://grokdebug.herokuapp.com/
Need help in capturing the user agent details from the citrix logs. The log format of the citrix is quite different for the successful and denied. The samples are given below
For Successful authentication the user agent details are enclosed within "". Details are after the keyword Browser_type ""
For Denied traffic , useragent details are not present within the "". It is present after the keyword Browser
Denied
Dec 8 05:20:53 netscaler02 12/08/2017:05:20:53 netscaler02 0-PPE-0 : AAA LOGIN_FAILED -adasd92 0 : User renju - Client_ip X.X.X.X - Failure_reason "External authentication server denied access" - Browser Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
Success
Dec 8 05:54:06 netscaler02 12/08/2017:11:54:06 GMT netscaler02 0-PPE-0 : SSLVPN LOGIN -78342434122 0 : Context renjus#1X.X.X.X - SessionId: xxx- User renju - Client_ip X.X.X.X - Nat_ip "Mapped Ip" - Vserver X.X.X.X:443 - Browser_type "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" - SSLVPN_asdasdat_type ICA - Group(s) "N/A"
I do have a regex to capture the browser agent within ""
(?P(?<=Browser_type\s\").?(?=\s(?:\w+=|\")))"
Bud need a regex that capture the user agent from all the format.
Thanks in advance.
Maybe you could match your logs like this:
Browser(?:_type)?\s"?(.+|[^"]+)
Match Browser with optional _type Browser(?:_type)
Followed by an optional double quote "?
Followed by a whitespace \s
Then capture in a group ( any character zero or more times .*
or |
all until you encounter a double quote [^"]+
Close the group )
Edit:
To capture "Browser" without the optional "_type" in a named capture group:
(?P<citrix_useragent>Browser)(?:_type)?\s"?(.+|[^"]+)
Got the initial push from The fourth bird and did some work around on it.
Browser((?:_type)?\s\"*)(?P(.+\"-|[^\"]+))
Working on an input extractor issue with IIS logs using an "advanced" IIS login tool to collect more than the basic logs provide. It's adding double quotes and spaces to many of the fields and we are trying to us the extractor to correct this. This is the beginning of an example message:
2016-02-08 16:46:35.957 "SITE" "SOURCE" XX.XX.XX.XX GET /blah/etc/etc/file.ext - 80 - "XX.XX.XX.XX" "HTTP/1.1" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; yie11; rv:11.0) like Gecko"
We've already written an extractor to remove all the added quotes before running it through all the other extractors to populate the fields, etc., but we want to replace all spaces between the quotes with + before we do that to match the old logging style.
Can anyone point us in the right direction for this? The closest I've come so far is catching " " between SITE and SOURCE and replacing that using something like "([\s]*)". Result:
2016-02-08 16:46:35.957 "SITE+SOURCE" XX.XX.XX.XX GET /blah/etc/etc/file.ext - 80 - "XX.XX.XX.XX+HTTP/1.1+Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; yie11; rv:11.0) like Gecko"
I can't seem to only look for spaces between the quotes.
Any help would be greatly appreciated. Thanks.
Further Clarification. This portion of the string:
"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; yie11; rv:11.0) like Gecko"
Should be:
"Mozilla/5.0+(Windows+NT+6.1;+WOW64;+Trident/7.0;+yie11;+rv:11.0)+like+Gecko"
Everything else should remain the same as those are the only spaces inside of a quoted section of the string.
Is this even possible with regex?
I'm afraid that regular expressions are not the best tool for this. You basically have to "count" quotes to determine whether a space is within quotes or not.
You can try something like this (Python):
text = '2016-02-08 16:46:35.957 "SITE" "SOURCE" XX.XX.XX.XX GET /blah/etc/etc/file.ext - 80 - "XX.XX.XX.XX" "HTTP/1.1" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; yie11; rv:11.0) like Gecko"'
escaped = ""
count = 0
for c in text:
if c == '"':
count += 1
if c == " " and count % 2 == 1:
escaped += "+"
else:
escaped += c
Afterwards, escaped is this:
2016-02-08 16:46:35.957 "SITE" "SOURCE" XX.XX.XX.XX GET /blah/etc/etc/file.ext - 80 - "XX.XX.XX.XX" "HTTP/1.1" "Mozilla/5.0+(Windows+NT+6.1;+WOW64;+Trident/7.0;+yie11;+rv:11.0)+like+Gecko"
I'm having trouble redirecting several domains & associated sub-domains to one other domain.
Keep falling into a 301 infinite loop.
I have 3 domains, proxied to the same lighttpd process, say :
dom.co
dom.info
dom.net
dom.net is my domain of choice, what I want is to get every one, including www.*, going to my domain of choice. (dom.net is working)
My lighttpd.conf insteresting parts :
$HTTP["host"] =~ "(^|\.)dom\.net$" {
/* working */
}
$HTTP["host"] =~ "(^|\.)dom\.co$" {
url.redirect = ( "^/(.*)" => "dom.net/$1" )
}
the log :
IP dom.co - [16/Nov/2012:20:51:33 +0100] "GET /dom.net/ HTTP/1.0" 301 0 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11"
IP dom.co - [16/Nov/2012:20:51:33 +0100] "GET /dom.net/dom.net/dom.net/ HTTP/1.0" 301 0 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11"
et caetera.
I understand what's happening, not how to fix it. Please help !
I went with this :
$HTTP["host"] =~ "^dom\.co" {
/* ... */
}
# some redirections to dom.co
$HTTP["host"] =~ "dom.net|dom.info|www.dom.net|www.dom.info|www.dom.co" {
url.redirect = ( "^/(.*)" => "http://dom.co/$1" )
}
I think the absence of http:// in front of the target url was what caused trouble, didn't tweak and try things around though, it's working now.
Idea comes from https://serverfault.com/questions/105920/how-do-i-redirect-multiple-domains-to-a-single-domain-in-lighttpd