Regex to find spaces between quotes in Graylog - regex

Working on an input extractor issue with IIS logs using an "advanced" IIS login tool to collect more than the basic logs provide. It's adding double quotes and spaces to many of the fields and we are trying to us the extractor to correct this. This is the beginning of an example message:
2016-02-08 16:46:35.957 "SITE" "SOURCE" XX.XX.XX.XX GET /blah/etc/etc/file.ext - 80 - "XX.XX.XX.XX" "HTTP/1.1" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; yie11; rv:11.0) like Gecko"
We've already written an extractor to remove all the added quotes before running it through all the other extractors to populate the fields, etc., but we want to replace all spaces between the quotes with + before we do that to match the old logging style.
Can anyone point us in the right direction for this? The closest I've come so far is catching " " between SITE and SOURCE and replacing that using something like "([\s]*)". Result:
2016-02-08 16:46:35.957 "SITE+SOURCE" XX.XX.XX.XX GET /blah/etc/etc/file.ext - 80 - "XX.XX.XX.XX+HTTP/1.1+Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; yie11; rv:11.0) like Gecko"
I can't seem to only look for spaces between the quotes.
Any help would be greatly appreciated. Thanks.
Further Clarification. This portion of the string:
"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; yie11; rv:11.0) like Gecko"
Should be:
"Mozilla/5.0+(Windows+NT+6.1;+WOW64;+Trident/7.0;+yie11;+rv:11.0)+like+Gecko"
Everything else should remain the same as those are the only spaces inside of a quoted section of the string.
Is this even possible with regex?

I'm afraid that regular expressions are not the best tool for this. You basically have to "count" quotes to determine whether a space is within quotes or not.
You can try something like this (Python):
text = '2016-02-08 16:46:35.957 "SITE" "SOURCE" XX.XX.XX.XX GET /blah/etc/etc/file.ext - 80 - "XX.XX.XX.XX" "HTTP/1.1" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; yie11; rv:11.0) like Gecko"'
escaped = ""
count = 0
for c in text:
if c == '"':
count += 1
if c == " " and count % 2 == 1:
escaped += "+"
else:
escaped += c
Afterwards, escaped is this:
2016-02-08 16:46:35.957 "SITE" "SOURCE" XX.XX.XX.XX GET /blah/etc/etc/file.ext - 80 - "XX.XX.XX.XX" "HTTP/1.1" "Mozilla/5.0+(Windows+NT+6.1;+WOW64;+Trident/7.0;+yie11;+rv:11.0)+like+Gecko"

Related

Match Text in a group regardless if their place are changed or not RegEx

I have the following log message:
request="POST /api/settings/update HTTP/1.1\r\nHost: example.com\r\nConnection: keep-alive\r\nContent-Length: 601\r\nsec-ch-ua: %22 Not A;Brand%22;v=%2299%22, %22Chromium%22;v=%22101%22, %22Google Chrome%22;v=%22101%22\r\nAccept: application/json, text/plain, */*\r\nLang: ar\r\nsec-ch-ua-mobile: ?0\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36\r\nsec-ch-ua-platform: %22Windows%22\r\nContent-Type: application/json;charset=UTF-8\r\nOrigin: https://example.com\r\nSec-Fetch-Site: same-origin\r\nSec-Fetch-Mode: cors\r\nSec-Fetch-Dest: empty\r\nAccept-Encoding: gzip, deflate, br\r\nAccept-Language: en,ar;q=0.9,en-US;q=0.8\r\nCookie: .AspNetCore.Culture=c%3dh-AA%7fgujt%3Dar-AA; BPBBBBBBB=d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5d5; dm2=!v87cvbt78ffdv76tv87ht87vtsdb879bt879ftb7s6dbt87asdtf786astd7b6as76dftba87fa76sdfbt876asdtbf76sndtb7asdf76t7d6ft76dtbf/OUM=; .AspNetCore.Cookies=sdfgehth87s6dfh876sdfh765dfh675sfdsdf7g5fsrgerh76g-fy00-thrtPlNFc546v54eryrthrtjrtujrtuhv45634v5vkhrB-tyhtr6WRvdyhrth-_dfgtrhCFuQ05QmPch2trP-rtfoNhUrpl8C8xu-tdyhthrthyhth42i40S-hgjgsghjb-ghu56h_a9; BP40sdf215=408a4314117\r\nX-Forwarded-For: 5.65.57.55\r\n\r\n{%22confirmMessage%22:%22Thanks for your feedback%22}"
And I have the following Regex:
([^r]*(?:r(?!equest=")[^r]*)*request="(?<objectname>([^H]*(?:H(?!ost:)[^H]*)*Host:(?<dname>.*?)\\r\\n)?([^U]*(?:U(?!ser-Agent:)[^U]*)*User-Agent:(?<useragent>.*?)\\r\\n)?([^R]*(?:R(?!eferer:)[^R]*)*Referer:(?<object>[^\\])\\)?([^M]*(?:M(?!essage%22:%22)[^M]*)*Message%22:%22(?<subject>.*?)%22)??[^"]*)")?
The above is working fine, but sometimes the message fields are movable as the following:
request="POST /api/settings/update HTTP/1.1\r\nConnection: keep-alive\r\nContent-Length: 601\r\nsec-ch-ua: %22 Not A;Brand%22;v=%2299%22, %22Chromium%22;v=%22101%22, %22Google Chrome%22;v=%22101%22\r\nAccept: application/json, text/plain, */*\r\nLang: ar\r\nsec-ch-ua-mobile: ?0\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36\r\nsec-ch-ua-platform: %22Windows%22\r\nHost: example.com\r\nContent-Type: application/json;charset=UTF-8\r\nOrigin: https://example.com\r\nSec-Fetch-Site: same-origin\r\nSec-Fetch-Mode: cors\r\nSec-Fetch-Dest: empty\r\nAccept-Encoding: gzip, deflate, br\r\nAccept-Language: en,ar;q=0.9,en-US;q=0.8\r\n"
As shown in above message the "User-Agent" came first and then "Host" and in the first message the Host was before User Agent.
Is there a way to match them regardless if their place are changed or not?
Regex Demo:
https://regex101.com/r/Iu7Db8/1
Thanks
The 'trick' here is to use look ahead. The look ahead starts from the beginning of the text every time.
I give you an example here with some of the header names. You can expand it with the names you want following this pattern.
Note that the 'test' name won't match so that group will be undefined:
/^(?=.*?request="(?<request>.*?(?=\\r\\n))|)(?=.*?Accept:(?<accept>.*?(?=\\r\\n))|)(?=.*?test:(?<test>.*?(?=\\r\\n))|)(?=.*?Connection:(?<connection>.*?(?=\\r\\n))|)/gm
Explanation:
^ - start from the beginning
(?= - look ahead for
.*?request=" - any characters zero or more times followed by 'request="'
(?<request> - group named 'request'
.*? - any characters zero or more times
(?=\\r\\n) - look ahead for '\r\n'
) - end group 'request'
| - OR empty
) - end look ahead
You can repeat this pattern with different group names as often as you need.
Now the regex don't care about the order.
Here's the link: Regex101

How can I exclude search pattern within double quotes in Notepad++

I have the following line from which I want to replace space with whitespace (tab) but want to keep the spaces within the double quotes as it is. I am on Notepad++.
[11/May/2020:10:10:20 -0400] "GET / HTTP/1.1" 302 523 52197 url.com - - TLSv1.2 19922 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36" https://somelinkhere - -
Desired output:
[11/May/2020:10:10:20 -0400] "GET / HTTP/1.1" 302 523 52197 url.com - - TLSv1.2 19922 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36" https://somelinkhere - -
Through the following regex I was able to select the string under the double quotes, but it's of no use for me.
"([^"]*)"
Can you please help me how this can be achieved?
You can use
("[^"]*")|[ ]
Replace with (?1$1:\t).
Details:
("[^"]*") - Capturing group 1: a ", then zero or more chars other than " and then a "
| - or
[ ] - matches a space (you may remove [ and ] here , they are used to make the space pattern visible in the answer).
See the demo screenshot:

Regex separate IP:Port from a log

I have below simple regex expressions that works pretty well to split the given sample log. This would provides separate groups of object where I could access with $1 $2 $3 ... etc. I'm using this in Splunk.
Eg.
$1 = https
$2 = 2020-08-20T12:40:00.274478Z
$3 = app/my-aws-alb/e7538073dd1a6fd8
(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+?)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)(.*?\s+)
https 2020-08-20T12:40:00.274478Z app/my-aws-alb/e7538073dd1a6fd8 162.158.26.188:21098 172.0.51.37:80 0.000 0.004 0.000 405 405 974 424 "POST https://my-aws-alb-domain:443/api/ps/fpx/callback HTTP/1.1" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.2840.91 Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:ap-southeast-1:111111111111:targetgroup/my-aws-target-group/41dbd234b301e3d84 "Root=1-5f3e6f20-3fdasdsfffdsf" "api.mydomain.com" "arn:aws:acm:ap-southeast-1:11111111111:certificate/be4344424-a40f-416e-8434c-88a8a3b072f5" 0 2020-08-20T12:40:00.270000Z "forward" "-" "-" "172.0.51.37:80" "405" "-" "-"
The problem here is, I want to separate IP:Port into separate group. There are multiple places which have the IP:Port. Those I need as a separate group like other object.
Eg.
$4 = 162.158.26.188
$5 = 21098
$6 = 172.0.51.37
$7 = 80
Can anyone help on this? Thank you!
Here's a regex that will pull all of the ip:port values from a field:
| rex field=_raw max_match=0 "(?<ip_port>\d+\.\d+\.\d+\.\d+\:\d+)"
Now expand the ip_port field:
| mvexpand ip_port
And then extract from ip_port into ip & port:
| rex field=ip_port "(?<ip>\d+\.\d+\.\d+\.\d+\)\:(?<port>\d+)"

Regex to capture the useragent from the citrix logs

Need help in capturing the user agent details from the citrix logs. The log format of the citrix is quite different for the successful and denied. The samples are given below
For Successful authentication the user agent details are enclosed within "". Details are after the keyword Browser_type ""
For Denied traffic , useragent details are not present within the "". It is present after the keyword Browser
Denied
Dec 8 05:20:53 netscaler02 12/08/2017:05:20:53 netscaler02 0-PPE-0 : AAA LOGIN_FAILED -adasd92 0 : User renju - Client_ip X.X.X.X - Failure_reason "External authentication server denied access" - Browser Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
Success
Dec 8 05:54:06 netscaler02 12/08/2017:11:54:06 GMT netscaler02 0-PPE-0 : SSLVPN LOGIN -78342434122 0 : Context renjus#1X.X.X.X - SessionId: xxx- User renju - Client_ip X.X.X.X - Nat_ip "Mapped Ip" - Vserver X.X.X.X:443 - Browser_type "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" - SSLVPN_asdasdat_type ICA - Group(s) "N/A"
I do have a regex to capture the browser agent within ""
(?P(?<=Browser_type\s\").?(?=\s(?:\w+=|\")))"
Bud need a regex that capture the user agent from all the format.
Thanks in advance.
Maybe you could match your logs like this:
Browser(?:_type)?\s"?(.+|[^"]+)
Match Browser with optional _type Browser(?:_type)
Followed by an optional double quote "?
Followed by a whitespace \s
Then capture in a group ( any character zero or more times .*
or |
all until you encounter a double quote [^"]+
Close the group )
Edit:
To capture "Browser" without the optional "_type" in a named capture group:
(?P<citrix_useragent>Browser)(?:_type)?\s"?(.+|[^"]+)
Got the initial push from The fourth bird and did some work around on it.
Browser((?:_type)?\s\"*)(?P(.+\"-|[^\"]+))

lighttpd domain redirection

I'm having trouble redirecting several domains & associated sub-domains to one other domain.
Keep falling into a 301 infinite loop.
I have 3 domains, proxied to the same lighttpd process, say :
dom.co
dom.info
dom.net
dom.net is my domain of choice, what I want is to get every one, including www.*, going to my domain of choice. (dom.net is working)
My lighttpd.conf insteresting parts :
$HTTP["host"] =~ "(^|\.)dom\.net$" {
/* working */
}
$HTTP["host"] =~ "(^|\.)dom\.co$" {
url.redirect = ( "^/(.*)" => "dom.net/$1" )
}
the log :
IP dom.co - [16/Nov/2012:20:51:33 +0100] "GET /dom.net/ HTTP/1.0" 301 0 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11"
IP dom.co - [16/Nov/2012:20:51:33 +0100] "GET /dom.net/dom.net/dom.net/ HTTP/1.0" 301 0 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11"
et caetera.
I understand what's happening, not how to fix it. Please help !
I went with this :
$HTTP["host"] =~ "^dom\.co" {
/* ... */
}
# some redirections to dom.co
$HTTP["host"] =~ "dom.net|dom.info|www.dom.net|www.dom.info|www.dom.co" {
url.redirect = ( "^/(.*)" => "http://dom.co/$1" )
}
I think the absence of http:// in front of the target url was what caused trouble, didn't tweak and try things around though, it's working now.
Idea comes from https://serverfault.com/questions/105920/how-do-i-redirect-multiple-domains-to-a-single-domain-in-lighttpd