Regex - String within the first parenthesis of string - regex

I am trying to get an overview of the visitors of my website by using AWS Logs Insights.
My query looks like this:
fields #timestamp, #message
| parse #message /(?<#ip>(?<=User-Agent)(.*)(?=X-Forwarded-Proto))/
| stats count() as requestCount by #ip
| filter ispresent(#ip)
| sort requestCount desc
Some of the results are like this:
=Mozilla/5.0 (iPhone; CPU iPhone OS 15_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.1 Mobile/15E148 Safari/604.1,
=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1 Safari/605.1.15,
I am trying to get the string within the first parenthesis:
iPhone; CPU iPhone OS 15_1 like Mac OS X
Macintosh; Intel Mac OS X 10_15_7
I tried | parse #ip /(?<#device>(/\((.*?)\)/)/ from this answer but it doesn't work.
Any ideas how I could make it work?
Thank you!

Looking at the result for the given pattern, you might use another named capture group (instead of a lookarounds, you might also match the text):
User-Agent=[^()]*\((?<#device>[^()]*)\).*X-Forwarded-Proto
See a regex demo.
With both capture groups:
User-Agent(?<#ip>[^()]*\((?<#device>[^()]*)\).*X-Forwarded-Proto)
See another regex demo.

Related

Regex to capture the useragent from the citrix logs

Need help in capturing the user agent details from the citrix logs. The log format of the citrix is quite different for the successful and denied. The samples are given below
For Successful authentication the user agent details are enclosed within "". Details are after the keyword Browser_type ""
For Denied traffic , useragent details are not present within the "". It is present after the keyword Browser
Denied
Dec 8 05:20:53 netscaler02 12/08/2017:05:20:53 netscaler02 0-PPE-0 : AAA LOGIN_FAILED -adasd92 0 : User renju - Client_ip X.X.X.X - Failure_reason "External authentication server denied access" - Browser Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
Success
Dec 8 05:54:06 netscaler02 12/08/2017:11:54:06 GMT netscaler02 0-PPE-0 : SSLVPN LOGIN -78342434122 0 : Context renjus#1X.X.X.X - SessionId: xxx- User renju - Client_ip X.X.X.X - Nat_ip "Mapped Ip" - Vserver X.X.X.X:443 - Browser_type "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" - SSLVPN_asdasdat_type ICA - Group(s) "N/A"
I do have a regex to capture the browser agent within ""
(?P(?<=Browser_type\s\").?(?=\s(?:\w+=|\")))"
Bud need a regex that capture the user agent from all the format.
Thanks in advance.
Maybe you could match your logs like this:
Browser(?:_type)?\s"?(.+|[^"]+)
Match Browser with optional _type Browser(?:_type)
Followed by an optional double quote "?
Followed by a whitespace \s
Then capture in a group ( any character zero or more times .*
or |
all until you encounter a double quote [^"]+
Close the group )
Edit:
To capture "Browser" without the optional "_type" in a named capture group:
(?P<citrix_useragent>Browser)(?:_type)?\s"?(.+|[^"]+)
Got the initial push from The fourth bird and did some work around on it.
Browser((?:_type)?\s\"*)(?P(.+\"-|[^\"]+))

Regex Extract in Hive (reqexp_extract)

My regex is:
(\bosName=(.iPhone.OS.|.Android.))|(?:\b(taAppVersion=)[0-9.]+)|(TAiApp|TATabletApp|TAaApp)
My String is:
Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_2 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) Mobile/14A456 Mobile iPhone TAiApp TARX13 taAppVersion=161107060 appLang=en_UK osName='iPhone OS' deviceName=iPhone8,4 osVer=10.0.2 taAppVersionString=18.4 mcc=234 mnc=15 connection=cellular
I want to grab:
osName='iPhone OS' taAppVersion=161107060 TAiApp
My regex works in the tester but when I do a Hive Query I just get TAiApp from it, i also give 0 to capture all groups to regexp_extract().
Thanks!

Chrome use-mobile-user-agent not working

Chrome use-mobile-user-agent not working
Running chrome from command line with flag --use-mobile-user-agent does not open the browser in mobile context (user-agent).
chrome --use-mobile-user-agent= true
Note:
passing user-agent option does work, but i feel its not the right way of doing things as chrome offers you this flag to boot in mobile context.
--user-agent= Mozilla/5.0 (iPhone; U; CPU iPhone OS 5_1_1 like Mac OS X; ar) AppleWebKit/534.46.0 (KHTML, like Gecko) CriOS/19.0.1084.60 Mobile/9B206 Safari/7534.48.3
Chromium source code
reading some of the chromium source code, i see the following:
content_switches.cc
define kUseMobileUserAgent from "use-mobile-user-agent" flag:
Set when Chromium should use a mobile user agent.
const char kUseMobileUserAgent[] = "use-mobile-user-agent";
shell_content_client.cc
add "Mobile" to product if our variable switch is true/set.
std::string GetShellUserAgent() {
std::string product = "Chrome/" CONTENT_SHELL_VERSION;
base::CommandLine* command_line = base::CommandLine::ForCurrentProcess();
if (command_line->HasSwitch(switches::kUseMobileUserAgent))
product += " Mobile";
return BuildUserAgentFromProduct(product);
}
Extra detail (running from selenium)
As an extra detail, i run chrome in using selenium and pass the configurations:
...
"browserName": "chrome",
"chromeOptions": {
"args": [
"--user-agent= Mozilla/5.0 (iPhone; U; CPU iPhone OS 5_1_1 like Mac OS X; ar) AppleWebKit/534.46.0 (KHTML, like Gecko) CriOS/19.0.1084.60 Mobile/9B206 Safari/7534.48.3",
"--window-size=320,640",
"--disable-popup-blocking",
"--incognito",
"--test-type"
]
},
...
The string is built to "Chrome/53.0.2785.116 Mobile" in GetShellUserAgent, then in BuildUserAgentFromProduct, product is not used, and passed on to BuildUserAgentFromOSAndProduct, which is supposed to format a string as such;
"Mozilla/5.0 (%s) AppleWebKit/%d.%d (KHTML, like Gecko) %s Safari/%d.%d"
The product string is inserted into token four, where the fourth replacement token is before "Safari". Therefore "Chrome/53.0.2785.116 Mobile" should be placed there.
With and without the flag, my user agent is the same.
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36
So what does this mean, is it broken? Quite possibly.
In src/extensions/shell/common/shell_content_client.cc, BuildUserAgentFromProduct("Chrome/" PRODUCT_VERSION) is called in ShellContentClient::GetUserAgent. That just circumvents the call to GetShellUserAgent.
Well. There goes the mobile user agent flag. There's other places it's possible for the product to be replaced, but that's the one that sticks out as the culprit.

Regex number range prasing

I am trying to parse out a specific number range, and can't seem to get it right. I am looking to extract specific browser versions from user agent strings. For example, I want to parse Chrome 1-42 and Firefox 1-40, but I can't figure out the syntax.
What I have so far is this, which kind of works, but it grabs the first number it sees and doesn't respect the 2 digit range:
Gecko..Chrome/([1-9].|[1-4][1-2].)
Sample:
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.1847.137 Safari/537.36
Firefox 29: Mozilla/5.0 (Android; Mobile; rv:29.0) Gecko/29.0 Firefox/23.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:29.0) Gecko/20100101 Firefox/29.0
Any ideas? TIA.
((?:(?:Mozilla\/(?:[1-9]|[1-3][0-9]|40))|(?:Chrome\/(?:[1-9]|[1-3][0-9]|4[0-3])))\.[^ ]+)
Is this what you would like? /Edited/
Demo:
https://regex101.com/r/gH1nU9/2
Because regex is text matching only and number are treated as text, to do something like 1 to 41 you would have to something like this:
\b[1-9]\b|\b[1-3][0-9]\b|4[0-2]\b
This is matching 1 to 9 or 10 to 39 or 40 to 42. I have added the boundries \b so that nothing except thes numbers are matched.

Is it possible to exclude specified GET parameters in apache access logs?

I need to exclude some sensitive details in my apache log, but I want to keep the log and the uri's in it. Is it possible to achieve following in my access log:
127.0.0.1 - - [27/Feb/2012:13:18:12 +0100] "GET /api.php?param=secret HTTP/1.1" 200 7600 "http://localhost/api.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11"
I want to replace "secret" with "[FILTERED]" like this:
127.0.0.1 - - [27/Feb/2012:13:18:12 +0100] "GET /api.php?param=[FILTERED] HTTP/1.1" 200 7600 "http://localhost/api.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11"
I know I probably should have used POST to send this variable, but the damage is already done. I've looked at http://httpd.apache.org/docs/2.4/logs.html and LogFormat, but could not find any possibilities to use regular expression or similar. Any suggestions?
[edit]
Do NOT send sensitive variables as GET parameters if you have the possibility to choose.
I've found one way to solve the problem. If I pipe the log output to sed, I can perform a regex replace on the output before I append it to the log file.
Example 1
CustomLog "|/bin/sed -E s/'param=[^& \t\n]*'/'param=\[FILTERED\]'/g >> /your/path/access.log" combined
Example 2
It's also possible to exclude several parameters:
exclude.sh
#!/bin/bash
while read x ; do
result=$x
for ARG in "$#"
do
cleanArg=`echo $ARG | sed -E 's|([^0-9a-zA-Z_])|\\\\\1|g'`
result=`echo $result | sed -E s/$cleanArg'=[^& \t\n]*'/$cleanArg'=\[FILTERED\]'/g`
done
echo $result
done
Move the script above to the folder /opt/scripts/ or somewhere else, give the script execute rights (chmod +x exclude.sh) and modify your apache config like this:
CustomLog "|/opt/scripts/exclude.sh param param1 param2 >> /your/path/access.log" combined
Documentation
http://httpd.apache.org/docs/2.4/logs.html#piped
http://www.gnu.org/software/sed/manual/sed.html
If you want to exclude several parameters, but don't want to use a script, you can use groups like that :
CustomLog "|$/bin/sed -E s/'(email|password)=[^& \t\n]*'/'\\\\\1=\[FILTERED\]'/g >> /var/log/apache2/${APACHE_LOG_FILENAME}.access.log" combined