Count IP on URLs begins with "domain/product" in Apache access_logs - regex

I try to count the access on a specific URL which begins every time with "shop/product?traffic=ads" with AWK, but I failed.
The following code gives me a counter how often an IP address has accessed these URL:
awk -F'[ "]+' '$7 == "/shop/product?traffic=ads" { ipcount[$1]++ }
END { for (i in ipcount) {
printf "%15s - %d\n", i, ipcount[i] } }' /var/www/vhosts/domain.com/logs/access_ssl_log
An example for the access_log (input-file) is here:
66.249.68.xx- - [19/Dec/2022:09:14:15 +0100] "GET /shop/other-product/1.0" 404 16996 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.5304.xxx Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
109.42.242.xxx - - [19/Dec/2022:09:14:55 +0100] "GET /shop/product?traffic=ads&gclid=Cj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB HTTP/1.0" 200 30589 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 11; SM-A515F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Mobile Safari/537.36"
80.187.75.xx - - [20/Dec/2022:06:40:12 +0100] "GET /shop/product HTTP/1.0" 200 10821 "https://www.example.com/shop/product?traffic=ads&gclid=EAIaIQobChMIg_Ks5vWF_AIVAgGLCh3k_gBKEAAYASAAEgKBOfD_BwE&dt=1671461107791" "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Mobile/15E148 Safari/604.1"
The "gclid" and and the "dt"(session cookie) are dynamic.
I try to play with ^ after ads, before /shop, but there will be no results.
I want for example the following output:
6 Clicks from 109.42.242.xxx to /shop/product?traffic=ads&gclid=Cj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB
1 Clicks from 80.187.75.xx to https://www.example.com/shop/product?traffic=ads&gclid=EAIaIQobChMIg_Ks5vWF_AIVAgGLCh3k_gBKEAAYASAAEgKBOfD_BwE&dt=1671461107791"

You can check if the string occurs in field 7 using index(), and then store the values of field 1 and field 7 with a space in between as the key, to retrieve both values in the END block by splitting on a space again.
awk -F'[ "]+' 'index($7, "/shop/product?traffic=ads") { ipcount[$1 " " $7]++ }
END { for (i in ipcount) {
parts = split(i, a, " ")
printf ipcount[i] " Clicks from " a[1] " to " a[2] "\n"
}
}' file
Test data
66.249.68.xx- - [19/Dec/2022:09:14:15 +0100] "GET /shop/other-product/1.0" 404 16996 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.5304.xxx Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
109.42.242.xxx - - [19/Dec/2022:09:14:55 +0100] "GET /shop/product?traffic=ads&gclid=Cj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB HTTP/1.0" 200 30589 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 11; SM-A515F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Mobile Safari/537.36"
109.42.242.xxx - - [19/Dec/2022:09:15:55 +0100] "GET /shop/product?traffic=ads&gclid=Cj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB HTTP/1.0" 200 30589 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 11; SM-A515F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Mobile Safari/537.36"
80.187.75.xx - - [20/Dec/2022:06:40:12 +0100] "GET /shop/product HTTP/1.0" 200 10821 "https://www.example.com/shop/product?traffic=ads&gclid=EAIaIQobChMIg_Ks5vWF_AIVAgGLCh3k_gBKEAAYASAAEgKBOfD_BwE&dt=1671461107791" "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Mobile/15E148 Safari/604.1"
109.42.242.xxx - - [19/Dec/2022:09:15:55 +0100] "GET /shop/product?traffic=ads&gclid=Aj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB HTTP/1.0" 200 30589 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 11; SM-A515F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Mobile Safari/537.36"
Output
1 Clicks from 109.42.242.xxx to /shop/product?traffic=ads&gclid=Aj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB
2 Clicks from 109.42.242.xxx to /shop/product?traffic=ads&gclid=Cj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB

With your shown samples please try following awk code. Using match function to match regex \/shop\/product\?traffic=ads\S+(where escaped / to match literal /) and if match is found then creating an array value with index of $1 FS and matched value. In the END block of this program printing the values as per requirement.
awk '
match($7,/\/shop\/product\?traffic=ads\S+/){
value[$1 FS substr($7,RSTART,RLENGTH)]++
}
END{
for(i in value){
split(i,arr)
print value[i] " Clicks from " arr[1] " to " arr[2]
}
}
' Input_file

Related

How to parse User-Agent string with Powershell Match to get Browser and OS

I'm trying to parse an access log-file from Caddy with Powershell, and I have now gotten to the User-Agent string.
How would I go about getting the Browser and Operating System info out of the below string?
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36
Mozilla/5.0 (Linux; Android 13; SM-G991B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Mobile Safari/537.36
Home Assistant/2023.1.1-3124 (Android 13; SM-G991B)
This is a User-Agent string from my own computer, and I can't fathom why Safari is in there when I use Chrome to access a page.
I thought about parsing the string with RegEx, but my RegEx skills are barely existing.
I found a RegEx from https://regex101.com/r/2McsiK/1, but it captures a whole lot more than just the actual browser and OS
\((?<info>.*?)\)(\s|$)|(?<name>.*?)\/(?<version>.*?)(\s|$)
and it does not seem to work well with Powershell Match.
PS C:\> "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36" -match "\((?<info>.*?)\)(\s|$)|(?<name>.*?)\/(?<version>.*?)(\s|$)" | Out-Null
PS C:\> $Matches
Name Value
---- -----
version 5.0
name Mozilla
2
0 Mozilla/5.0
Any advice would be helpful.
See Mathias' comments on the question for the perils of user-agent sniffing (parsing the user-agent string) in general.
Regex-based PowerShell-only solution:
The following tries hard to extract the relevant information, but it's impossible to tell if it will work meaningfully across all platforms and browsers, given the lack of standardization of user-agent strings.
# Sample user-agent strings, spanning
# * Windows, macOS, Linux, iOS, and Android
# * Chrome, Safari, Edge, Firefox, Opera
$userAgentStrings = #(
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36'
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.2 Safari/605.1.15'
'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 Edg/110.0.1587.41'
'Mozilla/5.0 (Linux; Android 13; SM-G991B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Mobile Safari/537.36'
'Home Assistant/2023.1.1-3124 (Android 13; SM-G991B)'
'Mozilla/5.0 (iPhone; CPU iPhone OS 16_0_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Mobile/15E148 Safari/604.1'
'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25'
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0'
'Mozilla/5.0 (X11; Linux ppc64le; rv:75.0) Gecko/20100101 Firefox/75.0'
'Opera/9.80 (X11; Linux i686; Ubuntu/14.10) Presto/2.12.388 Version/12.16.2'
)
$userAgentStrings | ForEach-Object {
if ($_ -match '^(?<browser1>.+?) \((?<os>.+?)\)(?: (?<engine>\S+(?: \(.+?\))?)(?: Version/(?<version>\S+(?: Mobile/\S+)?))?(?: (?<browser2>\S+))?(?: \S+? (?<browser4>\S+/\S+$))?)?') {
# Determine the true browser name and version.
$browser = if ($Matches.browser4) { $Matches.browser4 } elseif ($Matches.browser2) { $Matches.browser2 } else { $Matches.browser1 }
if ($Matches.version) {
$browser = ($browser -split '/')[0] + '/' + $Matches.version
}
# Output the captured substrings via a custom object.
[pscustomobject] #{
OS = $Matches.os
Browser = $browser
Engine = $Matches.engine
IsMobile = $Matches.os -match '\bAndroid\b' -or $Matches.version -match '\bMobile\b'
}
}
}
For an explanation of the regex and the ability to experiment with it, see this regex101.com page.
Output:
OS Browser Engine IsMobile
-- ------- ------ --------
Windows NT 10.0; Win64; x64 Chrome/110.0.0.0 AppleWebKit/537.36 (KHTML, like Gecko) False
Macintosh; Intel Mac OS X 10_15_7 Safari/16.2 AppleWebKit/605.1.15 (KHTML, like Gecko) False
Windows NT 10.0 Edg/110.0.1587.41 AppleWebKit/537.36 (KHTML, like Gecko) False
Linux; Android 13; SM-G991B Safari/537.36 AppleWebKit/537.36 (KHTML, like Gecko) True
Android 13; SM-G991B Home Assistant/2023.1.1-3124 True
iPhone; CPU iPhone OS 16_0_3 like Mac OS X Safari/16.0 Mobile/15E148 AppleWebKit/605.1.15 (KHTML, like Gecko) True
iPad; CPU OS 6_0 like Mac OS X Safari/6.0 Mobile/10A5355d AppleWebKit/536.26 (KHTML, like Gecko) True
Macintosh; Intel Mac OS X 10.15; rv:101.0 Firefox/101.0 Gecko/20100101 False
X11; Linux ppc64le; rv:75.0 Firefox/75.0 Gecko/20100101 False
X11; Linux i686; Ubuntu/14.10 Opera/12.16.2 Presto/2.12.388 False
More complete, web-service-based PowerShell solution:
https://useragentstring.com/pages/api.php offers an API that returns the parsed components as a JSON object, which a call via Invoke-RestMethod automatically converts to a PowerShell custom object.
While slower, this solution is more complete than the pure PowerShell solution, though it omits OS details.
$userAgentStrings = #(
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36'
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.2 Safari/605.1.15'
'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 Edg/110.0.1587.41'
'Mozilla/5.0 (Linux; Android 13; SM-G991B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Mobile Safari/537.36'
'Home Assistant/2023.1.1-3124 (Android 13; SM-G991B)'
'Mozilla/5.0 (iPhone; CPU iPhone OS 16_0_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Mobile/15E148 Safari/604.1'
'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25'
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0'
'Mozilla/5.0 (X11; Linux ppc64le; rv:75.0) Gecko/20100101 Firefox/75.0'
'Opera/9.80 (X11; Linux i686; Ubuntu/14.10) Presto/2.12.388 Version/12.16.2'
)
$userAgentStrings |
ForEach-Object {
Invoke-RestMethod ('https://useragentstring.com?uas={0}&getJSON=all' -f $_)
} |
Format-Table
Output:
agent_type agent_name agent_version os_type os_name os_versionName os_versionNumber os_producer os_producerURL linux_distibution
---------- ---------- ------------- ------- ------- -------------- ---------------- ----------- -------------- -----------------
Browser Chrome 110.0.0.0 Windows Windows 10 Null
Browser Safari 16.2 Macintosh OS X 10_15_7 Null
Browser Chrome 110.0.0.0 Windows Windows 10 Null
Browser Android Webkit Browser -- Android Android 13 Null
unknown unknown Android Android 13 Null
Browser Safari 16.0 Macintosh iPhone OS 16_0_3 Null
Browser Safari 6.0 Macintosh iPhone OS 6_0 Null
Browser Firefox 101.0 Macintosh OS X 10.15 Null
Browser Firefox 75.0 Linux Linux Null
Browser Opera 12.16.2 Linux Linux Ubuntu
See following. You will need to trim spaces
$data = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36"
$pattern = "(?<browser>[^\s]+)\s*(\((?<comment>[^)]+)\))?"
$match = $data | select-string $pattern -AllMatches
foreach($m in $match.Matches)
{
$m
Write-Host "Browser = " $m.Groups['browser'], "Comment = " $m.Groups['comment']
}
Results
Groups : {0, 1, browser, comment}
Success : True
Name : 0
Captures : {0}
Index : 0
Length : 41
Value : Mozilla/5.0 (Windows NT 10.0; Win64; x64)
Browser = Mozilla/5.0 Comment = Windows NT 10.0; Win64; x64
Groups : {0, 1, browser, comment}
Success : True
Name : 0
Captures : {0}
Index : 42
Length : 38
Value : AppleWebKit/537.36 (KHTML, like Gecko)
Browser = AppleWebKit/537.36 Comment = KHTML, like Gecko
Groups : {0, 1, browser, comment}
Success : True
Name : 0
Captures : {0}
Index : 81
Length : 17
Value : Chrome/110.0.0.0
Browser = Chrome/110.0.0.0 Comment =
Groups : {0, 1, browser, comment}
Success : True
Name : 0
Captures : {0}
Index : 98
Length : 13
Value : Safari/537.36
Browser = Safari/537.36 Comment =

Oauth2-proxy - 404 error when redirecting to upstream url (Django application web page)

I'm trying to protect a Django application with oauth2-proxy
In the oauth2-proxy configuration: (version 7.2.1 or 7.3.0)
When the upstream url is set to something like this: --upstream="http://127.0.0.1:8000"
the redirection works fine. (and it returns a home page I have defined in the application )
But, if I use an upstream like this: --upstream="http://127.0.0.1:8000/hello"
it returns 404 error instead of the hello page that is also defined in the application
The page http://127.0.0.1:8000/hello is working fine when invoked directly and it returns "GET /hello HTTP/1.1" 200 136
So I would say it is not a problem with the page.
This is the command line I'm using:
oauth2-proxy.exe ^
--http-address=127.0.0.1:4180 ^
--email-domain=* ^
--cookie-secure=false ^
--cookie-secret=adqeqpioqr809718 ^
--upstream="http://127.0.0.1:8000/hello" ^
--redirect-url=http://127.0.0.1:4180/oauth2/callback ^
--oidc-issuer-url=http://127.0.0.1:28081/auth/realms/testrealm ^
--insecure-oidc-allow-unverified-email=true ^
--provider=keycloak-oidc ^
--client-id=oauth2_proxy ^
--ssl-insecure-skip-verify=true ^
--client-secret=L2znXLhGX4N0j3nsZYxDKfdYpXHMGDkX ^
--skip-provider-button=true
When the oauth2-proxy succeeds to redirect (--upstream="http://127.0.0.1:8000"), I get the page and the following output:
This is the output for the oauth2-proxy:
[2022/09/08 10:52:06] [proxy.go:89] mapping path "/" => upstream "http://127.0.0.1:8000"
[2022/09/08 10:52:06] [oauthproxy.go:148] OAuthProxy configured for Keycloak OIDC Client ID: oauth2_proxy
[2022/09/08 10:52:06] [oauthproxy.go:154] Cookie settings: name:_oauth2_proxy secure(https):false httponly:true expiry:168h0m0s domains: path:/ samesite: refresh:disabled
[2022/09/08 10:57:01] [oauthproxy.go:866] No valid authentication in request. Initiating login.
127.0.0.1:54337 - 9bbfcf75-da91-487a-a55e-40472e4adb23 - - [2022/09/08 10:57:01] 127.0.0.1:4180 GET - "/" HTTP/1.1 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Edg/105.0.1343.27" 302 380 0.001
127.0.0.1:54337 - e0d8ed12-e4dd-4da6-9fbb-cf689fc53f8f - mail#gmail.com [2022/09/08 10:57:09] [AuthSuccess] Authenticated via OAuth2: Session{email:mail#gmail.com user:93547bcc-2441-414a-9149-c7533c4f5d23 PreferredUsername:testuser token:true id_token:true created:2022-09-08 10:57:09.789934 -0300 -03 m=+303.019857301 expires:2022-09-08 11:02:09.7839238 -0300 -03 m=+603.013847101 refresh_token:true groups:[role:offline_access role:uma_authorization role:default-roles-testrealm role:account:manage-account role:account:manage-account-links role:account:view-profile]}
[2022/09/08 10:57:09] [session_store.go:163] WARNING: Multiple cookies are required for this session as it exceeds the 4kb cookie limit. Please use server side session storage (eg. Redis) instead.
127.0.0.1:54337 - e0d8ed12-e4dd-4da6-9fbb-cf689fc53f8f - - [2022/09/08 10:57:09] 127.0.0.1:4180 GET - "/oauth2/callback?state=ahuKzCYr7jR4P4mmjniIt67TttZKyxGv4mLfEwKlQio%3A%2F&session_state=86ac9bd1-9756-4916-83e9-ec0496b5b767&code=df3940e5-58f5-49ac-a821-5607f0f2faae.86ac9bd1-9756-4916-83e9-ec0496b5b767.cd30a162-8e4d-4a2d-bff6-168e444aed92" HTTP/1.1 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Edg/105.0.1343.27" 302 24 0.029
127.0.0.1:54337 - d58ace6e-afe9-4737-9b12-dbc17fdd0ca2 - mail#gmail.com [2022/09/08 10:57:09] 127.0.0.1:4180 GET / "/" HTTP/1.1 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Edg/105.0.1343.27" 200 138 0.005
On the Django side I get:
**"GET / HTTP/1.1" 200 138**
When the oauth2-proxy fails to redirect --upstream="http://127.0.0.1:8000/hello"), I get the following output:
This is the output for the oauth2-proxy:
[2022/09/08 10:33:58] [proxy.go:89] mapping path "/hello" => upstream "http://127.0.0.1:8000/hello"
[2022/09/08 10:33:58] [oauthproxy.go:148] OAuthProxy configured for Keycloak OIDC Client ID: oauth2_proxy
[2022/09/08 10:33:58] [oauthproxy.go:154] Cookie settings: name:_oauth2_proxy secure(https):false httponly:true expiry:168h0m0s domains: path:/ samesite: refresh:disabled
[2022/09/08 10:37:20] [oauthproxy.go:866] No valid authentication in request. Initiating login.
127.0.0.1:53615 - 54c0f3d8-b3c0-4d48-8353-fe69be0e4500 - - [2022/09/08 10:37:20] 127.0.0.1:4180 GET - "/" HTTP/1.1 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Edg/105.0.1343.27" 302 380 0.001
127.0.0.1:53615 - 0bec934e-05a3-4cc8-9306-fffc28597c8f - mail#gmail.com [2022/09/08 10:37:28] [AuthSuccess] Authenticated via OAuth2: Session{email:mail#gmail.com user:93547bcc-2441-414a-9149-c7533c4f5d23 PreferredUsername:testuser token:true id_token:true created:2022-09-08 10:37:28.6527488 -0300 -03 m=+210.486252601 expires:2022-09-08 10:42:28.6468518 -0300 -03 m=+510.480355601 refresh_token:true groups:[role:offline_access role:uma_authorization role:default-roles-testrealm role:account:manage-account role:account:manage-account-links role:account:view-profile]}
[2022/09/08 10:37:28] [session_store.go:163] WARNING: Multiple cookies are required for this session as it exceeds the 4kb cookie limit. Please use server side session storage (eg. Redis) instead.
127.0.0.1:53615 - 0bec934e-05a3-4cc8-9306-fffc28597c8f - - [2022/09/08 10:37:28] 127.0.0.1:4180 GET - "/oauth2/callback?state=nox0LM3fIlVU1kamoLBaktByeLCcIWiBvRLdHFIuhd4%3A%2F&session_state=808c0654-c9e7-4593-b5dc-95d3231438ea&code=e220414d-e949-4e2d-8d33-55de96f8f5d4.808c0654-c9e7-4593-b5dc-95d3231438ea.cd30a162-8e4d-4a2d-bff6-168e444aed92" HTTP/1.1 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Edg/105.0.1343.27" 302 24 0.024
127.0.0.1:53615 - 9454773f-cade-46fe-870f-70d09fc49ffb - mail#gmail.com [2022/09/08 10:37:28] 127.0.0.1:4180 GET - "/" HTTP/1.1 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Edg/105.0.1343.27" 404 19 0.000
On the Django side I get:
Nothing. As the Django app is never reached and so there are no logs.
Could you please help me find out what could be happening? I will really appreciate it!!
It doesn't seem to be a problem with the application, as the pages work fine when invoked directly.
If it is a mistake in my oauth2-proxy command line/configuration, I would appreciate someone points me to the error, so I can correct it.
Otherwise, any hint would also be much appreciated.
The only thing I've noticed in the logs of oauth2-proxy is that no matter what I put in the --upstream, the final GET (I think it is the redirection to the upstream) is as follows: GET - "/" ... it is the same in both attempts, and it only succeeds in the first one, because it matches the [proxy.go:89] mapping path "/"
The reason it was giving the 404 error, was that the configuration --upstreams points to a url to which the proxy is going to pass the request once authenticated, but it is not going to redirect to that address unless you specifically ask for it in the original request.
So the correct way of making the request is http://127.0.0.1:4180/hello, which is including the whole path to the endpoint you want to reach. (instead of for example http://127.0.0.1:4180 )

How can I exclude search pattern within double quotes in Notepad++

I have the following line from which I want to replace space with whitespace (tab) but want to keep the spaces within the double quotes as it is. I am on Notepad++.
[11/May/2020:10:10:20 -0400] "GET / HTTP/1.1" 302 523 52197 url.com - - TLSv1.2 19922 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36" https://somelinkhere - -
Desired output:
[11/May/2020:10:10:20 -0400] "GET / HTTP/1.1" 302 523 52197 url.com - - TLSv1.2 19922 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36" https://somelinkhere - -
Through the following regex I was able to select the string under the double quotes, but it's of no use for me.
"([^"]*)"
Can you please help me how this can be achieved?
You can use
("[^"]*")|[ ]
Replace with (?1$1:\t).
Details:
("[^"]*") - Capturing group 1: a ", then zero or more chars other than " and then a "
| - or
[ ] - matches a space (you may remove [ and ] here , they are used to make the space pattern visible in the answer).
See the demo screenshot:

Is it possible to write multiple regex for the same input in Fluent Bit?

My logs look like this:
200 59903 0.056 - [24/Jun/2020:00:06:56 +0530] "GET /xxxxx/xxxxx/xxxxx HTTP/1.1" xxxxx.com [xxxx:4900:xxxx:b798:xxxx:c8ba:xxxx:6a23] - - xxx.xxx.xxx.xxx - - - "http://xxxxx/xxxxx/xxxxx" 164551836 1 HIT "-" "-" "Mozilla/5.0 (Linux; Android 9; Mi A1 Build/PKQ1.180917.001; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/77.0.3865.92 Mobile Safari/537.36" "-" "-" "dhDebug=-" "-" - -
200 11485 0.000 - [24/Jun/2020:00:06:56 +0530] "GET /xxxxx/xxxxx/xxxxx/xxxxx HTTP/1.1" xxxxx.com xxx.xxx.xxx.xxx - - xxx.xxx.xxx.xxx - - - "-" 164551710 7 HIT "-" "-" "Dalvik/2.1.0 (Linux; U; Android 9; vivo 1915 Build/PPR1.180610.011)" "-" "-" "dhDebug=appVersion=13.0.8&osVersion=9&clientId=1271210612&conn_type=4G&conn_quality=NO_CONNECTION&sessionSource=organic&featureMask=1879044085&featureMaskV1=635" "-" 40 -
The two logs are almost same except the fact that the last one contains a detailed output of dhDebug.
This is how my parsers.conf looks like:
[PARSER]
Name head
Format regex
Regex (?<responseCode>\d{3})\s(?<responseSize>\d+)\s(?<responseTime>\d+.\d+)\s.*?\s\[(?<time>.*?)\]\s"(?<method>.*?)\s(?<url1>.*?)\s(?<protocol>.*?)"\s(?<servedBy>.*?)\s(?<Akamai_ip1>.*?)\s(?<ClientId_ip2>.*?)\s(?<ip3>.*?)\s(?<lb_ip4>.*?)\s(?<ip5>.*?)\s(?<ip6>.*?)\s(?<ip7>.*?)\s+"(?<url2>.*?)".*?".*?"\s".*?"\s"(?<agentInfo>.*?)"
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
Time_Keep On
Types responseTime:float
Please suggest any idea on how to implement the information of dhDebug in a separate key-value pair in the same regex that works on both the types of logs.
EDITED!!
You can use (?:case1|case2) for case1: is null and case2: is not null
So Regex will be:
(?<responseCode>\d{3})\s(?<responseSize>\d+)\s(?<responseTime>\d+.\d+)\s.*?\s\[(?<time>.*?)\]\s"(?<method>.*?)\s(?<url1>.*?)\s(?<protocol>.*?)"\s(?<servedBy>.*?)\s(?<Akamai_ip1>.*?)\s(?<ClientId_ip2>.*?)\s(?<ip3>.*?)\s(?<lb_ip4>.*?)\s(?<ip5>.*?)\s(?<ip6>.*?)\s(?<ip7>.*?)\s+"(?<url2>.*?)".*?".*?"\s".*?"\s"(?<agentInfo>.*?)"\s"-"\s"-"\s"dhDebug=(?:-|appVersion=(?<appVersion>.*?)&osVersion=(?<osVersion>.*?)&clientId=(?<clientId>.*?)&conn_type=(?<conn_type>.*?)&conn_quality=(?<conn_quality>.*?)&sessionSource=(?<sessionSource>.*?)&featureMask=(?<featureMask>.*?)&featureMaskV1=(?<featureMaskV1>.*?))"
With this you get null for each field name of dhDebug for the first log line and field names with values for the second one.
You can test it at http://grokdebug.herokuapp.com/

Regex number range prasing

I am trying to parse out a specific number range, and can't seem to get it right. I am looking to extract specific browser versions from user agent strings. For example, I want to parse Chrome 1-42 and Firefox 1-40, but I can't figure out the syntax.
What I have so far is this, which kind of works, but it grabs the first number it sees and doesn't respect the 2 digit range:
Gecko..Chrome/([1-9].|[1-4][1-2].)
Sample:
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.1847.137 Safari/537.36
Firefox 29: Mozilla/5.0 (Android; Mobile; rv:29.0) Gecko/29.0 Firefox/23.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:29.0) Gecko/20100101 Firefox/29.0
Any ideas? TIA.
((?:(?:Mozilla\/(?:[1-9]|[1-3][0-9]|40))|(?:Chrome\/(?:[1-9]|[1-3][0-9]|4[0-3])))\.[^ ]+)
Is this what you would like? /Edited/
Demo:
https://regex101.com/r/gH1nU9/2
Because regex is text matching only and number are treated as text, to do something like 1 to 41 you would have to something like this:
\b[1-9]\b|\b[1-3][0-9]\b|4[0-2]\b
This is matching 1 to 9 or 10 to 39 or 40 to 42. I have added the boundries \b so that nothing except thes numbers are matched.