Blackberry 5.0 UA-String Regex - regex

I am trying to create a regex for the following UA string:
Mozilla/5.0 (BlackBerry; U; BlackBerry 9850; en-US) AppleWebKit/534.11+ (KHTML, like Gecko) Version/7.0.0.115 Mobile Safari/534.11+
I want to know if the device is a Blackberry 5.0 so I can create a non ajax jquery mobile site.
I can get the Mozilla/5.0 bit ok but im really struggling to match the word Blackberry.
Can anyone help?

According to this administrator's Blackberry Support Community Forums Post, AJAX support for BlackBerry phones was released with version 4.6
According to this list, BlackBerry UA Strings have always contained the word BlackBerry, have usually contained the phone model number, and have always contained a version number:
Mozilla/5.0 (BlackBerry; U; BlackBerry 9860; en-GB) AppleWebKit/534.11+ (KHTML, like Gecko) Version/7.0.0.296 Mobile Safari/534.11+
Mozilla/5.0 (BlackBerry; U; BlackBerry 9300; fr) AppleWebKit/534.8+ (KHTML, like Gecko) Version/6.0.0.570 Mobile Safari/534.8+
Mozilla/5.0 (BlackBerry; U; BlackBerry 9800; en-US) AppleWebKit/534.8+ (KHTML, like Gecko) Version/6.0.0.600 Mobile Safari/534.8+
Mozilla/5.0 (BlackBerry; U; BlackBerry 9800; en-US) AppleWebKit/534.1+ (KHTML, like Gecko) Version/6.0.0.246 Mobile Safari/534.1+
Mozilla/5.0 (BlackBerry; U; BlackBerry 9800; en) AppleWebKit/534.1+ (KHTML, Like Gecko) Version/6.0.0.141 Mobile Safari/534.1+
Mozilla/5.0 (BlackBerry; U; BlackBerry 9800; en-US) AppleWebKit/530.17 (KHTML, like Gecko) Version/6.0.0.62 Mobile Safari/530.17
BlackBerry9650/5.0.0.732 Profile/MIDP-2.1 Configuration/CLDC-1.1 VendorID/105
BlackBerry9700/5.0.0.351 Profile/MIDP-2.1 Configuration/CLDC-1.1 VendorID/123
BlackBerry9630/4.7.1.40 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105
BlackBerry9000/4.6.0.167 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/102
BlackBerry8330/4.3.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105
BlackBerry8830/4.2.2 Profile/MIDP-2.0 Configuration/CLOC-1.1 VendorID/105
BlackBerry8820/4.2.2 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/102
BlackBerry8703e/4.1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105
BlackBerry8320/4.5.0.188 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/100
BlackBerry8330/4.3.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/106
BlackBerry8320/4.3.1 Profile/MIDP-2.0 Configuration/CLDC-1.1
BlackBerry8110/4.3.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/118
BlackBerry8130/4.5.0.89 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/106
BlackBerry7100i/4.1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/103
BlackBerry7130e/4.1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/104
BlackBerry7250/4.0.0 Profile/MIDP-2.0 Configuration/CLDC-1.1
BlackBerry/3.6.0
BlackBerry7230/3.7.0
BlackBerry7230/3.7.1
BlackBerry7730/3.7.0
BlackBerry7730/3.7.1 UP.Link/5.1.2.5
the version number comes either after the word BlackBerry, possibly a model number consisting of numbers and letters, then a forward slash (/), or it comes well after the word BlackBerry, but immediately following the string Version/
Using this expression:
BlackBerry(\w*/|.*?Version/)(((?:[0-3]|4\.[0-5])\.[.\d+]+)|((?:4\.[6-9]|1?[5-9])[.\d+]+)|([\w.]+))?
In a find-type regular expression parser (like PHP's preg_match(), .Net's Regex.Match(), or Java's matcher.find() functions) this expression will allow discerning between a version number from 0.0.X to 4.5.X and a version number from 4.6.X to X.X.X where X represents any number not previously matched.
What's that now? Sorry... in other words, using that regex against a user agent string should allow you to determine whether it's a BlackBerry browser or not AND whether the version number indicates support for AJAX (pseudo-code):
regex = "BlackBerry(\w*/|.*?Version/)(((?:[0-3]|4\.[0-5])\.[.\d+]+)|((?:4\.[6-9]|1?[5-9])[.\d+]+)|([\w.]+))?";
result = regex.find(UserAgentString);
if (result.matchFound)
{
actualVersion = result.matchGroup(2);
if (result.matchGroup(3) != "")
{
print("Version " + actualVersion + " does not support AJAX");
}
else if (result.matchGroup(4) != "")
{
print("Version " + actualVersion + " supports AJAX!");
}
else if (result.matchGroup(5) != "")
{
print("Unknown whether Version " + actualVersion + " supports AJAX!?!?");
}
}
print("Not A BlackBerry Browser");
Summary: Match group #1 (could be made optional) matches the part between BlackBerry and any matched version number. Group #2 matches the version number. Group #3 contains the version number if it is 0.0.X to 4.5.X. Group #4 contains the version number if it is 4.6.X or greater, if it only consists of digits and decimal points. If the version does not seem to match this convention, possibly if there are letters or underscores as well, then it will be captured into Group #5.
I think this is all you need (once translated into whichever language you are using). The expression should be supported by .Net, Java, PHP, or even JavaScript if necessary.

If you just need to make sure that the term Blackberry is in the text, you could just do a simple text search (doesn't even need to be regex):
Blackberry
If you need to make sure that it comes after the Mozilla/5.0 bit, then you could just use this regex:
^Mozilla/5\.0 \(BlackBerry

Related

How to parse User-Agent string with Powershell Match to get Browser and OS

I'm trying to parse an access log-file from Caddy with Powershell, and I have now gotten to the User-Agent string.
How would I go about getting the Browser and Operating System info out of the below string?
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36
Mozilla/5.0 (Linux; Android 13; SM-G991B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Mobile Safari/537.36
Home Assistant/2023.1.1-3124 (Android 13; SM-G991B)
This is a User-Agent string from my own computer, and I can't fathom why Safari is in there when I use Chrome to access a page.
I thought about parsing the string with RegEx, but my RegEx skills are barely existing.
I found a RegEx from https://regex101.com/r/2McsiK/1, but it captures a whole lot more than just the actual browser and OS
\((?<info>.*?)\)(\s|$)|(?<name>.*?)\/(?<version>.*?)(\s|$)
and it does not seem to work well with Powershell Match.
PS C:\> "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36" -match "\((?<info>.*?)\)(\s|$)|(?<name>.*?)\/(?<version>.*?)(\s|$)" | Out-Null
PS C:\> $Matches
Name Value
---- -----
version 5.0
name Mozilla
2
0 Mozilla/5.0
Any advice would be helpful.
See Mathias' comments on the question for the perils of user-agent sniffing (parsing the user-agent string) in general.
Regex-based PowerShell-only solution:
The following tries hard to extract the relevant information, but it's impossible to tell if it will work meaningfully across all platforms and browsers, given the lack of standardization of user-agent strings.
# Sample user-agent strings, spanning
# * Windows, macOS, Linux, iOS, and Android
# * Chrome, Safari, Edge, Firefox, Opera
$userAgentStrings = #(
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36'
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.2 Safari/605.1.15'
'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 Edg/110.0.1587.41'
'Mozilla/5.0 (Linux; Android 13; SM-G991B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Mobile Safari/537.36'
'Home Assistant/2023.1.1-3124 (Android 13; SM-G991B)'
'Mozilla/5.0 (iPhone; CPU iPhone OS 16_0_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Mobile/15E148 Safari/604.1'
'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25'
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0'
'Mozilla/5.0 (X11; Linux ppc64le; rv:75.0) Gecko/20100101 Firefox/75.0'
'Opera/9.80 (X11; Linux i686; Ubuntu/14.10) Presto/2.12.388 Version/12.16.2'
)
$userAgentStrings | ForEach-Object {
if ($_ -match '^(?<browser1>.+?) \((?<os>.+?)\)(?: (?<engine>\S+(?: \(.+?\))?)(?: Version/(?<version>\S+(?: Mobile/\S+)?))?(?: (?<browser2>\S+))?(?: \S+? (?<browser4>\S+/\S+$))?)?') {
# Determine the true browser name and version.
$browser = if ($Matches.browser4) { $Matches.browser4 } elseif ($Matches.browser2) { $Matches.browser2 } else { $Matches.browser1 }
if ($Matches.version) {
$browser = ($browser -split '/')[0] + '/' + $Matches.version
}
# Output the captured substrings via a custom object.
[pscustomobject] #{
OS = $Matches.os
Browser = $browser
Engine = $Matches.engine
IsMobile = $Matches.os -match '\bAndroid\b' -or $Matches.version -match '\bMobile\b'
}
}
}
For an explanation of the regex and the ability to experiment with it, see this regex101.com page.
Output:
OS Browser Engine IsMobile
-- ------- ------ --------
Windows NT 10.0; Win64; x64 Chrome/110.0.0.0 AppleWebKit/537.36 (KHTML, like Gecko) False
Macintosh; Intel Mac OS X 10_15_7 Safari/16.2 AppleWebKit/605.1.15 (KHTML, like Gecko) False
Windows NT 10.0 Edg/110.0.1587.41 AppleWebKit/537.36 (KHTML, like Gecko) False
Linux; Android 13; SM-G991B Safari/537.36 AppleWebKit/537.36 (KHTML, like Gecko) True
Android 13; SM-G991B Home Assistant/2023.1.1-3124 True
iPhone; CPU iPhone OS 16_0_3 like Mac OS X Safari/16.0 Mobile/15E148 AppleWebKit/605.1.15 (KHTML, like Gecko) True
iPad; CPU OS 6_0 like Mac OS X Safari/6.0 Mobile/10A5355d AppleWebKit/536.26 (KHTML, like Gecko) True
Macintosh; Intel Mac OS X 10.15; rv:101.0 Firefox/101.0 Gecko/20100101 False
X11; Linux ppc64le; rv:75.0 Firefox/75.0 Gecko/20100101 False
X11; Linux i686; Ubuntu/14.10 Opera/12.16.2 Presto/2.12.388 False
More complete, web-service-based PowerShell solution:
https://useragentstring.com/pages/api.php offers an API that returns the parsed components as a JSON object, which a call via Invoke-RestMethod automatically converts to a PowerShell custom object.
While slower, this solution is more complete than the pure PowerShell solution, though it omits OS details.
$userAgentStrings = #(
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36'
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.2 Safari/605.1.15'
'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 Edg/110.0.1587.41'
'Mozilla/5.0 (Linux; Android 13; SM-G991B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Mobile Safari/537.36'
'Home Assistant/2023.1.1-3124 (Android 13; SM-G991B)'
'Mozilla/5.0 (iPhone; CPU iPhone OS 16_0_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Mobile/15E148 Safari/604.1'
'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25'
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0'
'Mozilla/5.0 (X11; Linux ppc64le; rv:75.0) Gecko/20100101 Firefox/75.0'
'Opera/9.80 (X11; Linux i686; Ubuntu/14.10) Presto/2.12.388 Version/12.16.2'
)
$userAgentStrings |
ForEach-Object {
Invoke-RestMethod ('https://useragentstring.com?uas={0}&getJSON=all' -f $_)
} |
Format-Table
Output:
agent_type agent_name agent_version os_type os_name os_versionName os_versionNumber os_producer os_producerURL linux_distibution
---------- ---------- ------------- ------- ------- -------------- ---------------- ----------- -------------- -----------------
Browser Chrome 110.0.0.0 Windows Windows 10 Null
Browser Safari 16.2 Macintosh OS X 10_15_7 Null
Browser Chrome 110.0.0.0 Windows Windows 10 Null
Browser Android Webkit Browser -- Android Android 13 Null
unknown unknown Android Android 13 Null
Browser Safari 16.0 Macintosh iPhone OS 16_0_3 Null
Browser Safari 6.0 Macintosh iPhone OS 6_0 Null
Browser Firefox 101.0 Macintosh OS X 10.15 Null
Browser Firefox 75.0 Linux Linux Null
Browser Opera 12.16.2 Linux Linux Ubuntu
See following. You will need to trim spaces
$data = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36"
$pattern = "(?<browser>[^\s]+)\s*(\((?<comment>[^)]+)\))?"
$match = $data | select-string $pattern -AllMatches
foreach($m in $match.Matches)
{
$m
Write-Host "Browser = " $m.Groups['browser'], "Comment = " $m.Groups['comment']
}
Results
Groups : {0, 1, browser, comment}
Success : True
Name : 0
Captures : {0}
Index : 0
Length : 41
Value : Mozilla/5.0 (Windows NT 10.0; Win64; x64)
Browser = Mozilla/5.0 Comment = Windows NT 10.0; Win64; x64
Groups : {0, 1, browser, comment}
Success : True
Name : 0
Captures : {0}
Index : 42
Length : 38
Value : AppleWebKit/537.36 (KHTML, like Gecko)
Browser = AppleWebKit/537.36 Comment = KHTML, like Gecko
Groups : {0, 1, browser, comment}
Success : True
Name : 0
Captures : {0}
Index : 81
Length : 17
Value : Chrome/110.0.0.0
Browser = Chrome/110.0.0.0 Comment =
Groups : {0, 1, browser, comment}
Success : True
Name : 0
Captures : {0}
Index : 98
Length : 13
Value : Safari/537.36
Browser = Safari/537.36 Comment =

Regex to parse blue coat log file

I have this log file that I'm currently trying to parse.
Jan 12 2019, 14:51:23, 117, 10.0.0.1, neil.armstrong, standard-users, -, TCP_Connect, "sports betting", -, 201, accept, GET, text, https, www.best-site.com, 443, /pages/home.php, ?user=narmstrong&team=wizards, -, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome Safari/537.36", 192.168.1.1, 1400, 1463, -, -, -
Jan 12 2019, 14:52:14, 86, 10.0.0.1, neil.armstrong, standard-users, -, TCP_Connect, "sports betting", -, 200, accept, POST, text, https, www.upload.best-site.com, 443, /, -, -, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537", 192.168.1.1, 230056, 600, -, -, -
Jan 12 2019, 14:52:54, 118, 10.0.0.1, neil.armstrong, standard-users, -, TCP_Connect, "sports betting", -, 200, accept, GET, text/javascript, http, google.fr, 80, /search, ?q=wizards, -, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537", 192.168.1.1, 1717, 17930, -, -, -
this is the regex that I'm currently using https://regex101.com/r/Asbpkx/3 it parses the log file fine until it reaches "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537" then it splits at (KHTML, =like Gecko)
How can I complete the regex so that this does not happen?
I looked into this closer and the log file is not CSV format which is why the CSV parsing regex didn't work in my previous answer. (I also tried parsing it with excel and python csv, and both split at the comma after 'KHTML'.
Using a negative lookbehind makes the example you gave parse correctly.
(.+?)(?<!KHTML),
It looks like you are trying to parse csv using regex.
Use the regex described in this post:
https://stackoverflow.com/a/18147076/9397882
Regex: (?:^|,)(?=[^"]|(")?)"?((?(1)[^"]*|[^,"]*))"?(?=,|$)
Don't use regex for a CSV. Try these props.conf settings.
[mysourcetype]
INDEXED_EXTRACTIONS = CSV
FIELD_DELIMITED = ,
FIELD_QUOTE = "
FIELD_NAMES = Date, Time, Field3, IP_Addr, Field4, Field5, Field6
TIMESTAMP_FIELDS = Date, Time

Regex Extract in Hive (reqexp_extract)

My regex is:
(\bosName=(.iPhone.OS.|.Android.))|(?:\b(taAppVersion=)[0-9.]+)|(TAiApp|TATabletApp|TAaApp)
My String is:
Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_2 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) Mobile/14A456 Mobile iPhone TAiApp TARX13 taAppVersion=161107060 appLang=en_UK osName='iPhone OS' deviceName=iPhone8,4 osVer=10.0.2 taAppVersionString=18.4 mcc=234 mnc=15 connection=cellular
I want to grab:
osName='iPhone OS' taAppVersion=161107060 TAiApp
My regex works in the tester but when I do a Hive Query I just get TAiApp from it, i also give 0 to capture all groups to regexp_extract().
Thanks!

Chrome use-mobile-user-agent not working

Chrome use-mobile-user-agent not working
Running chrome from command line with flag --use-mobile-user-agent does not open the browser in mobile context (user-agent).
chrome --use-mobile-user-agent= true
Note:
passing user-agent option does work, but i feel its not the right way of doing things as chrome offers you this flag to boot in mobile context.
--user-agent= Mozilla/5.0 (iPhone; U; CPU iPhone OS 5_1_1 like Mac OS X; ar) AppleWebKit/534.46.0 (KHTML, like Gecko) CriOS/19.0.1084.60 Mobile/9B206 Safari/7534.48.3
Chromium source code
reading some of the chromium source code, i see the following:
content_switches.cc
define kUseMobileUserAgent from "use-mobile-user-agent" flag:
Set when Chromium should use a mobile user agent.
const char kUseMobileUserAgent[] = "use-mobile-user-agent";
shell_content_client.cc
add "Mobile" to product if our variable switch is true/set.
std::string GetShellUserAgent() {
std::string product = "Chrome/" CONTENT_SHELL_VERSION;
base::CommandLine* command_line = base::CommandLine::ForCurrentProcess();
if (command_line->HasSwitch(switches::kUseMobileUserAgent))
product += " Mobile";
return BuildUserAgentFromProduct(product);
}
Extra detail (running from selenium)
As an extra detail, i run chrome in using selenium and pass the configurations:
...
"browserName": "chrome",
"chromeOptions": {
"args": [
"--user-agent= Mozilla/5.0 (iPhone; U; CPU iPhone OS 5_1_1 like Mac OS X; ar) AppleWebKit/534.46.0 (KHTML, like Gecko) CriOS/19.0.1084.60 Mobile/9B206 Safari/7534.48.3",
"--window-size=320,640",
"--disable-popup-blocking",
"--incognito",
"--test-type"
]
},
...
The string is built to "Chrome/53.0.2785.116 Mobile" in GetShellUserAgent, then in BuildUserAgentFromProduct, product is not used, and passed on to BuildUserAgentFromOSAndProduct, which is supposed to format a string as such;
"Mozilla/5.0 (%s) AppleWebKit/%d.%d (KHTML, like Gecko) %s Safari/%d.%d"
The product string is inserted into token four, where the fourth replacement token is before "Safari". Therefore "Chrome/53.0.2785.116 Mobile" should be placed there.
With and without the flag, my user agent is the same.
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36
So what does this mean, is it broken? Quite possibly.
In src/extensions/shell/common/shell_content_client.cc, BuildUserAgentFromProduct("Chrome/" PRODUCT_VERSION) is called in ShellContentClient::GetUserAgent. That just circumvents the call to GetShellUserAgent.
Well. There goes the mobile user agent flag. There's other places it's possible for the product to be replaced, but that's the one that sticks out as the culprit.

Regex number range prasing

I am trying to parse out a specific number range, and can't seem to get it right. I am looking to extract specific browser versions from user agent strings. For example, I want to parse Chrome 1-42 and Firefox 1-40, but I can't figure out the syntax.
What I have so far is this, which kind of works, but it grabs the first number it sees and doesn't respect the 2 digit range:
Gecko..Chrome/([1-9].|[1-4][1-2].)
Sample:
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.1847.137 Safari/537.36
Firefox 29: Mozilla/5.0 (Android; Mobile; rv:29.0) Gecko/29.0 Firefox/23.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:29.0) Gecko/20100101 Firefox/29.0
Any ideas? TIA.
((?:(?:Mozilla\/(?:[1-9]|[1-3][0-9]|40))|(?:Chrome\/(?:[1-9]|[1-3][0-9]|4[0-3])))\.[^ ]+)
Is this what you would like? /Edited/
Demo:
https://regex101.com/r/gH1nU9/2
Because regex is text matching only and number are treated as text, to do something like 1 to 41 you would have to something like this:
\b[1-9]\b|\b[1-3][0-9]\b|4[0-2]\b
This is matching 1 to 9 or 10 to 39 or 40 to 42. I have added the boundries \b so that nothing except thes numbers are matched.