Mod_rewrite syntax with query strings - regex

Embarrassing as this may be, I've hit a wall with mod_rewrite trying to come up with what seems to be a simple rule.
I'd like to accomplish the following mapping:
/cat/subcat which may have a "?PageId=123" afterwards
should become
/cat.php?cid=148 or (/cat.php?cid=148&PageId=123)
So for example, the following 2 mappings would occur:
/cat/subcat => /cat.php?cid=148 (the 148 part can be ignored, it's taken care of)
/cat/subcat?PageId=2 => /cat.php?cid=148&PageId=2
Note that there's an & in the second clause... The parameter will always be PageId
Can this be done?
Thanks so much in advance!

Apparently a little elbow grease worked (after 5 hours)...
Ends up the rule is just:
^/cat/subcat /cat.php?cid=148 [QSA]
I was missing the QSA component...
-Adam

Related

POSIX ERE Regex - Creating Efficient Regex

I'm working to create some regex entries that are well-formed, and efficient. I'll place an emphasis on efficient, as these regex entries can see thousands of logs per second. Inefficient regex entries can cause severe performance impacts.
Question: Does regex101 (through one flavor) support POSIX ERE Regex? Googling shows that PCRE2 should support BRE+ERE and more.
Regex Type: POSIX ERE
Syslog App: rsyslog (EL7)
Sample Payload (Well formed - Sensitive Information Stripped):
Jul 10 00:00:00 Firewall-Name-Removed CEF:0|Fortinet|FortiGate-removed|1.2.3,build1111 (GA)|0000000013|forward traffic accept|5|start=Jul 10 2022 00:00:00 logver=604091966 deviceExternalId=FG9A9A9A9999999 dvchost=Firewall-Name-Removed ad.vd=root ad.eventtime=1111111111111111111 ad.tz=-9999 ad.logid=0000000013 cat=traffic ad.subtype=forward deviceSeverity=notice src=1.1.1.1 shost=RandomHost1 spt=62119 deviceInboundInterface=DII-Out ad.srcintfrole=lan ad.srcssid=SSID Has Been Removed ad.apsn=ABC123D ad.ap=CHL-07 ad.channel=157 ad.radioband=802.11ac n-only ad.signal=-40 ad.snr=55 dst=2.2.2.2 dpt=53 deviceOutboundInterface=DOI-Out ad.dstintfrole=undefined ad.srccountry=Reserved ad.dstcountry=CountryRemoved externalID=123456789 proto=00 act=accept ad.policyid=000 ad.policytype=policy ad.poluuid=UUID-Removed ad.policyname=policy_name_removed app=DNS ad.trandisp=noop ad.appid=16195 ad.app=DNS ad.appcat=Network.Service ad.apprisk=elevated ad.applist=UTM Name - Removed ad.duration=180 out=0 in=205 ad.sentpkt=0 ad.rcvdpkt=1 ad.utmaction=allow ad.countdns=1 ad.osname=Windows ad.srcswversion=10 ad.mastersrcmac=MAC removed ad.srcmac=MAC removed ad.srcserver=0 tz="-9999"
What I'm attempting to do is remove specific logs that are not required. Normally I'd do this at a SIEM level through something like routing rules (where I can utilize fields), but this isn't possible for the foreseeable future. In this particular case: I'm trying to exclude on the following pieces of information.
Source IP: Is in a specific range
deviceOutboundInterface: is DOI-Out
Current Regex: "\bsrc=1.1.1[4-5]{0,1}.[0-9]{0,3}\b.*?\bdeviceOutboundInterface=DOI-Out\b" (Regex101 link in PCRE2). If that is matched, the log is rejected (through the stop call). Otherwise, it moves onto the other entries to check for unnecessary logs.
Most of my regex entries are in the low double-digits because they're a lot simpler. Is there a better way to make the more complex regex more efficient?
Thank you for any insight you can offer.
You might be able to cut some time with:
src=1\.1\.1[4-5]{0,1}\.[0-9]{0,3}.*?deviceOutboundInterface=DOI-Out
changes:
remove word boundaries
change the . to . in IP address
regex101 has the original efficiency at 383 steps, new is 301 so a potential savings of ~21%. Not terrible but you'll want to make sure any removals were OK.
to be honest, what you have looks pretty good to me.
This RE reduces the number of steps on Reg101 from 383 to 270 (~ -29.5%):
src=1\.1\.1[45]?\.\d{0,3}.*?O[boundIter]*?face=DOI-Out
The original RE already is quite simple, only matching one pattern and one literal string which makes it difficult to optimize. But we can do if we know (from the documentation of the text in question, here the Log Message manual) that an even simpler pattern will not lead to ambiguities.
Changes:
matching literal text whereever possible
replacing range '4-5' with simple elements
instead of matching the long 'deviceOutboundInterface=', use a pattern which will just barely match this string but would possibly match other words if they ever occurred in log messages - but we know they don't.

Mod_security rule exception for url/arg

An image on our site is flagging a modsec rule I am trying to add a rule exception for only that occurrence. The number at the start of the flagged string is a session number, so I have added a regex to my rule.
I've tried various permutations but had no joy and would appreciate some advice.
Blocked URI:
https://www.website.com/application/login?0--preLoginHeaderPanel-companyLogo
Modsec log snippet:
[file "/usr/share/modsecurity-crs/rules/REQUEST-942-APPLICATION-ATTACK-SQLI.conf"] [line "65"] [id "942100"] [msg "SQL Injection Attack Detected via libinjection"] [data "Matched Data: 1c found within ARGS_NAME:0--preLoginHeaderPanel-companyLogo: 0--preLoginHeaderPanel-companyLogo"]
Attempted exceptions (within apache.conf):
SecRuleUpdateTargetById 942100 !ARGS_NAMES:'[0-9][0-9]?--preLoginHeaderPanel-companyLogo'
Core Rule Set Dev on Duty here. Rule 942100 is one of our 'LibInjection' rules. LibInjection is quite opaque (it's a third party library/operator), so you're correct that a rule exclusion is the way to fix this issue.
The use of regular expressions in this context follows a specific form. They need to be sandwiched inside forward slashes, like so:
SecRuleUpdateTargetById 942100 "!ARGS_NAMES:/^[0-9][0-9]?--preLoginHeaderPanel-companyLogo/"
I added in a starting anchor at the beginning of the regular expression. You might want to think whether anchoring at the end is a good idea, as well.
For more examples and information, we have some great documentation on this here: https://coreruleset.org/docs/configuring/false_positives_tuning/#support-for-regular-expressions

Apache mod_rewrite mapping path to parameters

I'm moving over from IIS to Apache (on Windows) and struggling with adapting a rewrite rule (using Helicon ISAPI_Rewrite 3 in IIS).
The rule maps what looks like a directory structure path back into a set of query string parameters. There could be any number of parameters in the path.
E.g.
/basket/param1/value1/param2/value2/param3/value3 ...and so on...
Becomes...
/basket?param1=value1&param2=value2&param3=value3 ...and so on...
Rule in ISAPI_Rewrite:
# This rule simply reverts parameters that appear as folders back to standard parameters
# e.g. /search-results/search-value/red/results/10 becomes /search-results?search-value=red&results=10
RewriteRule ^/(.*?)/([^/]*)/([^/]*)(/.+)? /$1$4?$2=$3 [NC,LP,QSA]
I first spotted that Apache doesn't have the 'LP' flag, so swapped it for the N=10 as a test for looping...
RewriteRule ^(.*?)/([^/]*)/([^/]*)(/.+)? $1$4?$2=$3 [NC,N=10,QSA]
However the Apache error logs show the same parameters being added over and over again until the number of loops on the N flag is reached, ending in a HTTP 500 error.
Any ideas where I'm going wrong?!?
After having done much head scratching and engaging my Google Foo I have located the solution to all my problems on another Stack Overflow comment...
https://stackoverflow.com/a/5520004/14054970
Essentially...
apparently there's been an issue with mod_rewrite re-appending
post-fix part in certain cases
https://issues.apache.org/bugzilla/show_bug.cgi?id=38642
The problem:
If multiple RewriteRules within a .htaccess file match, unwanted
copies of PATH_INFO may accumulate at the end of the URI.
If you are on Apache 2.2.12 or later, you can use the DPI flag to
prevent this http://httpd.apache.org/docs/2.2/rewrite/flags.html
I'm using Apache 2.4, so my Rewrite rule now looks as follows (and I'll be adding the DPI flag to all rules to be safe)...
RewriteRule ^(.*?)/([^/]*)/([^/]*)(/.+)? $1$4?$2=$3 [NC,N=1000,QSA,DPI]

express router - regex - semicolon separated list

i' ve an express router which i want to accept a semicolon separated list. * should stand for 0 or more values, however it accepts only one or more in my case.
Here is my code:
App.get('/sth/((\\w+(\;\\w+)*))',
however it accepts only
/sth/aaa;bbb
/stg/aaa;bbb;ccc
/sth/aaa;bbb;ccc;ddd
...
, but not
/sth/aaa
.
How can i achieve my goal or what' s wrong with my regexp? Probably i miss just one trivial thing.
Thanks.
Change it to:
/sth/\\w+(;\\w+)*
A workaround or a solution would be something similar
App.get('/sth/((\\w+(;\\w+){0,}))',
As I experienced, express doesn't use the standard regexp, but it has its own implementation, and the * has a different usecase. It' d be nice to know how it' s treated, but to me it seems it gets everything from 1 to infinity.

How do I imitate twitters url-shortener?

the main question is a bit short so I'll collaborate.
I'm building an app for twitter with which you can do the basic actions (get posts, do a post, reply etc.)
Now I figured it would be a good idea if I'd check the max 140 char limit in my app.
So far so good, then someone asked if I could also do the url-shortener thing.
so at the moment I have a regex that picks op most (in fact too much) url's, takes the lenght of them and either adds or deduces the difference from the 140 max.
It's still a but buggy but I can manage that.
Now my problem....
It seems twitter is quite picky in what they think is an url:
I got the most basic ones (starting with http(s):// and such), but twitter also replaces some tld's very easily, (www.)google.com [whatever].net/.biz/.info are just a few of them)
but not .nl .de .tk
Now I was wondering if perhaps someone has found out which ones they do and which ones they don't 'shorten'.
now because I'm pretty sure my regex isn't the best either I'll drop that here as well:
((http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:\/~\+#]*[\w\-\#?^=%&\/~\+#])?)|([\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:\/~\+#]*[\w\-\#?^=%&\/~\+#])?)
http://support.twitter.com/articles/78124-how-to-shorten-links-urls# indicates that all URLs posted to Twitter will be rewritten to be exactly 19 characters long.
I am using this: var url_expression = /[-a-zA-Z0-9#:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&//=]*)?/gi; Nobody has complained :)
I figured it out, I found a pretty important line on the tld wikipage. It states that all country TLD's are two chars long. And also the other way around; all 2 char tld's are countries. With that in mind, I started testing a bunch of them with twitter and I'm pretty sure I now know what url's twitter shortens and which ones they don't.
All url's starting with http:// or https://
All url's like [something].[non country tld] # .com .biz .mobi etc. (Except .arpa & .aero)
All url's like [something].[something].[valid tld] # including countries
links like http://[user]:[pass]#[something].[tld] will NOT be shortened
Now to build a regex for it, i'll post it here as soon as I think I have it :D
this is what I got this far:
/(^(?:(?:ht|f)tp(?:s?)\:\/\/|~\/|\/)?(?:(?:[-\w]+\.)+(?:com|asia|cat|coop|edu|int|tel|pro|org|net|gov|mil|biz|info|mobi|name|jobs|museum|travel|([a-z]{2})))(?::[\d]{1,5})?(?:(?:(?:\/(?:[-\w~!$+|.,=\(\)]|%[a-f\d]{2})+)+|\/)+|\?|#)?(?:(?:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?:#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?)/gim;
one major flaw still in it, it also accepts [domain].[tld] which twitter doesn't.
I hope this will help someone in the future. I'm pretty sure there's not a whole lot easy-to-find info about this on the web (or at least I couldn't find it).