Matching text between strings and missing string - regex

I have a firewall config file and am trying to write an expression that will match when a user does not have a password.
The config is long, but a snippet of it looks like this:
config system custom-language
edit "en"
set filename "en"
next
more lines
could be many lines
end
config system admin
edit "user1"
set trusthost1 1.1.1.1 255.255.255.254
set vdom "root"
maybe more lines
maybe many more lines
set password ENC asdfasdfadsfasdfadsfasdf
next
edit "user2"
set trusthost1 1.1.1.1 255.255.255.254
set vdom "root"
maybe more lines here too
next
end
config system replacemsg-image
edit "logo_fnet"
set image-type gif
set image-base64 ''
next
end
other lines
end
Note that user2 is missing "set password ENC...". I know that I only want to match text between "config system admin" and its corresponding "end". I also know that each user starts with "edit "<username>"" and ends with "next".
I have the following regex, which at least starts at the rights spot (config system admin) but seems to be matching on both user blocks (and "config system replacemsg-img" for some reason):
(config\ssystem\sadmin(\n|.)*)(edit\s\".*\"(\n|.)*(?!set\spassword\sENC)(\n|.)*next)(\n|.)*end
How would I write the expression so it only returns true because "user2" (in this example) is missing "set password ENC"? I am using PCRE2.
EDIT:
After some additional work, I have the following (not working, but maybe closer?) expression:
(?<=(config\ssystem\sadmin))((\n)(\s+edit\s\".*\"(\n))((.|\n)*)((?!set\spassword).)*)(\nend)?
This begins the capture at "config system admin". But, in the regex testers I tried, it also highlights all the way down to the last "end", instead of stopping at the first for some reason.

Related

Regex to match within specific block

I am trying to match a string between two other strings. The document looks something like this (there are many more lines in the real config):
#config-version=user=user1
#conf_file_ver=1311784161
#buildno=123
#global_vdom=adsf
config system global
set admin-something
set admintimeout 8289392839823
set alias "F5"
set gui-theme mariner
set hostname "something"
end
config system accprofile
edit "prof_admin"
set secfabgrp read
set ftviewgrp read
set vpngrp read
set utmgrp read
set wifi read
next
end
config system np6xlite
edit "np6xlite_0"
next
end
config system interface
edit "dmz"
set vdom "asdf"
set ip 1.1.1.1 255.255.255.0
set type physical
set role dmz
next
edit "wan1"
set vdom "root"
set ip 2.2.2.2 255.255.255.255
set type physical
set alias "jklk5"
set role wan
next
end
config system physical-switch
edit "sw0"
set age-val 0
next
end
config system virtual-switch
edit "lan"
set physical-switch "sw0"
config port
edit "port2"
next
edit "port3"
next
edit "port4"
next
edit "port5"
next
edit "port6"
next
end
next
end
config system custom-language
edit "en"
set filename "en"
next
edit "fr"
set filename "fr"
next
end
config system admin
edit "user1"
set vdom "root"
set password ENC SH2Tb1/aYYJB2U9ER2f5Ykj1MtE6U=
next
edit "user2"
set trusthost1 255.255.255.255 255.255.255.224
set trusthost2 255.255.255.254 255.255.255.224
next
end
config system ha
set override
end
config system replacemsg-image
edit "logo_fnet"
set image-type gif
set image-base64 ''
next
edit "logo_fguard_wf"
set image-type gif
set image-base64 ''
next
edit "logo_fw_auth"
set image-base64 ''
next
edit "logo_v2_fnet"
set image-base64 ''
next
edit "logo_v2_fguard_wf"
set image-base64 ''
next
edit "logo_v2_fguard_app"
set image-base64 ''
next
end
I care about every "edit" block between "config system admin" and its corresponding "end". Each "edit" block represents a user and I need to know if a user block (edit "" ...stuff on new lines... next) is missing the "set password" line.
This expression (multiline) captures the "edit "en"..." under "config system custom-language":
\h*edit ".*\n(?:\h*+(?!next|set password).*\n)*\h*next\n
Now I need to make sure to ignore any config sections before or after "config system admin". I tried this:
(?<=config system admin\n)\h*edit ".*\n(?:\h*+(?!next|set password).*\n)*\h*next\n(?=end)
That change results in zero matches. But if I change the lookbehind to:
(?<=config system custom-language\n)
Then I get a match, but it is in the wrong config block again. I tried sticking [\S\s] in front, but that results in zero matches:
[\S\s](?<=config system admin\n)\h*edit ".*\n(?:\h*+(?!next|set password).*\n)*\h*next\n(?=end)
How do I take the "set password" matching and make sure it only happens in between "config system admin" and its corresponding "end". I only need the first result, but getting multiple is fine. I am using PCRE2.
The following pattern will starts with edit, stops before end or edit, and will not allow password, config system or set filename.
It is a bit long and clumsy but it does find regular users if the word password is absent and does not match the 2 opening blocks.
As noted it the comments it could malfunction if the keywords are found elsewhere in the file.
/edit((?!edit)(?!(edit|password|config sys|set filename))[\w\W])*(?=(edit|end))/gm
If you have the possibility to use a simple script, bash for example, that could read line by line we could build something simple that would be more reliable.
I think you want to work on this task from two levels. First, find the data that is in those config blocks, and then examine the users within them.
Here's something that is far simpler that may do what you need.
First, you want to look only at the lines between "config system admin" and "end", so use awk to find those.
$ awk '/^config system admin/,/^end/' config.txt
config system admin
edit "user1"
set vdom "root"
set password ENC SH2Tb1/aYYJB2U9ER2f5Ykj1MtE6U=
next
edit "user2"
set trusthost1 255.255.255.255 255.255.255.224
set trusthost2 255.255.255.254 255.255.255.224
next
end
Now search those results for either "edit" or "set password":
$ awk '/^config system admin/,/^end/' config.txt | grep -E 'edit|set password'
edit "user1"
set password ENC SH2Tb1/aYYJB2U9ER2f5Ykj1MtE6U=
edit "user2"
You can now eyeball the results and see who has set a password and who hasn't.
If you need to get more precise, then you can write a little more code to find "edit" lines that aren't followed by "set password".
In any case, the key is to break the problem into smaller problems.
Update based on your new example text:
(?<=config system admin.*?)(edit "[^"]+"(?!.*?set password.*?next).*?next)(?=.*?end)
It requires the global and singleline flags. If you can't use singleline, replace dot (.) with [\s\S].
Explanation:
(?<=config system admin.*?) - look behind for 'config system admin' followed by any characters (non greedy)
edit "[^"]+" - match 'edit' and a username
(?!.*?set password.*?next) - look ahead for NOT 'set password', followed by any characters and 'next'
.*?next - match any characters and 'next'
(?=.*?end) - look ahead for any characters and 'end'
This should give you the text between 'edit' and 'end' when there's no 'set password' between.

Regex - Coverting URLs to clickable links

We have some regex code that converts URLs to clickable links, it is working but we are running into issues where if a user submits a entry where they forget to space after a period it thinks it's a link as well.
example: End of a sentence.This is a new sentence
It would create a hyperlink for sentence.This
Is there anyway to valid the following code against say a proper domain like .com, .ca ect..?
Here is the code:
$url = '#(http)?(s)?(://)?(([a-zA-Z])([-\w]+\.)+([^\s\.]+[^\s]*)+[^,.\s])#';
$output = preg_replace($url, '$0', trim($val[0]));
Thanks,
Aaron

Replacing full referrer using REGEX Google Data Studio

I'm using Google Data Studio to create a report analyzing specific referral sites. My data source is my site Google Analytics.
I want to replace the Full Referrer (e.g. of the format webaddress.com/page-name-one) with a text only value (i.e Page name one), so that it's clearer to see in the report which page is which in my charts and tables.
I've used the below formulae in the calculated fields, but none of them seem to change Full Referrer to match what I need it to. Data studio recognizes them all as valid formulae too.
I've anonymised my examples, but it has the same principles. I've tried:
REGEXP_REPLACE(Full Referrer,"[webaddress\\.com\\/page\\-name\\-one].*","Page name one")
REGEXP_REPLACE(Full Referrer, 'webaddress.com/page-name-one', 'Page name one')
REGEXP_REPLACE(Full Referrer, 'webaddress\\.com\\/page\\-name\\-one', 'Page name one')
REGEXP_REPLACE(Full Referrer, 'name', 'Page name one')
REGEXP_REPLACE(Full Referrer, 'page-name-one', 'Page name one')
REGEXP_REPLACE(Full Referrer, 'page\\-name\\-one', 'Page name one')
In testing this on one of my own GA data sources, I was able to achieve this using one of your patterns:
REGEXP_REPLACE(Full Referrer,'webaddress.com/page-name-one','Page name one')
It should be noted, however, that the . should be properly escaped (either by \ or wrapping it in a character class like [.]; see re2 syntax for details). Because you have to double-backslash, I also prefer to use something Data Studio borrowed from BigQuery (sort of an undocumented feature), which is the regular expression string type (r"" or r''). When using this, you only have to single-backslash (unless you want a literal backslash):
REGEXP_REPLACE(Full Referrer,r'webaddress\.com/page-name-one','Page name one')
Because you're using REGEXP_REPLACE, anything before or after your match string will still exist after the replacement—meaning that for a Full Referrer of "m.facebook.com/l", REGEXP_REPLACE(Full Referrer,r'facebook\.com','FB') would return "m.FB/l"
So your pattern above will match the value anywhere in the string, which likely isn't what you want. To anchor it to the beginning, use the ^ (start of string) assertion:
REGEXP_REPLACE(Full Referrer,r'^webaddress\.com/page-name-one','Page name one')
If you want to only match that exact value of Full Referrer (i.e. not including any additional path levels), make sure to use the $ (end of string) assertion as well:
REGEXP_REPLACE(Full Referrer,r'^webaddress\.com/page-name-one$','Page name one')
Keep in mind that if you're doing this in the data source as a calculated field, you aren't actually changing the original metric—you're working on a copy of it. So you need to replace Full Referrer with whatever you named your calculated field in the data source.
Often you're wanting to do this for a bunch of sites or pages, so you can use CASE and REGEXP_MATCH to handle all this logic in a single field:
CASE
WHEN REGEXP_MATCH(Full Referrer,r'^webaddress\.com/page-name-one$') THEN 'Page name one'
WHEN REGEXP_MATCH(Full Referrer,r'^site2\.com/example$') THEN 'S2 Example'
ELSE Full Referrer
END
These matches are done in order, so you can even match a specific page or pages, and then still provide a different value for anything on that domain that you didn't match:
CASE
WHEN REGEXP_MATCH(Full Referrer,r'^site\.com/$') THEN 'Site - Home'
WHEN REGEXP_MATCH(Full Referrer,r'^site\.com/about$') THEN 'Site - About'
WHEN REGEXP_MATCH(Full Referrer,r'^site\.com/') THEN 'Site - (other)'
ELSE Full Referrer
END
You can also use the ELSE if you want to bucket all of the unmatched values into an "other" grouping instead of just leaving the original value.
Another thing to remember is that due to shared fields in GA, things like Source (utm_source) also show up in Full Referrer, so you could be seeing values there that you wouldn't normally expect. Often you can get rid of these by also filtering to only the Default Channel Grouping of "Referral".
If your patterns still aren't matching, please update the question with some additional details such as what the output actually is, whether there's an error message, etc.—and also whether you're doing this as a calculated field in the data source or the "Create Field" button on a single chart.

AWS S3 trouble with anchor in filename #

I have some filenames stored with the # symbol. If I send a GET request to retrieve them I am running into problems as I believe GET requests are cut off at anchors within the path?
ex:
s3.amazonaws.com/path/to/my_file.jpg
vs: my browser stops looking at the #
s3.amazonaws.com/path/to/my_other_#file.jpg
is there a way to retrieve the file or will I have to change filenames so they do not contain #'s?
You need to encode your path as URL which would replace # with %23.
Check out this for URL encoding. https://www.w3schools.com/tags/ref_urlencode.asp
In JavaScript you can use encodeURI() to get it encoded.
https://www.w3schools.com/jsref/jsref_encodeURI.asp

How to configure Fiddler's Autoresponder to "map" a host to a folder?

I'm already using Fiddler to intercept requests for specific remote files while I'm working on them (so I can tweak them locally without touching the published contents).
i.e. I use many rules like this
match: regex:(?insx).+/some_file([?a-z0-9-=&]+\.)*
respond: c:\somepath\some_file
This works perfectly.
What I'd like to do now is taking this a step further, with something like this
match: regex:http://some_dummy_domain/(anything)?(anything)
respond: c:\somepath\(anything)?(anything)
or, in plain text,
Intercept any http request to 'some_dummy_domain', go inside 'c:\somepath' and grab the file with the same path and name that was requested originally. Query string should pass through.
Some scenarios to further clarify:
http://some_domain/somefile --> c:\somepath\somefile
http://some_domain/path1/somefile --> c:\somepath\path1\somefile
http://some_domain/path1/somefile?querystring --> c:\somepath\path1\somefile?querystring
I tried to leverage what I already had:
match: regex:(?insx).+//some_dummy_domain/([?a-z0-9-=&]+\.)*
respond: ...
Basically, I'm looking for //some_dummy_domain/ in requests. This seems to match correctly when testing, but I'm missing how to respond.
Can Fiddler use matches in responses, and how could I set this up properly ?
I tried to respond c:\somepath\$1 but Fiddler seems to treat it verbatim:
match: regex:(?insx).+//some_domain/([?a-z0-9-=&]+\.)*
respond: c:\somepath\$1
request: http://some_domain/index.html
response: c:\somepath\$1html <-----------
The problem is your use of insx at the front of your expression; the n means that you want to require explicitly-named capture groups, meaning that a group $1 isn't automatically created. You can either omit the n or explicitly name the capture group.
From the Fiddler Book:
Use RegEx Replacements in Action Text
Fiddler’s AutoResponder permits you to use regular expression group replacements to map text from the Match Condition into the Action Text. For instance, the rule:
Match Text: REGEX:.+/assets/(.*)
Action Text: http://example.com/mockup/$1
...maps a request for http://example.com/assets/Test1.gif to http://example.com/mockup/Test1.gif.
The following rule:
Match Text: REGEX:.+example\.com.*
Action Text: http://proxy.webdbg.com/p.cgi?url=$0
...rewrites the inbound URL so that all URLs containing example.com are passed as a URL parameter to a page on proxy.webdbg.com.
Match Text: REGEX:(?insx).+/assets/(?'fname'[^?]*).*
Action Text C:\src\${fname}
...maps a request for http://example.com/‌assets/img/1.png?bunnies to C:\src\‌img\‌1.png.