Regex to match within specific block - regex

I am trying to match a string between two other strings. The document looks something like this (there are many more lines in the real config):
#config-version=user=user1
#conf_file_ver=1311784161
#buildno=123
#global_vdom=adsf
config system global
set admin-something
set admintimeout 8289392839823
set alias "F5"
set gui-theme mariner
set hostname "something"
end
config system accprofile
edit "prof_admin"
set secfabgrp read
set ftviewgrp read
set vpngrp read
set utmgrp read
set wifi read
next
end
config system np6xlite
edit "np6xlite_0"
next
end
config system interface
edit "dmz"
set vdom "asdf"
set ip 1.1.1.1 255.255.255.0
set type physical
set role dmz
next
edit "wan1"
set vdom "root"
set ip 2.2.2.2 255.255.255.255
set type physical
set alias "jklk5"
set role wan
next
end
config system physical-switch
edit "sw0"
set age-val 0
next
end
config system virtual-switch
edit "lan"
set physical-switch "sw0"
config port
edit "port2"
next
edit "port3"
next
edit "port4"
next
edit "port5"
next
edit "port6"
next
end
next
end
config system custom-language
edit "en"
set filename "en"
next
edit "fr"
set filename "fr"
next
end
config system admin
edit "user1"
set vdom "root"
set password ENC SH2Tb1/aYYJB2U9ER2f5Ykj1MtE6U=
next
edit "user2"
set trusthost1 255.255.255.255 255.255.255.224
set trusthost2 255.255.255.254 255.255.255.224
next
end
config system ha
set override
end
config system replacemsg-image
edit "logo_fnet"
set image-type gif
set image-base64 ''
next
edit "logo_fguard_wf"
set image-type gif
set image-base64 ''
next
edit "logo_fw_auth"
set image-base64 ''
next
edit "logo_v2_fnet"
set image-base64 ''
next
edit "logo_v2_fguard_wf"
set image-base64 ''
next
edit "logo_v2_fguard_app"
set image-base64 ''
next
end
I care about every "edit" block between "config system admin" and its corresponding "end". Each "edit" block represents a user and I need to know if a user block (edit "" ...stuff on new lines... next) is missing the "set password" line.
This expression (multiline) captures the "edit "en"..." under "config system custom-language":
\h*edit ".*\n(?:\h*+(?!next|set password).*\n)*\h*next\n
Now I need to make sure to ignore any config sections before or after "config system admin". I tried this:
(?<=config system admin\n)\h*edit ".*\n(?:\h*+(?!next|set password).*\n)*\h*next\n(?=end)
That change results in zero matches. But if I change the lookbehind to:
(?<=config system custom-language\n)
Then I get a match, but it is in the wrong config block again. I tried sticking [\S\s] in front, but that results in zero matches:
[\S\s](?<=config system admin\n)\h*edit ".*\n(?:\h*+(?!next|set password).*\n)*\h*next\n(?=end)
How do I take the "set password" matching and make sure it only happens in between "config system admin" and its corresponding "end". I only need the first result, but getting multiple is fine. I am using PCRE2.

The following pattern will starts with edit, stops before end or edit, and will not allow password, config system or set filename.
It is a bit long and clumsy but it does find regular users if the word password is absent and does not match the 2 opening blocks.
As noted it the comments it could malfunction if the keywords are found elsewhere in the file.
/edit((?!edit)(?!(edit|password|config sys|set filename))[\w\W])*(?=(edit|end))/gm
If you have the possibility to use a simple script, bash for example, that could read line by line we could build something simple that would be more reliable.

I think you want to work on this task from two levels. First, find the data that is in those config blocks, and then examine the users within them.
Here's something that is far simpler that may do what you need.
First, you want to look only at the lines between "config system admin" and "end", so use awk to find those.
$ awk '/^config system admin/,/^end/' config.txt
config system admin
edit "user1"
set vdom "root"
set password ENC SH2Tb1/aYYJB2U9ER2f5Ykj1MtE6U=
next
edit "user2"
set trusthost1 255.255.255.255 255.255.255.224
set trusthost2 255.255.255.254 255.255.255.224
next
end
Now search those results for either "edit" or "set password":
$ awk '/^config system admin/,/^end/' config.txt | grep -E 'edit|set password'
edit "user1"
set password ENC SH2Tb1/aYYJB2U9ER2f5Ykj1MtE6U=
edit "user2"
You can now eyeball the results and see who has set a password and who hasn't.
If you need to get more precise, then you can write a little more code to find "edit" lines that aren't followed by "set password".
In any case, the key is to break the problem into smaller problems.

Update based on your new example text:
(?<=config system admin.*?)(edit "[^"]+"(?!.*?set password.*?next).*?next)(?=.*?end)
It requires the global and singleline flags. If you can't use singleline, replace dot (.) with [\s\S].
Explanation:
(?<=config system admin.*?) - look behind for 'config system admin' followed by any characters (non greedy)
edit "[^"]+" - match 'edit' and a username
(?!.*?set password.*?next) - look ahead for NOT 'set password', followed by any characters and 'next'
.*?next - match any characters and 'next'
(?=.*?end) - look ahead for any characters and 'end'
This should give you the text between 'edit' and 'end' when there's no 'set password' between.

Related

Matching text between strings and missing string

I have a firewall config file and am trying to write an expression that will match when a user does not have a password.
The config is long, but a snippet of it looks like this:
config system custom-language
edit "en"
set filename "en"
next
more lines
could be many lines
end
config system admin
edit "user1"
set trusthost1 1.1.1.1 255.255.255.254
set vdom "root"
maybe more lines
maybe many more lines
set password ENC asdfasdfadsfasdfadsfasdf
next
edit "user2"
set trusthost1 1.1.1.1 255.255.255.254
set vdom "root"
maybe more lines here too
next
end
config system replacemsg-image
edit "logo_fnet"
set image-type gif
set image-base64 ''
next
end
other lines
end
Note that user2 is missing "set password ENC...". I know that I only want to match text between "config system admin" and its corresponding "end". I also know that each user starts with "edit "<username>"" and ends with "next".
I have the following regex, which at least starts at the rights spot (config system admin) but seems to be matching on both user blocks (and "config system replacemsg-img" for some reason):
(config\ssystem\sadmin(\n|.)*)(edit\s\".*\"(\n|.)*(?!set\spassword\sENC)(\n|.)*next)(\n|.)*end
How would I write the expression so it only returns true because "user2" (in this example) is missing "set password ENC"? I am using PCRE2.
EDIT:
After some additional work, I have the following (not working, but maybe closer?) expression:
(?<=(config\ssystem\sadmin))((\n)(\s+edit\s\".*\"(\n))((.|\n)*)((?!set\spassword).)*)(\nend)?
This begins the capture at "config system admin". But, in the regex testers I tried, it also highlights all the way down to the last "end", instead of stopping at the first for some reason.

How can I use regex to construct an API call in my Jekyll plugin?

I'm trying to write my own Jekyll plugin to construct an api query from a custom tag. I've gotten as far as creating the basic plugin and tag, but I've run into the limits of my programming skills so looking to you for help.
Here's my custom tag for reference:
{% card "Arbor Elf | M13" %}
Here's the progress on my plugin:
module Jekyll
class Scryfall < Liquid::Tag
def initialize(tag_name, text, tokens)
super
#text = text
end
def render(context)
# Store the name of the card, ie "Arbor Elf"
#card_name =
# Store the name of the set, ie "M13"
#card_set =
# Build the query
#query = "https://api.scryfall.com/cards/named?exact=#{#card_name}&set=#{#card_set}"
# Store a specific JSON property
#card_art =
# Finally we render out the result
"<img src='#{#card_art}' title='#{#card_name}' />"
end
end
end
Liquid::Template.register_tag('cards', Jekyll::Scryfall)
For reference, here's an example query using the above details (paste it into your browser to see the response you get back)
https://api.scryfall.com/cards/named?exact=arbor+elf&set=m13
My initial attempts after Googling around was to use regex to split the #text at the |, like so:
#card_name = "#{#text}".split(/| */)
This didn't quite work, instead it output this:
[“A”, “r”, “b”, “o”, “r”, “ “, “E”, “l”, “f”, “ “, “|”, “ “, “M”, “1”, “3”, “ “]
I'm also then not sure how to access and store specific properties within the JSON response. Ideally, I can do something like this:
#card_art = JSONRESPONSE.image_uri.large
I'm well aware I'm asking a lot here, but I'd love to try and get this working and learn from it.
Thanks for reading.
Actually, your split should work – you just need to give it the correct regex (and you can call that on #text directly). You also need to escape the pipe character in the regex, because pipes can have special meaning. You can use rubular.com to experiment with regexes.
parts = #text.split(/\|/)
# => => ["Arbor Elf ", " M13"]
Note that they also contain some extra whitespace, which you can remove with strip.
#card_name = parts.first.strip
#card_set = parts.last.strip
This might also be a good time to answer questions like: what happens if the user inserts multiple pipes? What if they insert none? Will your code give them a helpful error message for this?
You'll also need to escape these values in your URL. What if one of your users adds a card containing a & character? Your URL will break:
https://api.scryfall.com/cards/named?exact=Sword of Dungeons & Dragons&set=und
That looks like a URL with three parameters, exact, set and Dragons. You need to encode the user input to be included in a URL:
require 'cgi'
query = "https://api.scryfall.com/cards/named?exact=#{CGI.escape(#card_name)}&set=#{CGI.escape(#card_set)}"
# => "https://api.scryfall.com/cards/named?exact=Sword+of+Dungeons+%26+Dragons&set=und"
What comes after that is a little less clear, because you haven't written the code yet. Try making the call with the Net::HTTP module and then parsing the response with the JSON module. If you have trouble, come back here and ask a new question.

AWS S3 trouble with anchor in filename #

I have some filenames stored with the # symbol. If I send a GET request to retrieve them I am running into problems as I believe GET requests are cut off at anchors within the path?
ex:
s3.amazonaws.com/path/to/my_file.jpg
vs: my browser stops looking at the #
s3.amazonaws.com/path/to/my_other_#file.jpg
is there a way to retrieve the file or will I have to change filenames so they do not contain #'s?
You need to encode your path as URL which would replace # with %23.
Check out this for URL encoding. https://www.w3schools.com/tags/ref_urlencode.asp
In JavaScript you can use encodeURI() to get it encoded.
https://www.w3schools.com/jsref/jsref_encodeURI.asp

How to configure Fiddler's Autoresponder to "map" a host to a folder?

I'm already using Fiddler to intercept requests for specific remote files while I'm working on them (so I can tweak them locally without touching the published contents).
i.e. I use many rules like this
match: regex:(?insx).+/some_file([?a-z0-9-=&]+\.)*
respond: c:\somepath\some_file
This works perfectly.
What I'd like to do now is taking this a step further, with something like this
match: regex:http://some_dummy_domain/(anything)?(anything)
respond: c:\somepath\(anything)?(anything)
or, in plain text,
Intercept any http request to 'some_dummy_domain', go inside 'c:\somepath' and grab the file with the same path and name that was requested originally. Query string should pass through.
Some scenarios to further clarify:
http://some_domain/somefile --> c:\somepath\somefile
http://some_domain/path1/somefile --> c:\somepath\path1\somefile
http://some_domain/path1/somefile?querystring --> c:\somepath\path1\somefile?querystring
I tried to leverage what I already had:
match: regex:(?insx).+//some_dummy_domain/([?a-z0-9-=&]+\.)*
respond: ...
Basically, I'm looking for //some_dummy_domain/ in requests. This seems to match correctly when testing, but I'm missing how to respond.
Can Fiddler use matches in responses, and how could I set this up properly ?
I tried to respond c:\somepath\$1 but Fiddler seems to treat it verbatim:
match: regex:(?insx).+//some_domain/([?a-z0-9-=&]+\.)*
respond: c:\somepath\$1
request: http://some_domain/index.html
response: c:\somepath\$1html <-----------
The problem is your use of insx at the front of your expression; the n means that you want to require explicitly-named capture groups, meaning that a group $1 isn't automatically created. You can either omit the n or explicitly name the capture group.
From the Fiddler Book:
Use RegEx Replacements in Action Text
Fiddler’s AutoResponder permits you to use regular expression group replacements to map text from the Match Condition into the Action Text. For instance, the rule:
Match Text: REGEX:.+/assets/(.*)
Action Text: http://example.com/mockup/$1
...maps a request for http://example.com/assets/Test1.gif to http://example.com/mockup/Test1.gif.
The following rule:
Match Text: REGEX:.+example\.com.*
Action Text: http://proxy.webdbg.com/p.cgi?url=$0
...rewrites the inbound URL so that all URLs containing example.com are passed as a URL parameter to a page on proxy.webdbg.com.
Match Text: REGEX:(?insx).+/assets/(?'fname'[^?]*).*
Action Text C:\src\${fname}
...maps a request for http://example.com/‌assets/img/1.png?bunnies to C:\src\‌img\‌1.png.

CALABASH - Renaming screenshot filenames without the iterator

In Calabash you can take a screenshot and rename it to whatever you want and save it to any directory like so:
screenshot({:prefix => "some/directory", :name=>"some_name.png"})
However it will always save as some_name_0.png and the next one will be some_name_1.png.
Does anyone know how to rename the filename completely without the iterator?
You can also just pass text from your steps on what to save the screendump as.
I have done this to easily set the prefix and name and only take the screendumps when I add "capture=true" to the start command.
def take_picture(prefix, name)
if ENV["capture"] == 'true'
screenshot(options={:prefix=>prefix, :name=>name})
end
end
And from the steps I call it like this(this is example does not add special prefix:
take_picture("","SettingsMenu1")
In lib/calabash-cucumber/failure_helpers.rb the iterator is defined via ##screenshot_count ||= 0 then ##screenshot_count += 1
So I just overwrite that.