Get multiple lines using regex with Ansible - regex

I have been trying to modify some files with Ansible but I do not have the right regex.
The goal is to modify a set of files and change everything between <Factory /> and </Factory> as "not register". As an example
I want to change this:
<Factory />
Replacement set
Madrid
</Factory>
to this:
<Factory />
Not register
<Factory />
What I have is the following:
hosts: all
tasks:
- name: replace factory registration
ansible.builtin.replace:
path: /home/clientDatabase.xml
regex: {'(?<=<Factory />.*?(?=</Factory>)', multiline = True}
replace: 'Not register'
I have tried several expressions and this is the closest I have got. It works perfectly on notepad++ if you set the regular expression on and check the .match newline box but it does not do anything in ansible.
What I understand is from (?<=) to (?=) get me everything in between (.*) that is 0 or once (?), check on multiple lines to get the whole structure (multiline = True).
I have also tried \R for return carrier and break line, the ^ and $ ones but from all the tries I had it does not work and I am getting out of ideas.
Could someone give me any hints here?
Here are some resources I think helped me the most:
https://docs.ansible.com/ansible/latest/collections/ansible/builtin/replace_module.html
https://docs.ansible.com/ansible/latest/collections/ansible/builtin/replace_module.html
https://docs.ansible.com/ansible/latest/collections/ansible/builtin/replace_module.html
https://w3.unpocodetodo.info/utiles/regex.php
Update:
Finally, I follow your suggestion, using [^<]*? match everything except the "<" character and it worked perfectly. The parenthesis was a mispell, sorry
The final result is:
hosts: all
tasks:
- name: replace factory registration
ansible.builtin.replace:
path: /home/clientDatabase.xml
regex: '(?<=<Factory />)[^<]*?'
replace: 'Not register'
What I understand is from <Factory /> replace all content up to the first <. With this statement not multiline, nor dotall flag need to be use.

I suggest the following regex which allows words, spaces and newline between the tags <Factory /> and </Factory>.
Please note that the string is on 2 lines because it contains a newline as part of the pattern.
(?<=<Factory \/>)[\w\s
]*(?=<\/Factory>)
this could also be written
(?<=<Factory \/>)[\w\s\n\r]*(?=<\/Factory>)

Related

Ansible: Insert word in GRUB cmdline

I'd like to use Ansible's lineinfile or replace module in order to add the word splash to the cmdline in GRUB.
It should work for all the following examples:
Example 1:
Before: GRUB_CMDLINE_DEFAULT=""
After: GRUB_CMDLINE_DEFAULT="splash"
Example 2:
Before: GRUB_CMDLINE_DEFAULT="quiet"
After: GRUB_CMDLINE_DEFAULT="quiet splash"
Example 3:
Before: GRUB_CMDLINE_DEFAULT="quiet nomodeset"
After: GRUB_CMDLINE_DEFAULT="quiet nomodeset splash"
The post Ansible: insert a single word on an existing line in a file explained well how this could be done without quotes. However, I can't get it to insert the word within the quotes.
What is the required entry in the Ansible role or playbook in order to add the word splash to the cmdline as shown?
You can do this without a shell output, with 2 lineinfiles modules.
In your example you're searching for splash:
- name: check if splash is configured in the boot command
lineinfile:
backup: true
path: /etc/default/grub
regexp: '^GRUB_CMDLINE_LINUX=".*splash'
state: absent
check_mode: true
register: grub_cmdline_check
changed_when: false
- name: insert splash if missing
lineinfile:
backrefs: true
path: /etc/default/grub
regexp: "^(GRUB_CMDLINE_LINUX=\".*)\"$"
line: '\1 splash"'
when: grub_cmdline_check.found == 0
notify: update grub
The trick is to try to remove the line if we can find splash somewhere, but doing a check only check_mode: true. If the term was found (found > 0) then we don't need to update the line. If it's not found, it means we need to insert it. We append it at the end with the backrefs.
Inspired by Adam's answer, I use this one to enable IOMMU:
- name: Enable IOMMU
ansible.builtin.lineinfile:
path: /etc/default/grub
regexp: '^GRUB_CMDLINE_LINUX_DEFAULT="((:?(?!intel_iommu=on).)*?)"$'
line: 'GRUB_CMDLINE_LINUX_DEFAULT="\1 intel_iommu=on"'
backup: true
backrefs: true
notify: update-grub
Please note I've had to set backrefs to true in order to \1 reference to work otherwise the captured group was not replaced.
Idempotency works fine as well.
EDIT: Please note this snippet only works with an Intel CPU and might to be updated to fit your platform.
A possible solution is the definition of two entries as follows:
- name: "Checking GRUB cmdline"
shell: "grep 'GRUB_CMDLINE_LINUX_DEFAULT=.*splash.*' /etc/default/grub"
register: grub_cfg_grep
changed_when: false
failed_when: false
- name: "Configuring GRUB cmdline"
replace:
path: '/etc/default/grub'
regexp: '^GRUB_CMDLINE_LINUX_DEFAULT="((\w.?)*)"$'
replace: 'GRUB_CMDLINE_LINUX_DEFAULT="\1 splash"'
when: '"splash" not in grub_cfg_grep'
Explanation: We first check if the splash keyword is present in the required line using grep. Since grep gives a negative return code when a string is not found, we suppress the errors using failed_when: false. The output of grep is saved to the grub_cfg_grep variable.
Next, we bind the replace module to the condition that the keyword splash is in the standard output of grep. The regular expression takes the old content in the quotes and adds the splash keyword behind it.
Note: In the case of an empty string before the execution, the result reads " splash" (with a space in front) but it is still a valid cmdline.
The difficulty is this line in the replace module page: "It is up to the user to maintain idempotence by ensuring that the same pattern would never match any replacements made."https://docs.ansible.com/ansible/latest/modules/replace_module.html#id4 It's easy to insert the item but actually quite tricky to make it idempotent, so the target file doesn't grow every time you run the task.
I found a way to do it in one shot with the replace module. You should be able to adapt this. My task checks the GRUB_CMDLINE_LINUX_DEFAULT line for "vt.default_red" and inserts some colour codes if not found.
My method was to copy-and-paste various nearly-there examples into the regex tester website and fiddle until it worked. I still don't grok the result, but it worked in my tests at https://www.regextester.com/ and it works in my playbook.
One problem I had was that Ansible's regex implementation apparently doesn't support conditionals, which gave me odd errors for a while.
- name: colours | configured grub command
replace:
path: /etc/default/grub
regexp: '^GRUB_CMDLINE_LINUX_DEFAULT="((:?(?!vt\.default_red).)*?)"$'
replace: 'GRUB_CMDLINE_LINUX_DEFAULT="\1 vt.default_red=0xee,..."'
The regex matches the literal string ("GRUB_CMDLINE_LINUX_DEFAULT=" and a double quote mark) at the start and the double quote mark at the end. Deconstructing the rest...
( - open capture group #1 (creates backref #1)
(:? - open a non-capturing group (not sure what the question mark is here)
(?! - negative lookahead (ie. don't match if the following string comes next)
vt\.default_red - the string to look for, literal dot is escaped
) - close negative lookahead
.) - match a single char (why?) and close the non-capturing group
* - try to match the non-capturing group zero or more times
? - ... lazily (ie. get the smallest possible match)
) - close capture group #1
What about doing this in Ansible, use perl to address your need.
- name: Change items in the file
ansible.builtin.command:
command: perl -i pe 's/DEFAULT="/DEFAULT="splash"/'
Another way of looking at it. This is an old conversation, but it is still relevant.

RegEx find all XML tags

How do I match all the beginning tags in an XML document with RegEx? I just need to collect the tag names used.
This is what I have:
(?<=<)(.*?)((?= \/>)|(?=>))
this matches all the beginning and closing tags.
Example:
<Habazutty>yaddayadda</Habazutty>
<Vogons />
<Targ>blahblah</Targ>
Above code matches:
Habazutty
/Habazutty
Vogons
Targ
/Targ
I only need
Habazutty
Vogons
Targ
I couldn't figure out a way to exclude the closing tags. Negative lookahead didn't work - found nothing. I must have messed up.
You can achieve this simply using:
<([^\/>]+)[\/]*>
The group capture will have your output
You could change (?<=<)(.*?)((?= \/>)|(?=>)) to (?<=<)([^\/]*?)((?= \/>)|(?=>)), i.e. instead of using (.*?) for the tag name, use ([^\/]*?). / is not allowed in tag names anyway.
Found another solution:
((?=<)(?!<\/)<)(.*?)((?= \/>)|(?=>))
Basically this ((?=<)(?!<\/)<) looks behind everything that is "<" (?=<) and not "< /" (?!<\/).
#Redneb's answer is cleaner though, less capturing groups and shorter and fancier.
<([^ >!\/]+)[^>]*>
matches test2, test3 and test5 in
<!--test-->
<test2>
<test3 x="1">
</test4>
<test5 />

RegEx for REST url substitutions

I have an URL like that:
http://www.url.me/en/cats/dogs/potatoes/tomatoes/
I need to replace the first two REST parameters to get a result URL like that:
http://www.url.me/FIRST/cats/dogs/potatoes/tomatoes/
I tried this regex \/([^/]+)\/ but it's not working as expected in CF:
<cfset ret.REDIRECT = reReplace(currentUrl, "\/([^/]+)\/", "FIRST", "all") />
What do you suggest, both for the regex and the cf code?
Thank you.
Firstly, you do not need to escape / in regex. (Sometimes you'll see it escaped, such as in JavaScript regex literals, but that is the JS side being escaped, not the regex.)
However, even with that change it wont do what you want - you'll be replacing every other /-qualified segment instead of just the first one after the host part.
To do what you want, use something like this:
reReplace(CurrentUrl, "^(https?://[^/]+/)[^/]+/", "\1FIRST/")
The ^ anchors the replace to the start of the input.
The (..) part captures the protocol and hostname so they can be re-inserted with \1 in the replacement string.
The final [^/]+/ is what captures the first part of the request uri and replaces it with the FIRST/ in the replacement string.
(You can omit the trailing / if it's not required, or use (?=/) to assert that it is there without needing to put it in the replace side.)

Regex lookahead with multiple negative conditions

I am performing a regex on a HTML string to fetch URL's. I want to fetch all href's and src's that are not javascript. From another SO post I have the following pattern:
/(href|src)?\="http:\/\/www\.mydomain\.com\/(?:(?!\.js).)*"/
Which fetches me results like:
src="http://www.mydomain.com/path/to/resource/image.gif" alt="" border="0"
This is good because it is missing the .js results. It's bad because it's fetching additional tags in the element. I tried the following amendment to stop at the first ":
/(href|src)?\="http:\/\/www\.mydomain\.com\/(?:(?!\.js).)[^"]*"/
It works in that it returns href="$url", but it returns results ending in .js. Is there a way to combine a negative lookahead that says:
Match string until it comes across another " - i.e. [^"]*; and
Do not match string if it ends in .js"
Thanks in advance for any help/tips/pointers.
add a "?" to the "*" before the last quote. This will make the "*" non-greedy, ie: it will stop matching at the first quote, not the last
/(href|src)?\="http:\/\/www\.mydomain\.com\/(?:(?!\.js).)*?"/
Here's something a bit different. I used Debuggex with this expression:
(?:src|href)=(?&.quotStr)(?<!\.js")
which compiled it to this one:
$regex = '/(?:src|href)=(?:"((?:\\\\.|[^"\\\\]){0,})")(?<!\\.js")/';
Live Demo
If you only want to reject .js at the end of the string, you can use the following for the last part of the string match:
"(?![^"]*\.js").*?"
per this Rubular
EDIT
See: https://stackoverflow.com/a/18838123/1163653 for a better solution.
Fixed it:
/(href|src)?\="http:\/\/www\.mydomain\.com\/(?:(?!\.js"|").)*"/
Note that the lookahead is checking for any string (after the domain) that doesn't contain .js or ", both of which would cause it to be invalid. It allows hrefs ending in .css through as they only fail when they reach the first ", which is the behaviour needed.

Selecting URLs using RegExp but ignoring them when surrounded by double quotes

I've searched around quite a bit now, but I can't get any suggestions to work in my situation. I've seen success with negative lookahead or lookaround, but I really don't understand it.
I wish to use RegExp to find URLs in blocks of text but ignore them when quoted. While not perfect yet I have the following to find URLs:
(https?\://)?(\w+\.)+\w{2,}(:[0-9])?\/?((/?\w+)+)?(\.\w+)?
I want it to match the following:
www.test.com:50/stuff
http://player.vimeo.com/video/63317960
odd.name.amazone.com/pizza
But not match:
"www.test.com:50/stuff
http://plAyerz.vimeo.com/video/63317960"
"odd.name.amazone.com/pizza"
Edit:
To clarify, I could be passing a full paragraph of text through the expression. Sample paragraph of what I'd like below:
I would like the following link to be found www.example.com. However this link should be ignored "www.example.com". It would be nice, but not required, to have "www.example.com and www.example.com" ignored as well.
A sample of a different one I have working below. language is php:
$articleEntry = "Hey guys! Check out this cool video on Vimeo: player.vimeo.com/video/63317960";
$pattern = array('/\n+/', '/(https?\:\/\/)?(player\.vimeo\.com\/video\/[0-9]+)/');
$replace = array('<br/><br/>',
'<iframe src="http://$2?color=40cc20" width="500" height="281" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe>');
$articleEntry = preg_replace($pattern,$replace,$articleEntry);
The result of the above will replace any new lines "\n" with a double break "" and will embed the Vimeo video by replacing the Vimeo address with an iframe and link.
I've found a solution!
(?=(([^"]+"){2})*[^"]*$)((https?:\/\/)?(\w+\.)+\w{2,}(:[0-9]+)?((\/\w+)+(\.\w+)?)?\/?)
The first part from (? to *$) what makes it work for me. I found this as an answer in java Regex - split but ignore text inside quotes? by https://stackoverflow.com/users/548225/anubhava
While I had read that question before, I had overlooked his answer because it wasn't the one that "solved" the question. I just changed the single quote to double quote and it works out for me.
add ^ and $ to your regex
^(https?\://)?(\w+\.)+\w{2,}(:[0-9])?\/?((/?\w+)+)?(\.\w+)?$
please notice you might need to escape the slashes after http (meaning https?\:\/\/)
update
if you want it to be case sensitive, you shouldn't use \w but [a-z]. the \w contains all letters and numbers, so you should be careful while using it.