Ansible extract substring from multiline string containing url - regex

I'm trying to extract a substring from a multine string with Ansible regex without success.
I have this ouput from an excuted command (teleport users add):
"stdout": "Signup token has been created and is valid for 3600 seconds. Share this URL with the user:\nhttps://main-proxy:3080/web/newuser/d32ed2bc0ebb0084a381123e3eff0bfa\n\nNOTE: make sure 'main-proxy' is accessible!"
I would like to extract juste the token. Here: d32ed2bc0ebb0084a381123e3eff0bfa.
I registered the output in a result variable, and I'm trying to extract the token without success:
- set_fact:
signup_token: '{{ result.stdout | regex_replace("^(?s)^https:\/\/.*\/(.+).*?$", "\\1") }}'
- debug: msg={{ signup_token }}
What's the right regex and syntax?

Why do you use so complex regular expression? Take 32 chars of [0-9a-f] after /.
- set_fact:
signup_token: "{{ mystr | regex_search(qry) }}"
vars:
qry: '(?<=\/)[a-f0-9]{32}'
Use sites like https://regex101.com/ to test your expressions.

The regex (?s).*https://.*/([^\r\n]+).* works right as I expected.
And also yes I could have tried getting the last 32 characters from the second line of standard output but I prefere to be agnostique from token lenght if it changes, so a regex to extract the token id from output is the right way for my case now.

Related

Get multiple lines using regex with Ansible

I have been trying to modify some files with Ansible but I do not have the right regex.
The goal is to modify a set of files and change everything between <Factory /> and </Factory> as "not register". As an example
I want to change this:
<Factory />
Replacement set
Madrid
</Factory>
to this:
<Factory />
Not register
<Factory />
What I have is the following:
hosts: all
tasks:
- name: replace factory registration
ansible.builtin.replace:
path: /home/clientDatabase.xml
regex: {'(?<=<Factory />.*?(?=</Factory>)', multiline = True}
replace: 'Not register'
I have tried several expressions and this is the closest I have got. It works perfectly on notepad++ if you set the regular expression on and check the .match newline box but it does not do anything in ansible.
What I understand is from (?<=) to (?=) get me everything in between (.*) that is 0 or once (?), check on multiple lines to get the whole structure (multiline = True).
I have also tried \R for return carrier and break line, the ^ and $ ones but from all the tries I had it does not work and I am getting out of ideas.
Could someone give me any hints here?
Here are some resources I think helped me the most:
https://docs.ansible.com/ansible/latest/collections/ansible/builtin/replace_module.html
https://docs.ansible.com/ansible/latest/collections/ansible/builtin/replace_module.html
https://docs.ansible.com/ansible/latest/collections/ansible/builtin/replace_module.html
https://w3.unpocodetodo.info/utiles/regex.php
Update:
Finally, I follow your suggestion, using [^<]*? match everything except the "<" character and it worked perfectly. The parenthesis was a mispell, sorry
The final result is:
hosts: all
tasks:
- name: replace factory registration
ansible.builtin.replace:
path: /home/clientDatabase.xml
regex: '(?<=<Factory />)[^<]*?'
replace: 'Not register'
What I understand is from <Factory /> replace all content up to the first <. With this statement not multiline, nor dotall flag need to be use.
I suggest the following regex which allows words, spaces and newline between the tags <Factory /> and </Factory>.
Please note that the string is on 2 lines because it contains a newline as part of the pattern.
(?<=<Factory \/>)[\w\s
]*(?=<\/Factory>)
this could also be written
(?<=<Factory \/>)[\w\s\n\r]*(?=<\/Factory>)

Ansible lineinfile duplication using insertafter

I am trying to add an entry into my /etc/hosts file using ansibles lineinfile. I want the logic to be if it finds the entry 127.0.0.1 mysite.local then do nothing otherwise insert it after the line 127.0.1.1
127.0.0.1 localhost
127.0.1.1 mypc
127.0.0.1 mysite.local
I have the insert after part working but it appears the actual regex search is failing to find the existing entry so I keep getting duplication of the insertion of 127.0.0.1 mysite.local
The docs do say;
When modifying a line the regexp should typically match both the initial state of the line as well as its state after replacement by line to ensure idempotence.
But I'm not sure how that applies to my regex. Currently my play is;
- name: Add the site to hosts
lineinfile:
path: /etc/hosts
# Escape special chars
regex: "^{{ domain|regex_escape() }}"
line: "127.0.0.1 {{ domain }}"
insertafter: '127\.0\.1\.1'
firstmatch: yes
become: yes
where domain is mysite.local.
I have looked at this answer but I'm pretty sure I cannot use backrefs since the docs state;
This flag changes the operation of the module slightly; insertbefore and insertafter will be ignored, and if the regexp doesn't match anywhere in the file, the file will be left unchanged.
I have tried;
regex: '127\.0\.0\.1\s+?{{ domain|regex_escape() }}'
With no luck either
It seems that firstmatch: yes was breaking things. It work for me with following task (I replaced space with tab for fancy look but spaces work as well):
- name: Add the site to hosts
lineinfile:
path: /etc/hosts
# Escape special chars
regexp: "{{ domain|regex_escape() }}"
line: "127.0.0.1{{ '\t' }}{{ domain }}"
insertafter: '127\.0\.1\.1'
According to this link, lineinfile scans the file and applies the regex one line at a time, meaning you cannot use a regex that looks through the whole file. I am unfamiliar with the lineinfile tool, but if you can use the "replace" tool used in the link above then you can use the following Python regex to match as you need:
\A((?:(?!127\.0\.0\.1\s)[\s\S])*?)(?:\Z|127\.0\.0\.1\s+(?!{{ domain|regex_escape() }})\S+\n|(127\.0\.1\.1\s+\S+(?![\s\S]*\n127\.0\.0\.1\s)\n))
With the substitution: "\1\2127.0.0.1 {{ domain }}\n"
The non-capturing group handles three distinct cases:
Case 1: 127.0.1.1 and 127.0.0.1 don't exist so insert at end
Case 2: 127.0.0.1 exists with a different host so replace the entry
Case 3: 127.0.1.1 exists so insert after it
It is the second case that tackles idempotence by avoiding matching an entry for "127.0.0.1" if one already exists.
The doc says:
insertafter: ... If regular expressions are passed to both regexp and insertafter, insertafter is only honored if no match for regexp is found.
The regex in the task expands to
regex: ^mysite\.local
This regex is not found because there is no line that begins with "mysite.local". Hence insertafter is honored and "line" is inserted after 127.0.1.1 .

How can I get Regex_replace to match a special character in Ansible, and append a \ in front of it?

I am trying to create a regex that matches any special characters inside of a URL, and then adds the escape character \ in front of it. I have created the following regex, which correctly captures the right characters from the string, but it does not add the \ in front of the special characters.
For example - test-google.com would look like test\-google\.com
- hosts: localhost
vars:
site: "test-google.com"
site2: "test.mywebsite.com"
tasks:
- name: Bad at regex
debug:
msg: "{{ site | regex_replace('[^\\w]', '[^\\\w]') }}"
register: regex
- debug:
msg: "{{ regex }}"
I have tried '[^\\'\'w]' as well as '\w'
How could I accomplish this?
Thanks.
You may use either
msg: "{{ site | regex_replace(r'\W', r'\\\g<0>') }}"
or
msg: "{{ site | regex_replace('\\W', '\\\\\\g<0>') }}"
Here, \W matches any non-word char, and the replacement string contains a \ (expressed with 4 backslashes in the regular, non-raw, string literal, and
2 backslashes in a raw string literal) and then the whole match value expressed with the replacement backreference \g<0>.

Match literals with 'regex_replace' Ansible filter

I cannot find a way to much a literal (a dot) in Ansible's regex_replace filter. Here is the task:
- name: Display database name
debug:
msg: "{{ vhost | regex_replace('(.+\.)(.+)$', \\1) }}"
tags: [debug]
My intention is to match and replace the whole URL like test.staging.domain.com with its first part (test in the example).
Ansible would report the following error:
debug:
msg: "{{ vhost | regex_replace('(.+\.)(.+)$', \\1) }}"
^ here
We could be wrong, but this one looks like it might be an issue with
missing quotes. Always quote template expression brackets when they
start a value.
How can I match literals in Ansible regexp_replace filter?
it's actually possible to escape literals with double backlashes:
- name: Create database name and username
set_fact:
db_name: "{{ vhost | regex_replace('([^\\.]*)\\.(.+)$', '\\1') }}_stg"
The regexp above works correctly. The first capturing group extracts the first part of the URL until the first dot, the whole regex captures the whole URL. Passing test.staging.domain.com through it would produce just test.
I was trying to do the same thing, ended up doing it like this:
- name: Set hostname (minus the domain)
debug: msg={{ inventory_hostname | regex_replace('^([^.]*).*', '\\1') }}
*edit, found a nicer way:
- name: Set hostname (minus the domain)
debug: msg={{ inventory_hostname.split('.')[0] }}
There could be something whacky about escaping characters, but there's an escape-less way to code a literal dot:
[.]
So your regex could be written
(.+[.])(.+)$
Most characters lose their special meaning when in a character class, and the dot is one of them.

Ansible: how to handle dots while using search or match functions

Here are the two scenarios and variable prefix will just hold another string.
1)This works:
when: ansible_hostname | search("{{ prefix }}-test-.*")
2)This doesn't work probably because of dots in the search string.
when: ansible_hostname | search("{{ prefix }}-test-.*.tin.com)
I even tried escaping dots without any success.
when: ansible_hostname | search("{{ prefix }}-test-.*\.tin\.com)
Finally I understand that ansible_hostname will give you only part of fqdn and hence since my regex was trying to look for fqdn with .tin.com it was not matching. For time being I'll be using groups['name'] to iterate over my hosts and get the fqdn.
Note: Ansible has ansible_fqdn to give the complete host name however that works fine only if DNS is configured.