Ansible: how to handle dots while using search or match functions - regex

Here are the two scenarios and variable prefix will just hold another string.
1)This works:
when: ansible_hostname | search("{{ prefix }}-test-.*")
2)This doesn't work probably because of dots in the search string.
when: ansible_hostname | search("{{ prefix }}-test-.*.tin.com)
I even tried escaping dots without any success.
when: ansible_hostname | search("{{ prefix }}-test-.*\.tin\.com)

Finally I understand that ansible_hostname will give you only part of fqdn and hence since my regex was trying to look for fqdn with .tin.com it was not matching. For time being I'll be using groups['name'] to iterate over my hosts and get the fqdn.
Note: Ansible has ansible_fqdn to give the complete host name however that works fine only if DNS is configured.

Related

How to update new hostname in a file using ansible?

I am using below code to replace old hostname with new one, it is working except for hostnames starting with numbers.(OLD_HOSTNAME and NEW_HOSTNAME are vars)
tasks:
- name: "Updating file"
replace:
name: /tmp/interfaces
backup: yes
regexp: '(\s+){{ OLD_HOSTNAME }}(\s+)'
replace: '\1{{ NEW_HOSTNAME }}\2'
If I replace \1 with \g<1>, the hostnames starting with numbers will also get placed. But as per the ansible doc, \1 is used ambiguously, and \g<1> used explicitly.
Question: Will this change impact any other format of hostname?
No, using the explicit form will not affect other hostname formats.
The reason why you have a problem when NEW_HOSTNAME begins with a number is that the replace string would become something like \123-server\2 if NEW_HOSTNAME was 23-server and there is no backreference \123. Using the explicit form preserves your original intent. In my example, replace would become \g<1>23-server\g<2>.

Ansible lineinfile duplication using insertafter

I am trying to add an entry into my /etc/hosts file using ansibles lineinfile. I want the logic to be if it finds the entry 127.0.0.1 mysite.local then do nothing otherwise insert it after the line 127.0.1.1
127.0.0.1 localhost
127.0.1.1 mypc
127.0.0.1 mysite.local
I have the insert after part working but it appears the actual regex search is failing to find the existing entry so I keep getting duplication of the insertion of 127.0.0.1 mysite.local
The docs do say;
When modifying a line the regexp should typically match both the initial state of the line as well as its state after replacement by line to ensure idempotence.
But I'm not sure how that applies to my regex. Currently my play is;
- name: Add the site to hosts
lineinfile:
path: /etc/hosts
# Escape special chars
regex: "^{{ domain|regex_escape() }}"
line: "127.0.0.1 {{ domain }}"
insertafter: '127\.0\.1\.1'
firstmatch: yes
become: yes
where domain is mysite.local.
I have looked at this answer but I'm pretty sure I cannot use backrefs since the docs state;
This flag changes the operation of the module slightly; insertbefore and insertafter will be ignored, and if the regexp doesn't match anywhere in the file, the file will be left unchanged.
I have tried;
regex: '127\.0\.0\.1\s+?{{ domain|regex_escape() }}'
With no luck either
It seems that firstmatch: yes was breaking things. It work for me with following task (I replaced space with tab for fancy look but spaces work as well):
- name: Add the site to hosts
lineinfile:
path: /etc/hosts
# Escape special chars
regexp: "{{ domain|regex_escape() }}"
line: "127.0.0.1{{ '\t' }}{{ domain }}"
insertafter: '127\.0\.1\.1'
According to this link, lineinfile scans the file and applies the regex one line at a time, meaning you cannot use a regex that looks through the whole file. I am unfamiliar with the lineinfile tool, but if you can use the "replace" tool used in the link above then you can use the following Python regex to match as you need:
\A((?:(?!127\.0\.0\.1\s)[\s\S])*?)(?:\Z|127\.0\.0\.1\s+(?!{{ domain|regex_escape() }})\S+\n|(127\.0\.1\.1\s+\S+(?![\s\S]*\n127\.0\.0\.1\s)\n))
With the substitution: "\1\2127.0.0.1 {{ domain }}\n"
The non-capturing group handles three distinct cases:
Case 1: 127.0.1.1 and 127.0.0.1 don't exist so insert at end
Case 2: 127.0.0.1 exists with a different host so replace the entry
Case 3: 127.0.1.1 exists so insert after it
It is the second case that tackles idempotence by avoiding matching an entry for "127.0.0.1" if one already exists.
The doc says:
insertafter: ... If regular expressions are passed to both regexp and insertafter, insertafter is only honored if no match for regexp is found.
The regex in the task expands to
regex: ^mysite\.local
This regex is not found because there is no line that begins with "mysite.local". Hence insertafter is honored and "line" is inserted after 127.0.1.1 .

Ansible extract substring from multiline string containing url

I'm trying to extract a substring from a multine string with Ansible regex without success.
I have this ouput from an excuted command (teleport users add):
"stdout": "Signup token has been created and is valid for 3600 seconds. Share this URL with the user:\nhttps://main-proxy:3080/web/newuser/d32ed2bc0ebb0084a381123e3eff0bfa\n\nNOTE: make sure 'main-proxy' is accessible!"
I would like to extract juste the token. Here: d32ed2bc0ebb0084a381123e3eff0bfa.
I registered the output in a result variable, and I'm trying to extract the token without success:
- set_fact:
signup_token: '{{ result.stdout | regex_replace("^(?s)^https:\/\/.*\/(.+).*?$", "\\1") }}'
- debug: msg={{ signup_token }}
What's the right regex and syntax?
Why do you use so complex regular expression? Take 32 chars of [0-9a-f] after /.
- set_fact:
signup_token: "{{ mystr | regex_search(qry) }}"
vars:
qry: '(?<=\/)[a-f0-9]{32}'
Use sites like https://regex101.com/ to test your expressions.
The regex (?s).*https://.*/([^\r\n]+).* works right as I expected.
And also yes I could have tried getting the last 32 characters from the second line of standard output but I prefere to be agnostique from token lenght if it changes, so a regex to extract the token id from output is the right way for my case now.

Need IP Address mask and DNS host name regular expressions?

I need to allow an IP/DNS name from a text box. I am looking for a IP regular expression which work for IP.
Now I am using one regular expression:
/\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b/
which was working for 0-255 range. But allowing invalid IP such as : 121.21.05.234.01 which has 5 parts.
I need a regular expression which will work in all scenario's like below:
10.2.22.1 - true
123.123.123.123 - true
123.123.023.12 - true
12.23.12.0 - true
121.21.05.234.01 - false
Please provide me DNS expression also.
Try to anchor your regex with ^ and $, which will make it match the whole string.
Are you looking for a way to specify an occurrence count?
You may achieve this with curly brackets.
An exemple here.
In your case, it would lead to:
/\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}\b/
(I added a \ to escape the dot, too)

Regular expression to match DNS hostname or IP Address?

Does anyone have a regular expression handy that will match any legal DNS hostname or IP address?
It's easy to write one that works 95% of the time, but I'm hoping to get something that's well tested to exactly match the latest RFC specs for DNS hostnames.
You can use the following regular expressions separately or by combining them in a joint OR expression.
ValidIpAddressRegex = "^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$";
ValidHostnameRegex = "^(([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\-]*[A-Za-z0-9])$";
ValidIpAddressRegex matches valid IP addresses and ValidHostnameRegex valid host names. Depending on the language you use \ could have to be escaped with \.
ValidHostnameRegex is valid as per RFC 1123. Originally, RFC 952 specified that hostname segments could not start with a digit.
http://en.wikipedia.org/wiki/Hostname
The original specification of
hostnames in RFC
952,
mandated that labels could not start
with a digit or with a hyphen, and
must not end with a hyphen. However, a
subsequent specification (RFC
1123)
permitted hostname labels to start
with digits.
Valid952HostnameRegex = "^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$";
The hostname regex of smink does not observe the limitation on the length of individual labels within a hostname. Each label within a valid hostname may be no more than 63 octets long.
ValidHostnameRegex="^([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])\
(\.([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9]))*$"
Note that the backslash at the end of the first line (above) is Unix shell syntax for splitting the long line. It's not a part of the regular expression itself.
Here's just the regular expression alone on a single line:
^([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])(\.([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9]))*$
You should also check separately that the total length of the hostname must not exceed 255 characters. For more information, please consult RFC-952 and RFC-1123.
To match a valid IP address use the following regex:
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}
instead of:
([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])(\.([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])){3}
Explanation
Many regex engine match the first possibility in the OR sequence. For instance, try the following regex:
10.48.0.200
Test
Test the difference between good vs bad
I don't seem to be able to edit the top post, so I'll add my answer here.
For hostname - easy answer, on egrep example here -- http: //www.linuxinsight.com/how_to_grep_for_ip_addresses_using_the_gnu_egrep_utility.html
egrep '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}'
Though the case doesn't account for values like 0 in the fist octet, and values greater than 254 (ip addres) or 255 (netmask). Maybe an additional if statement would help.
As for legal dns hostname, provided that you are checking for internet hostnames only (and not intranet), I wrote the following snipped, a mix of shell/php but it should be applicable as any regular expression.
first go to ietf website, download and parse a list of legal level 1 domain names:
tld=$(curl -s http://data.iana.org/TLD/tlds-alpha-by-domain.txt | sed 1d | cut -f1 -d'-' | tr '\n' '|' | sed 's/\(.*\)./\1/')
echo "($tld)"
That should give you a nice piece of re code that checks for legality of top domain name, like .com .org or .ca
Then add first part of the expression according to guidelines found here -- http: //www.domainit.com/support/faq.mhtml?category=Domain_FAQ&question=9 (any alphanumeric combination and '-' symbol, dash should not be in the beginning or end of an octet.
(([a-z0-9]+|([a-z0-9]+[-]+[a-z0-9]+))[.])+
Then put it all together (PHP preg_match example):
$pattern = '/^(([a-z0-9]+|([a-z0-9]+[-]+[a-z0-9]+))[.])+(AC|AD|AE|AERO|AF|AG|AI|AL|AM|AN|AO|AQ|AR|ARPA|AS|ASIA|AT|AU|AW|AX|AZ|BA|BB|BD|BE|BF|BG|BH|BI|BIZ|BJ|BM|BN|BO|BR|BS|BT|BV|BW|BY|BZ|CA|CAT|CC|CD|CF|CG|CH|CI|CK|CL|CM|CN|CO|COM|COOP|CR|CU|CV|CX|CY|CZ|DE|DJ|DK|DM|DO|DZ|EC|EDU|EE|EG|ER|ES|ET|EU|FI|FJ|FK|FM|FO|FR|GA|GB|GD|GE|GF|GG|GH|GI|GL|GM|GN|GOV|GP|GQ|GR|GS|GT|GU|GW|GY|HK|HM|HN|HR|HT|HU|ID|IE|IL|IM|IN|INFO|INT|IO|IQ|IR|IS|IT|JE|JM|JO|JOBS|JP|KE|KG|KH|KI|KM|KN|KP|KR|KW|KY|KZ|LA|LB|LC|LI|LK|LR|LS|LT|LU|LV|LY|MA|MC|MD|ME|MG|MH|MIL|MK|ML|MM|MN|MO|MOBI|MP|MQ|MR|MS|MT|MU|MUSEUM|MV|MW|MX|MY|MZ|NA|NAME|NC|NE|NET|NF|NG|NI|NL|NO|NP|NR|NU|NZ|OM|ORG|PA|PE|PF|PG|PH|PK|PL|PM|PN|PR|PRO|PS|PT|PW|PY|QA|RE|RO|RS|RU|RW|SA|SB|SC|SD|SE|SG|SH|SI|SJ|SK|SL|SM|SN|SO|SR|ST|SU|SV|SY|SZ|TC|TD|TEL|TF|TG|TH|TJ|TK|TL|TM|TN|TO|TP|TR|TRAVEL|TT|TV|TW|TZ|UA|UG|UK|US|UY|UZ|VA|VC|VE|VG|VI|VN|VU|WF|WS|XN|XN|XN|XN|XN|XN|XN|XN|XN|XN|XN|YE|YT|YU|ZA|ZM|ZW)[.]?$/i';
if (preg_match, $pattern, $matching_string){
... do stuff
}
You may also want to add an if statement to check that string that you checking is shorter than 256 characters -- http://www.ops.ietf.org/lists/namedroppers/namedroppers.2003/msg00964.html
It's worth noting that there are libraries for most languages that do this for you, often built into the standard library. And those libraries are likely to get updated a lot more often than code that you copied off a Stack Overflow answer four years ago and forgot about. And of course they'll also generally parse the address into some usable form, rather than just giving you a match with a bunch of groups.
For example, detecting and parsing IPv4 in (POSIX) C:
#include <arpa/inet.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
for (int i=1; i!=argc; ++i) {
struct in_addr addr = {0};
printf("%s: ", argv[i]);
if (inet_pton(AF_INET, argv[i], &addr) != 1)
printf("invalid\n");
else
printf("%u\n", addr.s_addr);
}
return 0;
}
Obviously, such functions won't work if you're trying to, e.g., find all valid addresses in a chat message—but even there, it may be easier to use a simple but overzealous regex to find potential matches, and then use the library to parse them.
For example, in Python:
>>> import ipaddress
>>> import re
>>> msg = "My address is 192.168.0.42; 192.168.0.420 is not an address"
>>> for maybeip in re.findall(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', msg):
... try:
... print(ipaddress.ip_address(maybeip))
... except ValueError:
... pass
I think this is the best Ip validation regex. please check it once!!!
^(([01]?[0-9]?[0-9]|2([0-4][0-9]|5[0-5]))\.){3}([01]?[0-9]?[0-9]|2([0-4][0-9]|5[0-5]))$
def isValidHostname(hostname):
if len(hostname) > 255:
return False
if hostname[-1:] == ".":
hostname = hostname[:-1] # strip exactly one dot from the right,
# if present
allowed = re.compile("(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE)
return all(allowed.match(x) for x in hostname.split("."))
"^((\\d{1,2}|1\\d{2}|2[0-4]\\d|25[0-5])\.){3}(\\d{1,2}|1\\d{2}|2[0-4]\\d|25[0-5])$"
This works for valid IP addresses:
regex = '^([0-9]|[1-9][0-9]|[1][0-9][0-9]|[2][0-5][0-5])[.]([0-9]|[1-9][0-9]|[1][0-9][0-9]|[2][0-5][0-5])[.]([0-9]|[1-9][0-9]|[1][0-9][0-9]|[2][0-5][0-5])[.]([0-9]|[1-9][0-9]|[1][0-9][0-9]|[2][0-5][0-5])$'
>>> my_hostname = "testhostn.ame"
>>> print bool(re.match("^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$", my_hostname))
True
>>> my_hostname = "testhostn....ame"
>>> print bool(re.match("^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$", my_hostname))
False
>>> my_hostname = "testhostn.A.ame"
>>> print bool(re.match("^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$", my_hostname))
True
The new Network framework has failable initializers for struct IPv4Address and struct IPv6Address which handle the IP address portion very easily. Doing this in IPv6 with a regex is tough with all the shortening rules.
Unfortunately I don't have an elegant answer for hostname.
Note that Network framework is recent, so it may force you to compile for recent OS versions.
import Network
let tests = ["192.168.4.4","fkjhwojfw","192.168.4.4.4","2620:3","2620::33"]
for test in tests {
if let _ = IPv4Address(test) {
debugPrint("\(test) is valid ipv4 address")
} else if let _ = IPv6Address(test) {
debugPrint("\(test) is valid ipv6 address")
} else {
debugPrint("\(test) is not a valid IP address")
}
}
output:
"192.168.4.4 is valid ipv4 address"
"fkjhwojfw is not a valid IP address"
"192.168.4.4.4 is not a valid IP address"
"2620:3 is not a valid IP address"
"2620::33 is valid ipv6 address"
/^(?:[a-zA-Z0-9]+|[a-zA-Z0-9][-a-zA-Z0-9]+[a-zA-Z0-9])(?:\.[a-zA-Z0-9]+|[a-zA-Z0-9][-a-zA-Z0-9]+[a-zA-Z0-9])?$/
Here is a regex that I used in Ant to obtain a proxy host IP or hostname out of ANT_OPTS. This was used to obtain the proxy IP so that I could run an Ant "isreachable" test before configuring a proxy for a forked JVM.
^.*-Dhttp\.proxyHost=(\w{1,}\.\w{1,}\.\w{1,}\.*\w{0,})\s.*$
I found this works pretty well for IP addresses. It validates like the top answer but it also makes sure the ip is isolated so no text or more numbers/decimals are after or before the ip.
(?<!\S)(?:(?:\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\b|.\b){7}(?!\S)
AddressRegex = "^(ftp|http|https):\/\/([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}:[0-9]{1,5})$";
HostnameRegex = /^(ftp|http|https):\/\/([a-z0-9]+\.)?[a-z0-9][a-z0-9-]*((\.[a-z]{2,6})|(\.[a-z]{2,6})(\.[a-z]{2,6}))$/i
this re are used only for for this type validation
work only if
http://www.kk.com
http://www.kk.co.in
not works for
http://www.kk.com/
http://www.kk.co.in.kk
http://www.kk.com/dfas
http://www.kk.co.in/
try this:
((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4]\d|25[0-5]|[01]?\d\d?)
it works in my case.
Regarding IP addresses, it appears that there is some debate on whether to include leading zeros. It was once the common practice and is generally accepted, so I would argue that they should be flagged as valid regardless of the current preference. There is also some ambiguity over whether text before and after the string should be validated and, again, I think it should. 1.2.3.4 is a valid IP but 1.2.3.4.5 is not and neither the 1.2.3.4 portion nor the 2.3.4.5 portion should result in a match. Some of the concerns can be handled with this expression:
grep -E '(^|[^[:alnum:]+)(([0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.){3}([0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])([^[:alnum:]]|$)'
The unfortunate part here is the fact that the regex portion that validates an octet is repeated as is true in many offered solutions. Although this is better than for instances of the pattern, the repetition can be eliminated entirely if subroutines are supported in the regex being used. The next example enables those functions with the -P switch of grep and also takes advantage of lookahead and lookbehind functionality. (The function name I selected is 'o' for octet. I could have used 'octet' as the name but wanted to be terse.)
grep -P '(?<![\d\w\.])(?<o>([0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(\.\g<o>){3}(?![\d\w\.])'
The handling of the dot might actually create a false negatives if IP addresses are in a file with text in the form of sentences since the a period could follow without it being part of the dotted notation. A variant of the above would fix that:
grep -P '(?<![\d\w\.])(?<x>([0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(\.\g<x>){3}(?!([\d\w]|\.\d))'
There's a further nuance here that's missing.
It's true that a HOSTNAME should match, basically, what's been given above.
What's missing is that a REFERENCE TO a hostname can be the same, plus an optional period on the end.
For example, with a trailing period, ping foo.bar.svc.cluster.local. will ping that hostname only, and not attempt any DNS search options in resolv.conf.
tldr - If you provide an input box to receive a hostname, what's entered does not actually need to be a valid hostname.
how about this?
([0-9]{1,3}\.){3}[0-9]{1,3}
on php: filter_var(gethostbyname($dns), FILTER_VALIDATE_IP) == true ? 'ip' : 'not ip'
Checking for host names like... mywebsite.co.in, thangaraj.name, 18thangaraj.in, thangaraj106.in etc.,
[a-z\d+].*?\\.\w{2,4}$
I thought about this simple regex matching pattern for IP address matching
\d+[.]\d+[.]\d+[.]\d+