Regex to Match range of IP addresses - regex

I am wantin to match IP address that are from 10.0-29.x.x, 10.31-39.x.x, and 10.41-253.x.x.
Of the lines below, I want to capture the 3rd line and below.
network 10.40.5.0 0.0.0.255
network 10.255.5.0 0.0.0.255
network 10.23.3.0 0.0.0.255
netowrk 10.273.255.0 0.255.255
So the way it will work, is if there is a match, it will set a flag that the configuration is invalid. I may have 10 invalid lines, or just 1. It doesn't matter.

Regex are not designed to do math.
However, you can try something like [3-4]{1} if you want a 3 or a 4.
For bigger processing you might have to match it first with a general IP regex, then process it with any language.

The core of your problem is a regex that matches these number ranges: 0-29, 31-39, 41-253
An extended regex that matches this is:
^network 10\.([0-9]|1[0-9]|2[0-9]|3[1-9]|4[1-9]|[5-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-3])\.[0-9]+\.[0-9]+
The regex is divided in these steps:
0-9, 10-19, 20-29, 31-39, 41-49, 50-99, 100-199, 200-249, 250-253
A shell script that would work is:
if {
cat input_file |
egrep -q '^network 10.([0-9]|1[0-9]|2[0-9]|3[1-9]|4[1-9]|[5-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-3]).[0-9]+.[0-9]+ '
}
then
echo action if matched
else
echo action if not matched
fi

Related

Grep PCRE Regex Non Capturing Groups

From the following text I wish to extract the following two strings:
ip-10-x-x-x.eu-west-2.compute.interna
And
topology.kubernetes.io/zone=eu-west-2a
Full blob:
ip-10-x-x-x.eu-west-2.compute.internal Ready <none> 18d v1.20.4-eks-1-20-1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/os=linux,node.app/name=all,topology.kubernetes.io/region=eu-west-2,topology.kubernetes.io/zone=eu-west-2a
Regex with Grep PCRE is being used to extract the strings.
The following regex works on https://regex101.com/
(((^ip.*?)(?=(\s)))(?:.*?)((?<=\,)(topology\.kubernetes\.io\/zone.*?)(?=(\s|$))))
But when running on on Bash v4.2 with Grep, it pulls back to full blob, rather than the regex groups, as seen here:
echo "ip-10-x-x-x.eu-west-2.compute.internal Ready <none> 18d v1.20.4-eks-1-20-1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/os=linux,node.app/name=all,topology.kubernetes.io/region=eu-west-2,topology.kubernetes.io/zone=eu-west-2a" | grep -oP "(((^ip.*?)(?=(\s)))(?:.*?)((?<=\,)(topology\.kubernetes\.io\/zone.*?)(?=(\s|$))))"
What am I missing here?
As Barmer comments, grep does not refer capture groups. You need to modify the regex to work with grep:
echo "ip-10-x-x-x.eu-west-2.compute.internal Ready <none> 18d v1.20.4-eks-1-20-1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/os=linux,node.app/name=all,topology.kubernetes.io/region=eu-west-2,topology.kubernetes.io/zone=eu-west-2a" | grep -oP "^ip\S+|(?<=\,)topology\.kubernetes\.io\/zone\S*(?=(?:\s|$))"
Output:
ip-10-x-x-x.eu-west-2.compute.internal
topology.kubernetes.io/zone=eu-west-2a
If you want to make use of your regex as is, try ripgrep:
echo "ip-10-x-x-x.eu-west-2.compute.internal Ready <none> 18d v1.20.4-eks-1-20-1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/os=linux,node.app/name=all,topology.kubernetes.io/region=eu-west-2,topology.kubernetes.io/zone=eu-west-2a" | rg --pcre2 "(((^ip.*?)(?=(\s)))(?:.*?)((?<=\,)(topology\.kubernetes\.io\/zone.*?)(?=(\s|$))))" -r '$2'$'\n''$5'
which will produce the same results.
In case you are ok with awk, please try following awk program.
awk '
match($0,/^ip\S+/){
print substr($0,RSTART,RLENGTH)
match($0,/,topology\.kubernetes\.io\/zone\S*/)
print substr($0,RSTART+1,RLENGTH-1)
}
' Input_file
Explanation: Simple explanation would be, using match function of awk to match ^ip\S+ then printing its matched value. Then again using 1 more match to match regex ,topology\.kubernetes\.io\/zone\S* to get the 2nd mentioned value by OP then printing only needed output by substr function.
Your pattern already captures the parts that you want in groups, but it is not efficient as there are 7 capture groups where you actually only need 2 capture groups.
There are also multiple lookaround assertions that are unnecessary, and can be turned into a match instead or omitted at all.
As already commented and explained, you can not use capture groups with grep, but if you can make use of gnu awk you can use match.
awk 'match($0, /^(ip\S+).*,(topology\.kubernetes\.io\/zone\S*)/, a) {
print a[1]
print a[2]
}' file
Output
ip-10-x-x-x.eu-west-2.compute.internal
topology.kubernetes.io/zone=eu-west-2a
Explanation about the pattern:
^(ip\S+).*,(topology\.kubernetes\.io\/zone\S*)
^ Start of string
(ip\S+) Capture group 1, match ip and 1+ non whitespace chars
.* Match the rest of the line
,(topology\.kubernetes\.io\/zone\S*) Match the , and capture the string that you want after it followed by matching optional non whitspace chars using \S in group 2
If you want to output only those 2 matches, another option could be using sed and replace the whole line with the 2 capture groups:
sed -E 's/^(ip[^[:blank:]]+).*,(topology\.kubernetes\.io\/zone[^[:blank:]]*).*/\1 \2/' file
Output
ip-10-x-x-x.eu-west-2.compute.internal topology.kubernetes.io/zone=eu-west-2a

how to edit a line having IPv4 address using sed command

I need to modify an ntp configuration file by adding some options to the line containing Ip addresses.
I have been trying it for so long using sed command, but no able to modify the line unless i don't know the IP addresses.
Let say, i have few lines as,
server 172.0.0.1
server 10.0.0.1
I need to add iburst option after the ip address.
I have tried command like.. sed -e 's/(\d{1,3}\.\d{1.3}\.\d{1,3}\.\d{1,3})/ \1 iburst/g' ntp_file
or sed -e 's/^server +\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/server \1\.\2\.\3\ iburst/g' ntp_file
but its not modifying the line. Any kind of suggestions would be really appriciated.
The regex you have used as POSIX BRE cannot match the expected strings due to \d shorthand class that sed does not support, the misused dot inside a range quantifier and incorrect escaping of grouping and range quantifier delimiters.
You may use
sed -E -i 's/[0-9]{1,3}(\.[0-9]{1,3}){3}/ & iburst/g' ntp_file
The POSIX ERE (enabled with the -E option) expression means to match
[0-9]{1,3} - one to three digits
(\.[0-9]{1,3}){3} - three occurrences of a dot and one to three digits
The replacement pattern is & iburst where & stands for the whole match.
The g flag replaces all occurrences.

Regex to find valid IP address using awk [duplicate]

This question already has answers here:
Validating IPv4 addresses with regexp
(44 answers)
Closed 3 years ago.
I get all the IP address connected to the network along with strings and name of the network, but I wanted to extract only the IP's using awk regex
I tried :
awk '{match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/); ip = substr($0,RSTART,RLENGTH); print ip}'
But it prints IP address along with some numbers and date, say
2019-12-13 12
192.168.1.1
123.168.1.12
0.00012
But I want just the IP address.
Could you please try following. Since no samples given so didn't test it.
awk 'match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/){print substr($0,RSTART,RLENGTH)}' Input_file
Why OP's code is not working: Since OP has mentioned . in regex which is matching any character NOT literal character . that's why OP is getting results which are NOT IPs too. In above code it is escaped by doing \. which will let awk know to look for literal character . NOT for any character.
to be honest, I don't have any idea about awk command, but as a good regularexp writer, to extract ip addresses , you can use this optimized exp:
/^([0-9]{0,3}\.){3}[0-9]{1,3}$/g
you can check it here :
IP address Regex test
In terms of regex, the PCRE-compatible expression (?:[12]?\d{1,2}\.){3}[12]?\d{1,2} should meet your needs. It's a simplified version of the more comprehensive IP regexes that can be found as answers on this question, and can be tested with this demo.
Unfortunately, awk is quite limited in its ability, and is not PCRE compatible. I would suggest using perl instead, but if you're insistent on using awk, the following command should work:
awk 'match($0, /[12]?[0-9]?[0-9]\.[12]?[0-9]?[0-9]\.[12]?[0-9]?[0-9]\.[12]?[0-9]?[0-9]/) {print substr($0, RSTART, RLENGTH)}'
This uses awk-compatible regex to match IPs, and is an expanded form of the above regex. It matches and prints out only the IPs it finds, omitting the rest of the line.
Before you edited your question, your original regex was 0-9]+.[0-9]+.[0-9]+.[0-9]+ - the . allowed it to match any character, meaning hyphens, spaces, and numbers were all valid matches. By specifying \. instead, the regex will exactly match the period character.
Something like this ?
$ cat file
172.27.1.256 # invalid ip
2019-12-13 12
192.168.1.1
123.168.1.12
0.00012
299.288.299.333 # invalid ip
$ grep -oE '((1?[0-9][0-9]?|2[0-4][0-9]|25[0-5])\.){3}((1?[0-9][0-9]?|2[0-4][0-9]|25[0-5]))\s+?$' file
192.168.1.1
123.168.1.12

How to grep/sed/awk for a range of output starting with a whitespace character

I have a file that looks something like this:
# cat $file
...
ip access-list extended DOG-IN
permit icmp 10.10.10.1 0.0.0.7 any
permit tcp 10.11.10.1 0.0.0.7 eq www 443 10.12.10.0 0.0.0.63
deny ip any any log
ip access-list extended CAT-IN
permit icmp 10.13.10.0 0.0.0.255 any
permit ip 10.14.10.0 0.0.0.255 host 10.15.10.10
permit tcp 10.16.10.0 0.0.0.255 host 10.17.10.10 eq smtp
...
I want to be able to search by name (using a script) to get 'section' output for independent access-lists. I want the output to look like this:
# grep -i dog $file | sed <options??>
ip access-list extended DOG-IN
permit icmp 10.10.10.1 0.0.0.7 any
permit tcp 10.11.10.1 0.0.0.7 eq www 443 10.12.10.0 0.0.0.63
deny ip any any log
...with no further output of inapplicable non-indented lines.
I have tried the following:
grep -A 10 DOG $file | sed -n '/^[[:space:]]\{1\}/p'
...Which only gives me the 10 lines after DOG which begin with a single space (including lines not applicable to the searched access-list).
sed -n '/DOG/,/^[[:space:]]\{1\}/p' $file
...Which gives me the line containing DOG, and the next line beginning with a single space. (Need all the applicable lines of the access-list...)
I want the line containing DOG, and all lines after DOG which begin with a single space, until the next un-indented line. There are too many variables in the content to depend on any patterns other than the leading space (there is not always a deny on the end, etc...).
Using GNU sed (Linux):
name='dog' # case-INsensitive name of section to extract
sed -n "/$name/I,/^[^[:space:]]/ { /$name/I {p;d}; /^[^[:space:]]/q; p }" file
To make matching case-sensitive, remove the I after the occurrences of /I above.
-n suppresses default output so that output must explicitly be requested inside the script with functions such as p.
Note the use of double quotes ("...") around the sed script, so as to allow references to the shell variable $name: The double quotes ensure that the shell variable references are expanded BEFORE the script is handed to sed (sed itself has no access to shell variables).
Caveat: This technique is tricky, because (a) you must use shell escaping to escape shell metacharacters you want to pass through to sed, such as $ as \$, and (b) the shell-variable value must not contain sed metacharacters that could break the sed script; for generic escaping of shell-variable values for use in sed scripts, see this answer of mine, or use my awk-based answer.
/$name/I,/^[^[:space:]]/ uses a range to match the line of interest (/$name/I; the trailing I is GNU sed's case-insensitivity matching option) through the start of the next section (/^[^[:space:]]/ - i.e., the next line that does NOT start with whitespace); since sed ranges are always inclusive, the challenge is to selectively remove the last line of the range, IF it is the start of the next section - note that this will NOT be the case if the section of interest is the LAST one in the file.
Note that the commands inside { ... } are only executed for each line in the range.
/$name/I {p;d}; unconditionally prints the 1st line of the range: d deletes the line (which has already been printed) and starts the next cycle (proceeds to the next input line).
/^[^[:space:]]/q matches the last line in the range, IF it is the next section's first line, and quits processing altogether (q), without printing the line.
p is then only reached for section-interior lines and prints them.
Note:
The assumption is that header lines can be identified by NOT starting with a whitespace char., and that any other lines are non-header lines - if more sophisticated matching is required, see my awk-based answer.
This solution has the slight disadvantage that the range regexes must be duplicated, although you could mitigate that with shell variables.
FreeBSD/macOS sed can almost do the same, except that it lacks the case-insensitivity option, I.
name='DOG' # case-SENSITIVE name of section to extract
sed -n -e "/$name/,/^[^[:space:]]/ { /$name/ {p;d;}; /^[^[:space:]]/q; p; }" file
Note that FreeBSD/OSX sed generally has stricter syntax requirements, such as the ; after a command even when followed by }.
If you do need case-insensitivity, see my awk-based answer.
awk -vfound=0 '
/DOG/{
found = !found;
print;
next
}
/^[[:space:]]/{
if (found) {
print;
next
}
}
{ found = !found }
'
You can substitute any ERE in place of /DOG/, such as /(DOG)|(CAT)/, and the rest of the script will do the work. You can condense it if you like of course.
Note that just because a line begins with a space, that doesn't mean there is only one space. /^[[:space:]]{1}/ will match the leading space, even in a string like
nonspace
meaning it is equivalent to /^[[:space:]]/. If your format is so rigid that there must always only be a single space, use /^[[:space:]][^[:space:]]/ instead. Lines like the one with "nonspace" above will not be matched.
I added a second answer as mklement0 pointed a flaw on my logic.
This is yet a very simple way to do that in Perl:
perl -ne ' /^\w+/ && {$p=0}; /DOG/ && {$p=1}; $p && {print}'
EXAMPLES:
cat /tmp/file | perl -ne ' /^\w+/ && {$p=0}; /DOG/ && {$p=1}; $p && {print}'
ip access-list extended DOG-IN
permit icmp 10.10.10.1 0.0.0.7 any
permit tcp 10.11.10.1 0.0.0.7 eq www 443 10.12.10.0 0.0.0.63
deny ip any any log
cat /tmp/file | perl -ne ' /^\w+/ && {$p=0}; /CAT/ && {$p=1}; $p && {print}'
ip access-list extended CAT-IN
permit icmp 10.13.10.0 0.0.0.255 any
permit ip 10.14.10.0 0.0.0.255 host 10.15.10.10
permit tcp 10.16.10.0 0.0.0.255 host 10.17.10.10 eq smtp
EXPLANATION:
If the line starts with [a-z0-9_] set $p false
If the line contains PATTERN in this case DOG sets $p true
if $p true prints
#mklement0 squeezed my already-inscrutable sed down to this:
sed '/^ip/!{H;$!d};x; /DOG/I!d'
which swaps accumulated multiline groups into the pattern buffer for processing -- the main logic (/DOG/I!d here) operates on whole groups.
The /^ip/! identifies continuation lines by the absence of a first-line marker and accumulates them, so the x only runs when an entire group has been accumulated.
Some corner cases don't apply here:
The first x swaps in a phantom empty group at the start. If that doesn't get dropped during ordinary processing, adding a 1d fixes that.
The last x also swaps out the last line of the file. That's usually just last line of the last group, already accumulated by the H, but if some command might produce one-line groups you need to supply a fake one at the end (with e.g. echo "header phantom" | sed '/^header/!{H;$!d};x' realdata.txt -, or { showgroups; echo header phantom; } | sed '/^header/!{H;$!d};x'.
A shorter, POSIX-compliant awk solution, which is a generalized and optimized translation of #Tiago's excellent Perl-based answer.
One advantage of these answers over the sed solutions is that they use literal substring matching rather than regular expressions, which allows passing in arbitrary search strings, without needing to worry about escaping. That said, if you did want regex matching, use the ~ operator rather than the index() function; e.g., index($0, name) would become $0 ~ name. You then have to make sure that the value passed for name either contains no accidental regex metacharacters meant to be treated as literals or is an intentionally crafted regex.
name='DOG' # Case-sensitive name to search for.
awk -v name="$name" '/^[^[:space:]]/ {if (p) exit; if (index($0,name)) {p=1}} p' file
Option -v name="$name" defines awk variable name based on the value of shell variable $name (awk has no direct access to shell variables).
Variable p is used as a flag to indicate whether the current line should be printed, i.e., whether it is part of the section of interest; as long as p is not initialized, it is treated as 0 (false) in a Boolean context.
Pattern /^[^[:space:]]/ matches only header lines (lines that start with a non-whitespace character), and the associated action ({...}) is only processed for them:
if (p) exit exits processing altogether, if p is already set, because that implies that the next section has been reached. Exiting right away has the benefit of not having to process the remainder of the file.
if (index($0, name)) looks for the name of interest as a literal substring in the header line at hand, and, if found (in which case index() returns the 1-based position at which the substring was found, which is interpreted astruein a Boolean context), sets flagpto1({p=1}`).
p simply prints the current line, if p is 1, and does nothing otherwise. That is, once the section header of interest has been found, it and subsequent lines are printed (up until the next section or the end of the input file).
Note that this is an example of a pattern-only command: only a pattern (condition) is specified, without an associated action ({...}), in which case the default action is to print the current line, if the pattern evaluates to true. (That technique is used in the common shorthand 1 to simply unconditionally print the current record.)
If case-INsensitivity is needed:
name='dog' # Case-INsensitive name to search for.
awk -v name="$name" \
'/^[^[:space:]]/ {if(p) exit; if(index(tolower($0),tolower(name))) {p=1}} p' file
Caveat: The BSD-based awk that comes with macOS (still applies as of 10.12.1) is not UTF-8-aware.: the case-insensitive matching won't work with non-ASCII letters such as ü.
GNU awk alternative, using the special IGNORECASE variable:
awk -v name="$name" -v IGNORECASE=1 \
'/^[^[:space:]]/ {if(p) exit; if(index($0,name)) {p=1}} p' file
Another POSIX-compliant awk solution:
name='dog' # Case-insensitive name of section to extract.
awk -v name="$name" '
index(tolower($0),tolower(name)) {inBlock=1; print; next} # 1st section line found.
inBlock && !/^[[:space:]]/ {exit} # Exit at start of next section.
inBlock # Print 2nd, 3rd, ... section line.
' file
Note:
next skips the remaining pattern-action pairs and proceeds to the next line.
/^[[:space:]]/ matches lines that start with at least one whitespace char. As #Chrono Kitsune explains in his answer, if you wanted to match lines that start with exactly one whitespace char., use /^[[:space:]][^[:space:]]/. Also note that, despite its name, character class [:space:] matches ANY form of whitespace, not just spaces - see man isspace.
There's no need to initialize flag variable inBlock, as it defaults to 0 in numeric/Boolean contexts.
If you have GNU awk, you can more easily achieve case-insensitive matching by setting the IGNORECASE variable to a nonzero value (-v IGNORECASE=1) and simply using index($0, name) inside the program.
A GNU awk solution, IF, you can assume that all section header lines start with 'ip' (so as to break the input into sections that way, rather than looking for leading whitespace):
awk -v RS='(^|\n)ip' -F'\n' -v name="$name" -v IGNORECASE=1 '
index($1, name) { sub(/\n$/, ""); print "ip" $0; exit }
' file
-v RS='(^|\n)ip' breaks the input into records by lines that fall between line-starting instances of string 'ip'.
-F'\n' then breaks each record into fields ($1, ...) by lines.
index($1, name) looks for the name on the current record's first line - case-INsensitively, thanks to -v IGNORECASE=1.
sub(/\n$/, "") removes any trailing \n, which can stem from the section of interest being the last in the input file.
print "ip" $0 prints the matching record, comprising the entire section of interest - since, however the record doesn't include the separator, 'ip', it is prepended.
The simplest way I can think of is: sed '/DOG/, /^ip/ !d' | sed '$d'
cat file | sed '/DOG/, /^ip/ !d' | sed '$d'
ip access-list extended DOG-IN
permit icmp 10.10.10.1 0.0.0.7 any
permit tcp 10.11.10.1 0.0.0.7 eq www 443 10.12.10.0 0.0.0.63
deny ip any any log
Explanation:
first sed command prints from the line containing DOG to the next line starting with ip
second sed command deletes the last line(which is the line starting with ip)

Bash based regex domain name validation

I want to create a script that will add new domains to our DNS Servers.
I found that Fully qualified domain name validation REGEX.
However, when I use it with sed, it is not working as I would expect:
echo test | sed '/(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(:[a-zA-Z]{2,})$)/p'
--------
Output is:
test
echo test.com | sed '/(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(:[a-zA-Z]{2,})$)/p'
--------
Output is:
test.com
I expected that the output of the first command should be a blank line.
What do I do wrong?
I find this to be a more comprehensive regex:
(?=^.{4,253}$)(^(?:[a-zA-Z0-9](?:(?:[a-zA-Z0-9\-]){0,61}[a-zA-Z0-9])?\.)+([a-zA-Z]{2,}|xn--[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])$)
RFC 1034§3: Allows for a length of 4-253, with the shortest operational domain I'm aware of, "t.co", still matching where the other answers don't. 255 bytes is the maximum length, minus the length octet for each label (TLD and "primary" subdomain) gives us 253: (?=^.{4,253}$)
RFC 3696§2: Single-letter TLDs are technically permitted, meaning the minimum length would be 3, but as there are currently no single-letter TLDs a minimum length of 4 is practical.
RFC 1034§3: Allows numbers in subdomains, which Conor Clafferty's apparently doesn't (by not distinguishing other subdomains from "primary" subdomains -- i.e. the domain you register -- which the DNS spec doesn't)
RFC 1034§3: Restricts individual labels to 63 characters, permitting hyphens in the middle while restricting the beginning and end to alphanumerics (?:[a-zA-Z0-9](?:(?:[a-zA-Z0-9\-]){,61}[a-zA-Z0-9])?\.)
Requires a two-letter or larger TLD, but may be punycoded ([a-zA-Z]{2,}|xn--[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])
RFC 3696§2: The DNS spec technically permits numerics in the TLD, as well as single-letter TLDs; however, there are currently no single-letter TLDs or TLDs with numbers currently, and all-numeric TLDs are not permitted, so this part of the regex has been simplified to [a-zA-Z]{2,}.
--OR--
RFC 3490§5: an internationalized domain name ccTLD (IDN ccTLD) may be punycoded, as indicated by an "xn--" prefix, after which it may contain letters, numbers, or hyphens. This approximates to xn--[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9]
Be aware that this pattern does not validate a punycode TLD! Invalid punycode will be tolerated, e.g. "xn--qqqq", because attempting to validate punycode against the appropriate encoding mechanisms is beyond the scope of a regular expression. While punycode itself technically permits an encoded string ending in a hyphen, RFC 3492§5 observes and respects the IDNA limitation that labels may not end in a hyphen.
EDIT 02/2021: Hat tip to user2241415 for pointing out that IDN ccTLDs did not match the previously-specified regex.
You are missing a question mark in your regex :
(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(?:[a-zA-Z]{2,})$)
You can test your regex here
You can do what you want with grep :
$ echo test.com | grep -P '(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(?:[a-zA-Z]{2,})$)'
test.com
$ echo test | grep -P '(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(?:[a-zA-Z]{2,})$)'
$
No sed implementation I am aware of supports the various Perl extensions you are using in that regex. Try with Perl or grep -P or pcregrep, or simplify the regex to something sed can cope with. Here is a quick and dirty adaptation which splits the regex into a script of three different regexes, and rejects when something fails to match (or matches, in the middlemost case).
echo 'test' | sed -r '/^.{5,254}$/!d
/^([^.]*\.)*[0-9]+\./d # Seems incorrect; 112.com is valid
/^([a-zA-Z0-9_\-]{1,63}\.?)+([a-zA-Z]{2,})$/!d' # should disallow underscore
# also, what's with the question mark after the literal dot?
This also completely fails to accept IDNA domains (which can contain dashes and numbers in the TLD, among other things) so I would definitely not recommend this, but hopefully it shows you how to adapt something like this to sed if you wish to.
Pierre-Louis' answer didn't quite work for me. e.g. "kittens" is considered a domain name.
I added one slight adjustment to ensure that the domain at least had a dot in it.
(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+\.(?:[a-z]{2,})$)
Theres an extra \. just before it reads the last portion of the domain.
I use grep -P to do this.
echo test | grep -P "^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9](?:\.[a-zA-Z]{2,})+$"
--------
Output is:
echo www.test.com | grep -P "^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9](?:\.[a-zA-Z]{2,})+$"
--------
Output is: www.test.com
if the domain has to exist you can try:
$ cat test.sh
#!/bin/bash
for h in "bert" "ernie" "www.google.com"
do
host $h 2>&1 > /dev/null
if [ $? -eq 0 ]
then
echo "$h is a FQDN"
else
echo "$h is not a FQDN"
fi
done
jalderman#mba:/tmp$ ./test.sh
bert is not a FQDN
ernie is not a FQDN
www.google.com is a FQDN