Extract integers from string with bash - regex

From a variable how to extract integers that will be in format *\d+.\d+.\d+* (4.12.3123) using bash.
filename="xzxzxzxz4.12.3123fsfsfsfsfsfs"
I have tried:
filename="xzxzxzxz4.12.3123fsfsfsfsfsfs"
if [[ "$filename" =~ (.*)(\d+.\d+.\d+)(.*) ]]; then
echo ${BASH_REMATCH}
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
echo ${BASH_REMATCH[3]}
else
echo 'nej'
fi
which does not work.

The easiest way to work with regexes in Bash, in terms of consistency between Bash versions and escaping, is to put the regex into a single-quoted variable and then use it unquoted, as below:
re='[0-9]+\.[0-9]+\.[0-9]+'
[[ $filename =~ $re ]] && printf '%s\n' "${BASH_REMATCH[#]}"
The main issue with your approach were that you were using the "Perl-style" \d, so in fact you could make your code work with:
if [[ "$filename" =~ (.*)([0-9]+\.[0-9]+\.[0-9]+)(.*) ]]; then
echo "${BASH_REMATCH[2]}"
fi
But this unnecessarily creates 3 capture groups, when you don't even need one. Note that I also changed . (any character) to \. (a literal .).

one way to extract:
grep -oP '\d\.\d+\.\d+' <<<$xfilename

There is one more way
$ filename="xzxzxzxz4.12.3123fsfsfsfsfsfs"
$ awk '{ if (match($0, /[0-9].[0-9]+.[0-9]+/, m)) print m[0] }' <<< "$filename"
4.12.3123

Related

Matching between angle brackets in Bash in Cygwin 2.4.1(0.293/5/3)

I have:
file=something.xml
IFS='
'
for i in $( cat $file )
do
if [[ $i =~ '>[^<]*<' ]]
then
echo $i | sed -e 's/.*>\([^<]*\)<.*/\1/g'
fi
done
The match in the if finds nothing.
If I then do:
file=something.xml
IFS='
'
for i in $( cat $file )
do
#if [[ $i =~ '>[^<]*<' ]]
#then
echo $i | sed -e 's/.*>\([^<]*\)<.*/\1/g'
#fi
done
Of course, every line of the file passes through the sed. Lines that match the pattern in the sed are correctly filtered by the sed.
So the regex in the sed is matching but not the regex in the if, even though the two regexes are exactly the same.
Can someone explain this? I'm a longtime, though casual, Bash user, so I may not be doing things according to best practices.
After some experimenting with cygwin and regex, I found that the apostrophes are problematic here. Instead using apostrophes, we can escape the > and < chars:
if [[ $i =~ \>[^\<]*\< ]]
I don't know if we can make it work with some sort of quotes too. However, this seems to be simple enough for this script.

Regex - validate IPv6 shell script

I am able to validate IPv6 addresses using java with following regex:
([0-9a-fA-F]{0,4}:){1,7}([0-9a-fA-F]){0,4}
But I need to do this in shell script to which I am new.
This regex doesn't seem to work in shell. Have tried some other combinations also but nothing helped.
#!/bin/bash
regex="([0-9a-fA-F]{0,4}:){1,7}([0-9a-fA-F]){0,4}"
var="$1"
if [[ "$var" =~ "$regex" ]]
then
echo "matches"
else
echo "doesn't match!"
fi
It gives output doesn't match! for 2001:0Db8:85a3:0000:0000:8a2e:0370:7334
How can I write this in shell script?
Java regex shown in question would work in bash as well but make sure to not to use quoted regex variable. If the variable or string on the right hand side of =~ operator is quoted, then it is treated as a string literal instead of regex.
I also recommend using anchors in regex. Otherwise it will print matches for invalid input as: 2001:0db8:85a3:0000:0000:8a2e:0370:7334:foo:bar:baz.
Following script should work for you:
#!/bin/bash
regex='^([0-9a-fA-F]{0,4}:){1,7}[0-9a-fA-F]{0,4}$'
var="$1"
if [[ $var =~ $regex ]]; then
echo "matches"
else
echo "doesn't match!"
fi
[[ and =~ won't work with sh, and awk almost works everywhere.
Here is what I did
saved as ./check-ipv6.sh, chmod +x ./check-ipv6.sh
#!/bin/sh
regex='^([0-9a-fA-F]{0,4}:){1,7}[0-9a-fA-F]{0,4}$'
echo -n "$1" | awk '$0 !~ /'"$regex"'/{print "not an ipv6=>"$0;exit 1}'
Or you prefer bash than sh
#!/bin/bash
regex='^([0-9a-fA-F]{0,4}:){1,7}[0-9a-fA-F]{0,4}$'
awk '$0 !~ /'"$regex"'/{print "not an ipv6=>"$0;exit 1}' <<< "$1"
Test
~$ ./check-ipv6.sh 2001:0Db8:85a3:0000:0000:8a2e:0370:7334x
not an ipv6=>2001:0Db8:85a3:0000:0000:8a2e:0370:7334x
~$ echo $?
1
~$ ./check-ipv6.sh 2001:0Db8:85a3:0000:0000:8a2e:0370:7334
~$ echo $?
0

How to match this string in bash?

I'm reading a file in bash, line by line. I need to print lines that have the following format:
don't care <<< at least one character >>> don't care.
These are all the way which I have tried and none of them work:
if [[ $line =~ .*<<<.+>>>.* ]]; then
echo "$line"
fi
This has incorrect syntax
These two have correct syntax don't work
if [[ $line =~ '.*<<<.+>>>.*' ]]; then
echo "$line"
fi
And this:
if [[ $line == '*<<<*>>>*' ]]; then
echo "$line"
fi
So how to I tell bash to only print lines with that format? PD: I have tested and printing all lines works just fine.
Don't need regular expression. filename patterns will work just fine:
if [[ $line == *"<<<"?*">>>"* ]]; then ...
* - match zero or more characters
? - match exactly one character
"<<<" and ">>>" - literal strings: The angle brackets need to be quoted so bash does not interpret them as a here-string redirection.
$ line=foobar
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
n
$ line='foo<<<>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
n
$ line='foo<<<x>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
y
$ line='foo<<<xyz>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
y
For maximum compatibility, it's always a good idea to define your regex pattern as a separate variable in single quotes, then use it unquoted. This works for me:
re='<<<.+>>>'
if [[ $line =~ $re ]]; then
echo "$line"
fi
I got rid of the redundant leading/trailing .*, by the way.
Of course, I'm assuming that you have a valid reason to process the file in native bash (if not, just use grep -E '<<<.+>>>' file)
<, <<, <<<, >, and >> are special in the shell and need quoting:
[[ $line =~ '<<<'.+'>>>' ]]
. and + shouldn't be quoted, though, to keep their special meaning.
You don't need the leading and trailing .* in =~ matching, but you need them (or their equivalents) in patterns:
[[ $line == *'<<<'?*'>>>'* ]]
It's faster to use grep to extract lines:
grep -E '<<<.+>>>' input-file
I don't even understand why you are reading the file line per line. I have just launched following command in the bash prompt and it's working fine:
grep "<<<<.+>>>>" test.txt
where test.txt contains following data:
<<<<>>>>
<<<<a>>>>
<<<<aa>>>>
The result of the command was:
<<<<a>>>>
<<<<aa>>>>

bash if [[ =~ regex compare not working?

I have a value in a variable that may be absolute or relative url, and I need to check which one it is.
I have found that there's a =~ operator in [[, but I can't get it to work. What am I doing wrong?
url="http://test"
if [[ "$url" =~ "^http://" ]];
then echo "absolute.";
fi;
You need to use regex without quote:
url="http://test"
if [[ "$url" =~ ^http:// ]]; then
echo "absolute."
fi
This outputs `absolute. as regex needs to be without quote in newer BASH (after BASH v3.1)
Or avoid regex and use glob matching:
if [[ "$url" == "http://"* ]]; then
echo "absolute."
fi

Bash - correct way to escape dollar in regex

What is the correct way to escape a dollar sign in a bash regex? I am trying to test whether a string begins with a dollar sign. Here is my code, in which I double escape the dollar within my double quotes expression:
echo -e "AB1\nAB2\n\$EXTERNAL_REF\nAB3" | while read value;
do
if [[ ! $value =~ "^\\$" ]];
then
echo $value
else
echo "Variable found: $value"
fi
done
This does what I want for one box which has:
GNU bash, version 3.2.25(1)-release (x86_64-redhat-linux-gnu)
And the verbose output shows
+ [[ ! $EXTERNAL_REF =~ ^\$ ]]
+ echo 'Variable found: $EXTERNAL_REF'
However, on another box which uses
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
The comparison is expanded as follows
+ [[ ! $EXTERNAL_REF =~ \^\\\$ ]]
+ echo '$EXTERNAL_REF'
Is there a standard/better way to do this that will work across all implementations?
Many thanks
Why do you use a regular expression here? A glob is enough:
#!/bin/bash
while read value; do
if [[ "$value" != \$* ]]; then
echo "$value"
else
echo "Variable found: $value"
fi
done < <(printf "%s\n" "AB1" "AB2" '$EXTERNAL_REF' "AB3")
Works here with shopt -s compat32.
The regex doesn't need any quotes at all. This should work:
if [[ ! $value =~ ^\$ ]];
I would replace the double quotes with single quotes and remove a single \ and have the changes as below
$value =~ "^\\$"
can also be used as
$value =~ '^\$'
I never found the solution either, but for my purposes, I settled on the following workaround:
if [[ "$value" =~ ^(.)[[:alpha:]_][[:alnum:]_]+\\b && ${BASH_REMATCH[1]} == '$' ]]; then
echo "Variable found: $value"
else
echo "$value"
fi
Rather than trying to "quote" the dollar-sign, I instead match everything around it and I capture the character where the dollar-sign should be to do a direct-string comparison on. A bit of a kludge, but it works.
Alternatively, I've taken to using variables, but just for the backslash character (I don't like storing the entire regex in a variable because I find it confusing for the regex to not appear in the context where it's used):
bs="\\"
string="test\$test"
if [[ "$string" =~ $bs$ ]]; then
echo "output \"$BASH_REMATCH\""
fi