Bash regex to match substring with exact integer range - regex

I need to match a string $str that contains any of
foo{77..93}
and capture the above substring in a variable.
So far I've got:
str=/random/string/containing/abc-foo78_efg/ # for example
if [[ $str =~ (foo[7-9][0-9]) ]]; then
id=${BASH_REMATCH[1]}
fi
echo $id # gives foo78
but this also captures ids outside of the target range (e.g. foo95).
Is there a way to restrict the regex to an exact integer range? (tried foo[77-93] but that doesn't work.
Thanks

If you want to use a regex, you're going to have to make it slightly more complex:
if [[ $str =~ foo(7[7-9]|8[0-9]|9[0-3]) ]]; then
id=${BASH_REMATCH[0]}
fi
Note that I have removed the capture group around the whole pattern and am now using the 0th element of the match array.
As an aside, for maximum compatibility with older versions of bash, I would recommend assigning the pattern to a variable and using in the test like this:
re='foo(7[7-9]|8[0-9]|9[0-3])'
if [[ $str =~ $re ]]; then
id=${BASH_REMATCH[0]}
fi
An alternative to using a regex would be to use an arithmetic context, like this:
if (( "${str#foo}" >= 77 && "${str#foo}" <= 93 )); then
id=$str
fi
This strips the "foo" part from the start of the variable so that the integer part can be compared numerically.

Sure is easy to do with Perl:
$ echo foo{1..100} | tr ' ' '\n' | perl -lne 'print $_ if m/foo(\d+)/ and $1>=77 and $1<=93'
foo77
foo78
foo79
foo80
foo81
foo82
foo83
foo84
foo85
foo86
foo87
foo88
foo89
foo90
foo91
foo92
foo93
Or awk even:
$ echo foo{1..100} | tr ' ' '\n' | awk -F 'foo' '$2>=77 && $2<=93
{print}'
foo77
foo78
foo79
foo80
foo81
foo82
foo83
foo84
foo85
foo86
foo87
foo88
foo89
foo90
foo91
foo92
foo93

Related

'$' in regexp in bash

I really don't know what I'm doing.
In variable a, I want to find the first appearance of '$' after the first appearance of 'Bitcoin', and print everything after it until the first newline.
I have the following code:
a = 'something Bitcoin something againe $jjjkjk\n againe something'
if [[ $a =~ .*Bitcoin.*[\$](.*).* ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "no"
fi
In this example I would like to get 'jjjkjk'. All I get is 'no'.
This code might be really flawed, I have no experience in this. I think tho the problem might be with the '$' sign. Please help!
Properly handle newlines in bash with ANSI-C Quoting -- \n sequences become literal newlines.
a=$'something Bitcoin something againe $jjjkjk\n againe something'
regex=$'Bitcoin[^$]*[$]([^\n]+)'
[[ $a =~ $regex ]] && declare -p BASH_REMATCH
declare -ar BASH_REMATCH='([0]="Bitcoin something againe \$jjjkjk" [1]="jjjkjk")'
# .................................................................^^^^^^^^^^^^
To verify the contents contain newlines:
$ printf '%s' "$regex" | od -c
0000000 B i t c o i n [ ^ $ ] * [ $ ] (
0000020 [ ^ \n ] + )
0000026
Here is a working version of your code:
a='something Bitcoin something againe $jjjkjk\n againe something'
r=".*Bitcoin.*[\$]([^\n]*).*"
if [[ $a =~ $r ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "no"
fi
You need to find 'Bitcoin' then find a '$' after it, no matter what is between, so you should use .* operator, also when you want to capture some text until a specific char, the best way is using [^](not) operator, in your case: [^\n] this means capture everything until \n.
Also you had an issue with your variable declaration. a = "..." is not valid, the spaces are waste. so the correct one is 'a=".."`.
Using double quotation is wrong too, this will replaces dollar sign with an empty variable (evaluation)

Bash only get the first matched result when use regex

There's a string example
"j2sdk/1.8.0_25-static j2sdk/1.8.0_45 j2sdk/1.8.0_p120 j2sdk/1.8.0_40 j2sdk/1.8.0_51"
I want to find the ones matched with format j2sdk/1.8.0_xxx, but xxx only with digits, here, I want below strings be matched
j2sdk/1.8.0_45
j2sdk/1.8.0_40
j2sdk/1.8.0_51
I wrote below code, but when run, it only get the first matched j2sdk/1.8.0_45, anything wrong with my code?
avail_versions="j2sdk/1.8.0_25-static j2sdk/1.8.0_45 j2sdk/1.8.0_p120 j2sdk/1.8.0_40 j2sdk/1.8.0_51"
patern='j2sdk\/1\.8\.0_[0-9]+\s+'
if [[ $avail_versions =~ $patern ]];then
echo matched
echo ${BASH_REMATCH[0]}
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
fi
The results is that BASH_REMATCH[0] is j2sdk/1.8.0_45, BASH_REMATCH[1] and [2] are empty
I expected I can get them in BASH_REMATH[1],BASH_REMATH[2],BASH_REMATH[3].
Is there other way in Bash I can get expected matches.
Thanks
I split the input at spaces and add back the space after each word.
for s in $avail_versions ; do
s="$s "
if [[ $s =~ $patern ]];then
echo ${BASH_REMATCH[0]}
fi
done
j2sdk/1.8.0_45
j2sdk/1.8.0_40
j2sdk/1.8.0_51

RegEx : How can I extract a certain part and modify it?

I'd like to extract a certain part of a string and modify it by using a regular expression.
A given string is TestcaseVzwPerformance_8_2_1_4_1_FDD2.
I'd like to extract the part 8_2_1_4_1 from the string and then replace the underscores _ with dots . So the expected result needs to be 8.2.1.4.1.
The numbers and length of the given string can be different.
For example,
Given string // Expected result
TestcaseVzwCqi_3_9_Test2 // 3.9
TestcaseVzwSvd1xRttAclr_6_6_2_3 // 6.6.2.3
TestcaseVzwCsiFading_9_4_1_1_1_FDD4 // 9.4.1.1.1
Here is my RegEx:
((?:\D{0,}_)(\d(_\d)*)(.*))
The numbered capturing group - $2 - contains 8_2_1_4_1 but with underscores.
Can I replace the underscores with dots?
It needs to be done in one RegEx and a Replace.
regex cannot modify, for example with sed
echo TestcaseVzwPerformance_8_2_1_4_1_FDD2 |
sed -E 's/[^_]*_(([_0-9])+)_.*/\1/;s/_/./g'
8.2.1.4.1
If you have a Bash string, you can use a Bash regex to capture and Bash parameter expansions to replace:
$ s="TestcaseVzwSvd1xRttAclr_6_6_2_3"
$ [[ $s =~ ^[^_]*_([[:digit:]_]+)_* ]] && tmp=${BASH_REMATCH[1]//_/.} && echo "${tmp%.}"
6.6.2.3
Which can be in a loop:
while read -r line; do
if [[ $line =~ ^[^_]*_([[:digit:]_]+)_* ]]; then
tmp=${BASH_REMATCH[1]//_/.}
echo "\"$line\" => ${tmp%.}"
fi
done <<< 'Given string
TestcaseVzwCqi_3_9_Test2
TestcaseVzwSvd1xRttAclr_6_6_2_3
TestcaseVzwCsiFading_9_4_1_1_1_FDD4'
Prints:
"TestcaseVzwCqi_3_9_Test2" => 3.9
"TestcaseVzwSvd1xRttAclr_6_6_2_3" => 6.6.2.3
"TestcaseVzwCsiFading_9_4_1_1_1_FDD4" => 9.4.1.1.1
You can use the same loop to process a file.
If you have a file, you may as well use gawk:
$ awk 'BEGIN{FPAT="_[[:digit:]_]+"}
/_[[:digit:]]/ {sub(/^_/,"", $1); sub(/_$/,"",$1); gsub(/_/,".",$1); print $1}' file
3.9
6.6.2.3
9.4.1.1.1

regex in bash expression

I have 2 questions about regex in bash expression.
1.non-greedy mode
local temp_input='"a1b", "d" , "45"'
if [[ $temp_input =~ \".*?\" ]]
then
echo ${BASH_REMATCH[0]}
fi
The result is
"a1b", "d" , "45"
In java
String str = "\"a1b\", \"d\" , \"45\"";
Matcher m = Pattern.compile("\".*?\"").matcher(str);
while (m.find()) {
System.out.println(m.group());
}
I can get the result below.
"a1b"
"d"
"45"
But how can I use non-greedy mode in bash?
I can understand why the \"[^\"]\" works.
But I don't understand why does the \".?\" do not work.
2.global matches
local temp_input='abcba'
if [[ $temp_input =~ b ]]
then
#I wanna echo 2 b here.
#How can I set the global flag?
fi
How can I get all the matches?
ps:I only wanna use regex.
For the second question, sorry for the confusing.
I want to echo "b" and "b", not count "b".
Help!
For your first question, an alternative is this:
[[ $temp_input =~ \"[^\"]*\" ]]
For your second question, you can do this:
temp_input=abcba
t=${temp_input//b}
echo "$(( (${#temp_input} - ${#t}) / 1 )) b"
Or for convenience place it on a function:
function count_matches {
local -i c1=${#1} c2=${#2}
if [[ c2 -gt 0 && c1 -ge c2 ]]; then
local t=${1//"$2"}
echo "$(( (c1 - ${#t}) / c2 )) $2"
else
echo "0 $2"
fi
}
count_matches abcba b
Both produces output:
2 b
Update:
If you want to see the matches you can use a function like this. You can also try other regular expressions not just literals.
function find_matches {
MATCHES=()
local STR=$1 RE="($2)(.*)"
while [[ -n $STR && $STR =~ $RE ]]; do
MATCHES+=("${BASH_REMATCH[1]}")
STR=${BASH_REMATCH[2]}
done
}
Example:
> find_matches abcba b
> echo "${MATCHES[#]}"
b b
> find_matches abcbaaccbad 'a.'
> echo "${MATCHES[#]}"
ab aa ad
Your regular expression matches the string starting with the first quotation mark (before ab) and ending with the last quotation mark (after ef). This is greedy, even though your intention was to use a non-greedy match (*?). It seems that bash uses POSIX.2 regular expression (check your man 7 regex), which does not support a non-greedy Kleene star.
If you want just "ab", I'd suggest a different regular expression:
if [[ $temp_input =~ \"[^\"]*\" ]]
which explicitly says that you don't want quotation marks inside your strings.
I don't understand what you mean. If you want to find all matches (and there are two occurrences of b here), I think you cannot do it with a single ~= match.
This is my first post, and I am very amateur at bash, so apologies if I haven't understood the question, but I wrote a function for non-greedy regex using entirely bash:
regex_non_greedy () {
local string="$1"
local regex="$2"
local replace="$3"
while [[ $string =~ $regex ]]; do
local search=${BASH_REMATCH}
string=${string/$search/$replace}
done
printf "%s" "$string"
}
Example invocation:
regex_non_greedy "all cats are grey and green" "gre+." "white"
Which returns:
all cats are white and white

Extract version using Regular expressions in bash

All I need to do is extract the versioning information from the following file:
my_archive_1.1.1.201_x86_64.tgz
I am trying to extract both the version number which is 1.1.1 and the release number which is 201. Normally I use python for these purposes, but I have been asked not to. How do I do it by just using bash? The filename will always be of the form
([A-Za-z_]+)_([0-9]+\.[0-9]+\.[0-9]+)\.([0-9]+)_x86_64\.tgz
The groups are in parenthesis. I need the second and third groups if you start counting from 1.
Use pure BASH:
s='my_archive_1.1.1.201_x86_64.tgz'
[[ $s =~ ^[^_]+_[^_]+_(([^.]+\.){2}[^.]+)\.([^_]+) ]] && \
echo "${BASH_REMATCH[1]}, ${BASH_REMATCH[3]}"
OUTPUT:
1.1.1, 201
Using your own regex:
[[ $s =~ ([A-Za-z_]+)_([0-9]+\.[0-9]+\.[0-9]+).([0-9]+)_x86_64\.tgz ]] && \
echo "${BASH_REMATCH[2]}, ${BASH_REMATCH[3]}"
You can use simple string substitutions to extract substrings. You don't really need regular expressions. As a bonus, this is portable to other POSIX shells. Whether this is simpler or not is a matter of taste, and also depends on the problem.
s='my_archive_1.1.1.201_x86_64.tgz'
# ${s%%_[0-9]*} is 'my-archive'
s=${s#${s%%_[0-9]*}_}
# s='1.1.1.201_x86_64.tgz'
s=${s%%_*}
# s='1.1.1.201'
release=${s##*.}
version=${s%."$release"}
You might also want to experiment with set:
s='my_archive_1.1.1.201_x86_64.tgz'
oldIFS=$IFS
IFS=_
set $s
# $1 = my, $2=archive, $3=1.1.1.201, $4=x86, $5=64.tgz
# Shift until $1 contains only numbers and periods
while $1; do
case $1 in *[!.0-9]* ) shift ;; *) break ;; esac
done
IFS=.
set $1
version=$1.$2.$3
release=$4
IFS=$oldIFS
Another alternative without using regular expressions:
split=`echo "my_archive_1.1.1.201_x86_64.tgz" | cut -d'_' -f3`
versionnumber=`echo $split | cut -d'.' -f1,2,3`
releasenumber=`echo $split | cut -d'.' -f4`
echo "$versionnumber $releasenumber"