Removing a regular expression from a bash variable - regex

I have a bash variable that looks like
aaa-bb-cccc-r17, m12w_pp_r2, z-r123, etc.
I am looking to extract everything up to the (final) -rNNN (any number of digits), or in other words, remove the final -rNNN. If the variable does not end in -r followed by a number, I want to leave it unchanged.
I tried ${the_variable%-r[0-9]*} but it turns out the * is the shell * ("match anything") rather than the regular expression * ("match any number of occurrences of previous element"). Using + instead of * ("one or more") matched nothing.
Any solution (along this line or any other)?

You can do this with extended pattern support.
$ shopt -s extglob
$ the_variable=aaa-bb-cccc-r17
$ echo "${the_variable%-r+([0-9])}"
aaa-bb-cccc

As you found parameter expansion can't, directly, do what you want here.
You could play games with stripping everything up to the last - in the value and then checking that the remaining string matches your desired pattern but at that point you might as well just do the pattern match directly and be done.
$ pat='(.*)-r[0-9]*$'
$ var='aaa-bb-cccc-r17, m12w_pp_r2, z-r123'
$ [[ $var =~ $pat ]] && var=${BASH_REMATCH[1]}
$ declare -p var
declare -- var="aaa-bb-cccc-r17, m12w_pp_r2, z"
$ var='aaa-bb-cccc-r17, m12w_pp_r2, z-r123g'
$ [[ $var =~ $pat ]] && var=${BASH_REMATCH[1]}
$ declare -p var
declare -- var="aaa-bb-cccc-r17, m12w_pp_r2, z-r123g"

Related

Bash script with regex and capturing group

I'm working on a bash script to rename automatically files on my Synology NAS.
I have a loop for the statement of the files and everything is ok until I want to make my script more efficient with regex.
I have several bits of code which are working like as expected:
filename="${filename//[-_.,\']/ }"
filename="${filename//[éèēěëê]/e}"
But I have this:
filename="${filename//t0/0}"
filename="${filename//t1/1}"
filename="${filename//t2/2}"
filename="${filename//t3/3}"
filename="${filename//t4/4}"
filename="${filename//t5/5}"
filename="${filename//t6/6}"
filename="${filename//t7/7}"
filename="${filename//t8/8}"
filename="${filename//t9/9}"
And, I would like to use captured group to have something like this:
filename="${filename//t([0-9]{1,2})/\1}"
filename="${filename//t([0-9]{1,2})/${BASH_REMATCH[1]}}"
I've been looking for a working syntax without success...
The shell's parameter expansion facility does not support regular expressions. But you can approximate it with something like
filename=$(sed 's/t\([0-9]\)/\1/g' <<<"$filename")
This will work regardless of whether the first digit is followed by additional digits or not, so dropping that requirement simplifies the code.
If you want the last or all t[0-9]{1,2}s replaced:
$ filename='abt1cdt2eft3gh'; [[ "$filename" =~ (.*)t([0-9]{1,2}.*) ]] && filename="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"; echo "$filename"
abt1cdt2ef3gh
$ filename='abt1cdt2eft3gh'; while [[ "$filename" =~ (.*)t([0-9]{1,2}.*) ]]; do filename="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"; done; echo "$filename"
ab1cd2ef3gh
Note that the "replace all" case above would keep iterating until all t[0-9]{1,2}s are changed, even ones that didn't exist in the original input but were being created by the loop, e.g.:
$ filename='abtt123de'; while [[ "$filename" =~ (.*)t([0-9]{1,2}.*) ]]; do filename="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"; echo "$filename"; done
abt123de
ab123de
whereas the sed script in #tripleee's answer would not do that:
$ filename='abtt123de'; filename=$(sed 's/t\([0-9]\)/\1/g' <<<"$filename"); echo "$filename"
abt123de

Check if a string starts with a possible range of values and extract the match

Given the following JavaScript code:
const branch = 'PRODUCT-1234-foobar';
const match = (/^(?:product|core|shop)-\d+/i).exec(branch);
const name = match ? match[0] : 'unknown';
Given a branch that starts with either PRODUCT-, CORE- or SHOP- and is followed by at least one number, this will give me the first part of the name of the branch, in this case PRODUCT-1234.
I've been trying to do the same in Bash, but I cannot seem to make it work. How would I do this? The answer should preferably be case-insensitive.
You may use shopt -s nocasematch to make the subsequent match case insensitive (see this source).
Use
shopt -s nocasematch
branch='PRODUCT-1234-foobar'
reg="^(product|core|shop)-[0-9]+"
if [[ $branch =~ $reg ]]; then
echo ${BASH_REMATCH[0]};
fi
See the online demo.
Pattern details
^ - string start
(product|core|shop) - one of the alternatives
- - a hyphen
[0-9]+ - one or more digits.
The ${BASH_REMATCH[0]} stands for the whole match value.

regex - Edit Bash arrays in text file

I would like to change the following piece:
# Source
source=('10-nvidia-drm-outputclass.conf'
'20-nvidia.conf'
'linux-4.11.patch')
source_i686=("http://us.download.nvidia.com/XFree86/Linux-x86/$pkgver/NVIDIA-Linux-x86-$pkgver.run")
source_x86_64=("http://us.download.nvidia.com/XFree86/Linux-x86_64/$pkgver/$_pkg.run")
md5sums=('4f5562ee8f3171769e4638b35396c55d'
'2640eac092c220073f0668a7aaff61f7'
'897d9775dc484ab37934e7b102c5b325')
md5sums_i686=('8825cec1640739521689bd80121d1425')
md5sums_x86_64=('0e9590d48703c8baa034b6f0f8bbf1e5')
[[ $_pkg = NVIDIA-Linux-x86_64-$pkgver ]] && md5sums_x86_64=('1b74150e84fd99cc1207a51b9327112c')
into:
# Source
source=('10-nvidia-drm-outputclass.conf'
'20-nvidia.conf')
# 'linux-4.11.patch')
source_i686=("http://us.download.nvidia.com/XFree86/Linux-x86/$pkgver/NVIDIA-Linux-x86-$pkgver.run")
source_x86_64=("http://us.download.nvidia.com/XFree86/Linux-x86_64/$pkgver/$_pkg.run")
md5sums=('4f5562ee8f3171769e4638b35396c55d'
'2640eac092c220073f0668a7aaff61f7')
# '897d9775dc484ab37934e7b102c5b325')
md5sums_i686=('8825cec1640739521689bd80121d1425')
md5sums_x86_64=('0e9590d48703c8baa034b6f0f8bbf1e5')
[[ $_pkg = NVIDIA-Linux-x86_64-$pkgver ]] && md5sums_x86_64=('1b74150e84fd99cc1207a51b9327112c')
..to comment out the last item in source and md5sums and close the arrays ()).
I only know how to do 1/4th and comment out the 'linux-4.11.patch') with:
sed "/'linux-.*patch'/s/^/#/"
Sed version:
$ sed --version | head -1
sed (GNU sed) 4.4
Assuming no () characters inside the array elements and no NUL characters in file
$ sed -zE 's/((source|md5sums)=\([^)]*)\n([^)\n]*\))/\1)\n#\3/g' input_file
# Source
source=('10-nvidia-drm-outputclass.conf'
'20-nvidia.conf')
# 'linux-4.11.patch')
source_i686=("http://us.download.nvidia.com/XFree86/Linux-x86/$pkgver/NVIDIA-Linux-x86-$pkgver.run")
source_x86_64=("http://us.download.nvidia.com/XFree86/Linux-x86_64/$pkgver/$_pkg.run")
md5sums=('4f5562ee8f3171769e4638b35396c55d'
'2640eac092c220073f0668a7aaff61f7')
# '897d9775dc484ab37934e7b102c5b325')
md5sums_i686=('8825cec1640739521689bd80121d1425')
md5sums_x86_64=('0e9590d48703c8baa034b6f0f8bbf1e5')
[[ $_pkg = NVIDIA-Linux-x86_64-$pkgver ]] && md5sums_x86_64=('1b74150e84fd99cc1207a51b9327112c')
-z will cause whole file to be read at once
-E extended regular expression
((source|md5sums)=\([^)]*)\n([^)\n]*\)) will cause source=(...) or md5sums=(...) match in two halves, with second half containing last line
\1)\n#\3 replace as per requirement
If number of lines is known to be fixed number,
sed '/^source=\|^md5sums=/ {N;N; s/\n/)\n#/2}' input_file
where N;N and 2 will be number of lines minus one

Regular expression Bash issue

I have to match a string composed of only lowercase characters repeated 2 times , for example ballball or printprint. For example the word ball is not accepted because is not repeated 2 time.
For this reason I have this code:
read input
expr='^(([a-z]*){2})$'
if [[ $input =~ $expr ]]; then
echo "OK string"
exit 0
fi
exit 10
but it doesn't work , for example if I insert ball the script prints "OK string".
What do I wrong?
Not all Bash versions support backreferences in regexes natively. If yours doesn't, you can use an external tool such as grep:
read input
re='^\([a-z]\+\)\1$'
if grep -q "$re" <<< "$input"; then
echo "OK string"
exit 0
fi
exit 1
grep -q is silent and has a successful exit status if there was a match. Notice how (, + and ) have to be escaped for grep. (grep -E would understand () without escaping.)
Also, I've replaced your * with + so we don't match the empty string.
Alternatively: your requirement means that a matching string has two identical halves, so we can check for just that, without any regexes:
read input
half=$(( ${#input} / 2 ))
if (( half > 0 )) && [[ ${input:0:$half} = ${input:$half} ]]; then
echo "OK string"
fi
This uses Substring Expansion; the first check is to make sure that the empty string doesn't match.
Your requirement is to match strings made of two repeated words. This is easy to do by just checking if the first half of your string is equal to the remaining part. No need to use regexps...
$ var="byebye" && len=$((${#var}/2))
$ test ${var:0:$len} = ${var:$len} && { echo ok ; } || echo no
ok
$ var="abcdef" && len=$((${#var}/2))
$ test ${var:0:$len} = ${var:$len} && { echo ok ; } || echo no
no
The regex [a-z]* will match any alphanumeric or empty string.
([a-z]*){2} will match any two of those.
Ergo, ^(([a-z]*){2})$ will match any string containing zero or more alphanumeric characters.
Using the suggestion from #hwnd (replacing {2} with \1) will enforce a match on two identical strings.
N.B: You will need a fairly recent version of bash. Tested in bash 4.3.11.

Prefix and postfix elements of a bash array

I want to pre- and postfix an array in bash similar to brace expansion.
Say I have a bash array
ARRAY=( one two three )
I want to be able to pre- and postfix it like the following brace expansion
echo prefix_{one,two,three}_suffix
The best I've been able to find uses bash regex to either add a prefix or a suffix
echo ${ARRAY[#]/#/prefix_}
echo ${ARRAY[#]/%/_suffix}
but I can't find anything on how to do both at once. Potentially I could use regex captures and do something like
echo ${ARRAY[#]/.*/prefix_$1_suffix}
but it doesn't seem like captures are supported in bash variable regex substitution. I could also store a temporary array variable like
PRE=(${ARRAY[#]/#/prefix_})
echo ${PRE[#]/%/_suffix}
This is probably the best I can think of, but it still seems sub par. A final alternative is to use a for loop akin to
EXPANDED=""
for E in ${ARRAY[#]}; do
EXPANDED="prefix_${E}_suffix $EXPANDED"
done
echo $EXPANDED
but that is super ugly. I also don't know how I would get it to work if I wanted spaces anywhere the prefix suffix or array elements.
Bash brace expansion don't use regexes. The pattern used is just some shell glob, which you can find in bash manual 3.5.8.1 Pattern Matching.
Your two-step solution is cool, but it needs some quotes for whitespace safety:
ARR_PRE=("${ARRAY[#]/#/prefix_}")
echo "${ARR_PRE[#]/%/_suffix}"
You can also do it in some evil way:
eval "something $(printf 'pre_%q_suf ' "${ARRAY[#]}")"
Your last loop could be done in a whitespace-friendly way with:
EXPANDED=()
for E in "${ARRAY[#]}"; do
EXPANDED+=("prefix_${E}_suffix")
done
echo "${EXPANDED[#]}"
Prettier but essentially the same as the loop solution:
$ ARRAY=(A B C)
$ mapfile -t -d $'\0' EXPANDED < <(printf "prefix_%s_postfix\0" "${ARRAY[#]}")
$ echo "${EXPANDED[#]}"
prefix_A_postfix prefix_B_postfix prefix_C_postfix
mapfile reads rows into elements of an array. With -d $'\0' it instead reads null-delimited strings and -t omits the delimiter from the result. See help mapfile.
For arrays:
ARRAY=( one two three )
(IFS=,; eval echo prefix_\{"${ARRAY[*]}"\}_suffix)
For strings:
STRING="one two three"
eval echo prefix_\{${STRING// /,}\}_suffix
eval causes its arguments to be evaluated twice, in both cases first evaluation results in
echo prefix_{one,two,three}_suffix
and second executes it.
For array case subshell is used to avoid overwiting IFS
You can also do this in zsh:
echo ${${ARRAY[#]/#/prefix_}/%/_suffix}
Perhaps this would be the most elegant solution:
$ declare -a ARRAY=( one two three )
$ declare -p ARRAY
declare -a ARRAY=([0]="one" [1]="two" [2]="three")
$
$ IFS=$'\n' ARRAY=( $(printf 'prefix %s_suffix\n' "${ARRAY[#]}") )
$
$ declare -p ARRAY
declare -a ARRAY=([0]="prefix one_suffix" [1]="prefix two_suffix" [2]="prefix three_suffix")
$
$ printf '%s\n' "${ARRAY[#]}"
prefix one_suffix
prefix two_suffix
prefix three_suffix
$
By using IFS=$'\n' in front of the array reassignment (being valid only for this assignment line), it is possible to preserve spaces in both prefix & suffix as well as array element strings.
Using "printf" is rather handy, because it allows to apply the format string (1st argument) to each additional string argument supplied to the call of "printf".
I have exactly the same question, and I come up with the following solution using sed's word boundary match mechanism:
myarray=( one two three )
newarray=( $(echo ${myarray[*]}|sed "s/\(\b[^ ]\+\)/pre-\1-post/g") )
echo ${newarray[#]}
> pre-one-post pre-two-post pre-three-post
echo ${#newarray[#]}
> 3
Waiting for more elegant solutions...