How do I remove multiple bracket within nested bracket using sed? - regex

I want to make a program editing file code with shell programming
there is command 'Remove ${} in arithmetic expansion$(())'
and I have a problem in implementation.
I'm going to make bash shell code below
cnt=$(( ${cnt} + ${cnt123} ))
to
cnt=$(( cnt + cnt123 ))
I want to remove command substitution bracket in arithmetic Expansion bracket
I tried to do with this regex expression:
sed -Ei 's/(\$\(\()([^\)]*?)\$\{([^\}]+?)\}(.*?)(\)\))/\1\2\3\4\5/g' $file
but, it just found a longest one.(even though there are another match before matched one)
if you want to see visualized regex expression, click this link visualized image
result showed like this:
cnt=$(( ${cnt} + cnt123 ))
How to do remove internal bracket in nested bracket?
(I should just use awk or sed, but if it could be impossible, it doesn't matter using other bash command)
works example:
s=$(( ${s} ** 2 ))
to
s=$(( s ** 2 ))
sum=$(( ${a} + ${b} ))
to
sum=$(( a + b ))
echo $(( (${var} * ${var2}) / ${var3} ))
to
echo $(( (var * var2) / var3 ))
echo ${d} $((${t1} + ${t2})) ${e}
to
echo ${d} $(( t1 + t2 )) ${e}
:my sample input file (it doesn't matter what it do for)
#! /bin/bash
cnt=0
cnt123=1
for filename in *
do
fname=$(basename $filename)
cname=$(echo $fname | tr A-Z a-z)
if [ "$fname" != "$cname" ]
then
if [ -e "$cname" ]
then
echo "$cname already exists"
exit 1
fi
echo "$fname is renamed $cname"
mv $fname $cname
cnt=$(( ${cnt}+ ${cnt123} ))
fi
done
echo "Total count: $cnt"
exit 0

As it is not an easy task for sed/awk regex and relevant functions,
please let me show a perl solution, although perl is not tagged in
your question.
perl -i -pe 's/(?<=\$\(\().+?(?=\)\))/ $_ = $&; s#\$\{(.+?)}#$1#g; $_ /ge' "$file"
Input file example:
s=$(( ${s} ** 2 ))
sum=$(( ${a} + ${b} ))
echo $(( (${var} * ${var2}) / ${var3} ))
echo ${d} $((${t1} + ${t2})) ${e}
Modified result:
s=$(( s ** 2 ))
sum=$(( a + b ))
echo $(( (var * var2) / var3 ))
echo ${d} $((t1 + t2)) ${e}
The perl substitution s/pattern/expression/e replaces the matched
pattern with the perl expression instead of literal (or fixed)
replacement. You can perform a dynamic replacement with this mechanism.
(?<=\$\(\().+?(?=\)) matches a substring which is preceded by $((
and followed )). Then the matched substring will be the content within
$(( .. )). The perl variable $& is assigned to it.
The expression $_ = $&; s#\$\{(.+?)}#$1#g; $_ is a perl code to remove
the paired ${ and } from the matched substring above. The g option
after the delimiter # works as that of sed.

Depending on the context you may want to limit the extent of the match to only alphanumeric chars
sed -Ei.bak 's/\$\{([[:alnum:]]+)\}/\1/g'
to avoid unintentionally matching something else.

Using sed
$ sed -Ei.bak 's/\$\{([^}]*)}/\1/g' input_file
cnt=$(( cnt + cnt123 ))

printf="cnt=\$(( \${cnt} + \${cnt123} ))" | tr -d '{' | tr -d '}'
I used printf here because it is slightly more trustworthy than echo. Make sure you escape $, otherwise you will get an error because of bash trying to interpret the variables.

Related

Bash regex not recognizing a single space " "

I'm trying to solve a problem that appeared in my script which doesn't let me match the date+time (YYYY-MM-DD HH:MM:SS) inside a for loop
list='"dt_txt":"2022-06-03 21:00:00"},'
regex_datehour='"dt_txt":"([0-9,-]*.[0-9,:]*)'
for i in $list; do
[[ $i =~ $regex_datehour ]] && echo "${BASH_REMATCH[1]}"
done
It seems that the "." between the two pair of brackets it's not recognizing the space! that's because inside of the list, if I replace the empty space between the date and the time by a _, it works as intended! list='"dt_txt":"2022-06-03_21:00:00"},'
desired output:
2022-06-03 21:00:00
what I get:
2022-06-03
The problem here is one that catches a lot of people, and that is whitespace breaking. In the for loop, your $list variable is not quoted, and it contains a space:
$ list='"dt_txt":"2022-06-03 21:00:00"},'
$ for i in $list ; do echo "i = $i" ; done ;
i = "dt_txt":"2022-06-03
i = 21:00:00"},
Make sure to put double-quotes around all strings that contain variables except regexes:
Using an array for list, which is what makes sense when using the for loop from your original code, it would look something like this:
#!/usr/bin/env bash
# filename: re.sh
list=(
'"dt_txt":"2022-06-03 21:00:00"},'
'"dt_txt":"2022-06-03 22:00:00"},'
'"dt_txt":"2022-06-03 23:00:00"},'
)
regex_datehour='"dt_txt":"([0-9,-]*.[0-9,:]*)'
for i in "${list[#]}" ; do
[[ "$i" =~ $regex_datehour ]] && echo "${BASH_REMATCH[1]}"
done
$ ./re.sh
2022-06-03 21:00:00
2022-06-03 22:00:00
2022-06-03 23:00:00

Extracting filesnames in bash with regex

Can please someone help me to set up a regular expression.
I have a large LaTeX3 TeXDoc file. LaTeX3 TeXDoc defines the macro \TestFiles{}, which should be used, to list the names of files, which itself should be used as an unit tests. You can name more than one file between the braces. So \TestFiles{foo-bar} and \TestFiles{foo-bar, bar+baz,foo_bar_baz} are syntactical correct use cases for this macro.
I would like to write a bash script, to extract all the uni test files, named in the \TestFiles{} macros, to compile them with pdflatex and check, if pdflatex will be able to produce an output file successfully.
I have something like this in my script:
function get_filenames () {
## This regex works but is not sensible enough
# regex='\\TestFiles{(.*)}'
## This works also, but is again not precise enough
regex='\\TestFiles{([0-9a-zA-Z+-_, ]*)}'
## This should give more than one matching group
## (separated by ", " or ","), but this regex doesn't
## work. I have no idea why or how to modify, to get
## it working
while read -r line ; do
if [[ $line =~ $regex ]] ; then
i=1
while [ $i -le 3 ]; do
echo "Match $i: \"${BASH_REMATCH[$i]}\""
i=$(( i + 1 ))
done
echo
fi
done < mystyle.dtx
}
Here is an excerpt of the DTX file
\TestFiles{foo-bar}
\TestFiles{foo-bar, bar+baz,foo_bar_baz}
(You can store this as mystyle.dtx, in order to reproduce the next example.)
Using the above noted examples, my script gives me the following results:
get_filenames
Match 1: "foo-bar"
Match 2: ""
Match 3: ""
Match 1: "foo-bar, bar+baz,foo_bar_baz"
Match 2: ""
Match 3: ""
I wasn't able, to modify my regex expression, to split the content of the last \TestFiles{foo-bar, bar+baz,foo_bar_baz} example into three matching results.
I tried a regular expression like this regex='\\TestFiles{([[:alnum:]+-_]*)[,]+[ ]*}'. I thought the [:alnum:]+-_]* would match the filenames. As far as I understand regular expressions, the (...) should form a group, that should be listed afterwards in the bash array BASH_REMATCH[$i].
The part [,]+ should reflect that every file name must be separated by at least one comma. Between the filenames there might be some white space, so something like [[:space:]]* or at least [ ]* should represent this. The quantifier * means any repetition, ranging from 0 to ..., while + should at least appear one or more times.
But that regular expression did not work at all, if had no matching results.
How must regex be defined, to store each filenames as a matching group? I am searching for the correct regular expression, to get this result:
get_filenames
Match 1: "foo-bar"
Match 2: ""
Match 3: ""
Match 1: "foo-bar"
Match 2: "bar+baz"
Match 3: "foo_bar_baz"
EDIT: in my real world files, there may be (and are) more than tree test files.
Thanks in advance.
## This should give more than one matching group
regex='\\TestFiles{([0-9a-zA-Z+-_, ]*)}'
The element of BASH_REMATCH with index n is the portion of the string matching the nth parenthesized subexpression.
Your regex only has 1 "parenthesized subexpression" - which is why everything ends up in BASH_REMATCH[1]
$ regex='\\TestFiles{([0-9a-zA-Z+-_, ]*)}'
$ [[ $line =~ $regex ]]
$ declare -p BASH_REMATCH
declare -a BASH_REMATCH=(
[0]="\\TestFiles{foo-bar, bar+baz,foo_bar_baz}"
[1]="foo-bar, bar+baz,foo_bar_baz"
)
As you're trying to match an unknown number of filenames you would have to "dynamically" create your regex so it contains the needed amount of groups.
$ regex='\\TestFiles{([^, }]+)([,}] ?)'
$ [[ $line =~ $regex ]]
$ declare -p BASH_REMATCH
declare -a BASH_REMATCH=(
[0]="\\TestFiles{foo-bar, "
[1]="foo-bar"
[2]=", "
)
Add another group and see if it still matches:
$ regex+='([^, }]+)([,}] ?)'
$ [[ $line =~ $regex ]]
$ declare -p BASH_REMATCH
declare -a BASH_REMATCH=(
[0]="\\TestFiles{foo-bar, bar+baz,"
[1]="foo-bar"
[2]=", "
[3]="bar+baz"
[4]=","
)
You could keep looping until the regex no longer matches - or perhaps a simpler approach would be to count the number of , characters on the line.
regex='\\TestFiles{([^, }]+)([,}] ?)'
line='\TestFiles{foo-bar, bar+baz,foo_bar_baz}'
commas=${line//[!,]}
for ((i=0; i<${#commas}; i++))
do
regex+='([^, }]+)([,}] ?)'
done
[[ $line =~ $regex ]]
Which results in:
$ declare -p BASH_REMATCH
declare -a BASH_REMATCH=(
[0]="\\TestFiles{foo-bar, bar+baz,foo_bar_baz}"
[1]="foo-bar"
[2]=", "
[3]="bar+baz"
[4]=","
[5]="foo_bar_baz"
[6]="}"
)
Alternative approach using IFS
You can set IFS=', ' and have bash do the splitting for you.
line='\TestFiles{foo-bar, bar+baz,foo_bar_baz}'
[[ $line = \\TestFiles{* ]] && {
# Remove leading '\Testfiles{'
# Remove trailing }
line=${line#*{}
line=${line%}}
IFS=', ' read -a filenames <<< "$line"
declare -p filenames
}
declare -a filenames=([0]="foo-bar" [1]="bar+baz" [2]="foo_bar_baz}}")
Use set with IFS to split each line into new positional parameters. Assign $# to an array so that elements can be accessed by index. Trying this with $# directly results in a bad substitution error.
get-filenames.sh
#!/usr/bin/env bash
get_filenames() {
local IFS=' {},'
declare -a names
while read -r line; do
set -- $line
names=($#)
test "${names[0]}" == '\TestFiles' && {
for i in {1..3}; do
printf "Match %i: \"%s\"\n" $i ${names[$i]}
done
}
echo
done < 'mystyle.dtx'
}
get_filenames
mystyle.dtx
\TestFiles{foo-bar}
\TestFiles{foo-bar, bar+baz,foo_bar_baz}
output
Match 1: "foo-bar"
Match 2: ""
Match 3: ""
Match 1: "foo-bar"
Match 2: "bar+baz"
Match 3: "foo_bar_baz"
EDIT (without external programs, though it's rather impractical, and tied to exactly three matches)
function get_filenames () {
p='([^, }]*) *,? *'
regex="\\TestFiles\{$p$p$p"
while read -r line ; do
if [[ $line =~ $regex ]] ; then
i=1
while [ $i -le 3 ]; do
echo "Match $i: \"${BASH_REMATCH[$i]}\""
i=$(( i + 1 ))
done
echo
fi
done < mystyle.dtx
}
If you really need to output exactly three file names (even empty) for each '\TestFiles' row then here's the code.
function get_filenames () {
MAX_FILES_CNT=3
IFS=$'\n'
for line in $(grep -oP '\\TestFiles\{\K[^}]*' < mystyle.dtx); do
filenames=()
for filename in $(grep -m $MAX_FILES_CNT -oP "[^, ]+" <<< "$line"); do
filenames+=("$filename")
done
i=0
while [ $i -lt $MAX_FILES_CNT ]; do
echo "Match $(($i+1)): \"${filenames[i]}\""
i=$(( i + 1 ))
done
echo ""
done
unset IFS
}
Match 1: "foo-bar"
Match 2: ""
Match 3: ""
Match 1: "foo-bar"
Match 2: "bar+baz"
Match 3: "foo_bar_baz"
By the way, BASH_REMATCH is no good for this task, cause it captures only last rematch. Look
[[ "asdf" =~ (.)* ]]
echo "${BASH_REMATCH[#]}"
asdf f
Also I would recommend to read this question https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice
Suggesting an awk script that would do the trick on one or more files.
get_filenames.awk
/\\TestFiles{[^}]*}/ { # handle only lines matching regex filter
filesCount = split($0, fileNamesArr, "\\\\TestFiles{[ ]*|[ ]*,[ ]*|[ ]*}"); # parse line to array fileNamesArr
for (i = 2; i < filesCount; i++) { # read elements 2 --> filesCount - 1
printf("Match %d in %s: \"%s\"\n", i - 1, FILENAME, fileNamesArr[i]); # format print fileNames
}
print"";
}
test file: input.1.txt
some text line 1
\TestFiles{foo-bar0}
some text \TestFiles{foo-bar1, bar+baz1, foo_bar_baz1}
some text \TestFiles{foo-bar2 ,bar+baz2 ,foo_bar_baz2 }
some text \TestFiles{ foo-bar3 , bar+baz3 , foo_bar_baz3 } some text
line 4
test file: input.2.txt
\TestFiles{file10, file11}
text
text \TestFiles{ file20 } some text
text\TestFiles{file30,file31,file32 }text
text
testingget_filenames.awk
awk -f get_filenames.awk input.1.txt input.2.txt
Match 1 in input.1.txt: "foo-bar0"
Match 1 in input.1.txt: "foo-bar1"
Match 2 in input.1.txt: "bar+baz1"
Match 3 in input.1.txt: "foo_bar_baz1"
Match 1 in input.1.txt: "foo-bar2"
Match 2 in input.1.txt: "bar+baz2"
Match 3 in input.1.txt: "foo_bar_baz2"
Match 1 in input.1.txt: "foo-bar3"
Match 2 in input.1.txt: "bar+baz3"
Match 3 in input.1.txt: "foo_bar_baz3"
Match 1 in input.2.txt: "file10"
Match 2 in input.2.txt: "file11"
Match 1 in input.2.txt: "file20"
Match 1 in input.2.txt: "file30"
Match 2 in input.2.txt: "file31"
Match 3 in input.2.txt: "file32"
I believe this is the regular expression you're looking for:
(?<=\\TestFiles{.*)([\w\d\-\+_]+)[, }]+
You can see it working, modify it and have an explanation on what it does in the following link: https://regex101.com/r/0W8PBi/1

'$' in regexp in bash

I really don't know what I'm doing.
In variable a, I want to find the first appearance of '$' after the first appearance of 'Bitcoin', and print everything after it until the first newline.
I have the following code:
a = 'something Bitcoin something againe $jjjkjk\n againe something'
if [[ $a =~ .*Bitcoin.*[\$](.*).* ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "no"
fi
In this example I would like to get 'jjjkjk'. All I get is 'no'.
This code might be really flawed, I have no experience in this. I think tho the problem might be with the '$' sign. Please help!
Properly handle newlines in bash with ANSI-C Quoting -- \n sequences become literal newlines.
a=$'something Bitcoin something againe $jjjkjk\n againe something'
regex=$'Bitcoin[^$]*[$]([^\n]+)'
[[ $a =~ $regex ]] && declare -p BASH_REMATCH
declare -ar BASH_REMATCH='([0]="Bitcoin something againe \$jjjkjk" [1]="jjjkjk")'
# .................................................................^^^^^^^^^^^^
To verify the contents contain newlines:
$ printf '%s' "$regex" | od -c
0000000 B i t c o i n [ ^ $ ] * [ $ ] (
0000020 [ ^ \n ] + )
0000026
Here is a working version of your code:
a='something Bitcoin something againe $jjjkjk\n againe something'
r=".*Bitcoin.*[\$]([^\n]*).*"
if [[ $a =~ $r ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "no"
fi
You need to find 'Bitcoin' then find a '$' after it, no matter what is between, so you should use .* operator, also when you want to capture some text until a specific char, the best way is using [^](not) operator, in your case: [^\n] this means capture everything until \n.
Also you had an issue with your variable declaration. a = "..." is not valid, the spaces are waste. so the correct one is 'a=".."`.
Using double quotation is wrong too, this will replaces dollar sign with an empty variable (evaluation)

RegEx : How can I extract a certain part and modify it?

I'd like to extract a certain part of a string and modify it by using a regular expression.
A given string is TestcaseVzwPerformance_8_2_1_4_1_FDD2.
I'd like to extract the part 8_2_1_4_1 from the string and then replace the underscores _ with dots . So the expected result needs to be 8.2.1.4.1.
The numbers and length of the given string can be different.
For example,
Given string // Expected result
TestcaseVzwCqi_3_9_Test2 // 3.9
TestcaseVzwSvd1xRttAclr_6_6_2_3 // 6.6.2.3
TestcaseVzwCsiFading_9_4_1_1_1_FDD4 // 9.4.1.1.1
Here is my RegEx:
((?:\D{0,}_)(\d(_\d)*)(.*))
The numbered capturing group - $2 - contains 8_2_1_4_1 but with underscores.
Can I replace the underscores with dots?
It needs to be done in one RegEx and a Replace.
regex cannot modify, for example with sed
echo TestcaseVzwPerformance_8_2_1_4_1_FDD2 |
sed -E 's/[^_]*_(([_0-9])+)_.*/\1/;s/_/./g'
8.2.1.4.1
If you have a Bash string, you can use a Bash regex to capture and Bash parameter expansions to replace:
$ s="TestcaseVzwSvd1xRttAclr_6_6_2_3"
$ [[ $s =~ ^[^_]*_([[:digit:]_]+)_* ]] && tmp=${BASH_REMATCH[1]//_/.} && echo "${tmp%.}"
6.6.2.3
Which can be in a loop:
while read -r line; do
if [[ $line =~ ^[^_]*_([[:digit:]_]+)_* ]]; then
tmp=${BASH_REMATCH[1]//_/.}
echo "\"$line\" => ${tmp%.}"
fi
done <<< 'Given string
TestcaseVzwCqi_3_9_Test2
TestcaseVzwSvd1xRttAclr_6_6_2_3
TestcaseVzwCsiFading_9_4_1_1_1_FDD4'
Prints:
"TestcaseVzwCqi_3_9_Test2" => 3.9
"TestcaseVzwSvd1xRttAclr_6_6_2_3" => 6.6.2.3
"TestcaseVzwCsiFading_9_4_1_1_1_FDD4" => 9.4.1.1.1
You can use the same loop to process a file.
If you have a file, you may as well use gawk:
$ awk 'BEGIN{FPAT="_[[:digit:]_]+"}
/_[[:digit:]]/ {sub(/^_/,"", $1); sub(/_$/,"",$1); gsub(/_/,".",$1); print $1}' file
3.9
6.6.2.3
9.4.1.1.1

Bash regex to match substring with exact integer range

I need to match a string $str that contains any of
foo{77..93}
and capture the above substring in a variable.
So far I've got:
str=/random/string/containing/abc-foo78_efg/ # for example
if [[ $str =~ (foo[7-9][0-9]) ]]; then
id=${BASH_REMATCH[1]}
fi
echo $id # gives foo78
but this also captures ids outside of the target range (e.g. foo95).
Is there a way to restrict the regex to an exact integer range? (tried foo[77-93] but that doesn't work.
Thanks
If you want to use a regex, you're going to have to make it slightly more complex:
if [[ $str =~ foo(7[7-9]|8[0-9]|9[0-3]) ]]; then
id=${BASH_REMATCH[0]}
fi
Note that I have removed the capture group around the whole pattern and am now using the 0th element of the match array.
As an aside, for maximum compatibility with older versions of bash, I would recommend assigning the pattern to a variable and using in the test like this:
re='foo(7[7-9]|8[0-9]|9[0-3])'
if [[ $str =~ $re ]]; then
id=${BASH_REMATCH[0]}
fi
An alternative to using a regex would be to use an arithmetic context, like this:
if (( "${str#foo}" >= 77 && "${str#foo}" <= 93 )); then
id=$str
fi
This strips the "foo" part from the start of the variable so that the integer part can be compared numerically.
Sure is easy to do with Perl:
$ echo foo{1..100} | tr ' ' '\n' | perl -lne 'print $_ if m/foo(\d+)/ and $1>=77 and $1<=93'
foo77
foo78
foo79
foo80
foo81
foo82
foo83
foo84
foo85
foo86
foo87
foo88
foo89
foo90
foo91
foo92
foo93
Or awk even:
$ echo foo{1..100} | tr ' ' '\n' | awk -F 'foo' '$2>=77 && $2<=93
{print}'
foo77
foo78
foo79
foo80
foo81
foo82
foo83
foo84
foo85
foo86
foo87
foo88
foo89
foo90
foo91
foo92
foo93