Period wildcard not working in Bash pattern replacement - regex

Given this Bash code:
TEMP="1_2"
echo ${TEMP/_.*/}
why does it print out 1_2 instead of 1?
I've also tried these, but they don't work:
echo ${TEMP/_\.*/}
echo ${TEMP/_\\.*/}
This does work:
echo ${TEMP/_[0-9]*/}
but I want to know:
Why isn't the period acting as a wildcard?
What should I use instead?

A question mark is the single-character wildcard. However, it doesn't work like regular expressions where the asterisk is a quantifier. In Bash, in parameter expansions, an asterisk is a multicharacter wildcard.
$ temp=1_2
$ echo "${temp/_*}"
1
The following also work in this particular situation. See Parameter Expansion in man bash for more information regarding the differences.
echo "${temp%_*}"
echo "${temp%%_*}"
I recommend against using all-caps variable names in order to reduce the chance of name collision with shell or environment variables.

Related

Bash variable substitution with a regex not working as expected

Given a bash variable holding the following string:
INPUT="Cookie: cf_clearance=foo; __cfduid=bar;"
Why is the substitution ${INPUT/cf_clearance=[^;]*;/} producing the output: Cookie: instead of what I'd expect: Cookie: __cfduid=bar;
Testing the same regex in online regex validators confirms that cf_clearance=[^;]*; should match cf_clearance=foo; only, and not the rest of the string.
What am I doing wrong here?
Use the actual regular-expression matching features instead of parameter expansion, which works with patterns.
[[ $INPUT =~ (.*)(cf_clearance=[^;]*;)(.*) ]]
ans=${BASH_REMATCH[1]}${BASH_REMATCH[3]}
You can also use an extended pattern, which is equivalent to a regular expression in power:
shopt -s extglob
$ echo "${INPUT/cf_clearance=*([^;]);/}"
Use sed:
INPUT=$(sed 's/cf_clearance=[^;]*;//' <<< "$INPUT")
Like you have been told in comments, bash parameter substitution only supports glob patterns, not regular expressions. So the problem is really with your expectation, not with your code per se.
If you know that the expression can be anchored to the beginning of the string, you can use the ${INPUT#prefix} parameter substitution to grab the shortest possible match, and add back the Cookie: in front:
echo "Cookie: ${INPUT#Cookie: cf_clearance=*;}"
If you don't have this guarantee, something very similar can be approximated with a pair of parameter substitutions. Find which part precedes cf_clearance, find which part follows after the semicolon after cf_clearance; glue them together.
head=${INPUT%cf_clearance=*}
tail=${INPUT#*cf_clearance=*;}
echo "$head$tail"
(If you are not scared of complex substitutions, the temporary variables aren't really necessary or useful.
echo "${INPUT%cf_clearance=*}${INPUT#*cf_clearance=*;}"
This is a little dense even for my sophisticated taste, though.)

Regex not working in Bash

I have this regex for now
It should catch something like this
org.package;version="[1.0.41, 1.0.51)" and "," optionally if it is not last element.
Also if after package i added .* because the package could be "org.package.util.something" until ";version"
I tried it online in Regex tool and it is working like this
org.package(.*.*)?;version="[[0-9].[0-9].[0-9][0-9],\s[0-9].[0-9].[0-9][0-9])",?
but i dont know what should i change so it can work in bash
package="org.package"
sed -i "s/"$$package.*;version="\[[0-9].[0-9].[0-9][0-9],[[:space:]][0-9].[0-9].[0-9][0-9]\)",?"//g" "$file"
Change the double quotes arround sed command by single quotes, because variable expansion of $package single quotes are closed and double quotes are use arround variable
package="org.package"
sed -i 's/'"$package"'.*;version="\[[0-9].[0-9].[0-9][0-9],[[:space:]][0-9].[0-9].[0-9][0-9]\)",?//g' "$file"
before using command with -i option check the output is correct
There is more than one problem
$$ will be replaced by bash with its PID, that's probably not what you want
online regex evaluators usually use extended regex or perl regex syntax
sed -r will enable extended regex mode. (for grep there's -E and -P)
You use . when you want to match literal dots. However you should be using \., because . actually means "any character" in regular expressions.

BASH_REMATCH empty

I'm trying capture the some input regex in Bash but BASH_REMATCH comes EMPTY
#!/usr/bin/env /bin/bash
INPUT=$(cat input.txt)
TASK_NAME="MailAccountFetch"
MATCH_PATTERN="(${TASK_NAME})\s+([0-9]{4}-[0-9]{2}-[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2})"
while read -r line; do
if [[ $line =~ $MATCH_PATTERN ]]; then
TASK_RESULT=${BASH_REMATCH[3]}
TASK_LAST_RUN=${BASH_REMATCH[2]}
TASK_EXECUTION_DURATION=${BASH_REMATCH[4]}
fi
done <<< "$INPUT"
My input is:
MailAccountFetch 2017-03-29 19:00:00 Success 5.0 Second(s) 2017-03-29 19:03:00
By debugging the script (VS Code+Bash ext) I can see the INPUT string matches as the code goes inside the IF but BASH_REMATCH is not populated with my two capture groups.
I'm on:
GNU bash, version 4.4.0(1)-release (x86_64-pc-linux-gnu)
What could be the issue?
LATER EDIT
Accepted Answer
Accepting most explanatory answer.
What finally resolved the issue:
bashdb/VS Code environment are causing the empty BASH_REMATCH. The code works OK when ran alone.
As Cyrus shows in his answer, a simplified version of your code - with the same input - does work on Linux in principle.
That said, your code references capture groups 3 and 4, whereas your regex only defines 2.
In other words: ${BASH_REMATCH[3]} and ${BASH_REMATCH[4]} are empty by definition.
Note, however, that if =~ signals success, BASH_REMATCH is never fully empty: at the very least - in the absence of any capture groups - ${BASH_REMATCH[0]} will be defined.
There are some general points worth making:
Your shebang line reads #!/usr/bin/env /bin/bash, which is effectively the same as #!/bin/bash.
/usr/bin/env is typically used if you want a version other than /bin/bash to execute, one you've installed later and put in the PATH (too):
#!/usr/bin/env bash
ghoti points out that another reason for using #!/usr/bin/env bash is to also support less common platforms such as FreeBSD, where bash, if installed, is located in /usr/local/bin rather than the usual /bin.
In either scenario it is less predictable which bash binary will be executed, because it depends on the effective $PATH value at the time of invocation.
=~ is one of the few Bash features that are platform-dependent: it uses the particular regex dialect implemented by the platform's regex libraries.
\s is a character class shortcut that is not available on all platforms, notably not on macOS; the POSIX-compliant equivalent is [[:space:]].
(In your particular case, \s should work, however, because your Bash --version output suggests that you are on a Linux distro.)
It's better not to use all-uppercase shell variable names such as INPUT, so as to avoid conflicts with environment variables and special shell variables.
Bash uses system libraries to parse regular expressions, and different parsers implement different features. You've come across a place where regex class shorthand strings do not work. Note the following:
$ s="one12345 two"
$ [[ $s =~ ^([a-z]+[0-9]{4})\S*\s+(.*) ]] && echo yep; declare -p BASH_REMATCH
declare -ar BASH_REMATCH=()
$ [[ $s =~ ^([a-z]+[0-9]{4})[^[:space:]]*[[:space:]]+(.*) ]] && echo yep; declare -p BASH_REMATCH
yep
declare -ar BASH_REMATCH=([0]="one12345 two" [1]="one1234" [2]="two")
I'm doing this on macOS as well, but I get the same behaviour on FreeBSD.
Simply replace \s with [[:space:]], \d with [[:digit:]], etc, and you should be good to go. If you avoid using RE shortcuts, your expressions will be more widely understood.

Using multicase regex with sed on jenkins

OK, so I'm making a choice parameterized Jenkins job. The choices for the parameters are DEV STAGING QA and PROD and they are stored in ${ENV}
I need to change the variable ${ENV} to match a string in a URL. I'm trying to do this with a sed command using regex. Is it possible?
I tested PROD|ING|(?<!Q)A as the regex in Expresso, and it finds the necessary portions, (A,ING,PROD) which would leave me with either DEV QA STG or `` as my variable value if I replaced them with '', then I'll add something onto the end of it.
When I try to run echo "DEVSTAGINGQAPROD" | sed "s/PROD|ING|(?<!Q)A//g" to remove those chars on CentOS it returns -bash: !Q: event not found. I want it to return DEVSTGQA
echo "DEVSTAGINGQAPROD" | sed "s/PROD|ING//g returns DEVSTAGQA as it should. The problem I seem to be having is the look behind, to only remove the A if it doesn't have a Q before it.
Any ideas how to make this work?
One problem here is that sed doesn't understand negative lookbehind. Another is your choice of quotes. History expansion is enabled by default in the shell, so ! has a special meaning and must be escaped inside double quotes.
To deal with the first problem, I'd suggest using Perl instead of sed, as it has a much more advanced regular expression engine. For the second, just use single quotes, within which the ! will be interpreted literally:
$ echo "DEVSTAGINGQAPROD" | perl -pe 's/PROD|ING|(?<!Q)A//g'
DEVSTGQA

How to reference captures in bash regex replacement

How can I include the regex match in the replacement expression in BASH?
Non-working example:
#!/bin/bash
name=joshua
echo ${name//[oa]/X\1}
I expect to output jXoshuXa with \1 being replaced by the matched character.
This doesn't actually work though and outputs jX1shuX1 instead.
Perhaps not as intuitive as sed and arguably quite obscure but in the spirit of completeness, while BASH will probably never support capture variables in replace (at least not in the usual fashion as parenthesis are used for extended pattern matching), but it is still possible to capture a pattern when testing with the binary operator =~ to produce an array of matches called BASH_REMATCH.
Making the following example possible:
#!/bin/bash
name='joshua'
[[ $name =~ ([ao].*)([oa]) ]] && \
echo ${name/$BASH_REMATCH/X${BASH_REMATCH[1]}X${BASH_REMATCH[2]}}
The conditional match of the regular expression ([ao].*)([oa]) captures the following values to $BASH_REMATCH:
$ echo ${BASH_REMATCH[*]}
oshua oshu a
If found we use the ${parameter/pattern/string} expansion to search for the pattern oshua in parameter with value joshua and replace it with the combined string Xoshu and Xa. However this only works for our example string because we know what to expect.
For something that functions more like the match all or global regex counterparts the following example will greedy match for any unchanged o or a inserting X from back to front.
#/bin/bash
name='joshua'
while [[ $name =~ .*[^X]([oa]) ]]; do
name=${name/$BASH_REMATCH/${BASH_REMATCH:0:-1}X${BASH_REMATCH[1]}}
done
echo $name
The first iteration changes $name to joshuXa and finally to jXoshuXa before the condition fails and the loop terminates. This example works similar to the look behind expression /(?<!X)([oa])/X\1/ which assumes to only care about the o or a characters which don't have a X prefixed.
The output for both examples:
jXoshuXa
nJoy!
bash> name=joshua
bash> echo $name | sed 's/\([oa]\)/X\1/g'
jXoshuXa
The question bash string substitution: reference matched subexpressions was marked a duplicate of this one, in spite of the requirement that
The code runs in a long loop, it should be a one-liner that does not
launch sub-processes.
So the answer is:
If you really cannot afford launching sed in a subprocess, do not use bash ! Use perl instead, its read-update-output loop will be several times faster, and the difference in syntax is small. (Well, you must not forget semicolons.)
I switched to perl, and there was only one gotcha: Unicode support was not available on one of the computers, I had to reinstall packages.