Bash variable substitution with a regex not working as expected - regex

Given a bash variable holding the following string:
INPUT="Cookie: cf_clearance=foo; __cfduid=bar;"
Why is the substitution ${INPUT/cf_clearance=[^;]*;/} producing the output: Cookie: instead of what I'd expect: Cookie: __cfduid=bar;
Testing the same regex in online regex validators confirms that cf_clearance=[^;]*; should match cf_clearance=foo; only, and not the rest of the string.
What am I doing wrong here?

Use the actual regular-expression matching features instead of parameter expansion, which works with patterns.
[[ $INPUT =~ (.*)(cf_clearance=[^;]*;)(.*) ]]
ans=${BASH_REMATCH[1]}${BASH_REMATCH[3]}
You can also use an extended pattern, which is equivalent to a regular expression in power:
shopt -s extglob
$ echo "${INPUT/cf_clearance=*([^;]);/}"

Use sed:
INPUT=$(sed 's/cf_clearance=[^;]*;//' <<< "$INPUT")

Like you have been told in comments, bash parameter substitution only supports glob patterns, not regular expressions. So the problem is really with your expectation, not with your code per se.
If you know that the expression can be anchored to the beginning of the string, you can use the ${INPUT#prefix} parameter substitution to grab the shortest possible match, and add back the Cookie: in front:
echo "Cookie: ${INPUT#Cookie: cf_clearance=*;}"
If you don't have this guarantee, something very similar can be approximated with a pair of parameter substitutions. Find which part precedes cf_clearance, find which part follows after the semicolon after cf_clearance; glue them together.
head=${INPUT%cf_clearance=*}
tail=${INPUT#*cf_clearance=*;}
echo "$head$tail"
(If you are not scared of complex substitutions, the temporary variables aren't really necessary or useful.
echo "${INPUT%cf_clearance=*}${INPUT#*cf_clearance=*;}"
This is a little dense even for my sophisticated taste, though.)

Related

Expand environment variable inside Perl regex

I am having trouble with a short bash script. It seems like all forward slashes needs to be escaped. How can required characters in expanded (environment) variables be escaped before perl reads them? Or some other method that perl understands.
This is what I am trying to do, but this will not work properly.
eval "perl -pi -e 's/$HOME\/_TV_rips\///g'" '*$videoID.info.json'
That is part of a longer script where videoID=$1. (And for some reason perl expands variables both within single and double quotes.)
This simple workaround with no forward slash in the expanded environment variable $USER works. But I would like to not have /Users/ hard coded:
eval "perl -pi -e 's/\/Users\/$USER\/_TV_rips\///g'" '*$videoID.info.json'
This is probably solvable in some better way fetching home dir for files or something else. The goal is to remove the folder name in youtube-dl's json data.
I am using perl just because it can handle extended regex. But perl is not required. Any better substitute for extended regex on macOS is welcome.
You are building the following Perl program:
s//home/username\/_TV_rips\///g
That's quite wrong.
You shouldn't be attempting to build Perl code from the shell in the first place. There are a few ways you could pass values to the Perl code instead of generating Perl code. Since the value is conveniently in the environment, we can use
perl -i -pe's/\Q$ENV{HOME}\E\/_TV_rips\///' *"$videoID.info.json"
or better yet
perl -i -pe's{\Q$ENV{HOME}\E/_TV_rips/}{}' *"$videoID.info.json"
(Also note the lack of eval and the fixed quoting on the glob.)
Just assembling the ideas in comments, this should achieve what you expected :
perl -pi -e 's{$ENV{HOME}/_TV_rips/}{}g' *$videoID.info.json
#ikegami thanks for your comment! It is indeed safer with \Q...\E, in case $HOME contains characters like $.
All RegEx delimiters must of cource be escaped in input String.
But as Stefen stated, you can use other delimiters in perl, like %, ยง.
Special characters
# Perl comment - don't use this
?,[], {}, $, ^, . Regex control chars - must be escaped in Regex. That makes it easier if you have many slashes in your string.
You should always write a comment to make clear you are using different delimiters, because this makes your regex hard to read for inexperienced users.
Try out your RegEx here: https://regex101.com/r/cIWk1o/1

Bash variable search and replace instead of sed

See Code Review
See Github Project
I need to parse out instances of +word+ line by line (replace +word+ with blank). I'm currently using the following (working) sed regex:
newLine=$(echo "$line" | sed "s/+[a-Z]\++//g")
This violates "SC2001" according to "ShellCheck" validation;
SC2001: See if you can use ${variable//search/replace} instead.
I've attempted several variations without success (The string "+word+" remains in the output):
newLine=$(line//+[a-Z]+/)
newLine=$(line/+[a-Z]+//)
newLine=$(line/+[a-Z]\++/)
newLine=${line//+[a-Z]+/}
and more..
I've heard that in some cases sed is necessary, but I would like to use Bash's built in find and replace if possible.
The substitution in parameter expansion doesn't use regular expressions, but patterns. To get closer to regular expressions, you can turn on extended patterns:
shopt -s extglob
new_line=${line//++([a-Z])+}

Period wildcard not working in Bash pattern replacement

Given this Bash code:
TEMP="1_2"
echo ${TEMP/_.*/}
why does it print out 1_2 instead of 1?
I've also tried these, but they don't work:
echo ${TEMP/_\.*/}
echo ${TEMP/_\\.*/}
This does work:
echo ${TEMP/_[0-9]*/}
but I want to know:
Why isn't the period acting as a wildcard?
What should I use instead?
A question mark is the single-character wildcard. However, it doesn't work like regular expressions where the asterisk is a quantifier. In Bash, in parameter expansions, an asterisk is a multicharacter wildcard.
$ temp=1_2
$ echo "${temp/_*}"
1
The following also work in this particular situation. See Parameter Expansion in man bash for more information regarding the differences.
echo "${temp%_*}"
echo "${temp%%_*}"
I recommend against using all-caps variable names in order to reduce the chance of name collision with shell or environment variables.

How to reference captures in bash regex replacement

How can I include the regex match in the replacement expression in BASH?
Non-working example:
#!/bin/bash
name=joshua
echo ${name//[oa]/X\1}
I expect to output jXoshuXa with \1 being replaced by the matched character.
This doesn't actually work though and outputs jX1shuX1 instead.
Perhaps not as intuitive as sed and arguably quite obscure but in the spirit of completeness, while BASH will probably never support capture variables in replace (at least not in the usual fashion as parenthesis are used for extended pattern matching), but it is still possible to capture a pattern when testing with the binary operator =~ to produce an array of matches called BASH_REMATCH.
Making the following example possible:
#!/bin/bash
name='joshua'
[[ $name =~ ([ao].*)([oa]) ]] && \
echo ${name/$BASH_REMATCH/X${BASH_REMATCH[1]}X${BASH_REMATCH[2]}}
The conditional match of the regular expression ([ao].*)([oa]) captures the following values to $BASH_REMATCH:
$ echo ${BASH_REMATCH[*]}
oshua oshu a
If found we use the ${parameter/pattern/string} expansion to search for the pattern oshua in parameter with value joshua and replace it with the combined string Xoshu and Xa. However this only works for our example string because we know what to expect.
For something that functions more like the match all or global regex counterparts the following example will greedy match for any unchanged o or a inserting X from back to front.
#/bin/bash
name='joshua'
while [[ $name =~ .*[^X]([oa]) ]]; do
name=${name/$BASH_REMATCH/${BASH_REMATCH:0:-1}X${BASH_REMATCH[1]}}
done
echo $name
The first iteration changes $name to joshuXa and finally to jXoshuXa before the condition fails and the loop terminates. This example works similar to the look behind expression /(?<!X)([oa])/X\1/ which assumes to only care about the o or a characters which don't have a X prefixed.
The output for both examples:
jXoshuXa
nJoy!
bash> name=joshua
bash> echo $name | sed 's/\([oa]\)/X\1/g'
jXoshuXa
The question bash string substitution: reference matched subexpressions was marked a duplicate of this one, in spite of the requirement that
The code runs in a long loop, it should be a one-liner that does not
launch sub-processes.
So the answer is:
If you really cannot afford launching sed in a subprocess, do not use bash ! Use perl instead, its read-update-output loop will be several times faster, and the difference in syntax is small. (Well, you must not forget semicolons.)
I switched to perl, and there was only one gotcha: Unicode support was not available on one of the computers, I had to reinstall packages.

Proper Perl syntax for complex substitution

I've got a large number of PHP files and lines that need to be altered from a standard
echo "string goes here"; syntax to:
custom_echo("string goes here");
This is the line I'm trying to punch into Perl to accomplish this:
perl -pi -e 's/echo \Q(.?*)\E;/custom_echo($1);/g' test.php
Unfortunately, I'm making some minor syntax error, and it's not altering "test.php" in the least. Can anyone tell me how to fix it?
Why not just do something like:
perl -pi -e 's|echo (\".*?\");|custom_echo($1);|g' file.php
I don't think \Q and \E are doing what you think they're doing. They're not beginning and end of quotes. They're in case you put in a special regex character (like .) -- if you surround it by \Q ... \E then the special regex character doesn't get interpreted.
In other words, your regular expression is trying to match the literal string (.?*), which you probably don't have, and thus substitutions don't get made.
You also had your ? and * backwards -- I assume you want to match non-greedily, in which case you need to put the ? as a non-greedy modifier to the .* characters.
Edit: I also strongly suggest doing:
perl -pi.bak -e ... file.php
This will create a "backup" file that the original file gets copied to. In my above example, it'll create a file named file.php.bak that contains the original, pre-substitution contents. This is incredibly useful during testing until you're certain that you've built your regex properly. Hell, disk is cheap, I'd suggest always using the -pi.bak command-line operator.
You put your grouping parentheses inside the metaquoting expression (\Q(pattern)\E) instead of outside ((\Qpattern\E)), so your parentheses also get escaped and your regex is not capturing anything.