Bash script - variable expansion within backtick/grep regex string - regex

I'm trying to expand a variable in my bash script inside of the backticks, inside of the regex search string.
I want the $VAR to be substituted.
The lines I am matching are like:
start....some characters.....id:.....some characters.....[variable im searching for]....some characters....end
var=`grep -E '^.*id:.*$VAR.*$' ./input_file.txt`
Is this a possibility?
It doesn't seem to work. I know I can normally expand a variable with "$VAR", but won't this just search directly for those characters inside the regex? I am not sure what takes precedence here.

Variables do expand in backticks but they don't expand in single quotes.
So you need to either use double quotes for your regex string (I'm not sure what your concern with that is about) or use both quotes.
So either
var=`grep -E "^.*id:.*$VAR.*$" ./input_file.txt`
or
var=`grep -E '^.*id:.*'"$VAR"'.*$' ./input_file.txt`
Also you might want to use $(grep ...) instead of backticks since they are the more modern approach and have better syntactic properties as well as being able to be nested.

You need to have the expression in double quotes (and, then, escape anything which needs to be escaped) in order for the variable to be interpolated.
var=$(grep -E "^.*id:.*$VAR.*\$" ./input_file.txt)
(The backslash is not strictly necessary here, but I put it in to give you an idea. Your real expression is perhaps more complex.)

Related

Expand environment variable inside Perl regex

I am having trouble with a short bash script. It seems like all forward slashes needs to be escaped. How can required characters in expanded (environment) variables be escaped before perl reads them? Or some other method that perl understands.
This is what I am trying to do, but this will not work properly.
eval "perl -pi -e 's/$HOME\/_TV_rips\///g'" '*$videoID.info.json'
That is part of a longer script where videoID=$1. (And for some reason perl expands variables both within single and double quotes.)
This simple workaround with no forward slash in the expanded environment variable $USER works. But I would like to not have /Users/ hard coded:
eval "perl -pi -e 's/\/Users\/$USER\/_TV_rips\///g'" '*$videoID.info.json'
This is probably solvable in some better way fetching home dir for files or something else. The goal is to remove the folder name in youtube-dl's json data.
I am using perl just because it can handle extended regex. But perl is not required. Any better substitute for extended regex on macOS is welcome.
You are building the following Perl program:
s//home/username\/_TV_rips\///g
That's quite wrong.
You shouldn't be attempting to build Perl code from the shell in the first place. There are a few ways you could pass values to the Perl code instead of generating Perl code. Since the value is conveniently in the environment, we can use
perl -i -pe's/\Q$ENV{HOME}\E\/_TV_rips\///' *"$videoID.info.json"
or better yet
perl -i -pe's{\Q$ENV{HOME}\E/_TV_rips/}{}' *"$videoID.info.json"
(Also note the lack of eval and the fixed quoting on the glob.)
Just assembling the ideas in comments, this should achieve what you expected :
perl -pi -e 's{$ENV{HOME}/_TV_rips/}{}g' *$videoID.info.json
#ikegami thanks for your comment! It is indeed safer with \Q...\E, in case $HOME contains characters like $.
All RegEx delimiters must of cource be escaped in input String.
But as Stefen stated, you can use other delimiters in perl, like %, ยง.
Special characters
# Perl comment - don't use this
?,[], {}, $, ^, . Regex control chars - must be escaped in Regex. That makes it easier if you have many slashes in your string.
You should always write a comment to make clear you are using different delimiters, because this makes your regex hard to read for inexperienced users.
Try out your RegEx here: https://regex101.com/r/cIWk1o/1

Using Variables with Regex that contain a space (\s) and sed

Im trying to create a sort script using literal string variables and Regex and a sort using sed in bash. I cannot seem to find the liternal strings with spaces when using variables, although can find them when using the regex directly. So :
#!/bin/bash
group1="IRISHFHD"
group2="REGIONAL FHD"
sed -i '/group-title="'${group1}/',+1d' JWLINE.m3u
sed -i '/group-title="'${group2}/',+1d' JWLINE.m3u
Ive tried adding \s into the group variable but it doesnt work.
John
The problem has nothing to do with regex, it's all down to how the shell treats variables' values. When a variable is expanded without double-quotes around it (i.e. ${group2}), the shell will split it into "words" based on whitespace. It'll also try to expand any words that contain shell wildcards into lists of matching files, and several regex metacharacters look like shell wildcards, which can cause serious chaos.
In this example:
sed -i '/group-title="'${group2}/',+1d' JWLINE.m3u
It's a little more complicated, because the variable reference is in between two single-quoted sections. In this case, the part before the variable reference gets attached to the first "word" in the variable, and the part after gets attached to the last word. Essentially, it expands into the equivalent of this:
sed -i '/group-title="REGIONAL' 'FHD/,+1d' JWLINE.m3u
^ That's a space between arguments
Anyway, since it gets split on the whitespace, sed gets two partial arguments instead of one whole one, and it doesn't work at all.
Solution: as in almost all situations, you should have double-quotes around the variable reference to prevent weird effects like this. There are a few options for this. You could just add double-quotes around the variable part:
sed -i '/group-title="'"${group2}"/',+1d' JWLINE.m3u
...but IMO this is confusing; some of those quotes are syntactic (i.e. parsed by the shell), and one is literal (passed to sed as part of the regex), and it's not obvious which are which. I'd prefer to just use double-quotes around the whole thing, and escape the double-quote that's supposed to be literal:
sed -i "/group-title=\"${group2}/,+1d" JWLINE.m3u
^^ Escape makes this " a literal part of the argument.
(In double-quotes, you'd also need to escape any dollar signs, backslashes, or backticks that were supposed to be literal parts of the argument. But in this case, there aren't any of those.)

Can OR expressions be used in ${var//OLD/NEW} replacements?

I was testing some string manipulation stuff in a bash script and I've quickly realized it doesn't understand regular expressions (at least not with the syntax I'm using for string operations), then I've tried some glob expressions and it seems to understand some of them, some not. To be specific:
FINAL_STRING=${FINAL_STRING//<title>/$(get_title)}
is the main operation I'm trying to use and the above line works, replacing all occurrences of <title> with $(get_title) on $FINAL_STRING... and
local field=${1/#*:::/}
works, assigning $1 with everything from the beginning to the first occurrence of ::: replaced by nothing (removed). However # do what I'd expect ^ to do. Plus when I've tried to use the {,,} glob expression here:
FINAL_STRING=${FINAL_STRING//{<suffix>,<extension>}/${SUFFIX}}
to replace any occurrence of <suffix> OR <extension> by ${SUFFIX} , it works not.
So I see it doesn't take regex and it also doesn't take glob patterns... so what Does it take? Are there any exhaustive listing of what symbols/expressions are understood by plain bash string operations (particularly substring replacement)? Or are *, ?, #, ##, % and %% the only valid stuff?
(I'm trying to rely only on plain bash, without calling sed or grep to do what I want)
The gory details can be found in the bash manual, Shell Expansions section. The complete picture is surprisingly complex.
What you're doing is described in the Shell Parameter Expansion section. You'll see that the pattern in
${parameter/pattern/string}
uses the Filename Expansion rules, and those don't include Brace Expansion - that is done earlier when processing the command line arguments. Filename expansion "only" does ?, * and [...] matching (unless extglob is set).
But parameter expansion does a bit more than just filename expansion, notably the anchoring you noticed with # or %.
bash does in fact handle regex; specifically, the [[ =~ ]] operator, which you can then assign to a variable using the magic variable $BASH_REMATCH. It's funky, but it works.
See: http://www.linuxjournal.com/content/bash-regular-expressions
Note this is a bash-only hack feature.
For code that works in shells besides bash as well, the old school way of doing something like this is indeed to use #/##/%/%% along with a loop around a case statement (which supports basic * glob matching).

Proper Perl syntax for complex substitution

I've got a large number of PHP files and lines that need to be altered from a standard
echo "string goes here"; syntax to:
custom_echo("string goes here");
This is the line I'm trying to punch into Perl to accomplish this:
perl -pi -e 's/echo \Q(.?*)\E;/custom_echo($1);/g' test.php
Unfortunately, I'm making some minor syntax error, and it's not altering "test.php" in the least. Can anyone tell me how to fix it?
Why not just do something like:
perl -pi -e 's|echo (\".*?\");|custom_echo($1);|g' file.php
I don't think \Q and \E are doing what you think they're doing. They're not beginning and end of quotes. They're in case you put in a special regex character (like .) -- if you surround it by \Q ... \E then the special regex character doesn't get interpreted.
In other words, your regular expression is trying to match the literal string (.?*), which you probably don't have, and thus substitutions don't get made.
You also had your ? and * backwards -- I assume you want to match non-greedily, in which case you need to put the ? as a non-greedy modifier to the .* characters.
Edit: I also strongly suggest doing:
perl -pi.bak -e ... file.php
This will create a "backup" file that the original file gets copied to. In my above example, it'll create a file named file.php.bak that contains the original, pre-substitution contents. This is incredibly useful during testing until you're certain that you've built your regex properly. Hell, disk is cheap, I'd suggest always using the -pi.bak command-line operator.
You put your grouping parentheses inside the metaquoting expression (\Q(pattern)\E) instead of outside ((\Qpattern\E)), so your parentheses also get escaped and your regex is not capturing anything.

What is the best way to do string manipulation in a shell script?

I have a path as a string in a shell-script, could be absolute or relative:
/usr/userName/config.cfg
or
../config.cfg
I want to extract the file name (part after the last /, so in this case: "config.cfg")
I figure the best way to do this is with some simple regex?
Is this correct? Should or should I use sed or awk instead?
Shell-scripting's string manipulation features seem pretty primative by themselves, and appear very esoteric.
Any example solutions are also appreciated.
If you're okay with using bash, you can use bash string expansions:
FILE="/path/to/file.example"
FILE_BASENAME="${FILE##*/}"
It's a little cryptic, but the braces start the variable expansion, and the double hash does a greedy removal of the specified glob pattern from the beginning of the string.
Double %% does the same thing from the end of a string, and a single percent or hash does a non-greedy removal.
Also, a simple replace construct is available too:
FILE=${FILE// /_}
would replace all spaces with underscores for instance.
A single slash again, is non-greedy.
Instead of string manipulation I'd just use
file=`basename "$filename"`
Edit:
Thanks to unwind for some newer syntax for this (which assumes your filename is held in $filename):
file=$(basename $filename)
Most environments have access to perl and I'm more comfortable with that for most string manipulation.
But as mentioned, from something this simple, you can use basename.
I typically use sed with a simple regex, like this:
echo "/usr/userName/config.cfg" | sed -e 's+^.*/++'
result:
>echo "/usr/userName/config.cfg" | sed -e 's+^.*/++'
config.cfg