Expand environment variable inside Perl regex - regex

I am having trouble with a short bash script. It seems like all forward slashes needs to be escaped. How can required characters in expanded (environment) variables be escaped before perl reads them? Or some other method that perl understands.
This is what I am trying to do, but this will not work properly.
eval "perl -pi -e 's/$HOME\/_TV_rips\///g'" '*$videoID.info.json'
That is part of a longer script where videoID=$1. (And for some reason perl expands variables both within single and double quotes.)
This simple workaround with no forward slash in the expanded environment variable $USER works. But I would like to not have /Users/ hard coded:
eval "perl -pi -e 's/\/Users\/$USER\/_TV_rips\///g'" '*$videoID.info.json'
This is probably solvable in some better way fetching home dir for files or something else. The goal is to remove the folder name in youtube-dl's json data.
I am using perl just because it can handle extended regex. But perl is not required. Any better substitute for extended regex on macOS is welcome.

You are building the following Perl program:
s//home/username\/_TV_rips\///g
That's quite wrong.
You shouldn't be attempting to build Perl code from the shell in the first place. There are a few ways you could pass values to the Perl code instead of generating Perl code. Since the value is conveniently in the environment, we can use
perl -i -pe's/\Q$ENV{HOME}\E\/_TV_rips\///' *"$videoID.info.json"
or better yet
perl -i -pe's{\Q$ENV{HOME}\E/_TV_rips/}{}' *"$videoID.info.json"
(Also note the lack of eval and the fixed quoting on the glob.)

Just assembling the ideas in comments, this should achieve what you expected :
perl -pi -e 's{$ENV{HOME}/_TV_rips/}{}g' *$videoID.info.json
#ikegami thanks for your comment! It is indeed safer with \Q...\E, in case $HOME contains characters like $.

All RegEx delimiters must of cource be escaped in input String.
But as Stefen stated, you can use other delimiters in perl, like %, ยง.
Special characters
# Perl comment - don't use this
?,[], {}, $, ^, . Regex control chars - must be escaped in Regex. That makes it easier if you have many slashes in your string.
You should always write a comment to make clear you are using different delimiters, because this makes your regex hard to read for inexperienced users.
Try out your RegEx here: https://regex101.com/r/cIWk1o/1

Related

What is the correct usage for this perl script?

Consider the following perl shell script;
perl -p -i.bak -e 's/index.php?pageid=/p//g' `grep -ril index.php?pageid= *`
I am trying to recursively go through all web directories in my site and change any strings of
index.php?pageid=
to
p/
This is intended to shorten my links from something like:
www.domain.com/index.php?pageid=page1
to
www.domain.com/p/page1
I already have the .htaccess file set up properly, however this shell script is not working for me and I believe it's because of the ? or the = symbol in the original string that is messing up the regular expression.
How might I go about fixing this? I am terrible with regex.
The dot . and question mark ? are characters of special meaning and need to be escaped. As well, you need to either escape the forward slash in your replacement or use a different delimiter to avoid escaping.
perl -i.bak -pe 's!index\.php\?pageid=!p/!g'

Enclosing strings with forward slashes using AWK

I have a php file in which split() function was used extensively. I replaced it to preg_split using sed and find commands. The problem now is preg_split requires the regex pattern to be enclosed in delimiters while split does not require it.
I have tried using SED to enclose the strings with delimiters but SED is unable to it according to my knowledge. I have come to know that AWK kan solve this problem.
I want
preg_split('\r\n', $some_string);
to be modified as
preg_split('/\r\n/', $some_string);
where the forward slashes work as delimiters. How can this be done using AWK?
sed is perfectly capable of this.
sed "s:\(preg_split('\)\(([^']*\)':\1/\2/':g" file.php
Your sed dialect might want a different mix of backslashes; or use Perl (or, ugh, PHP);
perl -pi~ -e "s:(preg_split\(')([^']*)':$1/$2/':g" file.php
(Notice the -i flag for in-place editing; perhaps your sed supports that, too?)
I'm imagining your problem was with quoting rather than with the actual sed regex. Getting single quotes properly quoted in the shell can be a challenge. (In the worst case, put your shell script in a file so the shell won't see it.) And of course, using a different delimiter instead of slash makes the expression simpler.
That should work as you expect:
sed "s#preg_split('\(.*\)'#preg_split('/\1/'#g"
As #Stephen P mentioned in comment. You can use different delimeters with sed. If your delimiter is used in regex or replacement string you have to escape it using \. It's always simplier to use the delimiter which does not exist in your regex and replacement string. Here, I used #.

tough string to remove from a bunch of php files using perl

I am getting more and more bald as I yank hair out over what should be a simple thing. I have a fragment of a hack attempt left in some PHP files (100s).
The string is:
<?*god_mode_on*/eval(base64_decode("")); /*god_mode_off*/ ?>
And I thought that using a perl command line such as:
perl -pn -i.bak -e "s{<\?\*god_mode_on\*/eval\(base64_decode\(""\)\); /\*god_mode_off\*/ \?>}{}g;" `find . -name '*.php'`
Would neatly produce a backup and strip the string out but it seems to carefully avoid doing so. I think I may have perl blindness now as I have been looking at it for so long so hopefully someone might directly see the problem and let me know how slow I've been ;-)
Thanks!
Keeping track of everything that needs to be escaped is not simple. It looks like you have not escaped your double quotes inside a double quoted string, for example. Perl has the quotemeta function that helps you figure this out:
print quotemeta '<?*god_mode_on*/eval(base64_decode("")); /*god_mode_off*/ ?>';
===> \<\?\*god_mode_on\*\/eval\(base64_decode\(\"\"\)\)\;\ \/\*god_mode_off\*\/\ \?\>
Within a regular expression, the \Q escape will invoke quotemeta on everything up to the next \E escape, so you can say:
perl -p -i.bak -e \
's{\Q<?*god_mode_on*/eval(base64_decode("")); /*god_mode_off*/ ?>\E}{}g' \
`find . -name '*.php'`
Notice that I used single quotes instead of double quotes for the argument to the -e command-line switch. Otherwise, you would also have to worry about the shell interpolating your input and opening up a whole other can of worms.
(Also, the -pn switch is redundant -- it is sufficient to use -p)
Is it possible that whatever shell you are using is interpreting the backslashes? You may need to escape them (with another backslash) so they actually get passed to perl as backslashes.

Proper Perl syntax for complex substitution

I've got a large number of PHP files and lines that need to be altered from a standard
echo "string goes here"; syntax to:
custom_echo("string goes here");
This is the line I'm trying to punch into Perl to accomplish this:
perl -pi -e 's/echo \Q(.?*)\E;/custom_echo($1);/g' test.php
Unfortunately, I'm making some minor syntax error, and it's not altering "test.php" in the least. Can anyone tell me how to fix it?
Why not just do something like:
perl -pi -e 's|echo (\".*?\");|custom_echo($1);|g' file.php
I don't think \Q and \E are doing what you think they're doing. They're not beginning and end of quotes. They're in case you put in a special regex character (like .) -- if you surround it by \Q ... \E then the special regex character doesn't get interpreted.
In other words, your regular expression is trying to match the literal string (.?*), which you probably don't have, and thus substitutions don't get made.
You also had your ? and * backwards -- I assume you want to match non-greedily, in which case you need to put the ? as a non-greedy modifier to the .* characters.
Edit: I also strongly suggest doing:
perl -pi.bak -e ... file.php
This will create a "backup" file that the original file gets copied to. In my above example, it'll create a file named file.php.bak that contains the original, pre-substitution contents. This is incredibly useful during testing until you're certain that you've built your regex properly. Hell, disk is cheap, I'd suggest always using the -pi.bak command-line operator.
You put your grouping parentheses inside the metaquoting expression (\Q(pattern)\E) instead of outside ((\Qpattern\E)), so your parentheses also get escaped and your regex is not capturing anything.

What is the best way to do string manipulation in a shell script?

I have a path as a string in a shell-script, could be absolute or relative:
/usr/userName/config.cfg
or
../config.cfg
I want to extract the file name (part after the last /, so in this case: "config.cfg")
I figure the best way to do this is with some simple regex?
Is this correct? Should or should I use sed or awk instead?
Shell-scripting's string manipulation features seem pretty primative by themselves, and appear very esoteric.
Any example solutions are also appreciated.
If you're okay with using bash, you can use bash string expansions:
FILE="/path/to/file.example"
FILE_BASENAME="${FILE##*/}"
It's a little cryptic, but the braces start the variable expansion, and the double hash does a greedy removal of the specified glob pattern from the beginning of the string.
Double %% does the same thing from the end of a string, and a single percent or hash does a non-greedy removal.
Also, a simple replace construct is available too:
FILE=${FILE// /_}
would replace all spaces with underscores for instance.
A single slash again, is non-greedy.
Instead of string manipulation I'd just use
file=`basename "$filename"`
Edit:
Thanks to unwind for some newer syntax for this (which assumes your filename is held in $filename):
file=$(basename $filename)
Most environments have access to perl and I'm more comfortable with that for most string manipulation.
But as mentioned, from something this simple, you can use basename.
I typically use sed with a simple regex, like this:
echo "/usr/userName/config.cfg" | sed -e 's+^.*/++'
result:
>echo "/usr/userName/config.cfg" | sed -e 's+^.*/++'
config.cfg