tough string to remove from a bunch of php files using perl - regex

I am getting more and more bald as I yank hair out over what should be a simple thing. I have a fragment of a hack attempt left in some PHP files (100s).
The string is:
<?*god_mode_on*/eval(base64_decode("")); /*god_mode_off*/ ?>
And I thought that using a perl command line such as:
perl -pn -i.bak -e "s{<\?\*god_mode_on\*/eval\(base64_decode\(""\)\); /\*god_mode_off\*/ \?>}{}g;" `find . -name '*.php'`
Would neatly produce a backup and strip the string out but it seems to carefully avoid doing so. I think I may have perl blindness now as I have been looking at it for so long so hopefully someone might directly see the problem and let me know how slow I've been ;-)
Thanks!

Keeping track of everything that needs to be escaped is not simple. It looks like you have not escaped your double quotes inside a double quoted string, for example. Perl has the quotemeta function that helps you figure this out:
print quotemeta '<?*god_mode_on*/eval(base64_decode("")); /*god_mode_off*/ ?>';
===> \<\?\*god_mode_on\*\/eval\(base64_decode\(\"\"\)\)\;\ \/\*god_mode_off\*\/\ \?\>
Within a regular expression, the \Q escape will invoke quotemeta on everything up to the next \E escape, so you can say:
perl -p -i.bak -e \
's{\Q<?*god_mode_on*/eval(base64_decode("")); /*god_mode_off*/ ?>\E}{}g' \
`find . -name '*.php'`
Notice that I used single quotes instead of double quotes for the argument to the -e command-line switch. Otherwise, you would also have to worry about the shell interpolating your input and opening up a whole other can of worms.
(Also, the -pn switch is redundant -- it is sufficient to use -p)

Is it possible that whatever shell you are using is interpreting the backslashes? You may need to escape them (with another backslash) so they actually get passed to perl as backslashes.

Related

Expand environment variable inside Perl regex

I am having trouble with a short bash script. It seems like all forward slashes needs to be escaped. How can required characters in expanded (environment) variables be escaped before perl reads them? Or some other method that perl understands.
This is what I am trying to do, but this will not work properly.
eval "perl -pi -e 's/$HOME\/_TV_rips\///g'" '*$videoID.info.json'
That is part of a longer script where videoID=$1. (And for some reason perl expands variables both within single and double quotes.)
This simple workaround with no forward slash in the expanded environment variable $USER works. But I would like to not have /Users/ hard coded:
eval "perl -pi -e 's/\/Users\/$USER\/_TV_rips\///g'" '*$videoID.info.json'
This is probably solvable in some better way fetching home dir for files or something else. The goal is to remove the folder name in youtube-dl's json data.
I am using perl just because it can handle extended regex. But perl is not required. Any better substitute for extended regex on macOS is welcome.
You are building the following Perl program:
s//home/username\/_TV_rips\///g
That's quite wrong.
You shouldn't be attempting to build Perl code from the shell in the first place. There are a few ways you could pass values to the Perl code instead of generating Perl code. Since the value is conveniently in the environment, we can use
perl -i -pe's/\Q$ENV{HOME}\E\/_TV_rips\///' *"$videoID.info.json"
or better yet
perl -i -pe's{\Q$ENV{HOME}\E/_TV_rips/}{}' *"$videoID.info.json"
(Also note the lack of eval and the fixed quoting on the glob.)
Just assembling the ideas in comments, this should achieve what you expected :
perl -pi -e 's{$ENV{HOME}/_TV_rips/}{}g' *$videoID.info.json
#ikegami thanks for your comment! It is indeed safer with \Q...\E, in case $HOME contains characters like $.
All RegEx delimiters must of cource be escaped in input String.
But as Stefen stated, you can use other delimiters in perl, like %, ยง.
Special characters
# Perl comment - don't use this
?,[], {}, $, ^, . Regex control chars - must be escaped in Regex. That makes it easier if you have many slashes in your string.
You should always write a comment to make clear you are using different delimiters, because this makes your regex hard to read for inexperienced users.
Try out your RegEx here: https://regex101.com/r/cIWk1o/1

Regex not working in Bash

I have this regex for now
It should catch something like this
org.package;version="[1.0.41, 1.0.51)" and "," optionally if it is not last element.
Also if after package i added .* because the package could be "org.package.util.something" until ";version"
I tried it online in Regex tool and it is working like this
org.package(.*.*)?;version="[[0-9].[0-9].[0-9][0-9],\s[0-9].[0-9].[0-9][0-9])",?
but i dont know what should i change so it can work in bash
package="org.package"
sed -i "s/"$$package.*;version="\[[0-9].[0-9].[0-9][0-9],[[:space:]][0-9].[0-9].[0-9][0-9]\)",?"//g" "$file"
Change the double quotes arround sed command by single quotes, because variable expansion of $package single quotes are closed and double quotes are use arround variable
package="org.package"
sed -i 's/'"$package"'.*;version="\[[0-9].[0-9].[0-9][0-9],[[:space:]][0-9].[0-9].[0-9][0-9]\)",?//g' "$file"
before using command with -i option check the output is correct
There is more than one problem
$$ will be replaced by bash with its PID, that's probably not what you want
online regex evaluators usually use extended regex or perl regex syntax
sed -r will enable extended regex mode. (for grep there's -E and -P)
You use . when you want to match literal dots. However you should be using \., because . actually means "any character" in regular expressions.

sed regexp, number reformatting: how to escape for bash

I have a working (in macOS app Patterns) RegExp that reformats GeoJSON MultiPolygon coordinates, but don't know how to escape it for sed.
The file I'm working on is over 90 Mb in size, so bash terminal looks like the ideal place and sed the perfect tool for the job.
Search Text Example:
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
Desired outcome:
[[[37.9017735,69.400367955],[37.90098431,69.400425761],[37.90004869,69.400489545],[37.89915455,69.400578128],[37.89840665,69.400660744],[37.89747072,69.400762152],[37.89628639,69.400905283],[37.89545822,69.401014028],[37.89479369,69.401113128],[37.89414564,69.401195094],[37.89362565,69.401281229],[37.89276089,69.401414764],[37.89196611,69.401540312],[37.891721,69.401587053],[37.89137614,69.401634443],[37.89136515,69.401635893],[37.89114453,69.401663531],
My current RegExp:
((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))
and reformatting:
$1\.$2$4,$6.$7$9
The command should be something along these lines:
sed -i -e 's/ The RegExp escaped /$1\.$2$4,$6.$7$9/g' large_file.geojson
But what should be escaped in the RegExp to make it work?
My attempts always complain of being unbalanced.
I'm sorry if this has already been answered elsewhere, but I couldn't find even after extensive searching.
Edit: 2017-01-07: I didn't make it clear that the file contains properties other than just the GPS-points. One of the other example values picked from GeoJSON Feature properties is "35.642.1.001_001", which should be left unchanged. The braces check in my original regex is there for this reason.
That regex is not legal in sed; since it uses Perl syntax, my recommendation would be to use perl instead. The regular expression works exactly as-is, and even the command line is almost the same; you just need to add the -p option to get perl to operate in filter mode (which sed does by default). I would also recommend adding an argument suffix to the -i option (whether using sed or perl), so that you have a backup of the original file in case something goes horribly wrong. As for quoting, all you need to do is put the substitution command in single quotation marks:
perl -p -i.bak -e \
's/((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))/$1\.$2$4,$6.$7$9/g' \
large_file.geojson
If your data is just like you showed, you needn't worry about the brackets. You may use a POSIX ERE enabled with -E (or -r in some other distributions) like this:
sed -i -E 's/([0-9]{2})([0-9]*)\.([0-9]+)/\1.\2\3/g' large_file.geojson
Or a POSIX BRE:
sed -i 's/\([0-9]\{2\}\)\([0-9]*\)\.\([0-9]\+\)/\1.\2\3/g' large_file.geojson
See an online demo.
You may see how this regex works here (just a demo, not proof).
Note that in POSIX BRE you need to escape { and } in limiting / range quantifiers and ( and ) in grouping constructs, and the + quantifier, else they denote literal symbols. In POSIX ERE, you do not need to escape the special chars to make them special, this POSIX flavor is closer to the modern regexes.
Also, you need to use \n notation inside the replacement pattern, not $n.
A simple sed will do it:
$ echo "$var"
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
$ echo "$var" | sed 's/\([0-9]\{3\}\)\./.\1/g'
[[[379.017735,6940.0367955],[379.0098431,6940.0425761],[379.0004869,6940.0489545],[378.9915455,6940.0578128],[378.9840665,6940.0660744],[378.9747072,6940.0762152],[378.9628639,6940.0905283],[378.9545822,6940.1014028],[378.9479369,6940.1113128],[378.9414564,6940.1195094],[378.9362565,6940.1281229],[378.9276089,6940.1414764],[378.9196611,6940.1540312],[378.91721,6940.1587053],[378.9137614,6940.1634443],[378.9136515,6940.1635893],[378.9114453,6940.1663531],

Comment out multiline log statements using perl one liner regex substitution

I have a perl one-liner that works when log statements are on a single line:
find -type f -iname "*java" | xargs -d'\n' -n 1 perl -i -pe 's{(log.*((info)|(debug)).*)}{//$1}gi'
But trying to modify this to work on multiple lines is tricky. I know that the s modifier will match newlines, but how do I get it to comment out subsequent lines (i.e. up to ; assuming the log string doesn't have it)?
I'm fine with a solution that makes multi-line log statements into single-line log statements too. I'll also accept C-style comments (though it would be nice to find a solution for C++ style comments).
(Don't tell me to turn off logging. Anyone who's actually tried that will realize how brutally complicated that is in non-trivial applications.)
Just a general idea (please adapt to your case...)
...perl -i -p0e 's{(log.*?((info)|(debug)).*?;)}{ $1 =~ s!^|\n!\n//!gr }gsei'
where:
.*? instead of .* to behave non-greddy
-p0e to process the full text as a single record
$1 =~ s!^|\n!\n//!gr to make extra processing of internal newlines
Please test it before application...
You can use the range operator, start .. stop:
perl -i -pe 's!^!//! if /log.*(info|debug)/ .. /;/'

What is the correct usage for this perl script?

Consider the following perl shell script;
perl -p -i.bak -e 's/index.php?pageid=/p//g' `grep -ril index.php?pageid= *`
I am trying to recursively go through all web directories in my site and change any strings of
index.php?pageid=
to
p/
This is intended to shorten my links from something like:
www.domain.com/index.php?pageid=page1
to
www.domain.com/p/page1
I already have the .htaccess file set up properly, however this shell script is not working for me and I believe it's because of the ? or the = symbol in the original string that is messing up the regular expression.
How might I go about fixing this? I am terrible with regex.
The dot . and question mark ? are characters of special meaning and need to be escaped. As well, you need to either escape the forward slash in your replacement or use a different delimiter to avoid escaping.
perl -i.bak -pe 's!index\.php\?pageid=!p/!g'