Comment out multiline log statements using perl one liner regex substitution - regex

I have a perl one-liner that works when log statements are on a single line:
find -type f -iname "*java" | xargs -d'\n' -n 1 perl -i -pe 's{(log.*((info)|(debug)).*)}{//$1}gi'
But trying to modify this to work on multiple lines is tricky. I know that the s modifier will match newlines, but how do I get it to comment out subsequent lines (i.e. up to ; assuming the log string doesn't have it)?
I'm fine with a solution that makes multi-line log statements into single-line log statements too. I'll also accept C-style comments (though it would be nice to find a solution for C++ style comments).
(Don't tell me to turn off logging. Anyone who's actually tried that will realize how brutally complicated that is in non-trivial applications.)

Just a general idea (please adapt to your case...)
...perl -i -p0e 's{(log.*?((info)|(debug)).*?;)}{ $1 =~ s!^|\n!\n//!gr }gsei'
where:
.*? instead of .* to behave non-greddy
-p0e to process the full text as a single record
$1 =~ s!^|\n!\n//!gr to make extra processing of internal newlines
Please test it before application...

You can use the range operator, start .. stop:
perl -i -pe 's!^!//! if /log.*(info|debug)/ .. /;/'

Related

sed regexp, number reformatting: how to escape for bash

I have a working (in macOS app Patterns) RegExp that reformats GeoJSON MultiPolygon coordinates, but don't know how to escape it for sed.
The file I'm working on is over 90 Mb in size, so bash terminal looks like the ideal place and sed the perfect tool for the job.
Search Text Example:
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
Desired outcome:
[[[37.9017735,69.400367955],[37.90098431,69.400425761],[37.90004869,69.400489545],[37.89915455,69.400578128],[37.89840665,69.400660744],[37.89747072,69.400762152],[37.89628639,69.400905283],[37.89545822,69.401014028],[37.89479369,69.401113128],[37.89414564,69.401195094],[37.89362565,69.401281229],[37.89276089,69.401414764],[37.89196611,69.401540312],[37.891721,69.401587053],[37.89137614,69.401634443],[37.89136515,69.401635893],[37.89114453,69.401663531],
My current RegExp:
((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))
and reformatting:
$1\.$2$4,$6.$7$9
The command should be something along these lines:
sed -i -e 's/ The RegExp escaped /$1\.$2$4,$6.$7$9/g' large_file.geojson
But what should be escaped in the RegExp to make it work?
My attempts always complain of being unbalanced.
I'm sorry if this has already been answered elsewhere, but I couldn't find even after extensive searching.
Edit: 2017-01-07: I didn't make it clear that the file contains properties other than just the GPS-points. One of the other example values picked from GeoJSON Feature properties is "35.642.1.001_001", which should be left unchanged. The braces check in my original regex is there for this reason.
That regex is not legal in sed; since it uses Perl syntax, my recommendation would be to use perl instead. The regular expression works exactly as-is, and even the command line is almost the same; you just need to add the -p option to get perl to operate in filter mode (which sed does by default). I would also recommend adding an argument suffix to the -i option (whether using sed or perl), so that you have a backup of the original file in case something goes horribly wrong. As for quoting, all you need to do is put the substitution command in single quotation marks:
perl -p -i.bak -e \
's/((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))/$1\.$2$4,$6.$7$9/g' \
large_file.geojson
If your data is just like you showed, you needn't worry about the brackets. You may use a POSIX ERE enabled with -E (or -r in some other distributions) like this:
sed -i -E 's/([0-9]{2})([0-9]*)\.([0-9]+)/\1.\2\3/g' large_file.geojson
Or a POSIX BRE:
sed -i 's/\([0-9]\{2\}\)\([0-9]*\)\.\([0-9]\+\)/\1.\2\3/g' large_file.geojson
See an online demo.
You may see how this regex works here (just a demo, not proof).
Note that in POSIX BRE you need to escape { and } in limiting / range quantifiers and ( and ) in grouping constructs, and the + quantifier, else they denote literal symbols. In POSIX ERE, you do not need to escape the special chars to make them special, this POSIX flavor is closer to the modern regexes.
Also, you need to use \n notation inside the replacement pattern, not $n.
A simple sed will do it:
$ echo "$var"
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
$ echo "$var" | sed 's/\([0-9]\{3\}\)\./.\1/g'
[[[379.017735,6940.0367955],[379.0098431,6940.0425761],[379.0004869,6940.0489545],[378.9915455,6940.0578128],[378.9840665,6940.0660744],[378.9747072,6940.0762152],[378.9628639,6940.0905283],[378.9545822,6940.1014028],[378.9479369,6940.1113128],[378.9414564,6940.1195094],[378.9362565,6940.1281229],[378.9276089,6940.1414764],[378.9196611,6940.1540312],[378.91721,6940.1587053],[378.9137614,6940.1634443],[378.9136515,6940.1635893],[378.9114453,6940.1663531],

What is the correct usage for this perl script?

Consider the following perl shell script;
perl -p -i.bak -e 's/index.php?pageid=/p//g' `grep -ril index.php?pageid= *`
I am trying to recursively go through all web directories in my site and change any strings of
index.php?pageid=
to
p/
This is intended to shorten my links from something like:
www.domain.com/index.php?pageid=page1
to
www.domain.com/p/page1
I already have the .htaccess file set up properly, however this shell script is not working for me and I believe it's because of the ? or the = symbol in the original string that is messing up the regular expression.
How might I go about fixing this? I am terrible with regex.
The dot . and question mark ? are characters of special meaning and need to be escaped. As well, you need to either escape the forward slash in your replacement or use a different delimiter to avoid escaping.
perl -i.bak -pe 's!index\.php\?pageid=!p/!g'

Sed dynamic backreference replacement

I am trying to use sed for transforming wikitext into latex code. I am almost done, but I would like to automate the generation of the labels of the figures like this:
[[Image(mypicture.png)]]
... into:
\includegraphics{mypicture.png}\label{img-1}
For what I would like to keep using sed. The current regex and bash code I am using is the following:
__tex_includegraphics="\\\\includegraphics[width=0.95\\\\textwidth]{$__images_dir\/"
__tex_figure_pre="\\\\begin{figure}[H]\\\\centering$__tex_includegraphics"
__tex_figure_post="}\\\\label{img-$__images_counter}\\\\end{figure}"
sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"\
... but I cannot make that counter to be increased. Any ideas?
Within a more general perspective, my question would be the following: can I use a backreference in sed for creating a replacement that is different for each of the matches of sed? This is, each time sed matches the pattern, can I use \1 as the input of a function and use the result of this function as the replacement?
I know it is a tricky question and I might have to use AWK for this. However, if somebody has a solution, I would appreciate his or her help.
This might work for you (GNU sed):
sed -r ':a;/'"$PATTERN"'/{x;/./s/.*/echo $((&+1))/e;/./!s/^/1/;x;G;s/'"$PATTERN"'(.*)\n(.*)/'"$PRE"'\2'"$POST"'\1/;ba}' file
This looks for a PATTERN contained in a shell variable and if not presents prints the current line. If the pattern is present it increments or primes the counter in the hold space and then appends said counter to the current line. The pattern is then replaced using the shell variables PRE and POST and counter. Lastly the current line is checked for further cases of the pattern and the procedure repeated if necessary.
You could read the file line-by-line using shell features, and use a separate sed command for each line. Something like
exec 0<input_file
while read line; do
echo $line | sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"
__images_counter=$(expr $__images_counter + 1)
done
(This won't work if there are multiple matches in a line, though.)
For the second part, my best idea is to run sed or grep to find what is being matched, and then run sed again with the value of the function of the matched text substituted into the command.

tough string to remove from a bunch of php files using perl

I am getting more and more bald as I yank hair out over what should be a simple thing. I have a fragment of a hack attempt left in some PHP files (100s).
The string is:
<?*god_mode_on*/eval(base64_decode("")); /*god_mode_off*/ ?>
And I thought that using a perl command line such as:
perl -pn -i.bak -e "s{<\?\*god_mode_on\*/eval\(base64_decode\(""\)\); /\*god_mode_off\*/ \?>}{}g;" `find . -name '*.php'`
Would neatly produce a backup and strip the string out but it seems to carefully avoid doing so. I think I may have perl blindness now as I have been looking at it for so long so hopefully someone might directly see the problem and let me know how slow I've been ;-)
Thanks!
Keeping track of everything that needs to be escaped is not simple. It looks like you have not escaped your double quotes inside a double quoted string, for example. Perl has the quotemeta function that helps you figure this out:
print quotemeta '<?*god_mode_on*/eval(base64_decode("")); /*god_mode_off*/ ?>';
===> \<\?\*god_mode_on\*\/eval\(base64_decode\(\"\"\)\)\;\ \/\*god_mode_off\*\/\ \?\>
Within a regular expression, the \Q escape will invoke quotemeta on everything up to the next \E escape, so you can say:
perl -p -i.bak -e \
's{\Q<?*god_mode_on*/eval(base64_decode("")); /*god_mode_off*/ ?>\E}{}g' \
`find . -name '*.php'`
Notice that I used single quotes instead of double quotes for the argument to the -e command-line switch. Otherwise, you would also have to worry about the shell interpolating your input and opening up a whole other can of worms.
(Also, the -pn switch is redundant -- it is sufficient to use -p)
Is it possible that whatever shell you are using is interpreting the backslashes? You may need to escape them (with another backslash) so they actually get passed to perl as backslashes.

Strange behaviour with command-line perl

I have a file that I'm trying to modify using perl from the terminal in Ubuntu Linux (Natty).
The name of the file is vm.args and the first two lines are as follows:
## Name of the riak node
-name riak#127.0.0.1
I am trying to use perl to update the ip address. Below is my code:
riak_ip=`ifconfig eth1 | grep "inet addr" | cut -d ":" -f2 | cut -d " " -f1`
perl -0777 -i -pe "s/(\-name[\t ]*riak\#)[^\n]+/\1$riak_ip/g" vm.args
Let's assume the ip address I get is 10.181.106.32. The perl command gives me a result I can't understand. The resulting first two lines in the my file after I run the above in the terminal become:
## Name of the riak node
H.181.106.32
Which is the letter H and part of the ip address.
I can't seem to figure out what I'm doing wrong and will appreciate some assistance.
Thanks in advance.
This seems to work reliably:
perl -0777 -i -pe "s/(-name\\s*riak#).*/\${1}$riak_ip/g" vm.args
The "\\1$riak_ip" seems to cause some problems since perl was seeing it as "\1172.20.2.136" if $riak_ip was 172.20.2.136. My guess is that the back reference to "1172" was causing some weirdness. Anyway, switching to the ${1} form removes the possibility for misinterpretation (pun intended).
This really should all be done in Perl, which is much better at extracting data from text than shell script. Something like this should work, but I cannot test it at present.
perl -0777 -i -pe '($ip)=`ifconfig eth1`=~/inet addr:([\d.]+)/;s/-name\s+riak#\K[\d.]+/$ip/g;' vm.args
I would be grateful if someone could confirm whether this works OK. Beware that the \K construct in Perl regexes is a recent addition and may not be in any given installation of Perl.
Problem is that \1 gets concatenated with the first IP octet. To make it work despite concatenation, the ${1} syntax needs to be used and properly quoted. This works:
perl -0777 -i -pe "s/(\-name[\t ]*riak\#)[^\n]+/\${1}$riak_ip/g" vm.args
You might consider to use single quotes for the regex parts, to remove one layer of quoting:
perl -0777 -i -pe 's/(-name[\t ]*riak#)[^\n]+/${1}'"$riak_ip"'/g' vm.args
(Edited/corrected according to comments, my previous suggestion was wrong.)
Sounds like a good use for the \K sequence (v5.10). And [^\n] is actually ., unless the /s modifier is used. No need for /g option unless you intend to replace the string several times.
perl -0777 -i -pe "s/\-name[\t ]*riak\#\K.+/$riak_ip/" vm.args
This would be the correct regexp:
perl -0777 -i -pe "s/(-name\s*riak#)\S+/$1$riak_ip/g" vm.args
Result:
## Name of the riak node
10.181.106.32
Use \s for space characters, and \S (no space character) to match the whole IP address. In the replacement string, $1 is used instead \1. - and # are not special, so there is no need to escape them, although there is nothing bad with them.