complex batch replacement linux - regex

I'm trying to do some batch replacement for/with a fairly complex pattern
So far I find the pattern as:
find '(' -name '*.php' -o -name '*.html' ')' -exec grep -i -n 'hello' {} +
The string I want to replace is currently as follow:
<img src="/some/path/to/somewhere/hello" />
where the path for the image varies but always contain the sub-string 'hello' at the end
I would like to grab the path and perform a replacement as follow:
<img src="<?php myfunction('(/some/path/to/somewhere/)'); ?>" />
What would a good way to perform this?
Any help will be appreciate it.

Replace -exec grep ..... with
-exec cat '{}' | sed s/<img src="(/some/path/to/somewhere/)hello" />/<img src="<?php myfunction('(\1)(constant)'); ?>" />/ > /tmp/output; mv /tmp/output '{}' ';'
escaping spaces, " and / symbols in sed's search and replacement patterns with backslahes as sed likes.
For instance, this will extract /path/:
echo "<img src=\"/path/hello\"" | sed "s/<img\ src=\"\(\/path\/\)hello/\1/"

Related

regular expressions in exec sed

Could you help me with regular expressions in exec sed?
Example code:
<?php echo "This code need to delete"; ?><? echo 'This code need to keep'; ?>
I need to delete:
<?php echo "This code need to delete"; ?>
In all files, and keep
<? echo 'This code need to keep'; ?>
I tried to do it like this:
find ./ -type f -name \*.php -exec sed -i -r 's/<\?php.*\?>//g' {} \;
But this way doesn't work correctly. (delete all code)
Use a negated character class instead of .* because .* is greedy which matches any character as much as possible.
find ./ -type f -name \*.php -exec sed -i -r 's/<\?php[^>]*\?>//g' {} \;
You could use -name '*.php' instead of -name \*.php in the above.
Example:
$ echo '<?php echo "This code need to delete"; ?><? echo 'This code need to keep'; ?>' | sed -r 's/<\?php[^>]*\?>//g'
<? echo This code need to keep; ?>
Using gnu awk you can do this to get rid of first <?php...?> block:
cat file
<?php echo "This code need to delete"; ?><? echo 'This code need to keep'; ?>
awk -v RS='\\?>' '!/<\?php /{printf $0 RT}' file
<? echo 'This code need to keep'; ?>

Regular expression search and replace across whole directory in Linux terminal

In my PHP project, I have a PHP function that does some language stuff and is being called as:
<?php echo __('STRING'); ?>
I would like to switch from the consistent usage of uppercase string indexes, to a consistent usage of lowercase string indexes, so I would like to replace all these occurances:
__('SOMETHING')
With:
__('something')
What would be the command to do this?
I have a command ready for easy search & replace functionality, but I don't know how to write the regex.
find . -name "*.php" -print | xargs sed -i 's/search/replace/g'
You can use -print0 with xargs -0:
find . -name "*.php" -print0 | xargs -0 -I {} sed -i.bak 's/search/replace/g' {}
Use strtolower function.
Example:-
<?php
echo strtolower("Hello WORLD.");
?>
Result:-
hello world.

Pass sed output to mv

I'm trying to batch rename text files according to a string they contain.
I used sed to isolate the pattern with \( and \) as I couldn't get this to work in grep.
sed -i '' 's/<title>\(.*\)<\/title>/&/g' *.txt | mv *.txt $sed.txt
(the text I want to use as filename is between html title tags)`
Where I wrote $sed would be the output of sed.
hope that's clear!
A simple loop in bash can accomplish this. If each file is valid HTML, meaning you have only one <title> tag in the file, you can rename them all this way:
for file in *.txt; do
mv "$file" `sed -n 's/<title>\([^<]*\)<\/title>/\1/p;' $file| sed -e 's/[ ][ ]*/_/g'`.txt
done
So, if you have files 1.txt, 2.txt and 3.txt, each with cat, dog and my hippo in their TITLE tags, you'll end up with cat.txt, dog.txt and my_hippo.txt after the above loop.
EDIT: quoted initial $file in case there are spaces in filenames; and added a second sed to convert any spaces in the <title> tag to _'s in resulting filenames. NOTE the whitespace inside the []'s in the second sed command is a literal space and tab character.
You can enclose expression in grave accent characters (`) to make it insert its output to the place you want. Try:
mv *.txt `sed -i '' 's/<title>\(.*\)<\/title>/&/g' *.txt`.txt
It is rather not flexible, but should work.
(I haven't used it in a while and cannot test it now, so I might be wrong).
Here is the command I would use:
for i in *.txt ; do
sed "s=<title>\(.*\)</title>=mv '$i' '\1'=e" $i
done
The sed substitution search for pattern in each one of your .txt files. For each file it creates string mv 'file_name' 'found_pattern'.
With the e command at the end of sed commands, this resulting string is directly executed in terminal, thus it renames your files.
Some hints:
Note the use of =s instead of /s as delimiters for sed substition: it's more readable as you already have /s in your pattern (you could use many other symbols if you don't like =). And in this way you don't have to escape the / in your pattern.
The e command for sed executes the created string.
(I'm speaking of this one below:
sed "s=<title>\(.*\)</title>=mv '$i' '\1'=e" $i
^
)
So use it with caution! I would recommand to first use the line without final e: it won't execute any mv command, but just print instead what would be executed if you were to add the e.
What I read from your question is:
you have a number of text (html) files in a directory
each file contains at least the tag <title> ... </title>
you want to extract the content (elements.text) and use it as filename
last you want to rename that file to the extracted filename
Is this correct?
So, then you need to loop through the files, e.g. with xargs or find
ls '*.txt' | xargs -i\{\} command "{}" ...
find -maxdepth 1 -type f -name '*.txt' -exec command "{}" ... \;
I always replace the xargs substitues by -i\{\} because the resulting command is compatible if I use it sometimes with find and its substitute {}.
Next the -maxdepth option will help find not to dive deeper in directory, if no subdir, you can leave it out.
command could be something very simple like echo "Testing File: {}" or a really small script if you use it with bash:
find . -name '*.txt' -exec bash -c 'CUR_FILE="{}"; echo "Working on: $CUR_FILE"; ls -l "$CUR_FILE";' \;
The big decision for your question is: how to get the text from title element.
A simple solution (suitable if opening and closing tag is on same textline) would be by grep
A more solid solution is to use a HTML Parser and navigate by DOM operation
The simple solution base on:
get the title line
remove the everything before and after title content
So do it together:
ls *.txt | xargs -i\{\} bash -c 'TITLE=$(egrep "<title>[^<]*</title>" "{}"); NEW_FNAME=$(echo "$TITLE" | sed -e "s#.*<title>\([^<]*\)</title>.*#\1#"); mv -v "{}" "$NEW_FNAME.txt"'
Same with usage of find:
find . -maxdepth 1 -type f -name '*.txt' -exec bash -c 'TITLE=$(egrep "<title>[^<]*</title>" "{}"); NEW_FNAME=$(echo "$TITLE" | sed -e "s#.*<title>\([^<]*\)</title>.*#\1#"); mv -v "{}" "$NEW_FNAME.txt"' \;
Hopefully it is what you expected.

Escaping Regex in SED

I need to use Sed to do a search and replace. I'm replacing /**# for define('WP_POST_REVISIONS', 3);\n\n/**#.
But I can't figure out the proper escaping. Even after escaping the (obviously needed) single quotes, I still get a bash: syntax error near unexpected token ')'
What is the proper escaping in this case?
try to replace your:
find /start/path -name *.html -exec sed -ie 's|/**#|define(\'WP_POST_REVISIONS\', 3);|g' '{}' \;
with:
find /start/path -name '*.html' -print0 \
| xargs -0 -n 1 sed -ie 's|/\*\*#|define('\''WP_POST_REVISIONS'\'', 3);\n/\*\*#|g'
and tell us what it gives you
(I tried to guess you were looking for the actual string "/**#" in your file(s) ... please give us examples of what you are really looking for, if it isn't that actual string)
It is not sed escaping, but bash escaping.
Escaping does not work within single-quotes (')
You can use double-quotes ("), if you have no special characters like "$\ in the parameter (or escape them there if necessary):
find /start/path -name *.html -exec sed -ie "s/abc/define('WP_POST_REVISIONS', 3);/g" '{}' \;
Or quote using $', which supports escaping:
find /start/path -name *.html -exec sed -ie $'s/abc/define(\'WP_POST_REVISIONS\', 3);/g' '{}' \;

search and replace files in linux(sed)

I'm trying to search and replace the following:
<?php
<!DOCTYPE HTML>
with
<!DOCTYPE HTML>
so far I have tried this:
find . \( -name "*.php" \) -exec grep -Hn "<?php <\!DOCTYPE HTML>" {} \; -exec sed -i 's/<?php <\!DOCTYPE HTML>/<\!DOCTYPE HTML>/g' {} \;
But it's not finding any instances of files with my needle string which exists on my server.
find . -name "*.php" -exec grep -lZz '^<?php[[:space:]]\+<!DOCTYPE HTML>' {} + |
xargs -r0 sed -i '^<?php[[:space:]]*$/,1d'
Edit: The previous version didn't work due to the character \n in the pattern. The updated version avoid this character.
With GNU awk (for RS='\0' to read the whole file as one record) and assuming your file names don't contain newlines all you need is the clear, simple:
find . -name '*.php' -print |
while IFS= read -r file; do
gawk -v RS='\0' '{gsub(/<\?php\n<!DOCTYPE HTML>/,"<!DOCTYPE HTML>"); print}' "$file" > tmp &&
mv tmp "$file"
done