I've got a bunch of html files that I need to replace the following text:
<div id="header">
plus all info between
<!-- end #header -->
with
<?php include ("header.php"); ?>
I thought I could run something like this but it isn't matching the text:
perl -p -i.bak -e 's/<div id="header">.*<!\-\- end #header \-\->/<\?php include \("header\.php"\); \?>/g' *.html
or
perl -p -i.bak -e 's/<div id="header">[\S\s\n]*<!\-\- end \#header \-\->/<\?php include \("header\.php"\); \?>/img' *.html
I don't know if it's not searching across multiple lines and I need a parameter or I'm not escaping characters right. Any help would be appreciated.
I would like to batch run this in a directory and change all the content within each file where appropriate.
EDIT: looking for a single command line version, not using multiple pl files if possible.
You can set the record separator with the -0 option. Like so:
perl -0pe 's/.../.../g' *.html
This sets the record separator to the NUL character, so that the entire file is read at once, rather than line by line.
Related
I'm trying to use grep to copy out lines in a text file that match a certain pattern but, I'm running into some issues... I would like to grab the values in the "title=" container.
Code:
get_tmax=`grep '[0-9][0-9]°C' K0G7_ec_tmp`
echo "${get_tmax}" > K0G7_ec_tmp2
Text File Contents:
<p class="one" title="19°C">19</p>
<p class="two" title="26°C">26</p>
You can use grep -P with match reset \K:
grep -ioP 'title="\K[^"]+' K0G7_ec_tmp
19°C
26°C
However take caution while parsing HTML file using shell utilities grep/awk/sed etc. Better to use dedicated HTML parser for this job.
grep is shorthand for g/re/p which is not exactly what you're trying to do so I'd look at sed for this:
$ sed 's/.*title="\([^"]*\).*/\1/' file
19°C
26°C
That will work with any sed version on any OS.
I am having some problems with loading a php file and then replacing his content with something else.
my code looks like this
$pattern="*random text*"
$rep=" "
$where=`ls *.php`
find -f $where -name "*.php" -exec sed -i 's/$pattern/$rep/g' {} \;
This wont load entire line of text. Also is there a limit of how many character can $pattern load?
Also is there a way to make this .sh file execute on every 15min for example?
i am using mac osX.
Thanks!
The syntax $var="value" is wrong. You need to say var="value".
If you just want to do something on files matching *.php, you are doing it in just a directory, so there is no need to use find. Just use for loop:
pattern="*random text*"
rep=" "
for file in *.php
do
sed -i "s/$pattern/$rep/g" "$file"
done
See the usage of sed "s/$var/.../g" instead of sed 's/$var/.../g'. The double quotes expand the variables within the expression; otherwise, you would be looking for a literal $var.
Note that sed -i alone does not work in OS X, so you probably have to say sed -i ''.
Example of replacement:
Given a file:
$ cat a
hello
<?php eval(1234567890) regular php code ?>
bye
Let's remove everything from within eval():
$ sed -r 's/(eval\()[^)]*/\1X/' a
hello
<?php eval(X) regular php code ?>
bye
This is my one-line Perl command in a Bash script. How do I get the s to change across multiple lines?
#!/bin/bash
file=$1
echo "processing $file"
perl -0777pe 's/.*<script[ |>].*<\/script>/<script> \.\.\. <\/script>/g' "$file" >"${file}changed.txt"
I am inputting an XHTML file in this script. The Perl command line works fine when begin script and end script tags are in the same line. Perl does not find the begin script and end script tags when on separate lines.
Is there a problem with <> in a regular expression?
You're using -0777 which slurps the entire file. Now, all you need to do is add the /s switch to your regular expression so that the any character . will match new lines.
You probably also need to change your regex to be non-greedy .*?, and the regex can be simplified by using assertions and a different delimiter:
#!/bin/bash
file=$1
echo "processing $file"
perl -0777 -pe 's{<script\s*>\K.*?(?=</script>)}{ ... }gs' $file > "${file}changed.txt"
Switches:
-0777: Slurp the entire file
-p: Creates a while(<>){...; print} loop for each “line” in your input file.
-e: Tells perl to execute the code on command line.
Try this:
perl -0777pe 's:<script[ |>].*?</script>:<script> ... </script>:gs' "$file"
From the prelre:
s
Treat string as single line. That is, change "." to match any
character whatsoever, even a newline, which normally it would not
match.
The non-greed match .*? help with multiple <script> </script> blocks. Without it, (so greed match .*) for the
<script> some </script> <script> some2 </script>
will give only one
<script>...</script>
With the non-greed match (.*?) the for the same input will give
<script>...</script> <script>...</script>
I have a whole bunch of files that all have blocks of text in them that look like this:
"Each file has different text between the opening double-quote
and the closing right-quote (or whatever it's called)”
Perhaps not relevant, but in the past I have used grep to do a search and replace like this:
grep -Rl 'search' ./path/to/files/ | xargs sed -i 's/search/replace/g
Is there any way to do something similar, but use a regex to replace the opening plain old double-quote with a left-quote (“)? The only reliable way to replace the correct double-quote characters is to search on the right-quote, then backwards tot he previous double-quote. I think. I'm just not sure if that's possible or how to do it.
I could just do it with a PHP script, but then I wouldn't get to see if it's possible from the command line.
You can use sed:
sed -i.bak 's/"\([^”]*”\)/“\1/' file
cat file
“Each file has different text between the opening double-quote and the closing right-quote (or whatever it's called)”
I need help with my sed script. I have a XML-file where I have to remove everything except the text enclosed in these tags:
<TEXT>......</TEXT>
<HEADLINE>......</HEADLINE>
How do I write the sed code ? I know how to remove everything except the text enlosed in ONE tag.
s/.*<TEXT>\(.*\)<\/TEXT>.*/\1/
But how do i write the sed code for many tags ?
You can pass multiple commands to sed:
$ echo '<TEXT>Hello</TEXT>
<HEADLINE>there</HEADLINE>' | sed -n 's/.*<TEXT>\(.*\)<\/TEXT>.*/\1/gp; s/.*<HEADLINE>\(.*\)<\/HEADLINE>.*/\1/gp'
Hello
there
But you really should be careful when applying regex to XML-like files.
Assuming that you have valid XML:
sed '/.*<\(TEXT\|HEADLINE\)>\(.*\)<\/\(TEXT\|HEADLINE\)>.*/!d;s//\2/' yourfile.xml
If you want to use a sed script add this line:
/.*<\(TEXT\|HEADLINE\)>\(.*\)<\/\(TEXT\|HEADLINE\)>.*/!d;s//\2/
Then run:
sed -f yourscript.sed < yourfile.xml
This might work for you (GNU sed):
sed -r '/<(text|headline)>/I!d;s//&\n/;s/^[^\n]*\n//;:a;/<\//!{$!{N;ba}};s/\n/ /g;s/<\//\n&/;P;D' file
This removes all text accept that which is between TEXT and HEADLINE tags and on multi-line values replaces newlines with spaces.