batch process regular expression find and replace on folder and subfolders contents - regex

I have a folder with subfolders that contain text documents (hundreds). The text documents all require a find and replace. The regular expression I am using to find the text is:
^([A-Z])[\r\n]+(\w+)\b
This is being replaced by:
$1$2
How can I batch process this find and replace on a folder with subfolders?
I'm using a mac (osx 10.6.8)

You could use sed for this as well:
cd /path/to/files # make sure you are in the right directory
find . -type f -exec sed -i.bak 's/^([A-Z])[\r\n]+(\w+)\b/$1$2/g' {} \;
Edit: I just realized that the above is a Textmate search/replace string. For sed you'll have to use:
find . -type f -exec sed -i.bak 's/^([A-Z])[\r\n]+(\w+)\b/\1\2/g' {} \;
This makes a backup of all files.

You could do this using find and perl:
find ./* -exec perl -p -i -e 's/^([A-Z])[\r\n]+(\w+)\b/$1$2/g' {} \;
Warning: untested :)

Related

Find and rename files/folders using regex

I am trying to find a right regex for the filename that starts with I0[0-9][0-9]- eg: "I097-". I am not familiar with regex but using online, I came up with this [I][0][\d][\d][-], I am sure this is not the best regex pattern for the string I have, but I tested using online regex tools and it works. Now I want to use Linux 'find' to find all the files that match this regex and re-name the resulting files by replacing the matching string with nothing.
From:
I071-PTEN-7
./I071-PTEN-7/I071-PTEN-7.txt
To:
PTEN-7
./PTEN-7/PTEN-7.txt
command used:
find . -name "I0*" -type f -o -name "I0*" -type d -exec rename -n "s/[I][0][\d][\d][-]/''/" {} \;
But it doesn't seem to do anything, not sure what is going on. Any help to find the issue or solution would be greatly appreciated. Thanks.
Use -execdir option to get only filenames entries in find also there is no need to use character class around every character in your regex.
find . -name 'I0*' -execdir rename -n 's/^I0\d\d-//' {} \;
If rename isn't working then you may try this:
find . -type f -name 'I0*' -execdir bash -c 'mv "$1" "${1/I0[0-9][0-9]-/}"' - {} \; &&
find . -name 'I0*' -execdir bash -c 'mv "$1" "${1/I0[0-9][0-9]-/}"' - {} \;

How to rename all files in a folder removing everything after space character in linux?

Hello I can't use well the regular expressions it's all day I'm searching on Internet.
I have a folder with many pictures:
50912000 Bicchiere.jpg
50913714 Sottobottiglia Bernini.jpg
I'm using Mac OS X, but I can also try on a Ubuntu, I would like to make a script for bash to remove all the characters after the first space to have a solution like this:
50912000.jpg
50913714.jpg
For all the files in the folder.
Any help is appreciated.
Regards
Use pure BASH:
f='50912000 Bicchiere.jpg'
mv "$f" "${f/ *./.}"
Or using find fix all the files at once:
find . -type f -name "* *" -exec bash -c 'f="$1"; s="${f/_ / }"; mv -- "$f" "${s/ *./.}"' _ '{}' \;
Use sed,
sed 's/ .*\./\./g'
Notice the space before .*
You can use a combination of find and a small script.
prompt> find . -name "* *" -exec move_it {} \;
mv "./50912000 Bicchiere.jpg" ./50912000
mv "./50913714 Sottobottiglia Bernini.jpg" ./50913714
prompt> cat move_it
#!/bin/sh
dst=`echo $1 | cut -c 1-10`
# remove the echo in the line below to actually rename the file
echo mv '"'$1'"' $dst
With rename
rename 's/.*\s+//' *files

How can i replace the all email address found in particular folder in linux

I have some scripts in some folder. like /var/www/sites
Now i want to replace all the email address hardcoded in the scripts in all folders and subfolders and replace with my email address
how can i do that.
I can find using
grep -rn "abc#gmail.com" /var/www/sites/
But i don't know how to use regex and replace
Try perl:
perl -p -i -e 's/abc#gmail.com/new#gmail.com/g' /var/www/sites/*
Or with perl/find:
find /var/www/sites/ -exec perl -p -i -e 's/abc#gmail.com/new#gmail.com/g' {} \;
Open a shell, then
if you have bash4 :
oldmail="abc#gmail.com"
newmail="myemail#provider.tld"
shopt -s globstar
sed -i "/$oldmail/s/$oldmail/$newmail/g" /var/www/sites/**/*
if not :
oldmail="abc#gmail.com"
newmail="myemail#provider.tld"
find /var/www/sites -type f -exec sed -i "/$oldmail/s/$oldmail/$newmail/g" {} +
This solutions have the advantage to not modify the timestamps in the files even if the file doesn't contains the searched string, unlike sed -i & perl -i -pe solutions without a previous grep (I do this here with /pattern/)
find /var/www/sites -type f | xargs sed --in-place 's/abc#gmail\.com/mynewemail#elsewhere.com/g'
Try sed.
grep -rl "abc#gmail.com" /var/www/sites/ | xargs sed -i 's/oldemail/newemail/g'
Edit:
Took feedback into account. Sorry about the previously wrong solution!

sed mass replace CSS styles via terminal

I want to replace all instances of font-family: ([A-Za-z ,"]+){1}; with font-family: Verdana using sed. In the past, the following command has worked for simple search & replace:
find ./ -type f -exec sed -i 's/needle/replace/' {} \;
However, I tried the following regex with no success:
find ./ -type f -exec sed -i 's/(font\-family:){1}([\"A-Za-z, ]+){1}(;){1}/font\-family: Verdana;/' {} \;
I'm on Red Hat Enterprise Linux Server release 5.6. Additionally, the first command seems to only work on the first instance in any given file, which means I have to rerun the command until every instance gets replaced... can I improve the command to work on all instances of all files?
First, an explanation of why yours doesn't work. You need to escape all of your parentheses, square brackets, and the +, so the following should work:
sed -i 's/\(font\-family:\)\{1\}\(["A-Za-z, ]\+\)\{1\}\(;\)\{1\}/font-family: Verdana;/'
Fortunately you can add the -r switch to prevent the need for all of that escaping, but you can also simplify your current expression quite a bit. You do not need to put every section into a capturing group, and adding {1} to every group is redundant (that is basically the default). So you could reduce it to:
sed -ri 's/font-family:["A-Za-z, ]+;/font-family: Verdana;/g'
Note the added g option for global replacement, since you want this for every occurrence.
All together:
find ./ -type f -exec sed -ri 's/font-family:["A-Za-z, ]+;/font-family: Verdana;/g' {} \;
the problem is, you need -r in your sed, since you used +
see the test below:
kent$ echo "oldstring_0000"|sed 's/[0]+/newstring/'
oldstring_0000
nothing happened.
now with -r:
kent$ echo "oldstring_0000"|sed -r 's/[0]+/newstring/'
oldstring_newstring
also if you want to replace all, you need 'g' like 's/a/b/g'
I'm not sure I fully understand your font-family expresion: font-family: ([A-Za-z ,"]+){1}; Are those matching parens and you're looking for {1} exactly one match?
Your regex is just complicated enough that I'd switch from sed to perl -pi:
find ./ -type f -exec perl -pi -e 's/font-family:[\"A-Za-z, ]+;/font-family: Verdana;/g' {} \;
Try something like this -
sed -i 's/\(font-family:\) \(.*[^;]\)\(;.*\)/\1 Verdana\3/g'

Recursive multiline sed - remove beginning of file until pattern match

I have nested subdirectories containing html files. For each of these html files I want to delete from the top of the file until the pattern <div id="left-
This is my attempt from osx's terminal:
find . -name "*.html" -exec sed "s/.*?<div id=\"left-col/<div id=\"left-col/g" '{}' \;
I get a lot of html output in the termainal, but no files contain the substitution or are written.
There are two problems with your command. The first problem is that you aren't selecting an output location for sed. The second is that your sed script is not doing what you want it to do: the script you posted will look at each line and delete everything ON THAT LINE before the <div>. Lines without the <div> will be unaffected. You may want to try:
find . -name "*.html" -exec sed -i.BAK -n "/<div id=\"left-col/,$ p" {} \;
This will also automatically back up your files by appending .BAK to the original versions. If this is undesirable, change -i.BAK to simply -i.
You're outputting the result of the sed regex to stdout, the console, when you want to be writing it to the file.
To perform find and replace with sed, use the -i flag:
find . -name "*.html" -exec sed -i "s/.*?<div id=\"left-col/<div id=\"left-col/g" '{}' \;
Make sure you backup your files before performing this command, if possible. Otherwise you risk data-loss from a mistyped regex.
You're not storing the output of sed anywhere; that's why it's spitting out the html.