Using awk to find and replace string in files found with find

Using awk to find and replace string in files found with find - regex

I'm using find to grab a bunch of files in a directory, and then awk to try and find/replace a string in them. However, awk is just printing the entire contents of the files to stdout instead of overwriting the files. I'm a bit confused as to what syntax I should use here to accomplish my goal.
My command so far is:
find -iname '*_input.xml' -exec awk -P '/"prop_name" : "ID",/ , /}/ {print sub(/SKU/,"SKU+ProductId")} 1' {} +
THis is replacing the string SKU with SKU+ProductId but not overwriting the file.
Any one have any ideas as to what Im missing?
Cheers!
Update:
Example Content:
},
"prop_name" : "ID",
"rule" : "SKU"
},

awk writes to standard output, so you will need to redirect standard output. You can do that with find by executing an inner shell command:
find -iname '*_input.xml' \
-exec sh -c 'for file; do
./awk_script "$file" > "$file.tmp" && mv "$file.tmp" "$file"
done' _ {} +
I would move the awk command to its own file to avoid problems with quoting. You may also want to consider saving the task of overwriting the original files until you've checked the results.
You could create an awk file like this, I believe. Don't forget to chmod +x it.
#!/usr/bin/awk -f
/"prop_name" : "ID",/ , /}/ {
print sub(/SKU/,"SKU+ProductId")
} 1

GNU sed has a -i for in place editing that makes this a lot easier:
find -iname '*_input.xml' \
-exec sed -i '/"prop_name" : "ID",/ , /}/ s/SKU/SKU+ProductId/' {} +

You need to use a new-ish version of GNU awk (4.?) with the -i infile option and get rid of the print.

Related

append epoch date at the beginning of a file in bash

I have a list of 20 files, 10 of them already have 1970-01-01- at the beginning of the name and 10 does not ( the remaining ones all start with a small letter ) .
So my task was to rename those files that do not have the epoch date in the beginning with the epoch date too. Using bash, the below code works, but I could not solve it using a regular expression for example using rename. I had to extract the basename and then further mv. An elegant solution would be just use one pipe instead of two.
Works
find ./ -regex './[a-z].*' | xargs -I {} basename {} | xargs -I {} mv {} 1970-01-01-{}
Hence looking for a solution with just one xargs or -exec?

You can just use a single rename command:
rename -n 's/^([a-z])/1970-01-01-$1/' *
Assuming you're operating on all the files present in current directory.
Note that -n flag (dry run) will only show intended actions by rename command but won't really rename any files.
If you want to combine with find then use:
find . -type f -maxdepth 1 -name '[a-z]*.txt' -execdir rename -n 's/^/1970-01-01-/' {} +

I always prefer readable code over short code.
r() {
base=$(basename "$1")
dir=$(dirname "$1")
if [[ "$base" =~ ^1970-01-01- ]]
then
: "ignore, already has correct prefix"
else
echo mv "$1" "$dir/1970-01-01-$base"
fi
}
export -f r
find . -type f -exec bash -c 'r {}' \;
This also just prints out what would have been done (for testing). Remove the echo before the mv to have to real thing.
Mind that the mv will overwrite existing files (if there is a ./a/b/c and an ./a/b/1970-01-01-c already). Use option -i to mv to be save from this.

Make changes in multiple files using sed and find

I've got to go through 100+ sites and add two lines to the same file in all of them (sendmail1.php).
The boss wants me to hand copy/paste this stuff, but there's GOT to be an easier way, so I'm trying to do it with find and sed, both of which apparently I'm not using well. I just want to run a script in the dir housing the directories that have all of the sites in them.
I have this:
#!/bin/bash
read -p "Which file, sir? : " file
# find . -depth 1 -type f -name 'sendmail1.php' -exec \
sed -i 's/require\ dirname\(__FILE__\).\'\/includes\/validate.php\';/ \
a require_once dirname\(__FILE__\).\'\/includes\/carriersoft_email.php\';' $file
sed -i 's/\else\ if\($_POST[\'email\']\ \&\&\ \($_POST\'work_email\'\]\ ==\ \"\"\)\){/ \
a\t$carriersoft_sent = carriersoft_email\(\);' $file
exit 0
At the moment, I have the find commented out while trying to sort out sed here and testing the script, but I'd like to solve both.
I think I'm not escaping something necessary in the sed bits, but I keep going over it and changing it and getting different errors (sometimes "unfinished s/ statement" other times. other stuff.
The point is I have to do this:
Right below require dirname(__FILE__).'/includes/validate.php';, add this line:
require_once dirname(__FILE__).'/includes/carriersoft_email.php';
AND
Under else if($_POST['email'] && ($_POST['work_email'] == "")){, add this line:
$carriersoft_sent = carriersoft_email();
I'd like to turn this 4 hours copy/pasta nightmare into a 2 minutes lazy admin type script it and get it done job.
But my fu is not strong with the sed or the find...
As for the find, I get "path must preceed expression: 1"
I've found questions here addressing that error, but indicating that using the '' to surround the filename should resolve it, but it's not working.

Keep it simple and just use awk since awk can operate with strings, unlike sed which only works on REs with additional caveats:
find whatever |
while IFS= read -r file
do
awk '
{ print }
index($0,"require dirname(__FILE__).\047/includes/validate.php\047;") {
print "require_once dirname(__FILE__).\047/includes/carriersoft_email.php\047;"
}
index($0,"else if($_POST[\047email\047] && ($_POST[\04work_email\047] == "")){") {
print "$carriersoft_sent = carriersoft_email();"
}
' "$file" > /usr/tmp/tmp_$$ &&
mv /usr/tmp/tmp_$$ "$file"
done
With GNU awk you can use -i inplace to avoid manually specifying the tmp file name if you like, just like with sed -i.
The \047s are one way to specify a single quote inside a single-quote-delimited script.

Try this:
sed -e "s/require dirname(__FILE__).'\/includes\/validate.php';/&\nrequire_once dirname(__FILE__).'\/includes\/carriersoft_email.php'\;/" \
-e "s/else if(\$_POST\['email'\] && (\$_POST\['work_email'\] == \"\")){/&\n\$carriersoft_sent = carriersoft_email();/" \
file
Note: I haven't used the -i flag. Once you are confirmed that it works for you, you can use the -i flag. Also I have combined your two sed command into one with -e option.

I think it would be clearer if instead of s you used another fine command, a. To output one changed file, create a script (eg. script.sed) with the following content:
/require dirname(__FILE__)\.'\/includes\/validate.php';/a\
require_once dirname(__FILE__).'/includes/carriersoft_email.php';
/else if(\$_POST\['email'\] && (\$_POST\['work_email'\] == "")){/a\
$carriersoft_sent = carriersoft_email();
and run sed -f script.sed sendmail1.php.
To apply changes in all files, run:
find . -name 'sendmail1.php' -exec sed -i -f script.sed {} \;
(-i causes sed to change file in-place).
It is always advisable in such operations to do a backup and check out the exact changes after running the command. :)

Pass sed output to mv

I'm trying to batch rename text files according to a string they contain.
I used sed to isolate the pattern with \( and \) as I couldn't get this to work in grep.
sed -i '' 's/<title>\(.*\)<\/title>/&/g' *.txt | mv *.txt $sed.txt
(the text I want to use as filename is between html title tags)`
Where I wrote $sed would be the output of sed.
hope that's clear!

A simple loop in bash can accomplish this. If each file is valid HTML, meaning you have only one <title> tag in the file, you can rename them all this way:
for file in *.txt; do
mv "$file" `sed -n 's/<title>\([^<]*\)<\/title>/\1/p;' $file| sed -e 's/[ ][ ]*/_/g'`.txt
done
So, if you have files 1.txt, 2.txt and 3.txt, each with cat, dog and my hippo in their TITLE tags, you'll end up with cat.txt, dog.txt and my_hippo.txt after the above loop.
EDIT: quoted initial $file in case there are spaces in filenames; and added a second sed to convert any spaces in the <title> tag to _'s in resulting filenames. NOTE the whitespace inside the []'s in the second sed command is a literal space and tab character.

You can enclose expression in grave accent characters (`) to make it insert its output to the place you want. Try:
mv *.txt `sed -i '' 's/<title>\(.*\)<\/title>/&/g' *.txt`.txt
It is rather not flexible, but should work.
(I haven't used it in a while and cannot test it now, so I might be wrong).

Here is the command I would use:
for i in *.txt ; do
sed "s=<title>\(.*\)</title>=mv '$i' '\1'=e" $i
done
The sed substitution search for pattern in each one of your .txt files. For each file it creates string mv 'file_name' 'found_pattern'.
With the e command at the end of sed commands, this resulting string is directly executed in terminal, thus it renames your files.
Some hints:
Note the use of =s instead of /s as delimiters for sed substition: it's more readable as you already have /s in your pattern (you could use many other symbols if you don't like =). And in this way you don't have to escape the / in your pattern.
The e command for sed executes the created string.
(I'm speaking of this one below:
sed "s=<title>\(.*\)</title>=mv '$i' '\1'=e" $i
^
)
So use it with caution! I would recommand to first use the line without final e: it won't execute any mv command, but just print instead what would be executed if you were to add the e.

What I read from your question is:
you have a number of text (html) files in a directory
each file contains at least the tag <title> ... </title>
you want to extract the content (elements.text) and use it as filename
last you want to rename that file to the extracted filename
Is this correct?
So, then you need to loop through the files, e.g. with xargs or find
ls '*.txt' | xargs -i\{\} command "{}" ...
find -maxdepth 1 -type f -name '*.txt' -exec command "{}" ... \;
I always replace the xargs substitues by -i\{\} because the resulting command is compatible if I use it sometimes with find and its substitute {}.
Next the -maxdepth option will help find not to dive deeper in directory, if no subdir, you can leave it out.
command could be something very simple like echo "Testing File: {}" or a really small script if you use it with bash:
find . -name '*.txt' -exec bash -c 'CUR_FILE="{}"; echo "Working on: $CUR_FILE"; ls -l "$CUR_FILE";' \;
The big decision for your question is: how to get the text from title element.
A simple solution (suitable if opening and closing tag is on same textline) would be by grep
A more solid solution is to use a HTML Parser and navigate by DOM operation
The simple solution base on:
get the title line
remove the everything before and after title content
So do it together:
ls *.txt | xargs -i\{\} bash -c 'TITLE=$(egrep "<title>[^<]*</title>" "{}"); NEW_FNAME=$(echo "$TITLE" | sed -e "s#.*<title>\([^<]*\)</title>.*#\1#"); mv -v "{}" "$NEW_FNAME.txt"'
Same with usage of find:
find . -maxdepth 1 -type f -name '*.txt' -exec bash -c 'TITLE=$(egrep "<title>[^<]*</title>" "{}"); NEW_FNAME=$(echo "$TITLE" | sed -e "s#.*<title>\([^<]*\)</title>.*#\1#"); mv -v "{}" "$NEW_FNAME.txt"' \;
Hopefully it is what you expected.

How to use find command with sed and awk to remove duplicate IP from files

Howdie do,
I'm writing a script that will remove duplicate IP's from two files. For example,
grep -rw "123.234.567" /home/test/ips/
/home/test/ips/codingte:123.234.567
/home/test/ips/codingt2:123.234.567
Ok, so that IP is in two different files and so I need to remove the IP from the second file.
The grep gives me the file path and the IP address. My thinking: store the file path in a variable with awk and then use find to go to that file and use sed to remove the duplicate IP, so I changed my grep statement to:
grep -rw "123.234.567" . | awk -F ':' '{print $1}'
which returns:
./codingte
./codingt2
I originally tried to use the fully pathname in the find command, but that didn't work either
find -name /var/cpanel/dips/codingte -exec sed '/123.234.567/d' {} \;
So, I just did a CD in the directory and changed the find command to:
find -name 'codingt2' -exec sed '/123.234.567/d' {} \;
Which runs, but doesn't delete the IP address:
cat codingt2
123.234.567
Now, I know the issue is with the dots in the IP address. They need to be escaped, but I'm not sure how to do this. I've been reading for hours on escaping the regex, but I'm not sure how to do this with sed
Any help would be appreciated. I'm just trying to learn more about regex and using them with other linux tools such as awk and find.
I haven't written the full script yet. I'm trying to break it into pieces and then bring it together in the script.
So you know what the output should look like:
codingte
123.234.567
codingt2
The second file would just have the IP removed

cat FILE1.txt | while read IP ; do sed -i "/^${IP}$/d" FILE2.txt ; done
The command does the following:
There are two files: FILE1.txt and FILE2.txt
It will remove in FILE2.txt lines (in your case, IP addresses) found in FILE1.txt

You want grep -l which only print the filenames containing a match:
grep -lrw "123.234.567" /home/test/ips/
would print
/home/test/ips/codingte
/home/test/ips/codingt2
So, to skip the first file and work on the rest:
grep -l ... | sed 1d | while IFS= read -r filename; do
whatever with "$filename"
done

I think you're just missing the -i argument to sed to edit the files in place.
echo foo > test
find -name test -exec sed -i 's/foo/bar/' {} \;
seems to do the trick.

Regex to rename all files recursively removing everything after the character "?" commandline

I have a series of files that I would like to clean up using commandline tools available on a *nix system. The existing files are named like so.
filecopy2.txt?filename=3
filecopy4.txt?filename=33
filecopy6.txt?filename=198
filecopy8.txt?filename=188
filecopy3.txt?filename=19
filecopy5.txt?filename=1
filecopy7.txt?filename=5555
I would like them to be renamed removing all characters after and including the "?".
filecopy2.txt
filecopy4.txt
filecopy6.txt
filecopy8.txt
filecopy3.txt
filecopy5.txt
filecopy7.txt
I believe the following regex will grab the bit I want to remove from the name,
\?(.*)
I just can't figure out how to accomplish this task beyond this.

A bash command:
for file in *; do
mv $file ${file%%\?filename=*}
done

find . -depth -name '*[?]*' -exec sh -c 'for i do
mv "$i" "${i%[?]*}"; done' sh {} +
With zsh:
autoload zmv
zmv '(**/)(*)\?*' '$1$2'
Change it to:
zmv -Q '(**/)(*)\?*(D)' '$1$2'
if you want to rename dot files as well.
Note that if filenames may contain more than one ? character, both will only trim from the rightmost one.

If all files are in the same directory (ignoring .dotfiles):
$ rename -n 's/\?filename=\d+$//' -- *
If you want to rename files recursively in a directory hierarchy:
$ find . -type f -exec rename -n 's/\?filename=\d+$//' {} +
Remove -n option, to do the renaming.

I this case you can use the cut command:
echo 'filecopy2.txt?filename=3' | cut -d? -f1
example:
find . -type f -name "*\?*" -exec sh -c 'mv $1 $(echo $1 | cut -d\? -f1)' mv {} \;
You can use rename if you have it:
rename 's/\?.*$//' *

I use this after downloading a bunch of files where the URL included parameters and those parameters ended up in the file name.
This is a Bash script.
for file in *; do
mv $file ${file%%\?*};
done

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js