Sed command - order of option flags matters? (-ir vs -ri) - regex

Imagine the following data stored in file data.txt
1, StringString, AnotherString 545
I want to replace "StringString" with "Strung" with the following code
sed -ir 's/String+/Strung/g' data.txt
But it won't work. This works though:
sed -ri 's/String+/Strung/g' data.txt
I don't see any reason why the order of option flags would matter. Is it a bug or is there an explanation?
Please note that I'm not looking for a workaround but rather why the order of -ir and -ri matters.
Sidenotes: The switch -i "edits the file in place" while -r allows "extended regular expression" (allowing the + operator). I'm running sed 4.2.1 Dec. 2010 on Ubuntu 12.10.

When doing -ir you are specifying that "r" should be the suffix for the backup file.
You should be able to do -i -r if you need them in that order

Did you check sed --help or man sed?
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if extension supplied).
The default operation mode is to break symbolic and hard links.
This can be changed with --follow-symlinks and --copy.

Related

simple SED replace

Just attempting to write a script to do a simple regex replace in php.ini, what I want to do is replace the line ;cgi.fix_pathinfo=1 with cgi.fix_pathinfo=0.
Ideally want to avoid installing any additional packages so sed seems a logical choice since it is bundled with FreeBSD. I have tried the following but doesn't seem to work:
sed 's/;cgi\.fix_pathinfo=1/cgi\.fix_pathinfo=0/' /usr/local/etc/php.ini
To change the content of a file in place with sed BSD, you can do that:
sed -i.bak -e 's/;cgi\.fix_pathinfo=1/cgi.fix_pathinfo=0/;' /usr/local/etc/php.ini
That creates a copy of the old file with a .bak extension.
Or without creating a copy:
sed -i '' -e 's/;cgi\.fix_pathinfo=1/cgi.fix_pathinfo=0/;' /usr/local/etc/php.ini
Note that in this case, a space and an empty string enclosed between quotes are mandatory. You can't simply write sed -i -e '... like with GNU sed.

How to scrub emails from all CSVs in a directory?

I have this regex that works fine enough for my purposes for identifying emails in CSVs within a directory using grep on Mac OS X:
grep --no-filename -E -o "\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" *
I've tried to get this working with sed so that I can replace the emails with foo#bar.baz:
sed -E -i '' -- 's/\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b/foo#bar.baz/g' *
However, I can't seem to get it to work. Admittedly, sed and regex are not my strong points. Any ideas?
The sed in OSX is broken. Replace it with GNU sed using Homebrew that will be used as a replacement for the one bundled in OSX. Use this command for installation
sudo brew install gnu-sed
and use this for substitution
sed -E -i 's/\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b/foo#bar.baz/g' *
Reference
You seem to assume that grep and sed support the same regex dialect, but that is not necessarily, or even usually, the case.
If you want a portable solution, you could easily use Perl for this, which however supports yet another regex dialect...
perl -i -p -e 's/\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b/foo#bar.baz/g' *
For a bit of an overview of regex dialects, see https://stackoverflow.com/a/11857890/874188
Your regex kind of sucks, but I understand that is sort of beside the point here.

sed command creating unwanted duplicates of file with -e extension

I am trying to do a recursive find and replace on java files in a directory using a shell script. It works, but it is hiding all the files, and creating duplicates with a -e extension
#!/bin/bash
for file in $(find . -type f -name "*.java")
do sed -i -e 's/foo/bar/g' $file
done
From my understanding, the -e is optional - but if I do not provide it I get the following error on every file it finds
sed: 1: "./DirectoryAdapter.java": invalid command code .
Any clue as it what is happening here? For reference I am on Mac OS X running El Capitan
Here is a before and after screenshot of the directory after running the script. The replaced files still exist, they are hidden?
On OSX sed (BSD) sed requires an extension after -i option. Since it is finding -e afterwards it is adding -e to each input filename. btw you don't even need -e option here.
You can pass an empty extension like this:
sed -i '' 's/foo/bar/g' $file
Or use .bak for an extension to save original file:
sed -i.bak 's/foo/bar/g' $file
The accepted answer works for OSX but causes issues if your code is run on both GNU and OSX systems since they expect -i[SUFFIX] and -i [SUFFIX] respectively.
There are probably two reasonable solutions in this case.
Don't use -i (inplace). Instead pipe to a temporary file and overwrite the original after.
use perl.
The easiest fix for this I found was to simply use perl. The syntax is almost identical:
sed -i -e 's/foo/bar/g' $file
->
perl -pi -e 's/foo/bar/g' $file

Find and repalce string that includes quotes within files in multiple directories - unix aix

So here's the scenario. I'd like to change the following value from true to false in 100's of files in an installation but can't figure out the command and been working on this for a few days now. what i have is a simple script which looks for all instances of a file and stores the results in a file. I'm using this command to find the files I need to modify:
find /directory -type f \ ( -name 'filename' \) > file_instances.txt
Now what i'd like to do is run the following command, or a variation of it, to modify the following value:
sed 's/directoryBrowsingEnabled="false"/directoryBrowsingEnabled="true"/g' $i > $i
When i tested the above command, it had blanked out the file when it attempted to replace the string but if i run the command against a single file, the change is made correctly.
Can someone please shed some light on to this?
Thank you in advance
What has semi-worked for me is the following:
You can call sed with the -i option instead of doing > $i. You can even do a backup of the old file just in case you have a problem by adding a suffix.
sed -e 'command' -i.backup myfile.txt
This will execute command inplace on myfile.txt and save the old file in myfile.txt.backup.
EDIT:
Not using -i may indeed result in blank files, this is because unix doesn't like you to read and write at the same time (it leads to a race condition).
You can convince yourself of this by some simple cat commands:
$ echo "This is a test" > test.txt
$ cat test.txt > test.txt # This will return an error cat being smart
$ cat <test.txt >test.txt # This will blank the file, cat being not that smart
On AIX you might be missing the -i option of sed. Sad. You could make a script that moves each file to a tmp file and redirects (with sed) to the original file or try using a here-construction with vi:
cat file_instances.txt | while read file; do
vi ${file}<<END >/dev/null 2>&1
:1,$ s/directoryBrowsingEnabled="false"/directoryBrowsingEnabled="true"/g
:wq
END
done

Remove duplicate filename extensions

I have thousands of files named something like filename.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz
I am using the find command like this find . -name "*.gz*" to locate these files and either use -exec or pipe to xargs and have some magic command to clean this mess, so that I end up with filename.gz
Someone please help me come up with this magic command that would remove the unneeded instances of .gz. I had tried experimenting with sed 's/\.gz//' and sed 's/(\.gz)//' but they do not seem to work (or to be more honest, I am not very familiar with sed). I do not have to use sed by the way, any solution that would help solve this problem would be welcome :-)
one way with find and awk:
find $(pwd) -name '*.gz'|awk '{n=$0;sub(/(\.gz)+$/,".gz",n);print "mv",$0,n}'|sh
Note:
I assume there is no special chars (like spaces...) in your filename. If there were, you need quote the filename in mv command.
I added a $(pwd) to get the absolute path of found name.
you can remove the ending |sh to check generated mv ... .... cmd, if it is correct.
If everything looks good, add the |sh to execute the mv
see example here:
You may use
ls a.gz.gz.gz |sed -r 's/(\.gz)+/.gz/'
or without the regex flag
ls a.gz.gz.gz |sed 's/\(\.gz\)\+/.gz/'
ls *.gz | perl -ne '/((.*?.gz).*)/; print "mv $1 $2\n"'
It will print shell commands to rename your files, it won't execute those commands. It is safe. To execute it, you can save it to file and execute, or simply pipe to shell:
ls *.gz | ... | sh
sed is great for replacing text inside files.
You can do that with bash string substitution:
for file in *.gz.gz; do
mv "${file}" "${file%%.*}.gz"
done
This might work for you (GNU sed):
echo *.gz | sed -r 's/^([^.]*)(\.gz){2,}$/mv -v & \1\2/e'
find . -name "*.gz.gz" |
while read f; do echo mv "$f" "$(sed -r 's/(\.gz)+$/.gz/' <<<"$f")"; done
This only previews the renaming (mv) command; remove the echo to perform actual renaming.
Processes matching files in the current directory tree, as in the OP (and not just files located directly in the current directory).
Limits matching to files that end in at least 2 .gz extensions (so as not to needlessly process files that end in just one).
When determining the new name with sed, makes sure that substring .gz doesn't just match anywhere in the filename, but only as part of a contiguous sequence of .gz extensions at the end of the filename.
Handles filenames with special chars. such as embedded spaces correctly (with the exception of filenames with embedded newlines.)
Using bash string substitution:
for f in *.gz.gz; do
mv "$f" "${f%%.gz.gz*}.gz"
done
This is a slight modification of jaypal's nice answer (which would fail if any of your files had a period as part of its name, such as foo.c.gz.gz). (Mine is not perfect, either) Note the use of double-quotes, which protects against filenames with "bad" characters, such as spaces or stars.
If you wish to use find to process an entire directory tree, the variant is:
find . -name \*.gz.gz | \
while read f; do
mv "$f" "${f%%.gz.gz*}.gz"
done
And if you are fussy and need to handle filenames with embedded newlines, change the while read to while IFS= read -r -d $'\0', and add a -print0 to find; see How do I use a for-each loop to iterate over file paths output by the find utility in the shell / Bash?.
But is this renaming a good idea? How was your filename.gz.gz created? gzip has guards against accidentally doing so. If you circumvent these via something like gzip -c $1 > $1.gz, buried in some script, then renaming these files will give you grief.
Another way with rename:
find . -iname '*.gz.gz' -exec rename -n 's/(\.\w+)\1+$/$1/' {} +
When happy with the results remove -n (dry-run) option.