Case insensitive sed for pattern [duplicate] - regex

I'm trying to use SED to extract text from a log file. I can do a search-and-replace without too much trouble:
sed 's/foo/bar/' mylog.txt
However, I want to make the search case-insensitive. From what I've googled, it looks like appending i to the end of the command should work:
sed 's/foo/bar/i' mylog.txt
However, this gives me an error message:
sed: 1: "s/foo/bar/i": bad flag in substitute command: 'i'
What's going wrong here, and how do I fix it?

Update: Starting with macOS Big Sur (11.0), sed now does support the I flag for case-insensitive matching, so the command in the question should now work (BSD sed doesn't reporting its version, but you can go by the date at the bottom of the man page, which should be March 27, 2017 or more recent); a simple example:
# BSD sed on macOS Big Sur and above (and GNU sed, the default on Linux)
$ sed 's/ö/#/I' <<<'FÖO'
F#O # `I` matched the uppercase Ö correctly against its lowercase counterpart
Note: I (uppercase) is the documented form of the flag, but i works as well.
Similarly, starting with macOS Big Sur (11.0) awk now is locale-aware (awk --version should report 20200816 or more recent):
# BSD awk on macOS Big Sur and above (and GNU awk, the default on Linux)
$ awk 'tolower($0)' <<<'FÖO'
föo # non-ASCII character Ö was properly lowercased
The following applies to macOS up to Catalina (10.15):
To be clear: On macOS, sed - which is the BSD implementation - does NOT support case-insensitive matching - hard to believe, but true. The formerly accepted answer, which itself shows a GNU sed command, gained that status because of the perl-based solution mentioned in the comments.
To make that Perl solution work with foreign characters as well, via UTF-8, use something like:
perl -C -Mutf8 -pe 's/öœ/oo/i' <<< "FÖŒ" # -> "Foo"
-C turns on UTF-8 support for streams and files, assuming the current locale is UTF-8-based.
-Mutf8 tells Perl to interpret the source code as UTF-8 (in this case, the string passed to -pe) - this is the shorter equivalent of the more verbose -e 'use utf8;'.Thanks, Mark Reed
(Note that using awk is not an option either, as awk on macOS (i.e., BWK awk and BSD awk) appears to be completely unaware of locales altogether - its tolower() and toupper() functions ignore foreign characters (and sub() / gsub() don't have case-insensitivity flags to begin with).)
A note on the relationship of sed and awk to the POSIX standard:
BSD sed and awk limit their functionality mostly to what the POSIX sed and
POSIX awk specs mandate, whereas their GNU counterparts implement many more extensions.

Editor's note: This solution doesn't work on macOS (out of the box), because it only applies to GNU sed, whereas macOS comes with BSD sed.
Capitalize the 'I'.
sed 's/foo/bar/I' file

Another work-around for sed on Mac OS X is to install gsedfrom MacPorts or HomeBrew and then create the alias sed='gsed'.

If you are doing pattern matching first, e.g.,
/pattern/s/xx/yy/g
then you want to put the I after the pattern:
/pattern/Is/xx/yy/g
Example:
echo Fred | sed '/fred/Is//willma/g'
returns willma; without the I, it returns the string untouched (Fred).

The sed FAQ addresses the closely related case-insensitive search. It points out that a) many versions of sed support a flag for it and b) it's awkward to do in sed, you should rather use awk or Perl.
But to do it in POSIX sed, they suggest three options (adapted for substitution here):
Convert to uppercase and store original line in hold space; this won't work for substitutions, though, as the original content will be restored before printing, so it's only good for insert or adding lines based on a case-insensitive match.
Maybe the possibilities are limited to FOO, Foo and foo. These can be covered by
s/FOO/bar/;s/[Ff]oo/bar/
To search for all possible matches, one can use bracket expressions for each character:
s/[Ff][Oo][Oo]/bar/

The Mac version of sed seems a bit limited. One way to work around this is to use a linux container (via Docker) which has a useable version of sed:
cat your_file.txt | docker run -i busybox /bin/sed -r 's/[0-9]{4}/****/Ig'

Use following to replace all occurrences:
sed 's/foo/bar/gI' mylog.txt

I had a similar need, and came up with this:
this command to simply find all the files:
grep -i -l -r foo ./*
this one to exclude this_shell.sh (in case you put the command in a script called this_shell.sh), tee the output to the console to see what happened, and then use sed on each file name found to replace the text foo with bar:
grep -i -l -r --exclude "this_shell.sh" foo ./* | tee /dev/fd/2 | while read -r x; do sed -b -i 's/foo/bar/gi' "$x"; done
I chose this method, as I didn't like having all the timestamps changed for files not modified. feeding the grep result allows only the files with target text to be looked at (thus likely may improve performance / speed as well)
be sure to backup your files & test before using. May not work in some environments for files with embedded spaces. (?)

Following should be fine:
sed -i 's/foo/bar/gi' mylog.txt

Related

Cross platform regex substring match for git

Sorry for yet another pattern matching question, but I'm struggling to to find a tool that will do a regex in a git hook. It needs to work on Windows, Mac and Linux.
This gnu grep works for Windows and Linux, but not Mac (because bsd)
echo "feature/EOPP-234-foo" | grep -Po -e '[A-Z]{4}-\d{1,5}'
This works for Mac and Linux, but not windows (because <git>\usr\bin\egrep don't seem to work)
echo "feature/EOPP-234-foo" | egrep -o '[A-Z]{4}-\d{1,5}'
sed might be the most common tool, but stuffed if I can get it to match:
echo "feature/EOPP-234-foo" | sed -n 's/^.*\([A-Z]{4}\-\d{1,5}\).*$/\1/p'
I've even tried bash matching with no luck
[[ "feature/EOPP-234-foo" =~ ([A-Z]{4}-\d{1,5}) ]] && echo ${BASH_REMATCH[1]}
Any ideas?
When you need to make POSIX tools run on Windows, you need to remember to use double quotation marks around your commands, not single quotes.
Also, you can use a common POSIX ERE compliant regex across all these environments. This means \d must be replaced with [0-9] or [[:digit:]] as \d is a PCRE only compliant construct.
You can use
grep -Eo "[A-Z]{4}-[0-9]{1,5}"
grep -Eo "[A-Z]{4}-[[:digit:]]{1,5}"

Bash on macOS: How replace a path in a file with another string?

For integration tests, I have output that contains full file paths. I want to have my test script replace the user-specific start of the file path (e.g. /Users/uli/) with a generic word (USER_DIR) so that I can compare the files.
The problem, of course, are the slashes in the path. I tried the solutions given here and here, but they don't work for me:
#!/bin/bash
old_path="/Users/uli/"
new_path="USERDIR"
sed -i "s#$old_path#$new_path#g" /Users/uli/Desktop/replacetarget.txt
I get the error
sed: 1: "/Users/uli/Desktop/repl ...": invalid command code u
This is the version of sed that comes with macOS 10.14.6 (it has no --version option and is installed in /usr/bin/, so no idea what exact version).
Update:
I also tried
#!/bin/bash
old_path="/Users/uli/"
old_path=${old_path//\//\\\/}
new_path="USERDIR"
regex="s/$old_path/$new_path/g"
echo $old_path
echo $regex
sed -i $regex /Users/uli/Desktop/replacetarget.txt
But I get the same error. What am I doing wrong?
BSD sed requires an argument following -i (the empty string '' indicates no backup, similar to argumentless -i in GNU sed). As a result, your script is being treated as the backup-file extention, and your input file as the script.
old_path="/Users/uli/"
new_path="USERDIR"
sed -i '' "s#$old_path#$new_path#g" /Users/uli/Desktop/replacetarget.txt
However, sed is a stream editor, based on the file editor ed, so using -i is an indication you are using the wrong tool to begin with. Just use ed.
old_path="/Users/uli/"
new_path="USERDIR"
printf 's#%s#%s#g\nwq\n' "$old_path" "$new_path" | ed /Users/uli/Desktop/replacetarget.txt
Obligatory warning: neither editor is parameterized as such; you are simpling generating the script dynamically, which means it's your responsibility to ensure that the resulting script is valid. (For example, if either parameter contains a ;, it had better be escaped to prevent (s)ed from seeing it as a command separator.)

grep command with a lookahead pattern does not select anything

I was trying to use the following grep command:
grep '(.*)(?=(png|html|jpg|js|css)(?:\s*))(png|html|jpg|js|css.*\s)' file
File contains the following:
http://manage.bostonglobe.com/GiftTheGlobe/LandingPage.html
https://manage.bostonglobe.com/cs/mc/login.aspx?p1=BGFooter
https://www.bostonglobe.com/bgcs
/newsletters?p1=BGFooter_Newsletters
https://bostonglobe.custhelp.com/app/home?p1=BGFooter
https://bostonglobe.custhelp.com/app/answers/list?p1=BGFooter
/tools/help/stafflist?p1=BGFooter
https://www.bostonglobemedia.com/
https://manage.bostonglobe.com/Order/newspaper/Newspaper.aspx
https://www.facebook.com/globe
https://twitter.com/#!/BostonGlobe
https://plus.google.com/108227564341535363126/about
https://epaper.bostonglobe.com/launch.aspx?pbid=2c60291d-c20c-4780-9829- b3d9a12687cf
http://nieonline.com/bostonglobe/
https://secure.pqarchiver.com/boston-sub/no_default.html?ss=1&url=%2Fboston-sub%2Fadvancedsearch.html
/tools/help/privacy?p1=BGFooter
/tools/help/terms-service?p1=BGFooter
/termsofpurchase?p1=BGFooter
https://www.bostonglobemedia.com/careers
/css/globe-print.css?v=19256I1935
//meter.bostonglobe.com/css/style.css
/css/globe-print.css?v=19256I1935
//cdn.blueconic.net/bostonglobemedia.js
/js/lib/rwd-images.js,lib/respond.min.js,lib/modernizr.custom.min.js,globe- define.js,globe-controller.js?v=19256I1935
data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///ywAAAAAAQABAAACAUwAOw==
/js/lib/jquery.js,lib/lo-dash-custom-2.4.1.js,lib/a9.js,lib/pb.js,dist/ad- init.js,globe-newsletter.js,globe-profile-page.js,dist/globe-topic-nav.js,dist/rakuten.js?v=19256I1935
//dc8xl0ndzn2cb.cloudfront.net/js/bostonglobe/v0/keywee.min.js
For some reason it doesn't select anything from that file, Ive tried different flags but cant seem to figure out whats wrong
You are using a PCRE regex with the POSIX BRE engine that is default grep engine.
To make those patterns work, you should use -P option (available in GNU grep):
grep -P 'YOUR_PCRE_PATTERN'
^^
To develop and test PCRE patterns, a well-known regex101.com is usually recommended.
Note that on Mac OS, you may install GNU grep via brew.

SED command matches regex but does not substitute

I am working on building a .sed file to start scripting the setup of multiple apache servers. I am trying to get sed to match the default webmaster email addresses in the .conf file which works great with this egrep. However when I use sed to try and so a substitute search and replace i get no errors back but it also does not do any substituting. I test this by running the same egrep command again.
egrep -o '\b[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+(\.[A-Za-z]{2,4})?\b' /home/test/httpd.conf
returns
admin#your-domain.com
root#localhost
webmaster#dummy-host.example.com
The sed command I'm trying to use is
sed -i '/ServerAdmin/ s/\b[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+(\.[A-Za-z]{2,4})?\b/MY_ADMIN_ADDRESS#gmail.com/g' /home/test/httpd.conf
After running I try and verify the results by running the egrep again and it returns the same 3 email address indicating nothing was replaced.
Don't assume that any two tools use the same regular expression syntax. If you're going to be doing replacements with sed, use sed to test - not egrep. It's easy to use sed as if it were a grep command: sed -ne '/pattern/p'.
sed must be told that it needs to use extended regular expressions using the -r option then making the sed command as follows.
sed -ir '/ServerAdmin/ s/\b[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+(\.[A-Za-z]{2,4})?\b/MY_ADMIN_ADDRESS#gmail.com/g' /home/test/httpd.conf
Much thanks to Kent for pointing out that the address it was missing wasnt following a ServerName

grep with regexp: whitespace doesn't match unless I add an assertion

GNU grep 2.5.4 on bash 4.1.5(1) on Ubuntu 10.04
This matches
$ echo "this is a line" | grep 'a[[:space:]]\+line'
this is a line
But this doesn't
$ echo "this is a line" | grep 'a\s\+line'
But this matches too
$ echo "this is a line" | grep 'a\s\+\bline'
this is a line
I don't understand why #2 does not match (whereas # 1 does) and #3 also shows a match. Whats the difference here?
Take a look at your grep manpage. Perl added a lot of regular expression extensions that weren't in the original specification. However, because they proved so useful, many programs adopted them.
Unfortunately, grep is sometimes stuck in the past because you want to make sure your grep command remains compatible with older versions of grep.
Some systems have egrep with some extensions. Others allow you to use grep -E to get them. Still others have a grep -P that allows you to use Perl extensions. I believe Linux systems' grep command can use the -P extension which is not available in most Unix systems unless someone has replaced the grep with the GNU version. Newer versions of Mac OS X also support the -P switch, but not older versions.
grep doesn't support the complete set of regular expressions, so try using -P to enable perl regular expressions. You don't need to escape the + i.e.
echo "this is a line" | grep -P 'a\s+line'