Whats the difference between sed -E and sed -e - regex

I'm working on some old code and I found that I used to use
sed -E 's/findText/replaceWith/g' #findText would contain a regex
but I now try
sed -e 's/findText/replaceWith/g'
It seems to do the same thing, or does it?
I kinda remember there being a reason I done it but I can't remember and doing "man sed" doesn't help as they don't have anything about -E only -e that doesn't make much sense ether.
-e, --expression=script
Append the editing commands in script to the end of
the editing command script. script may contain more
than one newline separated command.
I thought -e meant it would match with a regex...
GNU sed version 4.2.1

From source code, -E is an undocumented option for compatibility with BSD sed.
/* Undocumented, for compatibility with BSD sed. */
case 'E':
case 'r':
if (extended_regexp_flags)
usage(4);
extended_regexp_flags = REG_EXTENDED;
break;
And from manual, -E in BSD sed is used to support extended regular expressions.

From sed's documentation:
-E
-r
--regexp-extended
Use extended regular expressions rather than basic regular expressions. Extended regexps are those that egrep accepts; they can be clearer because they usually have fewer backslashes. Historically this was a GNU extension, but the -E extension has since been added to the POSIX standard (http://austingroupbugs.net/view.php?id=528), so use -E for portability. GNU sed has accepted -E as an undocumented option for years, and *BSD seds have accepted -E for years as well, but scripts that use -E might not port to other older systems. See Extended regular expressions.
Therefore it seems that -E should be the preferred way to declare that you are going to use (E)xtended regular expressions, rather than -r.
Instead, -e just specifies that what follows is the script that you want to execute with sed (something like 's/bla/abl/g').
Always from the documentation:
Without -e or -f options, sed uses the first non-option parameter as the script, and the following non-option parameters as input files.

Related

Remove new lines except when preceded by specific set of characters

How can I remove new lines using Perl and / or Sed at the bash command line but avoiding a specific set of characters?
The closest I came from this is:
perl -C -i -p -e 's/[^.:]\n//' ~/Desktop/bak2
The above code is working well on avoid removing lines ended with a dot or a colon, but its failling because when removing the correct new lines its also erasing the very last character of the string. I also would need the removed \n to be substituted by a space.
Would be great, if possible, to have this solution by Perl and also by Sed.
I've searched for a similar solution in perl or sed and I haven't found it,sorry if it does exists.
Examples:
Existing content:
Violets are blue and
Buda has great teachings.
Programming can be easy because:
Stackoverflow exists,
and the community always helps
a lot.
Desired output:
Violets are blue and Buda has great teachings.
Programming can be easy because:
Stackoverflow exists, and the community always helps a lot.
With sed
sed -e ':A;/[^.:]$/{N;bA' -e '};y/\n/ /' ~/Desktop/bak2
or gnu sed
sed -z 's/\([^.:]\)\n/\1 /g' ~/Desktop/bak2
You may preserve pre new-line match (I added "empty" lines handling):
perl -C -i -p -e 's/(^|[^.:])\n/$1/' ~/Desktop/bak2
or use positive look behind
perl -C -i -p -e 's/(?<=[^.:])\n//' ~/Desktop/bak2
perl -i pe 's/[^.:]\K\n/ /' ~/Desktop/bak2

simple SED replace

Just attempting to write a script to do a simple regex replace in php.ini, what I want to do is replace the line ;cgi.fix_pathinfo=1 with cgi.fix_pathinfo=0.
Ideally want to avoid installing any additional packages so sed seems a logical choice since it is bundled with FreeBSD. I have tried the following but doesn't seem to work:
sed 's/;cgi\.fix_pathinfo=1/cgi\.fix_pathinfo=0/' /usr/local/etc/php.ini
To change the content of a file in place with sed BSD, you can do that:
sed -i.bak -e 's/;cgi\.fix_pathinfo=1/cgi.fix_pathinfo=0/;' /usr/local/etc/php.ini
That creates a copy of the old file with a .bak extension.
Or without creating a copy:
sed -i '' -e 's/;cgi\.fix_pathinfo=1/cgi.fix_pathinfo=0/;' /usr/local/etc/php.ini
Note that in this case, a space and an empty string enclosed between quotes are mandatory. You can't simply write sed -i -e '... like with GNU sed.

How to scrub emails from all CSVs in a directory?

I have this regex that works fine enough for my purposes for identifying emails in CSVs within a directory using grep on Mac OS X:
grep --no-filename -E -o "\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" *
I've tried to get this working with sed so that I can replace the emails with foo#bar.baz:
sed -E -i '' -- 's/\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b/foo#bar.baz/g' *
However, I can't seem to get it to work. Admittedly, sed and regex are not my strong points. Any ideas?
The sed in OSX is broken. Replace it with GNU sed using Homebrew that will be used as a replacement for the one bundled in OSX. Use this command for installation
sudo brew install gnu-sed
and use this for substitution
sed -E -i 's/\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b/foo#bar.baz/g' *
Reference
You seem to assume that grep and sed support the same regex dialect, but that is not necessarily, or even usually, the case.
If you want a portable solution, you could easily use Perl for this, which however supports yet another regex dialect...
perl -i -p -e 's/\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b/foo#bar.baz/g' *
For a bit of an overview of regex dialects, see https://stackoverflow.com/a/11857890/874188
Your regex kind of sucks, but I understand that is sort of beside the point here.

sed command not working properly in solaris but working in linux

I have following string which is used further in sed command. Its working properly in Linux but NOT working in Solaris
-bash-3.00$ string="CREATESETTABLEDATABASE1.TABLE1(uid)CREATESETTABLEDATABASE1.TABLENAMEuid,cid,mid)DATABASE2.TABLENAME(hi,hello)"
In Linux box, it outputs properly as below.
echo $string | sed -e 's/.*CREATESETTABLE[^)]\+TABLENAME\(.*\)/\1/g'
uid,cid,mid)DATABASE2.TABLENAME(hi,hello)
I solaris , sed search is not working returns full string irrespective of search string match.
echo $string | sed -e 's/.*CREATESETTABLE[^)]\+TABLENAME\(.*\)/\1/g'
CREATESETTABLEDATABASE1.TABLE1(uid)CREATESETTABLEDATABASE1.TABLENAMEuid,cid,mid)DATABASE2.TABLENAME(hi,hello)
I want the same output to be printed in solaris.
I believe \+ doesn't work on older sed even on BSD it is not supported. Try this sed:
sed -e 's/.*CREATESETTABLE[^)]*TABLENAME\(.*\)/\1/g'
POSIX sed supports only BRE (basic regular expressions) in which + has no special meaning.
One important oddity (relatively speaking) about BREs is that () and {} require \-escaping in order to gain their special meaning. Those characters, and only those characters, require such escaping. The opposite is required in contemporary (ERE) expressions, \-escaping them is required to disable their special meaning.
The behaviour of an escaped (\) non-special character in a BRE is undefined by the specification.
You problem stems from the fact that \+ (along with \?, and \| within \(\)) are GNU extensions.
These BRE extensions preserve the convention of a \ prefix, but when GNU sed is given the option -r it will enable ERE (extended regular expressions) in which + has its modern meaning (equivalent to {1,}) and the requirement for the extra \ is removed. Similarly, standard BREs have no special meaning for ? (or \?, equivalent to {0,1}), this feature is also enabled with -r.
If you use the GNU sed --posix option this will disable the various GNU extensions, and your scripts should in general be more portable (though perhaps more convoluted). Well nearly, prior to GNU sed 4.2 (April 2009) the --posix option did not disable all the BRE extensions, you should make sure to use an up to date version so that non-POSIX features don't creep in.
The most portable way to achieve what you want is with {1,}:
echo $string | sed --posix -e 's/.*CREATESETTABLE[^)]\{1,\}TABLENAME\(.*\)/\1/g'

Why does find -regex not accept my regex?

I want to select some files that are matching a regular expression.
Files are for example:
4510-88aid-50048-INA.txt
4510-88nid-50048-INA.txt
xxxx-05xxx-xxxxx-INA.txt
I want all files that match this regex:
.*[\w]{4}-05(?!aid)[\w]{3}-[\w]{5}-INA\.txt
In my opinion this have to be xxxx-05xxx-xxxxx-INA.txt in the case above.
Using some tool like RegexTester, everything works perfect.
Using the bash command find -regex doesn´t seem to work for me.
My question is, why?
I can't figure it out, I am using:
find /some/path -regex ".*[\w]{4}-05(?!aid)[\w]{3}-[\w]{5}-INA\.txt" -exec echo {} \;
But nothing is printed... Any ideas?
$ uname -a
Linux debmu838 2.6.5-7.321-smp #1 SMP Mon Nov 9 14:29:56 UTC 2009 x86_64 x86_64 x86_64 GNU/Linux
bash4+ and perl
ls /some/path/**/*.txt | perl -nle 'print if /^[\w]{4}-05(?!aid)[\w]{3}-[\w]{5}-INA\.txt/'
you should have in your .profile shopt -s globstar
According to the find man page the find regex uses per default emacs regex. And according to http://www.regular-expressions.info/refflavors.html emacs is GNU ERE and that does not support look arounds.
You can try a different -regextype like #l0b0 suggested, but also the Posix flavours seems to not support this feature.
I pretty much ditto the other answers: Find's -regex switch can't emulate everything in Perl's regex, However, here's something you can try...
Take a look at the find2perl command. That program can take a typical find statement, and give you a Perl program equivalent for it. I don't believe -regex is recognized by find2perl (It's not in the standard Unix find, but only in the GNU find), but you can simply use -name, and then see the program it generates. From there, you can modify the program to use the Perl expressions you want in your regex. In the end, you'll get a small Perl script that will do the file directory find you want.
Otherwise, try using -regextype posix-extended which pretty much match most of Perl's regex expressions. You can't use look arounds, but you can probably find something that does work.
What you've got looks like a Perl regex. Try with a different -regextype, and tweak the regex accordingly:
Changes the regular expression syntax
understood by -regex and -iregex
tests which occur later on the command
line. Currently-implemented types are
emacs (this is the default),
posix-awk, posix-basic, posix-egrep
and posix-extended.
Try this:
ls ????-??aid-?????-INA.txt
Try simple script like this:
#!/bin/bash
for file in *INA.txt
do
match=$(echo "${file%INA.txt}" | sed -r 's/^\w{4}-\w{5}-\w{5}-$/found/')
[ $match == "found" ] && echo "$file"
done