sed script extended regex - regex

I need to write multiple sed script files. I can't seem to find a way to enable extended regex from within the script. Is this possible? It isn't possible for me to use option flags because the scripts need to run on an external environment which isn't under my control.

You can try specifying the flag in the script shebang, say:
#!/bin/sed -rf
# script goes here
And then tell the admin to run the script as is (chmod a+x it first, then ./script.sed) so the shebang line is used for finding the right interpreter.
You may need to substitute /bin/sed with the right path for your environment. Unfortunately you probably won't be able to use /usr/bin/env sed -r for this (the extra -r is a problem).

I think the answer to your question is "no", but, if this is GNU sed, then you probably don't really need extended regular expressions, because GNU sed's implementation of basic regular expressions actually supports the features of EREs that true POSIX BREs don't. Admittedly, the result is incredibly, painfully backslash-heavy — ERE's s/(a|b+|cd?)/e/g becomes BRE's s/\(a\|b\+\|cd\?\)/e/g — but it works.

Related

How to scrub emails from all CSVs in a directory?

I have this regex that works fine enough for my purposes for identifying emails in CSVs within a directory using grep on Mac OS X:
grep --no-filename -E -o "\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" *
I've tried to get this working with sed so that I can replace the emails with foo#bar.baz:
sed -E -i '' -- 's/\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b/foo#bar.baz/g' *
However, I can't seem to get it to work. Admittedly, sed and regex are not my strong points. Any ideas?
The sed in OSX is broken. Replace it with GNU sed using Homebrew that will be used as a replacement for the one bundled in OSX. Use this command for installation
sudo brew install gnu-sed
and use this for substitution
sed -E -i 's/\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b/foo#bar.baz/g' *
Reference
You seem to assume that grep and sed support the same regex dialect, but that is not necessarily, or even usually, the case.
If you want a portable solution, you could easily use Perl for this, which however supports yet another regex dialect...
perl -i -p -e 's/\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b/foo#bar.baz/g' *
For a bit of an overview of regex dialects, see https://stackoverflow.com/a/11857890/874188
Your regex kind of sucks, but I understand that is sort of beside the point here.

Script to remove characters from a file name for all files in a folder

So basically, I want to write a script that would be able to remove characters from a file name until it hits a letter. Ex. if I were to run it in a folder containing files:
13. abc
0 2 d ef
1.ghi3
It would rename the files to
abc
d ef
ghi3
Thanks
Try the following:
for f in *; do
echo mv "$f" "$(sed 's/^[^[:alpha:]]*//' <<<"$f")"
done
For safety, the mv command is prefixed with echo; remove the echo to peform actual renaming.
The above is a POSIX-compliant implementation.
Note that rename is NOT a POSIX utility, so you can:
neither rely on its presence,
nor rely on it to work the same across platforms.
An overview of popular platforms with respect to rename:
Debian-based platforms such as Ubuntu have a Perl-based rename utility:
It expects Perl statements, most notably s/// to perform substitutions based on regular expressions.
This is what Avinash Raj's answer relies on - a great option if available.
Dry-run support with -n
Fedora has an entirely different utility that comes from the util-linux package:
Supports replacement of literal substrings only.
NO dry-run support.
macOS has NO rename utility at all.
Via Homebrew you can install a Perl-based one (brew install rename) whose features are a superset of what the Perl-based implementation on Debian-based platforms offers.
You may use rename command.
rename 's/^[^a-z]+//' *

Copy and Rename Multiple Files with Regular Expressions in bash

I've got a file structure that looks like:
A/
2098765.1ext
2098765.2ext
2098765.3ext
2098765.4ext
12345.1ext
12345.2ext
12345.3ext
12345.4ext
B/
2056789.1ext
2056789.2ext
2056789.3ext
2056789.4ext
54321.1ext
54321.2ext
54321.3ext
54321.4ext
I need to rename all the files that begin with 20 to start with 10; i.e., I need to rename B/2022222.1ext to B/1022222.1ext
I've seen many of the other questions regarding renaming multiple files, but couldn't seem to make it work for my case. Just to see if I can figure out what I'm doing before I actually try to do the copy/renaming I've done:
for file in "*/20?????.*"; do
echo "{$file/20/10}";
done
but all I get is
{*/20?????.*/20/10}
Can someone show me how to do this?
You just have a little bit of incorrect syntax is all:
for file in */20?????.*; do mv $file ${file/20/10}; done
Remove quotes from the argument to in. Otherwise, the filename expansion does not occur.
The $ in the substitution should go before the bracket
Here is a solution which use the find command:
find . -name '20*' | while read oldname; do echo mv "$oldname" "${oldname/20/10}"; done
This command does not actually do your bidding, it only prints out what should be done. Review the output and if you are happy, remove the echo command and run it for real.
Just wanna add to Explosion Pill's answer.
On OS X though, you must say
mv "${file}" "${file_expression}"
Or the mv command does not recognize it.
Brace expansions like :
{*/20?????.*/20/10}
can't be surrounded by quotes.
Instead, try doing (with Perl rename) :
rename 's/^10/^20/' */*.ext
You can do this using the Perl tool rename from the shell prompt. (There are other tools with the same name which may or may not be able to do this, so be careful.)
If you want to do a dry run to make sure you don't clobber any files, add the -n switch to the command.
note
If you run the following command (linux)
$ file $(readlink -f $(type -p rename))
and you have a result like
.../rename: Perl script, ASCII text executable
then this seems to be the right tool =)
This seems to be the default rename command on Ubuntu.
To make it the default on Debian and derivative like Ubuntu :
sudo update-alternatives --set rename /path/to/rename
The glob behavior of * is suppressed in double quotes. Try:
for file in */20?????.*; do
echo "${file/20/10}";
done

inotifywait - exclude regex pattern formatting

I am trying to use inotifywait to watch all .js files under my ~/js directory; how do I format my regex inside the following command?
$ inotifywait -m -r --exclude [REGEX HERE] ~/js
The regex - according to the man page, should be of POSIX extended regular expression - needs to match "all files except those that ends in .js", so these files can in turn be excluded by the --exclude option.
I've tried the (?!) lookaround thing, but it doesn't seem to work in this case. Any ideas or workarounds? Would much appreciate your help on this issue.
I've tried the (?!) thing
This thing is called negative lookahead and it is not supported by POSIX ERE.
So you have to do it the hard way, i.e. match everything that you want to exclude.
e.g.
\.(txt|xml) etc.
inotifywait has no include option and POSIX extended regular expressions don't support negation. (Answered by FailedDev)
You can patch the inotify tools to get an --include option. But you need to compile and maintain it yourself. (Answered by browndav)
A quicker workaround is using grep.
$ inotifywait -m -r ~/js | grep '\.js$'
But be aware of grep's buffering if you pipe the output to another commands. Add --line-buffered to make it work with while read. Here is an example:
$ inotifywait -m -r ~/js | grep '\.js$' --line-buffered |
while read path events file; do
echo "$events happened to $file in $path"
done
If you just want to watch already existing files, you can also use find to generate the list of files. It will not watch newly created files.
$ find ~/js -name '*.js' | xargs inotifywait -m
If all your files are in one directory, you can also use ostrokach's suggestion. In that case shell expansion is much easier than find and xargs. But again, it won't watch newly created files.
$ inotifywait -m ~/js/*.js
I posted a patch here that adds --include and --includei options that work like negations of --exclude and --excludei:
https://github.com/browndav/inotify-tools/commit/160bc09c7b8e78493e55fc9f071d0c5575496429
Obviously you'd have to rebuild inotifytools, and this is relatively untested, but hopefully it can make it in to mainline or is helpful to someone who comes across this post later.
Make sure you are quoting the regex command, if you are using shell-relevant characters (including ()).
While this is working:
inotifywait --exclude \.pyc .
this is not:
inotifywait --exclude (\.pyc|~) .
You have to quote the entire regular expression:
inotifywait --exclude '.*(\.pyc|~)' .
As of version 3.20.1, inotifywait does include the --include and --includei options.
To see them, run inotifywait --help. For some reason, they aren't documented in the manpages.
You could get most of this with --exclude '\.[^j][^s]' to ignore files unless they contain .js at some point in the filename or path. If you combine it with -r then it will work with arbitrary levels of nesting.
Only drawback is filenames like test.js.old will still be watched and all files inside a directory called example.js/ will also be watched, but this is probably somewhat unlikely.
You could probably extend this regex to fix this but personally I don't think the drawbacks are a big enough of a deal to worry about.

Whats the difference between sed -E and sed -e

I'm working on some old code and I found that I used to use
sed -E 's/findText/replaceWith/g' #findText would contain a regex
but I now try
sed -e 's/findText/replaceWith/g'
It seems to do the same thing, or does it?
I kinda remember there being a reason I done it but I can't remember and doing "man sed" doesn't help as they don't have anything about -E only -e that doesn't make much sense ether.
-e, --expression=script
Append the editing commands in script to the end of
the editing command script. script may contain more
than one newline separated command.
I thought -e meant it would match with a regex...
GNU sed version 4.2.1
From source code, -E is an undocumented option for compatibility with BSD sed.
/* Undocumented, for compatibility with BSD sed. */
case 'E':
case 'r':
if (extended_regexp_flags)
usage(4);
extended_regexp_flags = REG_EXTENDED;
break;
And from manual, -E in BSD sed is used to support extended regular expressions.
From sed's documentation:
-E
-r
--regexp-extended
Use extended regular expressions rather than basic regular expressions. Extended regexps are those that egrep accepts; they can be clearer because they usually have fewer backslashes. Historically this was a GNU extension, but the -E extension has since been added to the POSIX standard (http://austingroupbugs.net/view.php?id=528), so use -E for portability. GNU sed has accepted -E as an undocumented option for years, and *BSD seds have accepted -E for years as well, but scripts that use -E might not port to other older systems. See Extended regular expressions.
Therefore it seems that -E should be the preferred way to declare that you are going to use (E)xtended regular expressions, rather than -r.
Instead, -e just specifies that what follows is the script that you want to execute with sed (something like 's/bla/abl/g').
Always from the documentation:
Without -e or -f options, sed uses the first non-option parameter as the script, and the following non-option parameters as input files.