How to use PCRE Regex in SED on Mac OSX

How to use PCRE Regex in SED on Mac OSX - regex

I have the RegEx (\[!\[|!\[)(.*) that works on http://regexr.com/, however when I attempt to use it with sed like this -e 's/(\[!\[|!\[)\(.*\)//g' it does not work.
I have found an answer that suggests using the -r command instead of -e, however I am using Mac OSX El Capitan, and -r is not a supported command.
I have also found an answer that says use -E instead of -r on Mac OSX, but this did not work, and a commenter said they are not the same thing. It was also suggested to use grep instead of sed, but I am adding this search and replace to several others that are already using sed.
My code block looks like this and the search and replace in question is at the end of the sed... line:
# Transform the readme
if [ -f readme.md ]; then
mv readme.md readme.txt
if [ -f CHANGELOG.md ]; then
cat CHANGELOG.md >> readme.txt
rm CHANGELOG.md
fi
sed -i '' -e 's/^# \(.*\)$/=== \1 ===/' -e 's/ #* ===$/ ===/' -e 's/^## \(.*\)$/== \1 ==/' -e 's/ #* ==$/ ==/' -e 's/^### \(.*\)$/= \1 =/' -e 's/ #* =$/ =/' -e 's/\*\*//g' -e 's/(\[!\[|!\[)\(.*\)//g' readme.txt
fi
example I want to completely remove the second line that starts with [![from:
=== CMB2 Admin Extension ===
[![Build Status](https://travis-ci.org/twoelevenjay/CMB2-Admin-Extension.svg?branch=master)](https://travis-ci.org/twoelevenjay/CMB2-Admin-Extension)
Contributors: twoelevenjay
I also want it to remove lines that start with ![ like this:
![CMB2](https://plugins.trac.wordpress.org/export/HEAD/cmb2/assets/banner-1544x500.png)

I want to completely remove the second line that starts with [![:
On OSX sed this should work for you:
sed -E '/^(\[!\[|!\[)/d'
You don't need to use a substitution; just /d would suffice.

Related

How to batch rename files based off a pattern in bash or linux command line [duplicate]

Objective
Change these filenames:
F00001-0708-RG-biasliuyda
F00001-0708-CS-akgdlaul
F00001-0708-VF-hioulgigl
to these filenames:
F0001-0708-RG-biasliuyda
F0001-0708-CS-akgdlaul
F0001-0708-VF-hioulgigl
Shell Code
To test:
ls F00001-0708-*|sed 's/\(.\).\(.*\)/mv & \1\2/'
To perform:
ls F00001-0708-*|sed 's/\(.\).\(.*\)/mv & \1\2/' | sh
My Question
I don't understand the sed code. I understand what the substitution
command
$ sed 's/something/mv'
means. And I understand regular expressions somewhat. But I don't
understand what's happening here:
\(.\).\(.*\)
or here:
& \1\2/
The former, to me, just looks like it means: "a single character,
followed by a single character, followed by any length sequence of a
single character"--but surely there's more to it than that. As far as
the latter part:
& \1\2/
I have no idea.

First, I should say that the easiest way to do this is to use the
prename or rename commands.
On Ubuntu, OSX (Homebrew package rename, MacPorts package p5-file-rename), or other systems with perl rename (prename):
rename s/0000/000/ F0000*
or on systems with rename from util-linux-ng, such as RHEL:
rename 0000 000 F0000*
That's a lot more understandable than the equivalent sed command.
But as for understanding the sed command, the sed manpage is helpful. If
you run man sed and search for & (using the / command to search),
you'll find it's a special character in s/foo/bar/ replacements.
s/regexp/replacement/
Attempt to match regexp against the pattern space. If success‐
ful, replace that portion matched with replacement. The
replacement may contain the special character & to refer to that
portion of the pattern space which matched, and the special
escapes \1 through \9 to refer to the corresponding matching
sub-expressions in the regexp.
Therefore, \(.\) matches the first character, which can be referenced by \1.
Then . matches the next character, which is always 0.
Then \(.*\) matches the rest of the filename, which can be referenced by \2.
The replacement string puts it all together using & (the original
filename) and \1\2 which is every part of the filename except the 2nd
character, which was a 0.
This is a pretty cryptic way to do this, IMHO. If for
some reason the rename command was not available and you wanted to use
sed to do the rename (or perhaps you were doing something too complex
for rename?), being more explicit in your regex would make it much
more readable. Perhaps something like:
ls F00001-0708-*|sed 's/F0000\(.*\)/mv & F000\1/' | sh
Being able to see what's actually changing in the
s/search/replacement/ makes it much more readable. Also it won't keep
sucking characters out of your filename if you accidentally run it
twice or something.

you've had your sed explanation, now you can use just the shell, no need external commands
for file in F0000*
do
echo mv "$file" "${file/#F0000/F000}"
# ${file/#F0000/F000} means replace the pattern that starts at beginning of string
done

I wrote a small post with examples on batch renaming using sed couple of years ago:
http://www.guyrutenberg.com/2009/01/12/batch-renaming-using-sed/
For example:
for i in *; do
mv "$i" "`echo $i | sed "s/regex/replace_text/"`";
done
If the regex contains groups (e.g. \(subregex\) then you can use them in the replacement text as \1\,\2 etc.

The easiest way would be:
for i in F00001*; do mv "$i" "${i/F00001/F0001}"; done
or, portably,
for i in F00001*; do mv "$i" "F0001${i#F00001}"; done
This replaces the F00001 prefix in the filenames with F0001.
credits to mahesh here: http://www.debian-administration.org/articles/150

The sed command
s/\(.\).\(.*\)/mv & \1\2/
means to replace:
\(.\).\(.*\)
with:
mv & \1\2
just like a regular sed command. However, the parentheses, & and \n markers change it a little.
The search string matches (and remembers as pattern 1) the single character at the start, followed by a single character, follwed by the rest of the string (remembered as pattern 2).
In the replacement string, you can refer to these matched patterns to use them as part of the replacement. You can also refer to the whole matched portion as &.
So what that sed command is doing is creating a mv command based on the original file (for the source) and character 1 and 3 onwards, effectively removing character 2 (for the destination). It will give you a series of lines along the following format:
mv F00001-0708-RG-biasliuyda F0001-0708-RG-biasliuyda
mv abcdef acdef
and so on.

Using perl rename (a must have in the toolbox):
rename -n 's/0000/000/' F0000*
Remove -n switch when the output looks good to rename for real.
There are other tools with the same name which may or may not be able to do this, so be careful.
The rename command that is part of the util-linux package, won't.
If you run the following command (GNU)
$ rename
and you see perlexpr, then this seems to be the right tool.
If not, to make it the default (usually already the case) on Debian and derivative like Ubuntu :
$ sudo apt install rename
$ sudo update-alternatives --set rename /usr/bin/file-rename
For archlinux:
pacman -S perl-rename
For RedHat-family distros:
yum install prename
The 'prename' package is in the EPEL repository.
For Gentoo:
emerge dev-perl/rename
For *BSD:
pkg install gprename
or p5-File-Rename
For Mac users:
brew install rename
If you don't have this command with another distro, search your package manager to install it or do it manually:
cpan -i File::Rename
Old standalone version can be found here
man rename
This tool was originally written by Larry Wall, the Perl's dad.

The backslash-paren stuff means, "while matching the pattern, hold on to the stuff that matches in here." Later, on the replacement text side, you can get those remembered fragments back with "\1" (first parenthesized block), "\2" (second block), and so on.

If all you're really doing is removing the second character, regardless of what it is, you can do this:
s/.//2
but your command is building a mv command and piping it to the shell for execution.
This is no more readable than your version:
find -type f | sed -n 'h;s/.//4;x;s/^/mv /;G;s/\n/ /g;p' | sh
The fourth character is removed because find is prepending each filename with "./".

Here's what I would do:
for file in *.[Jj][Pp][Gg] ;do
echo mv -vi \"$file\" `jhead $file|
grep Date|
cut -b 16-|
sed -e 's/:/-/g' -e 's/ /_/g' -e 's/$/.jpg/g'` ;
done
Then if that looks ok, add | sh to the end. So:
for file in *.[Jj][Pp][Gg] ;do
echo mv -vi \"$file\" `jhead $file|
grep Date|
cut -b 16-|
sed -e 's/:/-/g' -e 's/ /_/g' -e 's/$/.jpg/g'` ;
done | sh

for i in *; do mv $i $(echo $i|sed 's/AAA/BBB/'); done

The parentheses capture particular strings for use by the backslashed numbers.

ls F00001-0708-*|sed 's|^F0000\(.*\)|mv & F000\1|' | bash

Some examples that work for me:
$ tree -L 1 -F .
.
├── A.Show.2020.1400MB.txt
└── Some Show S01E01 the Loreming.txt
0 directories, 2 files
## remove "1400MB" (I: ignore case) ...
$ for f in *; do mv 2>/dev/null -v "$f" "`echo $f | sed -r 's/.[0-9]{1,}mb//I'`"; done;
renamed 'A.Show.2020.1400MB.txt' -> 'A.Show.2020.txt'
## change "S01E01 the" to "S01E01 The"
## \U& : change (here: regex-selected) text to uppercase;
## note also: no need here for `\1` in that regex expression
$ for f in *; do mv 2>/dev/null "$f" "`echo $f | sed -r "s/([0-9] [a-z])/\U&/"`"; done
$ tree -L 1 -F .
.
├── A.Show.2020.txt
└── Some Show S01E01 The Loreming.txt
0 directories, 2 files
$
2>/dev/null suppresses extraneous output (warnings ...)
reference [this thread]: https://stackoverflow.com/a/2372808/1904943
change case: https://www.networkworld.com/article/3529409/converting-between-uppercase-and-lowercase-on-the-linux-command-line.html

Extract few matching strings from matching lines in file using sed

I have a file with strings similar to this:
abcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'
I have to find current_count and total_count for each line of file. I am trying below command but its not working. Please help.
grep current_count file | sed "s/.*\('current_count': u'\d+'\).*/\1/"
It is outputting the whole line but I want something like this:
'current_count': u'3', 'total_count': u'3'

It's printing the whole line because the pattern in the s command doesn't match, so no substitution happens.
sed regexes don't support \d for digits, or x+ for xx*. GNU sed has a -r option to enable extended-regex support so + will be a meta-character, but \d still doesn't work. GNU sed also allows \+ as a meta-character in basic regex mode, but that's not POSIX standard.
So anyway, this will work:
echo -e "foo\nabcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'" |
sed -nr "s/.*('current_count': u'[0-9]+').*/\1/p"
# output: 'current_count': u'2'
Notice that I skip the grep by using sed -n s///p. I could also have used /current_count/ as an address:
sed -r -e '/current_count/!d' -e "s/.*('current_count': u'[0-9]+').*/\1/"
Or with just grep printing only the matching part of the pattern, instead of the whole line:
grep -E -o "'current_count': u'[[:digit:]]+'
(or egrep instead of grep -E). I forget if grep -o is POSIX-required behaviour.

For me this looks like some sort of serialized Python data. Basically I would try to find out the origin of that data and parse it properly.
However, while being hackish, sed can also being used here:
sed "s/.*current_count': [a-z]'\([0-9]\+\).*/\1/" input.txt
sed "s/.*total_count': [a-z]'\([0-9]\+\).*/\1/" input.txt

sed string replace is giving some kind of warning?

I am using sed with grep command to replace a string. Old string is in 8 files at home location and I want to replace all of these with new string. I am using this:
#! /bin/bash
read oldstring
read newstring
sed -i -e 's/'Soldstring'/'$newstring'/' grep "$oldstring" /home/*
Now this command works but I am getting an warning:
sed: can't read grep: No such file or directory
sed: can't read oldstring: No such file or directory
Any ideas?

You probably wanted
sed -i -e "s|Soldstring|$newstring|" $(grep -l "$oldstring" /home/*)
However that form is unsafe. Better use xargs:
grep -l "$oldstring" /home/* | xargs sed -i -e "s|Soldstring|$newstring|"
And another if possible is to store on arrays:
readarray -t files < <(exec grep -l "$oldstring" /home/*)
sed -i -e "s|Soldstring|$newstring|" "${files[#]}"

You are not executing grep, you are giving it as a parameter to sed.

are you missing backticks?
sed -i -e 's/'Soldstring'/'$newstring'/' `grep "$oldstring" /home/*`

sed -i -e "s/$oldstring/$newstring/g" `grep -l "$oldstring" /home/*`

Just in order to clearly point out the various typos in your code:
#! /bin/bash
# ^
# extra space here (not really an error I think -- but unusual)
read oldstring
read newstring
sed -i -e 's/'Soldstring'/'$newstring'/' grep "$oldstring" /home/*
# ^ ^ ^
# `S` instead of `$` here | |
# here and there
# missing backticks (`)
As a side note, I suggest backticks above, but, since you are using bash, the syntax $(grep ....) is probably better than the classic Bourne Shell syntax `grep ....`. Finally, as suggested by konsolebox, "command nesting" might be unsafe, for example, in this case, if some file names contain spaces.

Regex with sed to parse archive name

I'd like to parse different kinds of Java archive with the sed command line tool.
Archives can have the followin extensions:
.jar, .war, .ear, .esb
What I'd like to get is the name without the extension, e.g. for Foobar.jar I'd like to get Foobar.
This seems fairly simple, but I cannot come up with a solution that works and is also robust.
I tried something along the lines of sed s/\.+(jar|war|ear|esb)$//, but could not make it work.

You were nearly there:
sed -E 's/\.+(jar|war|ear|esb)$//' file
Just needed to add the -E flag to sed to interpret the expression. And of course, respect the sed 's/something/new/' syntax.
Test
$ cat a
aaa.jar
bb.war
hello.ear
buuu.esb
hello.txt
$ sed -E 's/\.+(jar|war|ear|esb)$//' a
aaa
bb
hello
buuu
hello.txt

Using sed:
s='Foobar.jar'
sed -r 's/\.(jar|war|ear|esb)$//' <<< "$s"
Foobar
OR better do it in BASH itself:
echo "${s/.[jwe]ar/}"
Foobar

You need to escape the | and the () and also add ' if you do not add option like -r or -E
echo "test.jar" | sed 's/\.\(jar\|war\|ear\|esb\)$//'
test
* is also not needed, sine you normal have only one .

On traditionnal UNIX (tested with AIX/KSH)
File='Foobar.jar'
echo ${File%.*}
from a list having only your kind of file
YourList | sed 's/\....$//'
form a list of all kind of file
YouList | sed -n 's/\.[jew]ar$/p
t
s/\.esb$//p'

Renaming files with sed, escaping issues

I'm trying to write a bash script to remove spaces, underscores and dots and replace them with dashes. I also set to lowercase and remove brackets. That's the (long) second sed command, which seems to work.
The first sed call escapes the original names with spaces with '\ ' like when I tab complete, and this is the issue I think.
If I replace 'mv -i' with 'echo' I get what I think I want: the original filename escaped with backslashes and then the new name. If I paste this into the terminal it works, but with mv in the script the spaces cause problems. The escaping doesn't work.
#!/bin/bash
for a in "$#"; do
mv -i $(echo "$a" | sed -e 's/ /\\\ /g') $(echo "$a" | sed -e 's/\(.*\)/\L\1/' -e 's/_/-/g' -e 's/ /-/g' -e 's/---/--/g' -e 's/(//g' -e 's/)//g' -e 's/\[//g' -e 's/\]//g' -e 's/\./-/g' -e 's/-\([^-]*\)$/\.\1/')
done
The other solution is to put quotes around the names, but I can't work out how I would do this. I feel like I've got close, but I'm stumped.
I've also considered the 'rename' command, but you cannot do multiple operations like you can with sed.
Please point out any other issues, this is one of my first scripts. I'm not sure I got the "$#" or "$a" bits completely correct.
Cheers.
edit:
sample input filename
I am a Badly [named] (file) - PLEASE.rename_me.JPG
should become
i-am-a-badly-named-file--please-rename-me.jpg
edit2: my solution, tweaked from gniourf_gniourf's really helpful pure bash answer:
#!/bin/bash
for a in "$#"; do
b=${a,,} #lowercase
b=${b//[_[:space:]\.]/-} #subst dot,space,underscore with dash
b=${b//---/--} #remove triple dash
b=${b//[()\[\]]/} #remove brackets
if [ "${b%-*}" != "$b" ]; then #if there is a dash (prevents filename.filename)
b=${b%-*}.${b##*-} #replace final dash with a dot for extension
fi
if [ "$a" != "$b" ]; then #if there has been a change
echo '--->' "$b" #
#mv -i -- "$a" "$b" #rename
fi
done
This only fails if the file had spaces etc and no extension (e.g this BAD_filename becomes this-bad.filename. But these are media files and should have an extension, so I would have to sort them anyway.
Again, corrections and improvements welcome. I'm new at this stuff

Try doing this with rename :
rename 's/[_\s\.]/-/g' *files
from the shell prompt. It's very useful, you can put some perl code inside if needed.
You can remove the -n (dry-run mode switch) when your tests become valids.
There are other tools with the same name which may or may not be able to do this, so be careful.
If you run the following command (linux)
$ file $(readlink -f $(type -p rename))
and you have a result like
.../rename: Perl script, ASCII text executable
then this seems to be the right tool =)
If not, to make it the default (usually already the case) on Debian and derivative like Ubuntu :
$ sudo update-alternatives --set rename /path/to/rename
(replace /path/to/rename to the path of your perl's rename command.
Last but not least, this tool was originally written by Larry Wall, the Perl's dad.

Just for the records, look:
$ a='I am a Badly [named] (file) - PLEASE.rename_me.JPG'
$ # lowercase that
$ echo "${a,,}"
i am a badly [named] (file) - please.rename_me.jpg
$ # Cool! let's save that somewhere
$ b=${a,,}
$ # substitution 's/[_ ]/-/g:
$ echo "${b//[_ ]/-}"
i-am-a-badly-[named]-(file)---please.rename-me.jpg
$ # or better, yet:
$ echo "${b//[_[:space:]]/-}"
i-am-a-badly-[named]-(file)---please.rename-me.jpg
$ # Cool! let's save that somewhere
$ c=${b//[_[:space:]]/-}
$ # substitution 's/---/--/g' (??)
$ echo "${c//---/--}"
i-am-a-badly-[named]-(file)--please.rename-me.jpg
$ d=${c//---/--}
$ # substitution 's/()[]//g':
$ echo "${d//[()\[\]]/}"
i-am-a-badly-named-file--please.rename-me.jpg
$ e="${d//[()\[\]]/}"
$ # substitution 's/\./-/g':
$ echo "${e//\./-}"
i-am-a-badly-named-file--please-rename-me-jpg
$ f=${e//\./-}
$ # substitution 's/-\([^-]*\)$/\.\1/':
$ echo "${f%-*}.${f##*-}"
i-am-a-badly-named-file--please-rename-me.jpg
$ # Done!
Now, here's a 100% bash implementation of what you're trying to achieve:
#!/bin/bash
for a in "$#"; do
b=${a,,}
b=${b//[_[:space:]]/-}
b=${b//---/--}
b=${b//[()\[\]]/}
b=${b//\./-}
b=${b%-*}.${b##*-}
mv -i -- "$a" "$b"
done
yeah, done!
All this standard and known as shell parameter expansion.
Remark. For a more robust script, you could check whether a has an extension (read: a period in its name), otherwise the last substitution of the algorithm fails a little bit. For this, put the following line just below the for statement:
[[ a != *.* ]] && { echo "Oh no, file \`$a' has no extension..."; continue; }
(and isn't the *.* part of this line so cute?)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to use PCRE Regex in SED on Mac OSX - regex

I want to completely remove the second line that starts with [![: On OSX sed this should work for you: sed -E '/^(\[!\[|!\[)/d' You don't need to use a substitution; just /d would suffice.

Related

How to batch rename files based off a pattern in bash or linux command line [duplicate]

Extract few matching strings from matching lines in file using sed

sed string replace is giving some kind of warning?

Regex with sed to parse archive name

Renaming files with sed, escaping issues

Categories

Resources