OS X 'RE error: invalid repetition count(s)'

OS X 'RE error: invalid repetition count(s)' - regex

I'm trying to replace the following (in it's simplest form)
{filedir_9}file.jpg
with
{filedir_7}file.jpg
Using
sed -i -e 's/(\{filedir_9\})([a-z\-\_0-9]+).jpg/\{filedir_7\}$2$3/g'
But I'm getting : RE error: invalid repetition count(s)

You may use
sed -i '' -e 's/{filedir_9}\([-a-z_0-9]\{2,\}\)\.jpg/{filedir_7}\1/g'
Note that \{ opens a limiting quantifier in a POSIX BRE pattern, you need { to match a literal {.
To create a capturing group in a BRE POSIX pattern, you need \(...\), not (...) and inside the replacement, you should use \1 to refer to Group 1 value.
In POSIX patterns, escaped sequences inside bracket expressions are not supported, you should put - at the start/end of the pattern, escaping it does not work (the \ is treated as a literal \ char).
Also, to match a dot, you need to escape the . char in the pattern, if it is unescaped, it matches any char.
Inside the replacement string, you should use \1 rather than $1 (Perl-like placeholders). Note you are using placeholders to Group 2 and 3, while your (\{filedir_9\})([a-z\-\_0-9]+).jpg pattern only attempts to capture 2 substrings, and thus there is no Group 3 (so, no point using $3 or \3, it is empty).

Related

Why are there extra parenthesis in this regex substitution?

I have code with lines that look like this:
self.request.sendall(some_string)
I want to replace them to look like this:
self.request.sendall(bytes(some_string, 'utf-8'))
This is my current sed command:
sed -i "s/\.sendall\((.*)\)/\.sendall\(bytes\(\1, 'utf-8'\)\)/g" some_file.py
I'm close, but this is the result I'm getting:
self.request.sendall(bytes((some_string), 'utf-8'))
I can't figure out where the extra open and close parenthesis are coming from in the group substitution. Does anyone see it? The escaped parenthesis are literal and need to be there. The ones around .* are to form a group for later replacement, but it's like they are becoming part of the matched text.

You escaped the wrong set of parentheses, you need to use
sed -i "s/\.sendall(\(.*\))/.sendall(bytes(\1, 'utf-8'))/g" some_file.py
Note: the regex flavor you are using is POSIX BRE, thus,
Capturing groups are set with \(...\)
Literal parentheses are defined with mere ( and ) chars with no escapes
Parentheses and a dot in the RHS, replacement, are redundant.
Pattern details:
\.sendall( - a .sendall( string
\(.*\) - Group 1 (\1): any zero or more chars
) - a ) char
.sendall(bytes(\1, 'utf-8')) - RHS, where \1 refers to the Group 1 value.

What's the best way to replace text in round brackets with the same text in square brackets?

I'm trying to do a global find/replace of strings like ('id') with ['id'] using sed on a Mac. I'm having trouble putting together the correct regex to correctly match the brackets without causing syntax errors. I'm also not necessarily interested in using sed, it just seemed like the best way to do it.
I've tried the following code:
sed -i "" "s/(['].*['])/[\1]/g" file.txt
and
sed -i "" "s/[(]['].*['][)]/[\1]/g" file.txt
How should I approach this?

Assuming there are no ' in between (' and ') you may use
sed "s/(\('[^']*'\))/[\1]/g"
The point is that the capturing groups in BRE POSIX regex patterns must be declared with \(...\), while ( and ) denote literal ( and ) symbols. [^']* matches zero or more symbols other than '.
POSIX BRE pattern details:
( - a literal ( symbol
\('[^']*'\) - a capturing group matching:
' - a single quote
[^']* - a negated bracket expression matching zero or more (*) chars other than ' and then
' - a single quote
) - a literal ) symbol.

How robust do you need the script to be? are all of the examples a single set of parentheses or are some nested? Nested may be possible to do in practice, is provably hard to do robustly in sed. Should we account for having parentheses in strings & not replacing them? If so you've got quite a rabbit hole to go down/may be impossible.
Here's a reasonably simple one that assumes the simplest case:
sed 's/(\([^)]*\))/[\1]/g' test.tmp
Explanation:
sed 's/<find>/<replace>/g'
The sed substitute command searches for a regular expression within each line and replaces it as specified. The g option indicates a 'global' replacement meaning it replaces all occurances on a line, not just the first.
(\([^)]*\))
The outside parentheses match those you're hoping to replace. The inside escaped parentheses, \( and \), create a group around the text you want to keep. [^)] matches any character that is not a ), while the following * tells us to match 0+ such characters.
[\1]
The \1 represents the contents of the first (and only) group we formed earlier, and we then place the desired square brackets around it.
Any text not matched by the regular expression remains untouched.

Seperate backreference followed by numeric literal in perl regex

I found this related question : In perl, backreference in replacement text followed by numerical literal
but it seems entirely different.
I have a regex like this one
s/([^0-9])([xy])/\1 1\2/g
^
whitespace here
But that whitespace comes up in the substitution.
How do I not get the whitespace in the substituted string without having perl confuse the backreference to \11?
For eg.
15+x+y changes to 15+ 1x+ 1y.
I want to get 15+1x+1y.

\1 is a regex atom that matches what the first capture captured. It makes no sense to use it in a replacement expression. You want $1.
$ perl -we'$_="abc"; s/(a)/\1/'
\1 better written as $1 at -e line 1.
In a string literal (including the replacement expression of a substitution), you can delimit $var using curlies: ${var}. That means you want the following:
s/([^0-9])([xy])/${1}1$2/g
The following is more efficient (although gives a different answer for xxx):
s/[^0-9]\K(?=[xy])/1/g

Just put braces around the number:
s/([^0-9])([xy])/${1}1${2}/g

regex, search and replace until a certain point

The Problem
I have a file full of lines like
convert.these.dots.to.forward.slashes/but.leave.these.alone/i.mean.it
I want to search and replace such that I get
convert/these/dots/to/forward/slashes/but.leave.these.alone/i.mean.it
The . are converted to / up until the first forward slash
The Question
How do I write a regex search and replace to solve my problem?
Attempted solution
I tried using look behind with perl, but variable length look behinds are not implemented
$ echo "convert.these.dots.to.forward.slashes/but.leave.these.alone/i.mean.it" | perl -pe 's/(?<=[^\/]*)\./\//g'
Variable length lookbehind not implemented in regex m/(?<=[^/]*)\./ at -e line 1.
Workaround
Variable length look aheads are implemented, so you can use this dirty trick
$ echo "convert.these.dots.to.forward.slashes/but.leave.these.alone/i.mean.it" | rev | perl -pe 's/\.(?=[^\/]*$)/\//g' | rev
convert/these/dots/to/forward/slashes/but.leave.these.alone/i.mean.it
Is there a more direct solution to this problem?

s/\G([^\/.]*)\./\1\//g
\G is an assertion that matches the point at the end of the previous match. This ensures that each successive match immediately follows the last.
Matches:
\G # start matching where the last match ended
([^\/.]*) # capture until you encounter a "/" or a "."
\. # the dot
Replaces with:
\1 # that interstitial text you captured
\/ # a slash
Usage:
echo "convert.these.dots.to.forward.slashes/but.leave.these.alone/i.mean.it" | perl -pe 's/\G([^\/.]*)\./\1\//g'
# yields: convert/these/dots/to/forward/slashes/but.leave.these.alone/i.mean.it
Alternatively, if you're a purist and don't want to add the captured subpattern back in — avoiding that may be more efficient, but I'm not certain — you could make use of \K to restrict the "real" match solely to the ., then simply replace with a /. \K essentially "forgets" what has been matched up to that point, so the final match ultimately returned is only what comes after the \K.
s/\G[^\/.]*\K\./\//g
Matches:
\G # start matching where the last match ended
[^\/.]* # consume chars until you encounter a "/" or a "."
\K # "forget" what has been consumed so far
\. # the dot
Thus, the entirety of the text matched for replacement is simply ".".
Replaces with:
\/ # a slash
Result is the same.

You can use substr as an lvalue and perform the substitution on it. Or transliteration, like I did below.
$ perl -pe 'substr($_,0,index($_,"/")) =~ tr#.#/#'
convert.these.dots.to.forward.slashes/but.leave.these.alone/i.mean.it
convert/these/dots/to/forward/slashes/but.leave.these.alone/i.mean.it
This finds the first instance of a slash, extracts the part of the string before it, and performs a transliteration on that part.

Vim regex backreference

I want to do this:
%s/shop_(*)/shop_\1 wp_\1/
Why doesn't shop_(*) match anything?

There's several issues here.
parens in vim regexen are not for capturing -- you need to use \( \) for captures.
* doesn't mean what you think. It means "0 or more of the previous", so your regex means "a string that contains shop_ followed by 0+ ( and then a literal ). You're looking for ., which in regex means "any character". Put together with a star as .* it means "0 or more of any character". You probably want at least one character, so use .\+ (+ means "1 or more of the previous")
Use this: %s/shop_\(.\+\)/shop_\1 wp_\1/.
Optionally end it with g after the final slash to replace for all instances on one line rather than just the first.

If I understand correctly, you want %s/shop_\(.*\)/shop_\1 wp_\1/
Escape the capturing parenthesis and use .* to match any number of any character.
(Your search is searching for "shop_" followed by any number of opening parentheses followed by a closing parenthesis)

If you would like to avoid having to escape the capture parentheses and make the regex pattern syntax closer to other implementations (e.g. PCRE), add \v (very magic!) at the start of your pattern (see :help \magic for more info):
:%s/\vshop_(*)/shop_\1 wp_\1/

#Luc if you look here: regex-info, you'll see that vim is behaving correctly. Here's a parallel from sed:
echo "123abc456" | sed 's#^([0-9]*)([abc]*)([456]*)#\3\2\1#'
sed: -e expression #1, char 35: invalid reference \3 on 's' command's RHS
whereas with the "escaped" parentheses, it works:
echo "123abc456" | sed 's#^\([0-9]*\)\([abc]*\)\([456]*\)#\3\2\1#'
456abc123
I hate to see vim maligned - especially when it's behaving correctly.
PS I tried to add this as a comment, but just couldn't get the formatting right.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

OS X 'RE error: invalid repetition count(s)' - regex

I'm trying to replace the following (in it's simplest form) {filedir_9}file.jpg with {filedir_7}file.jpg Using sed -i -e 's/(\{filedir_9\})([a-z\-\_0-9]+).jpg/\{filedir_7\}$2$3/g' But I'm getting : RE error: invalid repetition count(s)

Related

Why are there extra parenthesis in this regex substitution?

What's the best way to replace text in round brackets with the same text in square brackets?

Seperate backreference followed by numeric literal in perl regex

regex, search and replace until a certain point

Vim regex backreference

Categories

Resources