Why are there extra parenthesis in this regex substitution?

Why are there extra parenthesis in this regex substitution? - regex

I have code with lines that look like this:
self.request.sendall(some_string)
I want to replace them to look like this:
self.request.sendall(bytes(some_string, 'utf-8'))
This is my current sed command:
sed -i "s/\.sendall\((.*)\)/\.sendall\(bytes\(\1, 'utf-8'\)\)/g" some_file.py
I'm close, but this is the result I'm getting:
self.request.sendall(bytes((some_string), 'utf-8'))
I can't figure out where the extra open and close parenthesis are coming from in the group substitution. Does anyone see it? The escaped parenthesis are literal and need to be there. The ones around .* are to form a group for later replacement, but it's like they are becoming part of the matched text.

You escaped the wrong set of parentheses, you need to use
sed -i "s/\.sendall(\(.*\))/.sendall(bytes(\1, 'utf-8'))/g" some_file.py
Note: the regex flavor you are using is POSIX BRE, thus,
Capturing groups are set with \(...\)
Literal parentheses are defined with mere ( and ) chars with no escapes
Parentheses and a dot in the RHS, replacement, are redundant.
Pattern details:
\.sendall( - a .sendall( string
\(.*\) - Group 1 (\1): any zero or more chars
) - a ) char
.sendall(bytes(\1, 'utf-8')) - RHS, where \1 refers to the Group 1 value.

Related

OS X 'RE error: invalid repetition count(s)'

I'm trying to replace the following (in it's simplest form)
{filedir_9}file.jpg
with
{filedir_7}file.jpg
Using
sed -i -e 's/(\{filedir_9\})([a-z\-\_0-9]+).jpg/\{filedir_7\}$2$3/g'
But I'm getting : RE error: invalid repetition count(s)

You may use
sed -i '' -e 's/{filedir_9}\([-a-z_0-9]\{2,\}\)\.jpg/{filedir_7}\1/g'
Note that \{ opens a limiting quantifier in a POSIX BRE pattern, you need { to match a literal {.
To create a capturing group in a BRE POSIX pattern, you need \(...\), not (...) and inside the replacement, you should use \1 to refer to Group 1 value.
In POSIX patterns, escaped sequences inside bracket expressions are not supported, you should put - at the start/end of the pattern, escaping it does not work (the \ is treated as a literal \ char).
Also, to match a dot, you need to escape the . char in the pattern, if it is unescaped, it matches any char.
Inside the replacement string, you should use \1 rather than $1 (Perl-like placeholders). Note you are using placeholders to Group 2 and 3, while your (\{filedir_9\})([a-z\-\_0-9]+).jpg pattern only attempts to capture 2 substrings, and thus there is no Group 3 (so, no point using $3 or \3, it is empty).

What's the best way to replace text in round brackets with the same text in square brackets?

I'm trying to do a global find/replace of strings like ('id') with ['id'] using sed on a Mac. I'm having trouble putting together the correct regex to correctly match the brackets without causing syntax errors. I'm also not necessarily interested in using sed, it just seemed like the best way to do it.
I've tried the following code:
sed -i "" "s/(['].*['])/[\1]/g" file.txt
and
sed -i "" "s/[(]['].*['][)]/[\1]/g" file.txt
How should I approach this?

Assuming there are no ' in between (' and ') you may use
sed "s/(\('[^']*'\))/[\1]/g"
The point is that the capturing groups in BRE POSIX regex patterns must be declared with \(...\), while ( and ) denote literal ( and ) symbols. [^']* matches zero or more symbols other than '.
POSIX BRE pattern details:
( - a literal ( symbol
\('[^']*'\) - a capturing group matching:
' - a single quote
[^']* - a negated bracket expression matching zero or more (*) chars other than ' and then
' - a single quote
) - a literal ) symbol.

How robust do you need the script to be? are all of the examples a single set of parentheses or are some nested? Nested may be possible to do in practice, is provably hard to do robustly in sed. Should we account for having parentheses in strings & not replacing them? If so you've got quite a rabbit hole to go down/may be impossible.
Here's a reasonably simple one that assumes the simplest case:
sed 's/(\([^)]*\))/[\1]/g' test.tmp
Explanation:
sed 's/<find>/<replace>/g'
The sed substitute command searches for a regular expression within each line and replaces it as specified. The g option indicates a 'global' replacement meaning it replaces all occurances on a line, not just the first.
(\([^)]*\))
The outside parentheses match those you're hoping to replace. The inside escaped parentheses, \( and \), create a group around the text you want to keep. [^)] matches any character that is not a ), while the following * tells us to match 0+ such characters.
[\1]
The \1 represents the contents of the first (and only) group we formed earlier, and we then place the desired square brackets around it.
Any text not matched by the regular expression remains untouched.

vi substitute in json

I have a JSON file which has a bunch of jpg references that I'm trying to replace with png. I want to match on a pattern where there is a double digit and period before the jpg, capture 1, and use it in the replacement. The issue is I only ever get pattern not found.
"plith":"img/01.jpg"},{"block_ha....
where the substitution code looks like the following
:%s/\(\d{2}\.\)+jpg/$1png/g

I tried this substitution command:
:%s/\v(\d{2}\.)jpg/\1png/g
And it replaced the line:
"plith":"img/01.jpg"},{"block_ha....
With:
"plith":"img/01.png"},{"block_ha....
If the 2 digits and the following dot can be repeated, you can apply the + quantifier to \d{2}\.:
:%s/\v(\d{2}\.)+jpg/\1png/g
In your original command:
:%s/(\d{2}.)+jpg/$1png/g
There seemed to be 3 problems:
you use non-escaped parentheses to capture the digits, but by default you need to escape them. If you don't want to, you can switch to very magic mode by adding the atom \v in your pattern.
you don't escape the ., which means that it will match any character (except a newline), instead of a literal dot
in the replacement part, you use $1 to refer to the first capturing group, but it should be \1

regex for first instance of a specific character that DOESN'T come immediately after another specific character

I have a function, translate(), takes multiple parameters. The first param is the only required and is a string, that I always wrap in single quotes, like this:
translate('hello world');
The other params are optional, but could be included like this:
translate('hello world', true, 1, 'foobar', 'etc');
And the string itself could contain escaped single quotes, like this:
translate('hello\'s world');
To the point, I now want to search through all code files for all instances of this function call, and extract just the string. To do so I've come up with the following grep, which returns everything between translate(' and either ') or ',. Almost perfect:
grep -RoPh "(?<=translate\(').*?(?='\)|'\,)" .
The problem with this though, is that if the call is something like this:
translate('hello \'world\', you\'re great!');
My grep would only return this:
hello \'world\
So I'm looking to modify this so that the part that currently looks for ') or ', instead looks for the first occurrence of ' that hasn't been escaped, i.e. doesn't immediately follow a \
Hopefully I'm making sense. Any suggestions please?

You can use this grep with PCRE regex:
grep -RoPh "\btranslate\(\s*\K'(?:[^'\\\\]*)(?:\\\\.[^'\\\\]*)*'" .
Here is a regex demo
RegEx Breakup:
\b # word boundary
translate # match literal translate
\( # match a (
\s* # match 0 or more whitespace
\K # reset the matched information
' # match starting single quote
(?: # start non-capturing group
[^'\\\\]* # match 0 or more chars that are not a backslash or single quote
) # end non-capturing group
(?: # start non-capturing group
\\\\. # match a backslash followed by char that is "escaped"
[^'\\\\]* # match 0 or more chars that are not a backslash or single quote
)* # end non-capturing group
' # match ending single quote
Here is a version without \K using look-arounds:
grep -oPhR "(?<=\btranslate\(')(?:[^'\\\\]*)(?:\\\\.[^'\\\\]*)*(?=')" .
RegEx Demo 2

I think the problem is the .*? part: the ? makes it a non-greedy pattern, meaning it'll take the shortest string that matches the pattern. In effect, you're saying, "give me the shortest string that's followed by quote+close-paren or quote+comma". In your example, "world\" is followed by a single quote and a comma, so it matches your pattern.
In these cases, I like to use something like the following reasoning:
A string is a quote, zero or more characters, and a quote: '.*'
A character is anything that isn't a quote (because a quote terminates the string): '[^']*'
Except that you can put a quote in a string by escaping it with a backslash, so a character is either "backslash followed by a quote" or, failing that, "not a quote": '(\\'|[^'])*'
Put it all together and you get
grep -RoPh "(?<=translate\(')(\\'|[^'])*(?='\)|'\,)" .

Vim regex backreference

I want to do this:
%s/shop_(*)/shop_\1 wp_\1/
Why doesn't shop_(*) match anything?

There's several issues here.
parens in vim regexen are not for capturing -- you need to use \( \) for captures.
* doesn't mean what you think. It means "0 or more of the previous", so your regex means "a string that contains shop_ followed by 0+ ( and then a literal ). You're looking for ., which in regex means "any character". Put together with a star as .* it means "0 or more of any character". You probably want at least one character, so use .\+ (+ means "1 or more of the previous")
Use this: %s/shop_\(.\+\)/shop_\1 wp_\1/.
Optionally end it with g after the final slash to replace for all instances on one line rather than just the first.

If I understand correctly, you want %s/shop_\(.*\)/shop_\1 wp_\1/
Escape the capturing parenthesis and use .* to match any number of any character.
(Your search is searching for "shop_" followed by any number of opening parentheses followed by a closing parenthesis)

If you would like to avoid having to escape the capture parentheses and make the regex pattern syntax closer to other implementations (e.g. PCRE), add \v (very magic!) at the start of your pattern (see :help \magic for more info):
:%s/\vshop_(*)/shop_\1 wp_\1/

#Luc if you look here: regex-info, you'll see that vim is behaving correctly. Here's a parallel from sed:
echo "123abc456" | sed 's#^([0-9]*)([abc]*)([456]*)#\3\2\1#'
sed: -e expression #1, char 35: invalid reference \3 on 's' command's RHS
whereas with the "escaped" parentheses, it works:
echo "123abc456" | sed 's#^\([0-9]*\)\([abc]*\)\([456]*\)#\3\2\1#'
456abc123
I hate to see vim maligned - especially when it's behaving correctly.
PS I tried to add this as a comment, but just couldn't get the formatting right.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why are there extra parenthesis in this regex substitution? - regex

Related

OS X 'RE error: invalid repetition count(s)'

What's the best way to replace text in round brackets with the same text in square brackets?

vi substitute in json

regex for first instance of a specific character that DOESN'T come immediately after another specific character

Vim regex backreference

Categories

Resources