How can I find and replace between ( ) characters using regex? - regex

I want to change a string just like below. But I couldn't find out the exact regex pattern.
Strings like:
Stack Overflow (1234)
Stack exchange (12)
That I want is to proceed like
Stack Overflow
Stack Exchange
I'm using Notepad++, UltraEdit etc. Also It would be very useful to try sed command too .
Thanks everybody

Try using this find:
\s+\([^)]+\)
And replace by nothing.
\s+ matches spaces.
\( matches an opening parenthesis.
[^)]+ matches any character except a closing parenthesis.
) matches a closing parenthesis.
[(*)] will match any one of (, * or ) because they are in a character class.
You can otherwise use \s+\(.*?\) as well, but it's not as safe as the regex above. In regex, the dot is the wildcard and brackets are used for capture; that's why I had to escape them with backslashes. You don't need to escape them in a character class, like for instance, you can use this: \s+[(].*?[)] though it's a bit longer!

Don't know about Notepad++ but using sed it is a simple command:
sed -i.bak 's/ *(.*$//' file
-i is for inline editing (it will save the converted file with original saved as file.bak)

Replace (Ctrl+H):
^(.+?)\s*\(\d+\)$
By:
$1
This is how it works:
Everything in Group 1 is kept, the rest is dropped.

Using awk you can do this:
awk '{sub(/ \(.+\)/,x)}1' file

Related

What's the best way to replace text in round brackets with the same text in square brackets?

I'm trying to do a global find/replace of strings like ('id') with ['id'] using sed on a Mac. I'm having trouble putting together the correct regex to correctly match the brackets without causing syntax errors. I'm also not necessarily interested in using sed, it just seemed like the best way to do it.
I've tried the following code:
sed -i "" "s/(['].*['])/[\1]/g" file.txt
and
sed -i "" "s/[(]['].*['][)]/[\1]/g" file.txt
How should I approach this?
Assuming there are no ' in between (' and ') you may use
sed "s/(\('[^']*'\))/[\1]/g"
The point is that the capturing groups in BRE POSIX regex patterns must be declared with \(...\), while ( and ) denote literal ( and ) symbols. [^']* matches zero or more symbols other than '.
POSIX BRE pattern details:
( - a literal ( symbol
\('[^']*'\) - a capturing group matching:
' - a single quote
[^']* - a negated bracket expression matching zero or more (*) chars other than ' and then
' - a single quote
) - a literal ) symbol.
How robust do you need the script to be? are all of the examples a single set of parentheses or are some nested? Nested may be possible to do in practice, is provably hard to do robustly in sed. Should we account for having parentheses in strings & not replacing them? If so you've got quite a rabbit hole to go down/may be impossible.
Here's a reasonably simple one that assumes the simplest case:
sed 's/(\([^)]*\))/[\1]/g' test.tmp
Explanation:
sed 's/<find>/<replace>/g'
The sed substitute command searches for a regular expression within each line and replaces it as specified. The g option indicates a 'global' replacement meaning it replaces all occurances on a line, not just the first.
(\([^)]*\))
The outside parentheses match those you're hoping to replace. The inside escaped parentheses, \( and \), create a group around the text you want to keep. [^)] matches any character that is not a ), while the following * tells us to match 0+ such characters.
[\1]
The \1 represents the contents of the first (and only) group we formed earlier, and we then place the desired square brackets around it.
Any text not matched by the regular expression remains untouched.

Using sed to replace string matching regex with wildcards

I have a string I'm trying manipulate with sed
js/plex.js?hash=f1c2b98&version=2.4.23"
Desired output is
js/plex.js"
This is what I'm currently trying
sed -i s'/js\/plex.js[\?.\+\"]/js\/plex.js"/'
But it is only matching the first ? and returns this output
js/plex.js"hash=f1c2b98&version=2.4.23"
I can't see why this isn't working after a few hours
This works
echo 'js/plex.js?hash=f1c2b98&version=2.4.23"' | sed s:.js?.*:.js:g
With the original Regex:
Firstly I would suggest use a different delimiter (like : in sed when using / in the regex. Secondly, the use of [] means that you are matching the characters inside the brackets (and as such it will not expand the .+ to the end of the line - you could potentially try put the + after the [])
perhaps
sed 's#\(js/plex.js?\)[^"]\+".*#\1#g'
..
\# is used as a delimiter
\(js/plex.js?\)[^"]\+".* #find this pattern and replace everything with your marked pattern \1 found
The marked pattern
In sed you can mark part of a pattern or the whole pattern buy using \( \). .
When part of a pattern is enclosed by brackets () escaped by backslashes..the pattern is marked/stored...
in my example this is my pattern without marking
js/plex.js?[^"]\+".*
but I only want sed to remember js/plex.js? and replace the whole line with only this piece of pattern js/plex.js? ..with sed the first marked pattern is known as \1, the second \2 and so forth
\(js/plex.js?\) ---> is marked as \1
Hence I replace the whole line with \1

How do I write a SED regex to extract a string delimited by another string?

I am using GNU sed version 4.2.1 and I am trying to write a non-greedy SED regex to extract a string that delimited by two other strings. This is easy when the delimiting strings are single-character:
s:{\([^}]*\)}:\1:g
In that example the string is delimited by '{' on the left and '}' on the right.
If the delimiting strings are multiple characters, say '{{{' and '}}}' I can adjust the above expression like this:
s:{{{\([^}}}]*\)}}}:\1:g
so the centre expression matches anything not containing the '}}}' closing string. But this only works if the match string does not contain '}' at all. Something like:
{{{cannot match {this broken} example}}}
will not work but
{{{can match this example}}}
does work. Of course
s:{{{\(.*\)}}}:\1:g
always works but is greedy so isn't suitable where multiple patterns occur on the same line.
I understand [^a] to mean anything except a and [^ab] to mean anything except a or b so, despite it appearing to work, I don't think [^}}}] is the correct way to exclude that sequence of 3 consecutive characters.
So how to I write a regex for SED that matches a string that is delimited bt two other strings ?
You are correct that [^}}}] doesn't work. A negated character class matches anything that is not one of the characters inside it. Repeating characters doesn't change the logic. So what you wrote is the same as [^}]. (It is easy to see why this works when there are no braces inside the expression).
In Perl and compatible regular expressions, you can use ? to make a * or + non-greedy:
s:{{{(.*?)}}}:$1:g
This will always match the first }}} after the opening {{{.
However, this is not possible in Sed. In fact, I don't think there is any way in Sed of doing this match. The only other way to do this is use advanced features like look-ahead, which Sed also does not have.
You can easily use Perl in a sed-like fashion with the -pe options, which cause it to take a single line of code from the command line (-e) and automatically loop over each line and print the result (-p).
perl -pe 's:{{{(.*?)}}}:$1:g'
The -i option for in-place editing of files is also useful, but make sure your regex is correct first!
For more information see perlrun.
With sed you could do something like:
sed -e :a -e 's/\(.*\){{{\(.*\)}}}/\1\2/ ; ta'
With:
{{{can match this example}}} {{{can match this 2nd example}}}
This gives:
can match this example can match this 2nd example
It is not lazy matching, but by replacing from right to left we can make use of sed's greediness.

Regular expression to match beginning and end of a line?

Could anyone tell me a regex that matches the beginning or end of a line? e.g. if I used sed 's/[regex]/"/g' filehere the output would be each line in quotes? I tried [\^$] and [\^\n] but neither of them seemed to work. I'm probably missing something obvious, I'm new to these
Try:
sed -e 's/^/"/' -e 's/$/"/' file
To add quotes to the start and end of every line is simply:
sed 's/.*/"&"/g'
The RE you were trying to come up with to match the start or end of each line, though, is:
sed -r 's/^|$/"/g'
Its an ERE (enable by "-r") so it will work with GNU sed but not older seds.
matthias's response is perfectly adequate, but you could also use a backreference to do this. if you're learning regular expressions, they are a handy thing to know.
here's how that would be done using a backreference:
sed 's/\(^.*$\)/"\1"/g' file
at the heart of that regex is ^.*$, which means match anything (.*) surrounded by the start of the line (^) and the end of the line ($), which effectively means that it will match the whole line every time.
putting that term inside parenthesis creates a backreference that we can refer to later on (in the replace pattern). but for sed to realize that you mean to create a backreference instead of matching literal parentheses, you have to escape them with backslashes. thus, we end up with \(^.*$\) as our search pattern.
the replace pattern is simply a double quote followed by \1, which is our backreference (refers back to the first pattern match enclosed in parentheses, hence the 1). then add your last double quote to end up with "\1".

Vim regex backreference

I want to do this:
%s/shop_(*)/shop_\1 wp_\1/
Why doesn't shop_(*) match anything?
There's several issues here.
parens in vim regexen are not for capturing -- you need to use \( \) for captures.
* doesn't mean what you think. It means "0 or more of the previous", so your regex means "a string that contains shop_ followed by 0+ ( and then a literal ). You're looking for ., which in regex means "any character". Put together with a star as .* it means "0 or more of any character". You probably want at least one character, so use .\+ (+ means "1 or more of the previous")
Use this: %s/shop_\(.\+\)/shop_\1 wp_\1/.
Optionally end it with g after the final slash to replace for all instances on one line rather than just the first.
If I understand correctly, you want %s/shop_\(.*\)/shop_\1 wp_\1/
Escape the capturing parenthesis and use .* to match any number of any character.
(Your search is searching for "shop_" followed by any number of opening parentheses followed by a closing parenthesis)
If you would like to avoid having to escape the capture parentheses and make the regex pattern syntax closer to other implementations (e.g. PCRE), add \v (very magic!) at the start of your pattern (see :help \magic for more info):
:%s/\vshop_(*)/shop_\1 wp_\1/
#Luc if you look here: regex-info, you'll see that vim is behaving correctly. Here's a parallel from sed:
echo "123abc456" | sed 's#^([0-9]*)([abc]*)([456]*)#\3\2\1#'
sed: -e expression #1, char 35: invalid reference \3 on 's' command's RHS
whereas with the "escaped" parentheses, it works:
echo "123abc456" | sed 's#^\([0-9]*\)\([abc]*\)\([456]*\)#\3\2\1#'
456abc123
I hate to see vim maligned - especially when it's behaving correctly.
PS I tried to add this as a comment, but just couldn't get the formatting right.