Using a positive lookahead to remove the middle of a string - regex

I'm currently attempting to remove text in the middle of this string:
RenameMe_12345_12365_130706T234502.txt
using the following regex:
^[a-zA-Z]+(?=_[0-9]+_[0-9]+).+$
in an attempt to return:
RenameMe_130706T234502.txt
but the regex returns the entire string without excluding the middle:
RenameMe_12345_12365_130706T234502.txt
Am I using the positive lookahead incorrectly, or am I approaching the problem incorrectly? Can positive lookaheads not be used this way?

replace this regex:
_.*_
with
_
example with sed tool:
kent$ echo RenameMe_12345_12365_130706T234502.txt|sed 's/_.*_/_/'
RenameMe_130706T234502.txt
You could do it with your own tool/programming language.
EDIT for OP's comment:
#CodingUnderDuress _.*_ is a single regex (BRE). It uses the .* greedy character to achieve your goal.
If you don't want to do the substitution, just with regex to match the parts you need, you could do:
(^[^_]*|_[^_]*$)
test with grep: (-E means ERE)
kent$ echo "RenameMe_12345_12365_130706T234502.txt"|grep -Eo '(^[^_]*|_[^_]*$)'
RenameMe
_130706T234502.txt
You can of course use look-behind/ahead, if you really love them. then you need PCRE. And I don't see why we need use look-around here for your requirement.

You can replace the contents of this by a empty character
_(\w+(?=_))*
Working
[1] Match the character `_`
[2] followed a set of word characters
[3] I have used positive look-ahead `?=_` to make sure the last `_` is not missed out
[4] Match the above 0 or more times

Use this
(?<=[^_])_\w+_(?=[^_]+)
to match the part you want to remove.

Related

How can I get the count of a capture group then replace the characters with a specific character?

How can I get the count of a capture group and replace it with the same number of characters that I specify?
For example here is a string...
123456789ABCD00001DDD
My regex with capture groups is as follows...
^([123456789]{9})([ABCD]{1,4})([0]{1,5})([0-9]{1,5})([D]*)$
When I use something like Notepad++ I want to find the above and replace it with something like...
\1\2 \4\5
Making the end results look like...
123456789ABCD 1DDD
Example located at https://regex101.com/r/fykEnn/1
You may use this regex with \G and a positive lookahead:
(?:^([1-9]{9}[ABCD]{1,4})(?=0{1,5}\d{1,5}D*$)|\G)0
RegEx Demo
\G asserts position at the end of the previous match or the start of the string for the first match.
In bash, using sed:
$ echo 123456789ABCD00001DDD | sed -re 's/([123456789]{9})([ABCD]{1,4})([0]{1,5})([0-9]{1,5})([D]*)$/\1\2 \4\5 /g'
123456789ABCD 1DDD
I think you should to use both lookbehind & lookahead assertion with a single digit ...
example ...
Find what: (?<=[A-Z])\d{4}(?=\d[A-Z])
Replace with: a space

Regex AND search inside block which enclosed by something delimiter

I want to regex AND search (?=)(?=) inside block which enclosed by something delimiter such as #
In following sample regex, what I expected is, cat to ugly matches to the pattern inside # cat B to before # cat C.
But the regex match to nothing.
regex
^#(?=[\s\S]*(cat))(?=[\s\S]*(ugly))^#
text
# cat A
the cat is
very cute.
# cat B
the cat is
very ugly.
# cat C
the cat is
very good.
#
You can test the regex on https://regexr.com/
In your pattern ^#(?=[\s\S]*(cat))(?=[\s\S]*(ugly))^# you use match a # from the start of the string ^#, followed by 2 positive lookaheads and then again match ^#. That is why you don't get a match.
To get a more exact match, you could start the pattern with ^# cat B
If you want to use lookaheads, you might use 2 capturing groups in the positive lookahead. If you want to search for cat and ugly as whole words you might use word boundaries \b.
The (?s) is a modifier that enables the dot matching a newline for which you might also use /s as a flag instead.
(?s)(?=^# cat B.*?(cat).*?(ugly).*?^# cat C
Regex demo
But it might be easier to not use the lookahead and match instead:
(?s)^# cat B.*?(cat).*?(ugly).*?^# cat C$
Php demo
This RegEx might help you to design/match your target words by bounding them using \n.
((.+)(cat)(.+))\n((.+)(ugly)(.+))
Just to be simple, it creates four groups for each target keywords: 🐈 and ugly, where your target keywords can be called using $3 and $7:
You could additionally bound it with start ^ and end $, if you wish.
This expression only works when your target keywords are in the middle of both lines.

How to change this regex without lookbehind check

It should match substring between 0 or more spaces. C++11 does not have look behind. This is possible to rewrite this regex ? Or do I need to install boost and use "full" regex powerful?
The regex: ^\s*(.*(?<! ))\s*$
The image:
UPDATE: match in backreference!
You can make the inner * lazy by using .*? instead, which makes it match as few characters as possible while still giving you a match. This allows the last \s* to consume all the spaces:
>>> re.match(r'^\s*(.*?)\s*$', ' asdf asdf ').group(1)
'asdf asdf'

Regex to get all character to the right of first space?

I am trying to craft a regular expression that will match all characters after (but not including) the first space in a string.
Input text:
foo bar bacon
Desired match:
bar bacon
The closest thing I've found so far is:
\s(.*)
However, this matches the first space in addition to "bar bacon", which is undesirable. Any help is appreciated.
You can use a positive lookbehind:
(?<=\s).*
(demo)
Although it looks like you've already put a capturing group around .* in your current regex, so you could just try grabbing that.
I'd prefer to use [[:blank:]] for it as it doesn't match newlines just in case we're targetting mutli's. And it's also compatible to those not supporting \s.
(?<=[[:blank:]]).*
You don't need look behind.
my $str = 'now is the time';
# Non-greedily match up to the first space, and then get everything after in a group.
$str =~ /^.*? +(.+)/;
my $right_of_space = $1; # Keep what is in the group in parens
print "[$right_of_space]\n";
You can also try this
(?s)(?<=\S*\s+).*
or
(?s)\S*\s+(.*)//group 1 has your match
With (?s) . would also match newlines

Vim regex backreference

I want to do this:
%s/shop_(*)/shop_\1 wp_\1/
Why doesn't shop_(*) match anything?
There's several issues here.
parens in vim regexen are not for capturing -- you need to use \( \) for captures.
* doesn't mean what you think. It means "0 or more of the previous", so your regex means "a string that contains shop_ followed by 0+ ( and then a literal ). You're looking for ., which in regex means "any character". Put together with a star as .* it means "0 or more of any character". You probably want at least one character, so use .\+ (+ means "1 or more of the previous")
Use this: %s/shop_\(.\+\)/shop_\1 wp_\1/.
Optionally end it with g after the final slash to replace for all instances on one line rather than just the first.
If I understand correctly, you want %s/shop_\(.*\)/shop_\1 wp_\1/
Escape the capturing parenthesis and use .* to match any number of any character.
(Your search is searching for "shop_" followed by any number of opening parentheses followed by a closing parenthesis)
If you would like to avoid having to escape the capture parentheses and make the regex pattern syntax closer to other implementations (e.g. PCRE), add \v (very magic!) at the start of your pattern (see :help \magic for more info):
:%s/\vshop_(*)/shop_\1 wp_\1/
#Luc if you look here: regex-info, you'll see that vim is behaving correctly. Here's a parallel from sed:
echo "123abc456" | sed 's#^([0-9]*)([abc]*)([456]*)#\3\2\1#'
sed: -e expression #1, char 35: invalid reference \3 on 's' command's RHS
whereas with the "escaped" parentheses, it works:
echo "123abc456" | sed 's#^\([0-9]*\)\([abc]*\)\([456]*\)#\3\2\1#'
456abc123
I hate to see vim maligned - especially when it's behaving correctly.
PS I tried to add this as a comment, but just couldn't get the formatting right.