Extend string between strings - regex

startABCend
->
startABC123end
I seek to capture text between start and end, and extend it, as shown. I tried:
find = start.*end, replace = \1 123: will capture start and end and between, but replace them all
find = (?s)(?<=start).+?(?=end), replace = \1 123: will keep start and end but replace captured
How to accomplish this with regex in N++?
The exact use case is
func_name(a, b=1) -> func_name(a, b=1, c=2)
# can also be
func_name(g=5, k=7) -> func_name(g=5, k=7, c=2)
# so capture between `func_name(` and `)` and extend with `, c=2`

You could do this without capture groups, and match what you want to replace.
\bstart\K.*?(?=end\b)
The pattern matches:
\bstart Match start preceded by a word boundary
\K Forget what is matched until now
.*? Match as least chars as possible
(?=end\b) Positive lookahead, assert end to the right followed by a word boundary
In the replacement use the full match followed by 123
$&123
For the updated example data, you could match the format of key with an optional =value, and optionally repeat that asserting a ) to the right.
\bfunc_name\([^\s,=]+(?:=[^\s,=]+)?(?:,\h*[^\s,=]+(?:=[^\s,=]+)?)*(?=\))
Regex demo
And replace with
$&, c=2

Your example target does not include the white space you have in your replace string. To accomplish using the group AND append numbers you can use brackets.
Basically:
Find: (?<=start)(.+?)(?=end)
Replace: (\1)123
or just
Find: start(.+?)end
Replace: start(\1)123end

Related

Using regex for repeating text in Notepad++

I have links like this:
https://d2ynliea65eb6o.cloudfront.net/6100052500-STXMLOPEN/sub_1.m3u8
https://d2ynliea65eb6o.cloudfront.net/6100052499-STXMLOPEN/sub_1.m3u8
https://d2ynliea65eb6o.cloudfront.net/6100052498-STXMLOPEN/sub_1.m3u8
How can I use a regex in Notepad++ to make them like this:
https://d2ynliea65eb6o.cloudfront.net/6100052500-STXMLOPEN/6100052500-STXMLOPENsub_1.m3u8
https://d2ynliea65eb6o.cloudfront.net/6100052499-STXMLOPEN/6100052499-STXMLOPENsub_1.m3u8
https://d2ynliea65eb6o.cloudfront.net/6100052498-STXMLOPEN/6100052498-STXMLOPENsub_1.m3u8
I want to repeat what is between net/ and /sub for each link.
I am assuming you want to repeat the characters before the last /.
You may try this regex:
Regex
([^/\n]+)/(?=[^/\n]+$)
Substitution
$1/$1
([^/\n]+) // any consecutive non-slash and non-linebreak characters, and capture them in group 1
/ // a slash
(?=[^/\n]+$) // lookahead, there must be non-slash and non-linebreak characters followed by the end of a line ahead
Check the proof
If you want to actually search for and repeat what's in between "net/" and "/sub" and repeat that then you can use:
(net/(.*?))/sub
replace with:
$1/$2sub
the second () ie (.*?) will create group $2 which will contain the variable text that occurs between net/ and /sub
the first (), which DOES NOT contain the /sub will contain the text up to, but not including the "/sub" text and put it into $1. If you want to include the "/sub" you would put the ")" on the right side of "/sub".
then $1/$2sub will be the concatenation of $1 with a "/" then $2 then "sub" then the remainder of the text

Regex notepad++ and groups

I have the following data in my file:
234xt_
yad42_
23ft3_
45gdw_
...
Where the _ means a space.
Using Notepad++ I want to rewrite it to be:
'234xt',
'yad42',
'23ft3',
'45gdw'
I am using the following regex in the "Find what" (^\w+)\s*\n
And in the "Replace with" field $0,
But it is not working as expected.
You may use
^(\w+) $
or
^(\w+)\h$
And replace with '$1',.
^ will match the start of a line, (\w+) will place one or more letters, digits or underscores into Group 1 (that you may access via $1 or \1 backreference in the replacement pattern), and then a space or \h will match a space or any horizontal whitespace, and then $ will assert the position at the end of the line.
If the (white)spaces can go missing add the appropriate quantifier after the space or \h: \h* will match 0 or more whitespaces and \h? will match 1 or 0.
Settings & demo:
You should use \1 instead of $0 see the example in the docs.

How to replace text without changing quoted string with regex

I want to replace
$this->input->post("product_name");
with
$post_data["product_name"];
I want to use notepad++ regex, but I couldn't find proper solution
In find --> $this->input->post("[\*w\]");
In replace --> $post_data["$1"];
but its not working
The $this->input->post("[\*w\]"); pattern does not work because:
$ is a special char matching the end of a line, you need to use \$ to match it as a literal char
[\*w'\] is a malformed pattern as there is no matching unescaped ] for the [ that opens a character class. Also, w just matches w, not any letter, digit or underscore, \w does that.
You may use
Find What: \$this->input->post\("(\w*)"\);
Replace With: $post_data["$1"];
If there can be any char inside double quotes use .*? instead of \w*:
Find What: \$this->input->post\("(.*?)"\);
Regulex graph:
NPP test:
Use this pattern to match desired text \$this->input->post\(("[^"]+")\);
And replace it with pattern \$post_data\[\1\]
Explanation:
\$this->input->post - matach $this->input->post literally
\(("[^"]+")\); - match (literally, then match double quates and everything between them with "[^"]+" and store inside first capturing group, then match ); literally
To replace
$this->input->post("product_name");
by
$post_data["product_name"];
do replace, with regex activated
this->input->post\("(.*)"\);
by
post_data\["\1"\];
The \x with x a number, corresponds to the x-th match catched with the parenthesis. Here we catch any character inside this->input->post(XXXX);
Don't forget to escape special character with \.
Your special characters were []()

Replace whitespaces between specific strings

I'm trying to replace whitespaces with underscores in certain parts of my html-document with Notepad++.
I can identify the area to search for the whitespaces in the following way:
-Begins with: src="video/
-Ends with: mp4
For example I might have a line like this:
<video class="play" src="video/my file name with empty spaces.mp4">
and I would like to change it to be like this:
<video class="play" src="video/my_file_name_with_empty_spaces.mp4">
Tested in N++
Search: (?:src="video|(?<!^)\G)(?:(?!mp4).)*?\K\s+
Replace: _
On the demo, see the substitutions at the bottom.
Explanation
(?:src="video|(?<!^)\G) matches the delimiter src="video, or \G the position following the previous match as long as it is not at the beginning of the string (?<!^) where \G can also match
(?:(?!mp4).) matches one character that is not followed by mp4
*? lazily matches such characters, up to...
\s a space character (our match which we replace with _)
before the space, the \K tells the engine to drop what was matched so far from the final match it returns

Multiline selection of blocks with ID at the end of each block with regular expression

I have regular expression:
BEGIN\s+\[([\s\S]*?)END\s+ID=(.*)\]
which select multiline text and ID from text below. I would like to select only IDs with prefix X_, but if I change ID=(.*) to ID=(X_.*) begin is selected from second pair not from third as I need. Could someone help me to get correct expression please?
text example:
BEGIN [
text a
END ID=X_1]
BEGIN [
text b
text c
END ID=Y_1]
text aaa
text bbb
BEGIN [
text d
text e
END ID=X_2]
text xxx
BEGIN [
text bbb
END ID=X_3]
It isn't the .* that's gobbling everything up as people keep saying, it's the [\s\S]*?. .* can't do it because (as the OP said) the dot doesn't match newlines.
When the END\s+ID=(X_.*)\] part of your regex fails to match the last line of the second block, you're expecting it to abandon that block and start over with the third one. That's what it have to do to make the shortest match.
In reality, it backtracks to the beginning of the line and lets [\s\S]*? consume it instead. And it keeps on consuming until it finds a place where END\s+ID=(X_.*)\] can match, which happens to be the last line of the third block.
The following regex avoids that problem by matching line by line, checking each one to see if it starts with END. This effectively confines the match to one block at a time.
(?m)^BEGIN\s+\[[\r\n]+((?:(?!END).*[\r\n]+)*)END\s+ID=(X_.*)\]
Note that I used ^ to anchor each match to the beginning of a line, so I used (?m) to turn on multiline mode. But I did not--and you should not--turn on single-line/DOTALL mode.
Assuming there aren't any newlines inside a block and the BEGIN/END statements are the first non-space of their line, I'd write the regex like this (Perl notation; change the delimiters and remove comments, whitespaces and the /x modifier if you use a different engine)
m{
\n \s* BEGIN \s+ \[ # match the beginning
( (?!\n\s*\n) .)*? # match anything that isn't an empty line
# checking with a negative look-ahead (?!PATTERN)
\n \s* END \s+ ID=X_[^\]]* \] # the ID may not contain "]"
}sx # /x: use extended syntax, /s: "." matches newlines
If the content may be anything, it might be best to create a list of all blocks, and then grep through them. This regex matches any block:
m{ (
BEGIN \s+ \[
.*? # non-greedy matching is important here
END \s+ ID=[^\]]* \] # greedy matching is safe here
) }xs
(add newlines if wanted)
Then only keep those matches that match this regex:
/ID = X_[^\]]* \] $/x # anchor at end of line
If we don't do this, backtracking may prevent a correct match ([\s\S]*? can contain END ID=X_). Your regex would put anything inside the blocks until it sees a X_.*.
So using BEGIN\s+\[([/s/S]*?)END\s+ID=(.*?)\] — note the extra question mark — one match would be:
BEGIN [
text b
text c
END ID=Y_1]
text aaa
text bbb
BEGIN [
text d
text e
END ID=X_2]
…instead of failing at the Y_. A greedy match (your unchanged regex) should result in the whole file being matched: Your (.*) eats up all characters (until the end of file) and then goes back until it finds a ].
EDIT:
Should you be using perls regex engine, we can use the (*FAIL) verb:
/BEGIN\s+\[(.*?)END\s+ID=(X_[^\]]*|(*FAIL))\]/s
"Either have an ID starting with X_ or the match fails". However, this does not solve the problem with END ID=X_1]-like statements inside your data.
Change your .* to a [^\]]* (i.e. match non-]s), so that your matches can't spill over past an END block, giving you something like BEGIN\s+\[([^\]]*?)END\s+ID=(X_[^\]]*)\]