Replace whitespaces between specific strings

Replace whitespaces between specific strings - regex

I'm trying to replace whitespaces with underscores in certain parts of my html-document with Notepad++.
I can identify the area to search for the whitespaces in the following way:
-Begins with: src="video/
-Ends with: mp4
For example I might have a line like this:
<video class="play" src="video/my file name with empty spaces.mp4">
and I would like to change it to be like this:
<video class="play" src="video/my_file_name_with_empty_spaces.mp4">

Tested in N++
Search: (?:src="video|(?<!^)\G)(?:(?!mp4).)*?\K\s+
Replace: _
On the demo, see the substitutions at the bottom.
Explanation
(?:src="video|(?<!^)\G) matches the delimiter src="video, or \G the position following the previous match as long as it is not at the beginning of the string (?<!^) where \G can also match
(?:(?!mp4).) matches one character that is not followed by mp4
*? lazily matches such characters, up to...
\s a space character (our match which we replace with _)
before the space, the \K tells the engine to drop what was matched so far from the final match it returns

Related

Extend string between strings

startABCend
->
startABC123end
I seek to capture text between start and end, and extend it, as shown. I tried:
find = start.*end, replace = \1 123: will capture start and end and between, but replace them all
find = (?s)(?<=start).+?(?=end), replace = \1 123: will keep start and end but replace captured
How to accomplish this with regex in N++?
The exact use case is
func_name(a, b=1) -> func_name(a, b=1, c=2)
# can also be
func_name(g=5, k=7) -> func_name(g=5, k=7, c=2)
# so capture between `func_name(` and `)` and extend with `, c=2`

You could do this without capture groups, and match what you want to replace.
\bstart\K.*?(?=end\b)
The pattern matches:
\bstart Match start preceded by a word boundary
\K Forget what is matched until now
.*? Match as least chars as possible
(?=end\b) Positive lookahead, assert end to the right followed by a word boundary
In the replacement use the full match followed by 123
$&123
For the updated example data, you could match the format of key with an optional =value, and optionally repeat that asserting a ) to the right.
\bfunc_name\([^\s,=]+(?:=[^\s,=]+)?(?:,\h*[^\s,=]+(?:=[^\s,=]+)?)*(?=\))
Regex demo
And replace with
$&, c=2

Your example target does not include the white space you have in your replace string. To accomplish using the group AND append numbers you can use brackets.
Basically:
Find: (?<=start)(.+?)(?=end)
Replace: (\1)123
or just
Find: start(.+?)end
Replace: start(\1)123end

Regex - replace blank spaces in line (Notepad++)

I have a document with multiple information. What I want is to build a Notepad++ Regex replace function, that finds the following lines in the document and replaces the blank spaces between the "" with an underline (_).
Example:
The line is:
&LOG Part: "NAME TEST.zip"
The result should be:
&LOG Part: "NAME_TEST.zip"
The perfect solution would be that the regex finds the &LOG Part: "NAME TEST.zip" lines and replaces the blank space with an underline.
What I have tried for now is this expression to find the text between the " ":
\"[^"]*\"
It should do it, but I don't know which expression to use to replace the blank spaces with an underline.
Anyone could help with a solution?
Thanks!

The \"[^"]*\" will only match whole substrings from " up to another closest " without matching individual spaces you want to replace.
Since Notepad++ does not support infinite width lookbehind, the only possible solution is using the \G - based regex to set the boundaries and use multiple matching (this one will replace consecutive spaces with 1 _):
(?:"|(?!^)\G)\K([^ "]*) +(?=[^"]*")
Or (if each space should be replaced with an underscore):
(?:"|(?!^)\G)\K([^ "]*) (?=[^"]*")
And replace with $1_. If you need to restrict to replacing inside &LOG Part only, just add it to the beginning:
(?:&LOG Part:\s*"|(?!^)\G)\K([^ "]*) (?=[^"]*")
A human-readable explanation of the regex:
(?:"|(?!^)\G)\K - Find a ", or, with each subsequent successful match, the end of the previous successful match position, and omit all the text in the buffer (thanks to \K)
([^ "]*) - (Group 1, accessed with$1from the replacement pattern) 0+ characters other than a space and"`
+ - one or more literal spaces (replace with \h to match all horizontal whitespace, or \s to match any whitespace)
(?=[^"]*") - check if there is a double quote ahead of the current position

Regex to find all spaces in lines beginning with a specific string

I am searching for a regex to find all the spaces in lines starting with a specific string (in a SVN dump file). Despite the "global" modifier my regex returns only the first occurence of the space character.
A part of the file i am working on :
...
pla bla bli
Node-path: branches/BU ILD/ml_cf/syst em/Translation/TranslationManager.class.php
Node-kind: file
Node-action: change
Text-delta: true
....
The regex :
/Node-path: \S*(\ )/g
finds only the first space (between U and I) but not the others on the line.

Using PCRE regex to find all the spaces on a line starting with a particular text, use this regex:
/(?:^Node-path: |\G)\S+\K\h+/gm
RegEx Demo
Using (?:Node-path: |\G) we are matching lines starting with Node-path: OR positioning at the end of the previous match.
\G asserts position at the end of the previous match or the start of the string for the first match
\K resets the starting point of the reported match.
\h+ matches 1 or more of horizontal whitespace (space or tab)

Remove all characters after a certain match

I am using Notepad++ to remove some unwanted strings from the end of a pattern and this for the life of me has got me.
I have the following sets of strings:
myApp.ComboPlaceHolderLabel,
myApp.GridTitleLabel);
myApp.SummaryLabel + '</b></div>');
myApp.NoneLabel + ')') + '</label></div>';
I would like to leave just myApp.[variable] and get rid of, e.g. ,, );, + '...', etc.
Using Notepad++, I can match the strings themselves using ^myApp.[a-zA-Z0-9].*?\b (it's a bit messy, but it works for what I need).
But in reality, I need negate that regex, to match everything at the end, so I can replace it with a blank.

You don't need to go for negation. Just put your regex within capturing groups and add an extra .*$ at the last. $ matches the end of a line. All the matched characters(whole line) are replaced by the characters which are present inside the first captured group. .
matches any character, so you need to escape the dot to match a literal dot.
^(myApp\.[a-zA-Z0-9].*?\b).*$
Replacement string:
\1
DEMO
OR
Match only the following characters and then replace it with an empty string.
\b[,); +]+.*$
DEMO

I think this works equally as well:
^(myApp.\w+).*$
Replacement string:
\1
From difference between \w and \b regular expression meta characters:
\w stands for "word character", usually [A-Za-z0-9_]. Notice the inclusion of the underscore and digits.

(^.*?\.[a-zA-Z]+)(.*)$
Use this.Replace by
$1
See demo.
http://regex101.com/r/lU7jH1/5

Multiline selection of blocks with ID at the end of each block with regular expression

I have regular expression:
BEGIN\s+\[([\s\S]*?)END\s+ID=(.*)\]
which select multiline text and ID from text below. I would like to select only IDs with prefix X_, but if I change ID=(.*) to ID=(X_.*) begin is selected from second pair not from third as I need. Could someone help me to get correct expression please?
text example:
BEGIN [
text a
END ID=X_1]
BEGIN [
text b
text c
END ID=Y_1]
text aaa
text bbb
BEGIN [
text d
text e
END ID=X_2]
text xxx
BEGIN [
text bbb
END ID=X_3]

It isn't the .* that's gobbling everything up as people keep saying, it's the [\s\S]*?. .* can't do it because (as the OP said) the dot doesn't match newlines.
When the END\s+ID=(X_.*)\] part of your regex fails to match the last line of the second block, you're expecting it to abandon that block and start over with the third one. That's what it have to do to make the shortest match.
In reality, it backtracks to the beginning of the line and lets [\s\S]*? consume it instead. And it keeps on consuming until it finds a place where END\s+ID=(X_.*)\] can match, which happens to be the last line of the third block.
The following regex avoids that problem by matching line by line, checking each one to see if it starts with END. This effectively confines the match to one block at a time.
(?m)^BEGIN\s+\[[\r\n]+((?:(?!END).*[\r\n]+)*)END\s+ID=(X_.*)\]
Note that I used ^ to anchor each match to the beginning of a line, so I used (?m) to turn on multiline mode. But I did not--and you should not--turn on single-line/DOTALL mode.

Assuming there aren't any newlines inside a block and the BEGIN/END statements are the first non-space of their line, I'd write the regex like this (Perl notation; change the delimiters and remove comments, whitespaces and the /x modifier if you use a different engine)
m{
\n \s* BEGIN \s+ \[ # match the beginning
( (?!\n\s*\n) .)*? # match anything that isn't an empty line
# checking with a negative look-ahead (?!PATTERN)
\n \s* END \s+ ID=X_[^\]]* \] # the ID may not contain "]"
}sx # /x: use extended syntax, /s: "." matches newlines
If the content may be anything, it might be best to create a list of all blocks, and then grep through them. This regex matches any block:
m{ (
BEGIN \s+ \[
.*? # non-greedy matching is important here
END \s+ ID=[^\]]* \] # greedy matching is safe here
) }xs
(add newlines if wanted)
Then only keep those matches that match this regex:
/ID = X_[^\]]* \] $/x # anchor at end of line
If we don't do this, backtracking may prevent a correct match ([\s\S]*? can contain END ID=X_). Your regex would put anything inside the blocks until it sees a X_.*.
So using BEGIN\s+\[([/s/S]*?)END\s+ID=(.*?)\] — note the extra question mark — one match would be:
BEGIN [
text b
text c
END ID=Y_1]
text aaa
text bbb
BEGIN [
text d
text e
END ID=X_2]
…instead of failing at the Y_. A greedy match (your unchanged regex) should result in the whole file being matched: Your (.*) eats up all characters (until the end of file) and then goes back until it finds a ].
EDIT:
Should you be using perls regex engine, we can use the (*FAIL) verb:
/BEGIN\s+\[(.*?)END\s+ID=(X_[^\]]*|(*FAIL))\]/s
"Either have an ID starting with X_ or the match fails". However, this does not solve the problem with END ID=X_1]-like statements inside your data.

Change your .* to a [^\]]* (i.e. match non-]s), so that your matches can't spill over past an END block, giving you something like BEGIN\s+\[([^\]]*?)END\s+ID=(X_[^\]]*)\]

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Replace whitespaces between specific strings - regex

Related

Extend string between strings

Regex - replace blank spaces in line (Notepad++)

Regex to find all spaces in lines beginning with a specific string

Remove all characters after a certain match

Multiline selection of blocks with ID at the end of each block with regular expression

Categories

Resources