Notepad++ Regex to find group of lines with condition - regex

Given this example text:
<abr:rules>
<abr:ruleTypeDefinition>
<abr:code>ABB</abr:code>
<abr:ownership>
<abr:owner organization="NT" application="DCS" subapplication="FM"/>
...lines...
...........
</abr:rules>
<abr:rules>
<abr:ruleTypeDefinition>
<abr:code>ADE</abr:code>
<abr:ownership>
<abr:owner organization="NT" application="DCS" subapplication="CM"/>
...lines...
...........
</abr:rules> (end of group)
I would like to find and remove all that goes from <abr:rules> to </abr:rules> with the condition that subapplication IS NOT "CM". Organization and application are the same, <abr:code> it's any string.
What I tried so far is
<abr:rules>\n<abr:ruleTypeDefinition>\n<abr:code>[a-zA-Z0-9]{3,}<\/abr:code>\n<abr:ownership>\n<.*"(FM|PSD|SSC)"\/>\n(?s).*?\n<\/abr:rules>\n
which works but only because I know the other subapplication names.
Is there any way to do it with Regex only ?

Try the following find and replace:
Find:
<abr:rules>((?!subapplication=).)*subapplication="(?!CM")[^"]+"((?!</abr:rules>).)*</abr:rules>
Replace:
(empty string)
Demo
Note: The above pattern will only work if you enable dot in Notepad++ to match newlines. If you don't want to do that, then you may use [\S\s] instead of dot.

You should not use regex for xml, you can read why here:
https://stackoverflow.com/a/1732454/3763374
Instead you can use some parser like Xpath

Related

Notepad++ replace text with RegEx search result

I would like replace a standard string in a file, with another that is a result of a regular expression. The standard text looks like:
<xsl:variable name="ServiceCode" select="###"/>
I would like to replace ### with a servicecode, that I can find later in the same file, from this URL:
<a href="/Services/xyz" target="_self">
The regular expression (?<=\/Services\/)(.*)(?=\" )
returns the required service code "xyz".
So, I opened Notepad++, added "###" to the "Find what" and this RegEx to the "Replace with" section, and expected that the ### text will be replaced by xyz.
But I got this result:
<xsl:variable name="ServiceCode" select="?<=/Services/.*?=" "/>
I am new to RegEx, do I need to use different syntax in the replace section than I use to find a string? Can someone give me a hint how to achieve the required result? The goal is to standardize tons of files with similar structure as now all servicecodes are hardcoded in several places in the file. Thanks.
You could use a lookahead for capturing the part ahead.
Search for: (?s)###(?=.*/Services/([^"]+)") and replace with: $1
(?s) makes the dot also match newlines (there is also a checkbox available in np++)
[^"] matches a character that is not "
The replacement $1 corresponds to capture of first parenthesized subpattern.
I am no expert at RegEx but I think I may be able to help. It looks like you might be going at this the wrong way. The regex search that you are using would normally work like this:
The parenthesis () in RegEx allow you to select part of your search and use that in the replace section.
You place (?<=\/Services\/)(.*)(?=\" ) into the "Find what" section in Notepad++.
Then in the "Replace with" section you could use \1 or \2 or \3 to replace the contents of your search with what was found in the (?<=\/Services\/) or (.*) or (?=\" ) searches respectively.
Depending on the structure of your files, you would need to use a RegEx search that selects both lines of code (and the specific parts you need), then use a combination of \1\2\3 etc. to replace everything exactly how it was, except for the ### which you could replace with the \number associated with xyz.
See http://docs.notepad-plus-plus.org/index.php/Regular_Expressions for more info.

How to select text between greater than and less than with an additional slash

I'm trying to select text between ></ . Example below I want "text"
>text</
but I'm unable to do so.
tried the following but it doesn't like the slash at the end of the regex
\>(.*?)\<\
I'm trying to do this in TextPad. How is this supposed to be done?
I'm ultimately wanting to delete all text between these two characters so all I'm left with is something like: <element></element>
RegEx wise, you can use 3 groupings and for the replace only use the first and 3rd group: \1\3.
Find: (>)(.*)(</)
Replace: \1\3
Try doing:
\>(.*?)\<\/
The regex that you were trying would actually have given error because you had a \ and nothing after that.
You are close.. use the following:
(>).*?(<\/)
And replace with \1\2
See DEMO
OR
You can use lookbehind and lookaheads:
(?<=>)(.*?)(?=<\/)
And replace with '' (empty string)
See DEMO

Notepad++ and delimiters: automatically replace ``string'' by \command{string}

Within Notepad++, I want to replace many instances of the type ``string'' by \command{string} where string can be any string of characters. I am fairly close to what I want to achieve with:
Find: (?<=``)(.*?)(?='')
Replace: \\command{\1}
There is still a problem. With the regex code above, instead of \command{string} I get ``\command{string}'' and I am not sure why the `` and '' are not removed?
It is because you are using lookaround assertions. Lookaround (zero-width) assertions only assert that a position can be matched and do not "consume" any characters on the string. You can use the below regular expression.
Find: ``([^']+)''
Replace: \\command{\1}
You need to wrap everything into a capture group and use that. NP++ seems to not support lookahead/behind, but you dont need that for this specific case anyway:
``([^']+)'' -> \\command{\1}
This will make sure it does not match two commands (longest match) in something like:
run ``ls -l'' or ``ls -a''

Find and replace using regular expressions in Notepad++

I have to make changes to URL's in a couple of notepad files. I was hoping if this could be done using regular expressions.
The URL's are in the following structure,
/web/20120730114452im_/hxxp://mysite1.com
/web/20120730114453im_/hxxp://mysite2.com
/web/20120730114454im_/hxxp://mysite3.com
/web/20120730114454im_/hxxp://mysite4.com
I have to remove the part before the hxxp so what remains after the search and replace is,
hxxp://mysite1.com
hxxp://mysite2.com
hxxp://mysite3.com
hxxp://mysite4.com
What is the regular expression I need to use to get the desired result ?
Thanks for your help.
Okay, as per your confirmation, a proper regex that won't match too much would be this:
/web/[0-9]+im_/
Where [0-9]+ will match any amount of numbers.
regex101 demo.
Don't forget to check the 'regular expression' checkbox in the Find/Replace dialog box.
USE THIS,
FIND: [ a-z 0-9 _ / ]+/hxxp
REPLACE: hxxp

RegEx: capture entire group content

I am writing a parser for some Oracle commands, like
LOAD DATA
INFILE /DD/DATEN
TRUNCATE
PRESERVE BLANKS
INTO TABLE aaa.bbb
( some parameters... )
I already created a regex to match the entire command. I am now looking for a way to capture the name of the input file ("/DD/DATEN" for instance here).
My problem is that using the following regex will only return the last character of the first group ("N").
^\s*LOAD DATA\s*INFILE\s*(\w|\\|/)+\s*$
Debuggex Demo
Any ideas?
Many thanks in advance
EDIT: following #HamZa 's question, here would be the entire regex to parse Oracle LOAD DATA INFILE command (simplified though):
^\s*LOAD DATA\s*INFILE\s*((?:\w|\\|/)+)\s*((?:TRUNCATE|PRESERVE BLANKS)\s*){0,2}\s*INTO TABLE\s*((?:\w|\.)+)\s*\(\s*((\w+)\s*POSITION\s*\(\s*\d+\s*\:\s*\d+\s*\)\s*((DATE\s*\(\s*(\d+)\s*\)\s*\"YYYY-MM-DD\")|(INTEGER EXTERNAL)|(CHAR\s*\(\s*(\d+)\s*\)))\s*\,{0,1}\s*)+\)\s*$
Debuggex Demo
Let's point out the wrongdoer in your regex (\w|\\|/)+. What happens here ?
You're matching either a word character or a back/forwardslash and putting it in group 1 (\w|\\|/) after that you're telling the regex engine to do this one or more times +. What you actually want is to match those characters several times before grouping them. So you might use a non-matching group (?:) : ((?:\w|\\|/)+).
You might notice that you could just use a character class after all ([\w\\/]+). Hence, your regex could look like
^\s*LOAD DATA\s*INFILE\s*([\w\\/]+)\s*$
On a side note: that end anchor $ will cause your regex to fail if you're not using multiline mode. Or is it that you intentionally didn't post the full regex :) ?
Not tested but...
^\s*LOAD DATA\s*INFILE\s*(\S+)\s*$