Multiple regex query (Notepad++) - regex

I’ve understand that I can do multiple search&replace queries in this way:
Search: (á)|(é)|(í)|(ó)|(ú)
Replace: (?1Á)(?2É)(?3Í)(?4Ó)(?5Ú)
Example:
Before: árbol ácido
After: Árbol ácido
But how can I use this method if I need to add a precondition, such as line start and some punctuation before it?
I’ve tried:
Search: (^[—¿¡«]*?)[(á)|(é)|(í)|(ó)|(ú)] => Works!
Replace: \1(?2Á)(?3É)(?4Í)(?5Ó)(?6Ú) => Doesn’t work…
Example:
Before: —¿árbol ácido?
After: —¿rbol ácido?
Any help?
Regards.

In your second regex, you use a character class (indicated by [...]) - for grouping purposes according to a comment. But that's not how those work. Use non-capturing groups instead, e.g.
(^[—¿¡«]*?)(?:(á)|(é)|(í)|(ó)|(ú))

Instead of creating a group for each character, you can also have all the characters in one group and replace them with their uppercase variant.
search string
(^[—¿¡«]*)([áéíóú])
replacement
$1\U$2
The following can be found in the Notepad++ Regular Expressions documentation:
\U Causes next characters to be output in uppercase, until a \E is found.

Related

Regex: matching up first occurence before special characters (|,-,/...)

I have product id on a sheet in two parts separated by special characters
I have several pattern, I can't find a solution that works for all my patterns, I would like to keep only the text before the "-", "|", space can be everywhere
aaa23-rerez3
dfds12|gdflk 132
ds123 fdsf-123 gad
sa 123,fdsg 123
I found this regex :
.*\w
working for some pattern but didn't work for pipe | and -
many thanks for your help
To match only the text before the | or - you can use an anchor ^ to assert the start of the string and use a negated character class to match any char except the listed in the character class.
^[^|-]+
Regex demo
If the spaces can be anywhere and you also want to match those along with only word characters:
^\s*(?:\w+\s*)+
Regex demo
I hope the following regular expression works for you. I tested it and it worked for all your patterns.
^([^-\|\s]+)(?=[-\|\s].*$)
Allow spaces, but separate if special character found.
["aaa23-rerez3", "dfds12|gdflk 132", "ds123 fdsf-123 gad", "sa 123,fdsg 123"].forEach(x => console.log(x, x.split(/[^\d\w\s]/g)))
Separates space also.
["aaa23-rerez3", "dfds12|gdflk 132", "ds123 fdsf-123 gad", "sa 123,fdsg 123"].forEach(x => console.log(x, x.split(/\W/g)))

Search and convert to lower case on vim

I have a code with object.attribute where attribute can be an array
example: object.SIZE_OF_IMAGE[0] or a simple string. I want to search all occurrences "object.attribute" and replace it with self.lowercase(attribute) I want a regular expression on vim to do that.
I can use that :%s/object.*/self./gc and replace it manually but it is very slow.
Here are some examples:
object.SIZE to self.size
object.SIZE_OF_IMAGE[0] to self.size_of_image[0]
You basically just need two things:
Capture groups :help /\( let you store what's matched in between \(...\) and then reference it (via \1, \2, etc.) in the replacement (or even afterwards in the pattern itself).
The :help s/\L special replacement action that makes everything following lowercase.
This gives you the following command:
:%substitute/\<object\.\(\w\+\)/self.\L\1/g
Notes:
I've established a keyword start assertion (\<) at the beginning to avoid matching schlobject as well.
\w\+ matches letters, digits, and underscores (so it fulfills your example); various alternatives are possible here.
sed -E 's/object\.([^ \(]*)(.*)/self.lowercase(\1)\2/g' file_name.txt
above command considers that your attribute is followed by space or "("
you can tweek this command based on your need
Based on your comment above that the attribute part
"finishes by space or [ or (" you could match it with:
/object\.[^ [(]*
So, to replace it with self.attribute use a capturing
group and \L to make everything lowercase:
:%s/\vobject\.([^ [(]*)/self.\L\1/g
In the command mode try this
:1,$ s/object.attribute/self.lowercase(attribute)/g

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

Notepad++ Regex to find group of lines with condition

Given this example text:
<abr:rules>
<abr:ruleTypeDefinition>
<abr:code>ABB</abr:code>
<abr:ownership>
<abr:owner organization="NT" application="DCS" subapplication="FM"/>
...lines...
...........
</abr:rules>
<abr:rules>
<abr:ruleTypeDefinition>
<abr:code>ADE</abr:code>
<abr:ownership>
<abr:owner organization="NT" application="DCS" subapplication="CM"/>
...lines...
...........
</abr:rules> (end of group)
I would like to find and remove all that goes from <abr:rules> to </abr:rules> with the condition that subapplication IS NOT "CM". Organization and application are the same, <abr:code> it's any string.
What I tried so far is
<abr:rules>\n<abr:ruleTypeDefinition>\n<abr:code>[a-zA-Z0-9]{3,}<\/abr:code>\n<abr:ownership>\n<.*"(FM|PSD|SSC)"\/>\n(?s).*?\n<\/abr:rules>\n
which works but only because I know the other subapplication names.
Is there any way to do it with Regex only ?
Try the following find and replace:
Find:
<abr:rules>((?!subapplication=).)*subapplication="(?!CM")[^"]+"((?!</abr:rules>).)*</abr:rules>
Replace:
(empty string)
Demo
Note: The above pattern will only work if you enable dot in Notepad++ to match newlines. If you don't want to do that, then you may use [\S\s] instead of dot.
You should not use regex for xml, you can read why here:
https://stackoverflow.com/a/1732454/3763374
Instead you can use some parser like Xpath

Notepad++ and delimiters: automatically replace ``string'' by \command{string}

Within Notepad++, I want to replace many instances of the type ``string'' by \command{string} where string can be any string of characters. I am fairly close to what I want to achieve with:
Find: (?<=``)(.*?)(?='')
Replace: \\command{\1}
There is still a problem. With the regex code above, instead of \command{string} I get ``\command{string}'' and I am not sure why the `` and '' are not removed?
It is because you are using lookaround assertions. Lookaround (zero-width) assertions only assert that a position can be matched and do not "consume" any characters on the string. You can use the below regular expression.
Find: ``([^']+)''
Replace: \\command{\1}
You need to wrap everything into a capture group and use that. NP++ seems to not support lookahead/behind, but you dont need that for this specific case anyway:
``([^']+)'' -> \\command{\1}
This will make sure it does not match two commands (longest match) in something like:
run ``ls -l'' or ``ls -a''