Replace same character on one line multiple times Notepad++ - regex

I have a many files and in each file there is a line with the following (amongst other things). This line is not always in the same place, however, it starts from the beginning of the line. This line is also always different.
slug: bláh-téxt-hello-write-sométhing-ábout-arrow
I'd like to replace each occurrence of the special characters (á and é with their corresponding characters a and e). Each of those characters can be on this line many times + also occur in the document in other places (which should not be replaced).
So the result is:
slug: blah-text-hello-write-something-about-arrow
I have this:
find: ^slug: (.)((é)|(á))(.)
replace: slug: $1(?2e)(?2a)$3
However, this seems to replace only one character at a time. How do I get it to run multiple times until there is no character to replace?
Thanks much for any insights.

You can use
Find What: (?:\G(?!^)|^slug:\h*)[^\r\náé]*\K(?:(á)|é)
Or
Find What: (?:\G(?!^)|^slug:\h*).*?\K(?:(á)|é)
Replace With: (?1a:e)
Details:
(?:\G(?!^)|^slug:\h*) - end of the previous match or slug: and then zero or more horizontal whitespaces at the start of a line
[^\r\náé]* - zero or more chars other than CR, LF, á and é
.*? - will match zero or more chars other than line break chars, as few as possible
\K - the operator that discards all text matched so far
(?:(á)|é) - either á (captured into Group 1) or é.
In the replacement, (?1a:e), replaces the found match with a if Group 1 matched, else, e is used.
See the regex demo.
Extra information about the use of conditional replacement is available in my "Swapping multiple values using conditional replacement patterns in Notepad++" YT video.
Extra information about the use of \G operator can be found in another YT video of mine, "\G anchor use cases".

Related

What is the regex to find lines WITHOUT a line break

I'm using SubtitleEdit and I'd like to locate all the lines that do not contain a line break.
Because lines containing a line break indicates they are bilingual, which I want.
But those that do not have line breaks are mono-lingual, and I'd like to quickly locate them all and delete them. TIA!
Alternatively, if there is a regex expression that can find lines which do not contain any English characters, that would also work.
The confusion here was caused by 2 facts:
What SubtitleEdit calls a line is actually a multiline, containing
newlines.
The newline displayed is not the one used internally (so it would never match <br>).
Solution 1:
Now that we have found out it uses either \r\n or just \n, we can write a regex:
(?-m)^(?!.*\r?\n)[\s\S]*$
Explanation:
(?-m) - turn off the multiline option (which is otherwise enabled).
^ - match from start of text
(?!.*\r?\n) - negative look ahead for zero or more of any characters followed by newline character(s) - (=Contains)
[\s\S]*$ - match zero or more of ANY character (including newline) - will match the rest of text.
In short: If we don't find newline characters, match everything.
Now replace with an empty string.
Solution 2:
If you want to match lines that doesn't have any English characters, you can use this:
(?-m)^(?![\s\S]*[a-zA-Z])[\s\S]*$
Explanation:
(?-m) - turn off the multiline option (which is otherwise enabled).
^ - match from start of text
(?![\s\S]*[a-zA-Z]) - negative look ahead for ANY characters followed by an English character.
[\s\S]*$ - match zero or more of ANY character (including newline) - will match the rest of text.
In short: If we don't find an English character, match everything.
Now replace with an empty string.
You should use regex assert. Given test lines:
something_1
some<br>thing_2
something_3<br>
<br>something_4
something_5
This is an expression that will match lines 1 and 5
^(?!.*<br>).*$
In this regular expression we have the negative lookahead assertion (?!.*<br>) that allows us to define what line is suitable for us

Regex to match lines starting with a \t or - but only capture - on

I cannot figure out this regex for the life of me
I have example input such as:
- Line 1
- Line 2
- Line 3
- Line 4
I am trying to match each line starting at the - and going through the end of the line. I am using the Workflow app on iOS which uses ICU regex parsing
The pattern I am using is
(?m)^\t*(-.*)
This pattern will match all the lines, but it captures the tabs. What am I doing wrong?
You ask why your regex captures the tabs. It is not so: your regex matches the tabs, and captures the - after those tabs with the rest of the line. The point is that you are using consuming pattern, the one that will return the matched/captured strings.
Non-consuming patterns - lookarounds - can be used to just check for some text presence/absence that do not actually put it into the text returned.
In ICU regex flavor, the lookbehinds are of constrained-width, that is, if you use a limiting quantifier, it is OK to use it. (The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)
Thus, this will work in case there can be 100 and fewer tabs at the line start:
(?m)(?<=^\t{0,100})-.*
Here,
(?m) - makes ^ match the start of a line
(?<=^\t{0,100}) - a positive lookbehind requiring 0 to 100 tabs after the beginning of the line to appear before a
-.* - hyphen and the rest of the line.
Try this:
(?m)^[ \t]*(-.*)
First, it appears that you have some spaces at the beginning of some of those lines, so \t will not match spaces. Replacing \t with [ \t] (or just \s) will fix this. Also, (-*) is going to match and capture any number of -, not including what's following. Put a . before your * to match any number of characters following the -, like this: (-.*)
If you don't require leading spaces, you can use
(?m)(-.*)
If you don't care about capturing the match, you don't need the parenthesis, giving you
(?m)-.*
As mentioned in the comments

How can I replace only the first 2 matches per line, using regex in Notepad++

I'm trying to parse a list of filenames to a CSV file by converting the first 2 - characters per line into a |. The problem is that the filenames themselves also contain the character I'm searching for.
My raw data looks something like this:
12055371-1-Florence - BW Letter of Intent HB Comments 9-4-14-2.DOCX
12057668-2-EB-DUE-M- SBuxbaum FHA Benefit Plans-2.DOCX
12058210-1-Redline Letter of Intent-2.PDF
12058029-3-Florence Hospital--Order Establishing Bid Procedures-HB 9-23-14-2.DOCX
12058020-10-Florence - BW Letter of Intent 10,10,14 Revisions-2.DOCX
Using Notepadd++ to replace on the fly, but I'm not sure what regex will work to identify and replace these items.
Don't match -, match the beginning of the lines up to the second - :
match ^(.*?)-(.*?)-
replace by \1|\2|
Explanation :
^ matches the beginning of the line (0-width match).
(.*?) matches any character in a non-greedy way : if the next token of the regex can match, it will let it do so. The result is grouped so it can be referenced later.
\1 and \2 are back-references and refers to the two (.*?) groups.
Note : for efficiency you could replace the non-greedy matches by the negated class [^\-], which means every character but -, the - being escaped because it's a special character in this context. The groups would then become ([^\-]*). Of course it really does not matter if it's a one-time operation.

How to combine lines in regular expressions?

So i am new to regular expressions and i am learning them using a simple text editor only. I have the following file
84544484N
32343545M
32334546E
34456434M
I am trying to combine each pair of lines into one tab delimited line
The result should be :
84544484N 32343545M
32334546E 34456434M
I wrote the following :
Search: (.*?)\n(.*?)
Replace: \1\t\2
this did not work can someone please explain why and give me the correct solution. Thank you!!
The (.*?)\n(.*?) pattern will never work well because the (.*?) at the end of the pattern will always return an empty string (since *? is a lazy matching quantifier and if it can return zero characters (and it can) it will. Use greedy matching and adjust the pattern like:
(.+)\r?\n *(.*)
or - since SublimeText uses Boost regex - you can match any newline sequence with \R:
(.+)\R *(.*)
and replace with \1\t\2. Note I replaced *? with + in the first capturing group because you need to match non-empty lines.
Regex breakdown:
(.+) - one or more characters other than a newline (as many as possible) up to
\R - a newline sequence (\r\n, \r or just \n)
* - a literal space, zero or more occurrences
(.*) - Group 2: zero or more characters other than a newline (as many as possible)
/

Notepad++ - Add link html to beginning/end of every line using regular expressions

I'm not as comfortable with RegEx as I'd like to be. What I'm trying to do is prepend every line (of a list of URL's) with
for the prepend, I've been using Replace with regular expressions: ^ with <a href="
this works alright, however, there are certain blank lines that get <a href=" added to them. Is it possible to replace the beginning of each line only if there's more than 1 character in the line?
And as for doing the end of the line, I have no idea. Any help would be much appreciated--I have a very large amount of url's in different text files to go through to edit.
Seach and replace by ^(?=.) and (?<=.)$ instead. The period implies "any character, excluding a linebreak". combined with ^ and $, it would be the start and end of a line that is followed by (or preceeded by in the case of $) a character. This example combines it with positive lookahead and lookbehind to ensure that you don't replace any of the original line but append/prepend instead.
You can use a negative lookahead (at least if you upgrade to Notepad++ 6).
Find what: ^(?!$)
And for line endings:
Find what: (?!^)$
Taking the first one as an example, it matches at the start of a line (^) but only if $ does not match at that position - i.e. if it is not a line ending at the same time.
An alternative approach does both replacements in one replacement (and the assertion as well):
Find what: ^.+$
Replacement:
In fact, you can even omit the anchors, due to the greediness of the +, the pattern will always consume whole lines (but only if there is at least one character):
Find what: .+
Replacement:
Note that any of these will wrap your anchor around lines that contain only spaces and tabs. The best way to avoid that is to modify the third pattern:
Find what: ^[ \t]*\S[^\r\n]*
Replacement:
Starting at the beginning of a line we consume all spaces and tabs (no line breaks). Then we require one non-space character (\S). And then we consume as many non-line-break characters as possible. Due to greediness, there is again no need for the $ anchor.