What is the regex to find lines WITHOUT a line break - regex

I'm using SubtitleEdit and I'd like to locate all the lines that do not contain a line break.
Because lines containing a line break indicates they are bilingual, which I want.
But those that do not have line breaks are mono-lingual, and I'd like to quickly locate them all and delete them. TIA!
Alternatively, if there is a regex expression that can find lines which do not contain any English characters, that would also work.

The confusion here was caused by 2 facts:
What SubtitleEdit calls a line is actually a multiline, containing
newlines.
The newline displayed is not the one used internally (so it would never match <br>).
Solution 1:
Now that we have found out it uses either \r\n or just \n, we can write a regex:
(?-m)^(?!.*\r?\n)[\s\S]*$
Explanation:
(?-m) - turn off the multiline option (which is otherwise enabled).
^ - match from start of text
(?!.*\r?\n) - negative look ahead for zero or more of any characters followed by newline character(s) - (=Contains)
[\s\S]*$ - match zero or more of ANY character (including newline) - will match the rest of text.
In short: If we don't find newline characters, match everything.
Now replace with an empty string.
Solution 2:
If you want to match lines that doesn't have any English characters, you can use this:
(?-m)^(?![\s\S]*[a-zA-Z])[\s\S]*$
Explanation:
(?-m) - turn off the multiline option (which is otherwise enabled).
^ - match from start of text
(?![\s\S]*[a-zA-Z]) - negative look ahead for ANY characters followed by an English character.
[\s\S]*$ - match zero or more of ANY character (including newline) - will match the rest of text.
In short: If we don't find an English character, match everything.
Now replace with an empty string.

You should use regex assert. Given test lines:
something_1
some<br>thing_2
something_3<br>
<br>something_4
something_5
This is an expression that will match lines 1 and 5
^(?!.*<br>).*$
In this regular expression we have the negative lookahead assertion (?!.*<br>) that allows us to define what line is suitable for us

Related

Match consecutive lines that start with 1 or more spaces

Could anyone offer up assistance to make this work:
https://regex101.com/r/s1X84J/1
REGEX
^((?:(?:[ ]{1,}|\t).*(\R|$))+){1,}
It should match any consecutive lines that start with one or more spaces. In the example, I am able to get it to match the first block of text. I am trying to get it to match the next block of consecutive text starting with one or more spaces as Match 2.
Firstly you need the global flag/option set (/g) to return more than one match.
Secondly the following returns multiple lines starting with space. It uses a look back to ensure the match starts on an even line boundary:
/(^|(?<=\n))( [^\n]*\n\r?)+ /gm
The flags are on the right.
You need to use g and m flag with the following pattern:
^\h.*(?:\R\h.*)*
If your real regex flavor does not support \h (horizontal whitespaces) you can use either [^\S\r\n] or [\p{Zs}\t] instead.
Details:
^ - start of a line
\h - a horizontal whitespaces
.* - the rest of the line
(?:\R\h.*)* - any zero or more occurrences of
\R - any line break sequence
\h - a horizontal whitespaces
.* - the rest of the line.
It needs to be adjusted if the regex flavor is not PCRE / Onigmo / Java.

Replace same character on one line multiple times Notepad++

I have a many files and in each file there is a line with the following (amongst other things). This line is not always in the same place, however, it starts from the beginning of the line. This line is also always different.
slug: bláh-téxt-hello-write-sométhing-ábout-arrow
I'd like to replace each occurrence of the special characters (á and é with their corresponding characters a and e). Each of those characters can be on this line many times + also occur in the document in other places (which should not be replaced).
So the result is:
slug: blah-text-hello-write-something-about-arrow
I have this:
find: ^slug: (.)((é)|(á))(.)
replace: slug: $1(?2e)(?2a)$3
However, this seems to replace only one character at a time. How do I get it to run multiple times until there is no character to replace?
Thanks much for any insights.
You can use
Find What: (?:\G(?!^)|^slug:\h*)[^\r\náé]*\K(?:(á)|é)
Or
Find What: (?:\G(?!^)|^slug:\h*).*?\K(?:(á)|é)
Replace With: (?1a:e)
Details:
(?:\G(?!^)|^slug:\h*) - end of the previous match or slug: and then zero or more horizontal whitespaces at the start of a line
[^\r\náé]* - zero or more chars other than CR, LF, á and é
.*? - will match zero or more chars other than line break chars, as few as possible
\K - the operator that discards all text matched so far
(?:(á)|é) - either á (captured into Group 1) or é.
In the replacement, (?1a:e), replaces the found match with a if Group 1 matched, else, e is used.
See the regex demo.
Extra information about the use of conditional replacement is available in my "Swapping multiple values using conditional replacement patterns in Notepad++" YT video.
Extra information about the use of \G operator can be found in another YT video of mine, "\G anchor use cases".

Regex to match lines starting with a \t or - but only capture - on

I cannot figure out this regex for the life of me
I have example input such as:
- Line 1
- Line 2
- Line 3
- Line 4
I am trying to match each line starting at the - and going through the end of the line. I am using the Workflow app on iOS which uses ICU regex parsing
The pattern I am using is
(?m)^\t*(-.*)
This pattern will match all the lines, but it captures the tabs. What am I doing wrong?
You ask why your regex captures the tabs. It is not so: your regex matches the tabs, and captures the - after those tabs with the rest of the line. The point is that you are using consuming pattern, the one that will return the matched/captured strings.
Non-consuming patterns - lookarounds - can be used to just check for some text presence/absence that do not actually put it into the text returned.
In ICU regex flavor, the lookbehinds are of constrained-width, that is, if you use a limiting quantifier, it is OK to use it. (The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)
Thus, this will work in case there can be 100 and fewer tabs at the line start:
(?m)(?<=^\t{0,100})-.*
Here,
(?m) - makes ^ match the start of a line
(?<=^\t{0,100}) - a positive lookbehind requiring 0 to 100 tabs after the beginning of the line to appear before a
-.* - hyphen and the rest of the line.
Try this:
(?m)^[ \t]*(-.*)
First, it appears that you have some spaces at the beginning of some of those lines, so \t will not match spaces. Replacing \t with [ \t] (or just \s) will fix this. Also, (-*) is going to match and capture any number of -, not including what's following. Put a . before your * to match any number of characters following the -, like this: (-.*)
If you don't require leading spaces, you can use
(?m)(-.*)
If you don't care about capturing the match, you don't need the parenthesis, giving you
(?m)-.*
As mentioned in the comments

How to match words and an empy string

Newbie of regex here! :D
I have to match the string "SOMETHING HERE" in this example:
DATA[SOMETHING HERE]
SOMETHINGHERE can be NULL (DATA[]) and I have to match it too.
SOMETHINGHERE can anything, carriage returns and line breaks included
You might be looking for DATA\[(.*)\], where
\[ escapes [ character, . is any character and .* means here can be zero or more any characters.
EDIT
I didn't was able to test it and I was sure it will work until I noticed this:
The dot matches a single character, without caring what that character is. The only exception are line break characters. In all regex flavors discussed in this tutorial, the dot does not match line breaks by default.
This exception exists mostly because of historic reasons. The first tools that used regular expressions were line-based. They would read a file line by line, and apply the regular expression separately to each line. The effect is that with these tools, the string could never contain line breaks, so the dot could never match them.
So . match almost all characters (excluding CR and LF). So you can use this:
DATA\[([^a]*[a]*)*\]
It's exactly: match character, which's not 'a' or 'a' (you can use any character here)

Vim RegEx: Match until blank line

I'm trying to write a RegEx that will match any line that contains ".wpd", and then match all lines after that until it reaches a blank line (including the blank line).
This is what I've tried:
/\v^.*.wpd\_.\{-}^\s*$
However, the non-greedy operator \{-} after the "all characters including new lines" character class \{-} doesn't seem to work. If I use
/\v^.*.wpd\_.*
that will match the next line containing ".wpd" and then all lines after that. However, as soon as I change the * to \{-}, it doesn't match anything at all.
What am I doing wrong? Thanks!
This one seems to work:
/\v^.*\.wpd\_.{-}\n\s*\n
You cannot use the atom ^ (same for $) inside the regexp, it has its special meaning only at the front (back); elsewhere, it's taken as the literal char. Use \n to match a newline inside the regexp, as shown by perreal's answer.
(?s)[^\n\r]*\.wpd(.*?)\n{2}
(?s) - Turn on 'dot matches line breaks' to search across lines
[^\n\r]* - Starting at the beginning of a line, match anything that's not a line break
.wpd - Match '.wpd'
(.*?) - Match anything, non-greedily, including line breaks ( because we turned on (?s) previously )
\n{2} - ... until you find two newlines in a row, which would be a blank line
:)
The following is a large supporting comment to #perreal's answer above as well as my own version of that answer which I find more intuitive.
Let's dissect the following regexp based on http://vimdoc.sourceforge.net/htmldoc/pattern.html#/magic
/\v^.*\.wpd\_.{-}\n\s*\n
\v (lowercase v): This is the 'very magic' operator which
signifies that in the pattern after it all ASCII characters except
'0'-'9', 'a'-'z', 'A'-'Z' and '_' have a special meaning.Therefore, characters like * , ^, $ need not be escaped in the pattern but for _ to have special meaning (such as modifying the behaviour of . to match newline), it needs to be escaped. Hence with \v set, you need \_ for the latter to have special meaning. To truly appreciate how much very magic simplifies the expression, compare it with the same expression using the very NOmagic(uppercase \V): /\V\^\.\*.wpd\_\.\{-}\n\s\*\n (very nomagic) vs /\v^.*\.wpd\_.{-}\n\s*\n (very magic)
^.*\.wpd: Greedily match anything (.*) from the beginning of a line (^) till .wpd
\_. : Matches a single character, which can be
any character including the newline. Note that with \v set, the pattern must have escaped underscore as noted above.
{-} : Is the non-greedy equivalent of * quantifier. So, where .*BLAH matches the most possible characters till BLAH, .{-}BLAH will match the least possible. To see this in action, take a look at this (in this case, I had to use ? instead of {-} since that regex is PCRE) :
\n\s*\n: Matches a blank line which may contain one or more spaces or tabs
\_.{-}\n\s*\n: combines the above two and means Match the least possible number of characters including newline (\_.) until a blank line (\n\s*\n)
\v^.*\.wpd\_.{-}\n\s*\n: Finally putting it altogether, set the very magic operator (possibly to allow simplifying the pattern by not needing to escape anything except an _ for special meaning), search for any line which contains .wpd and match until the closest blank line.
My version using variants of end-of-line start-of-line characters
The only modification is to the expression used to signify a blank line. I find it useful to define a blank line in terms of the start-of-line ('^') and end-of-line ('$') characters, however as-is, they cannot be used anywhere in a regexp except the beginning and the end respectively.
For the above use-case, there are variants which can be used anywhere in a regex, namely: '_^' and \_$ respectively. Therefore the blank line expression can be written as \_^\s*\_$ instead of \n\s*\n, thus making the complete expression:
\v^.*.wpd\_.{-}\_^\s*\_$
This perhaps is closer to answering the OP's question about why they were unable to use the start-of-line character in their expression.
Phew!