Match consecutive lines that start with 1 or more spaces - regex

Could anyone offer up assistance to make this work:
https://regex101.com/r/s1X84J/1
REGEX
^((?:(?:[ ]{1,}|\t).*(\R|$))+){1,}
It should match any consecutive lines that start with one or more spaces. In the example, I am able to get it to match the first block of text. I am trying to get it to match the next block of consecutive text starting with one or more spaces as Match 2.

Firstly you need the global flag/option set (/g) to return more than one match.
Secondly the following returns multiple lines starting with space. It uses a look back to ensure the match starts on an even line boundary:
/(^|(?<=\n))( [^\n]*\n\r?)+ /gm
The flags are on the right.

You need to use g and m flag with the following pattern:
^\h.*(?:\R\h.*)*
If your real regex flavor does not support \h (horizontal whitespaces) you can use either [^\S\r\n] or [\p{Zs}\t] instead.
Details:
^ - start of a line
\h - a horizontal whitespaces
.* - the rest of the line
(?:\R\h.*)* - any zero or more occurrences of
\R - any line break sequence
\h - a horizontal whitespaces
.* - the rest of the line.
It needs to be adjusted if the regex flavor is not PCRE / Onigmo / Java.

Related

What is the regex to find lines WITHOUT a line break

I'm using SubtitleEdit and I'd like to locate all the lines that do not contain a line break.
Because lines containing a line break indicates they are bilingual, which I want.
But those that do not have line breaks are mono-lingual, and I'd like to quickly locate them all and delete them. TIA!
Alternatively, if there is a regex expression that can find lines which do not contain any English characters, that would also work.
The confusion here was caused by 2 facts:
What SubtitleEdit calls a line is actually a multiline, containing
newlines.
The newline displayed is not the one used internally (so it would never match <br>).
Solution 1:
Now that we have found out it uses either \r\n or just \n, we can write a regex:
(?-m)^(?!.*\r?\n)[\s\S]*$
Explanation:
(?-m) - turn off the multiline option (which is otherwise enabled).
^ - match from start of text
(?!.*\r?\n) - negative look ahead for zero or more of any characters followed by newline character(s) - (=Contains)
[\s\S]*$ - match zero or more of ANY character (including newline) - will match the rest of text.
In short: If we don't find newline characters, match everything.
Now replace with an empty string.
Solution 2:
If you want to match lines that doesn't have any English characters, you can use this:
(?-m)^(?![\s\S]*[a-zA-Z])[\s\S]*$
Explanation:
(?-m) - turn off the multiline option (which is otherwise enabled).
^ - match from start of text
(?![\s\S]*[a-zA-Z]) - negative look ahead for ANY characters followed by an English character.
[\s\S]*$ - match zero or more of ANY character (including newline) - will match the rest of text.
In short: If we don't find an English character, match everything.
Now replace with an empty string.
You should use regex assert. Given test lines:
something_1
some<br>thing_2
something_3<br>
<br>something_4
something_5
This is an expression that will match lines 1 and 5
^(?!.*<br>).*$
In this regular expression we have the negative lookahead assertion (?!.*<br>) that allows us to define what line is suitable for us

Regex: delete everything between String and replace with other

So I've been scratching my head over this one, I have over a thousand files that have different values between the strings
<lodDistances content="float_array">
15.000000
25.000000
70.000000
140.000000
500.000000
500.000000
</lodDistances>
I need to replace those values with these
<lodDistances content="float_array">
120.000000
200.000000
300.000000
400.000000
500.000000
550.000000
</lodDistances>
I tried the following without any success
\ (?<=\<lodDistances content\=\"float_array\"\>)(.*)(?=\<\/lodDistances\>)
It seems to find it in regexr but not in a sublime text when I try to find it in files, I constantly get 0 results. Any idea why this is happening?
There are a couple of things that are wrong in your pattern:
\< matches a leading word boundary position (as \b(?=\w)) and \> matches the trailing word boundary position (same as \b(?<=\w)). You wanted to match literal < and > chars, thus, you must NOT escape them
There is no need matching a space before the first <
Since you text is multiline, use either (?s) inline modifier or (?s:...) modifier group to make . match across line breaks, or use a [\s\S] / [\w\W] / [\d\D] workaround
Use a lazy dot pattern to stop matching at first occurrence of the trailing delimiter.
You may use
(?s)(<lodDistances content="float_array">\s*).*?(?=\s*</lodDistances>)
And replace with ${1}<new values>. The curly braces are necessary as the new values are most likely numbers and without the braces, $1n (n stands for a digit here) will be parsed incorrectly (see this YT video for a demo of what it is fraught with).
See the demo below:
V
Regex details:
(?s) - now, . matches line break chars, too
(<lodDistances content="float_array">\s*) - Group 1 capturing <lodDistances content="float_array"> text and then zero or more whitespaces
.*? - any zero or more chars, but as few as possible
(?=\s*</lodDistances>) - a positive lookahead that matches the location that is immediately followed with zero or more whitespaces and </lodDistances> text.
Note that / is not a special regex metacharacter, and since regex delimiter notation is not supported in Sublime Text, you do not have to ever escape it here.

vim regex - match any number of whitespace at end of line except 2

I want to write a fixer for ale to remove all whitespace at the end of a line except for a double whitespace - in markdown this is used to create a linebreak.
I need to match "at end of row, 1 or more white space AND not 2 white space"
kind of like \s\+$\&\s\{^2}$ except that ^ is not negation inside curly brackets. Some googling reveals that negating a count of a meta character seems to be a particularly niche problem.
You can use
:%s/\v(\s)#<!\s(\s{2,})?$//g
Details
% - search on all lines
s - substitute
\v - very magic mode
(\s)#<! - location not immediately preced with a whitespace
\s - a whitespace
(\s{2,})? - an optional occurrence of two or more whitespaces
$ - end of line
g - all occurrences on the line.
This is how this regex works (translated into PCRE).
(This really should be a comment but formatting is important, here)
How do you want the following snippet to look after you have "fixed" it (spaces marked with _)?
First line, without trailing spaces
Second line, with one trailing space_
Third line, with two trailing spaces__
Fourth line, with more than two trailing spaces______

Regex to match lines starting with a \t or - but only capture - on

I cannot figure out this regex for the life of me
I have example input such as:
- Line 1
- Line 2
- Line 3
- Line 4
I am trying to match each line starting at the - and going through the end of the line. I am using the Workflow app on iOS which uses ICU regex parsing
The pattern I am using is
(?m)^\t*(-.*)
This pattern will match all the lines, but it captures the tabs. What am I doing wrong?
You ask why your regex captures the tabs. It is not so: your regex matches the tabs, and captures the - after those tabs with the rest of the line. The point is that you are using consuming pattern, the one that will return the matched/captured strings.
Non-consuming patterns - lookarounds - can be used to just check for some text presence/absence that do not actually put it into the text returned.
In ICU regex flavor, the lookbehinds are of constrained-width, that is, if you use a limiting quantifier, it is OK to use it. (The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)
Thus, this will work in case there can be 100 and fewer tabs at the line start:
(?m)(?<=^\t{0,100})-.*
Here,
(?m) - makes ^ match the start of a line
(?<=^\t{0,100}) - a positive lookbehind requiring 0 to 100 tabs after the beginning of the line to appear before a
-.* - hyphen and the rest of the line.
Try this:
(?m)^[ \t]*(-.*)
First, it appears that you have some spaces at the beginning of some of those lines, so \t will not match spaces. Replacing \t with [ \t] (or just \s) will fix this. Also, (-*) is going to match and capture any number of -, not including what's following. Put a . before your * to match any number of characters following the -, like this: (-.*)
If you don't require leading spaces, you can use
(?m)(-.*)
If you don't care about capturing the match, you don't need the parenthesis, giving you
(?m)-.*
As mentioned in the comments

How to combine lines in regular expressions?

So i am new to regular expressions and i am learning them using a simple text editor only. I have the following file
84544484N
32343545M
32334546E
34456434M
I am trying to combine each pair of lines into one tab delimited line
The result should be :
84544484N 32343545M
32334546E 34456434M
I wrote the following :
Search: (.*?)\n(.*?)
Replace: \1\t\2
this did not work can someone please explain why and give me the correct solution. Thank you!!
The (.*?)\n(.*?) pattern will never work well because the (.*?) at the end of the pattern will always return an empty string (since *? is a lazy matching quantifier and if it can return zero characters (and it can) it will. Use greedy matching and adjust the pattern like:
(.+)\r?\n *(.*)
or - since SublimeText uses Boost regex - you can match any newline sequence with \R:
(.+)\R *(.*)
and replace with \1\t\2. Note I replaced *? with + in the first capturing group because you need to match non-empty lines.
Regex breakdown:
(.+) - one or more characters other than a newline (as many as possible) up to
\R - a newline sequence (\r\n, \r or just \n)
* - a literal space, zero or more occurrences
(.*) - Group 2: zero or more characters other than a newline (as many as possible)
/