RegEx for expression between 2 white space characters - regex

I have a file directory listing from an embedded target that looks like this:
Directory of D:\
D 0 19-Jan-15 16:12:16 FILE1
D 0 19-Jan-15 16:09:31 FILE2
D 0 21-Jan-15 14:10:33 FILE3
94951/218985 MB unused/total
And I am looking to only get the file names. The string in c# will look like this:
\r\nDirectory of D:\\\r\nD \t 0\t19-Jan-15 16:12:16\tFILE1\r\nD \t 0\t19-Jan-15 16:09:31\tFILE2\r\nD \t 0\t21-Jan-15 14:04:15\tFILE3\r\n94969/218985 MB unused/total\r\n
I noticed that the file names are always contained between a \t and a \r\n so i thought the easiest way to approach it would be with \t(.*?)\r\n But this will get the whole line. What is the best way to combine this with a regex to omit the first 2 \t in the line?

You can use this regex:
\t([^\t]*)\r\n
i.e. find all characters non tab characters between \t and \r\n thus giving you file names in each line.
RegEx Demo

Because file names cannot include tab characters, you can replace the . in \t(.*?)\r\n with [^\t]. Also, you can use lookarounds to not match the \t at the start and the \r at the end, eliminate the unnecessary capturing group, and change *? to +:
(?<=\t)[^\t]+(?=\r)
This regex will match a sequence of characters that does not include any tab characters, as long as the sequence is between a tab (\t) and a carriage return (\r).
You can find an online explanation and demonstration here. Note that to work on regex101, I had to change the \r to a \n; you will most likely still need the \r in your regex.

You can do it with a capturing group or like this:
(?<=\t)[^\t]+(?=[\r\n])

Related

Replace Certain Line Breaks with Equivalent of Pressing delete key on Keyboard NotePad++ Regex

Im using Notepad++ Find and replace and I have regex that looks for [^|]\r which will find the end of the line that starts with 8778.
8778|44523|0||TENNESSEE|ADMINISTRATION||ROLL 169 BATCH 8|1947-09-22|0|OnBase
See Also 15990TT|
I want to basically merge that line with the one below it, so it becomes this:
8778|44523|0||TENNESSEE|ADMINISTRATION||ROLL 169 BATCH 8|1947-09-22|0|OnBase See Also 15990TT|
Ive tried the replace being a blank space, but its grabbing the last character on that line (an e in this case) and replacing that with a space, so its making it
8778|44523|0||TENNESSEE|ADMINISTRATION||ROLL 169 BATCH 8|1947-09-22|0|OnBas
See Also 15990TT|
Is there any way to make it essentially merge the two lines?
\r only matches a carriage return symbol, to match a line break, you need \R that matches any line break sequence.
To keep a part of a pattern after replacement, capture that part with parentheses, and then use a backreference to that group.
So you may use
([^|\r])\R
Replace with $1. Or with $1 if you need to append a space.
Details
([^|\r]) - Capturing group 1 ($1 is the backreference that refers to the group value from the replacement pattern): any char other than | and CR
\R - any line break char sequence, LF, CR or CRLF.
See the regex demo and the Notepad++ demo with settings:
The issue is you're using [^|] to match anything that's not a pipe character before the carriage return, which, on replacement, will remove that character (hence why you're losing an e).
If it's imperative that you match only carriage returns that follow non-pipe characters, capture the preceding character ([^|])\r$ and then put it back in the replacement using $1.
You're also missing a \n in your regex, which is why the replacement isn't concatenating the two lines. So your search should be ([^|])\r\n$ and your replace should be $1.
Find
(\r\n)+
For "Replace" - don't put anything in (not even a space)

Notepad++ N text lines separated by blank lines?

I searched a bit, but didn't find a solution for this specific situation. I need to combine groups of non-blank lines into single lines, while preserving the blank lines. For example, the input:
Hi, My name is
Max
What are you
doing
Right now?
Hi
Hello
World
should be output as:
Hi, My name is Max
What are you doing Right now?
Hi
Hello World
Thanks in advance to all who respond.
You could try replacing
(?<![\n\r])[\n\r](?![\n\r])
With a space, as demonstrated here
Explanation -
(?<![\n\r]) is a negative look-behind which tells the regex that anything to be matched must not be preceded by a newline or by a carriage return (just take it as a newline)
[\n\r] is the newline or carriage return which is matched (and later replaced with a space)
(?![\n\r]) is a negative look-ahead that tells the regex that any newline to be matched should not be followed by another newline or carriage return.
In essence, this replaces the blank, new lines which are not followed by another newline - with a space.
You can try this too,
(?m)(?!^\s*$)(^[^\n]*)\n(?!^\s*$)
Demo,,, in which matches all lines which are not empty and not followed by empty line and remove all matched newline character (\n).
But, in notepad++, you must consider carrige return(\r) with newline(\n). Thus,
(?m)(?!^\s*$)(^[^\n]*)\r\n(?!^\s*$)

Multi-line regular expressions in Visual Studio Code

I cannot figure a way to make regular expression match stop not on end of line, but on end of file in VS Code? Is it a tool limitation or there is some kind of pattern that I am not aware of?
It seems the CR is not matched with [\s\S]. Add \r to this character class:
[\s\S\r]+
will match any 1+ chars.
Other alternatives that proved working are [^\r]+ and [\w\W]+.
If you want to make any character class match line breaks, be it a positive or negative character class, you need to add \r in it.
Examples:
Any text between the two closest a and b chars: a[^ab\r]*b
Any text between START and the closest STOP words:
START[\s\S\r]*?STOP
START[^\r]*?STOP
START[\w\W]*?STOP
Any text between the closest START and STOP words:
START(?:(?!START)[\s\S\r])*?STOP
See a demo screenshot below:
To matcha multi-line text block starting from aaa and ending with the first bbb (lazy qualifier)
aaa(.|\n)+?bbb
To find a multi-line text block starting from aaa and ending with the last bbb. (greedy qualifier)
aaa(.|\n)+bbb
If you want to exclude certain characters from the "in between" text, you can do that too. This only finds blocks where the character "c" doesn't occur between "aaa" and "bbb":
aaa([^c]|\n)+?bbb

Regex replace one value between comma separated values

I'm having a bunch of comma separated CSV files.
I would like to replace exact one value which is between the third and fourth comma. I would love to do this with Notepad++ 'Find in Files' and Replace functionality which could use RegEx.
Each line in the files look like this:
03/11/2016,07:44:09,327575757,1,5434543,...
The value I would like to replace in each line is always the number 1 to another one.
It can't be a simple regex for e.g. ,1, as this could be somewhere else in the line, so it must be the one after the third and before the fourth comma...
Could anyone help me with the RegEx?
Thanks in advance!
Two more rows as example:
01/25/2016,15:22:55,276575950,1,103116561,10.111.0.111,ngd.itemversions,0.401,0.058,W10,0.052,143783065,,...
01/25/2016,15:23:07,276581704,1,126731239,10.111.0.111,ll.browse,7.133,1.589,W272,3.191,113273232,,...
You can use
^(?:[^,\n]*,){2}[^,\n]*\K,1,
Replace with any value you need.
The pattern explanation:
^ - start of a line
(?:[^,\n]*,){2} - 2 sequences of
[^,\n]* - zero or more characters other than , and \n (matched with the negated character class [^,\n]) followed with
, - a literal comma
[^,\n]* - zero or more characters other than , and \n
\K - an operator that forces the regex engine to discard the whole text matched so far with the regex pattern
,1, - what we get in the match.
Note that \n inside the negated character classes will prevent overflowing to the next lines in the document.
You can replace value between third and fourth comma using following regex.
Regex: ([^,]+,[^,]+,[^,]+),([^,]+)
Replacement to do: Replace with \1,value. I used XX for demo.
Regex101 Demo
Notepad++ Demo

UltraEdit: Deleting all lines under a certain length with \n and or \r

I have a massive text file and want to remove all lines that are less than 6 characters long.
I tried the following search string (Regular expressions - Perl)
^.{0,5}\n\r$ -- string not found
^.{0,5}\n\r -- string not found
^.{0,5}$ -- leaves blank lines
^.{0,5}$\n\r -- string not found
^.{0,5}$\r -- leaves blank lines
^.{0,5}$\r\n -- **worked**
My question is why should the last one work and the 4th one not work? Why should the 5th one leave blank lines.
Thanks.
Because ^.{0,5}$\n\r is not the same as ^.{0,5}$\r\n.
\n\r is a linefeed followed by carriage return.
\r\n is a carriage return followed by linefeed - a popular line ending combination of characters. Specifically \r\n is used by the MS-DOS and Windows family of operating systems, among others.
In multiline mode, ^ is a metacharacter that matches Begin of String and
can also match after a newline.
Likewise, $ matches End of String and these too:
\r\n
^ ^
here ----+-or-+
or
\n
^ ^
here ----+-or-+
$ will try to match before newline if it can (depends on other parts of the regex).
You can use that to advantage like this regex
^.{0,5}$(\r?\n)* which will match end of string AND optional successive linebreaks.