How to search for character+line feed with regex?
For example to turn this:
line one
line two
line (three)
line four
line five
into this:
line one
line two
line (three)=line four
line five
e.g. to search for ) and \n and replace \n only in lines containing ) with something else.
Search for \)\r?\n, replace with \)=.
You need to escape special regex characters (like brackets) when using them as literal portions of your pattern. Here is a good read on that: http://www.regular-expressions.info/characters.html
Related
I'm using the following Regex to search for a string in each line of a document. Every line is encapsulated with þ.
^þ.*(SEARCHSTRING).*þ$
But I came across a discrepancy in my count. Running the regex over the below two example lines of data will only get one hit when I'd like to capture both. This is because of the Line Separator Character. My regex believes this to be a new line when in fact it is simply a new line indicator. Is there any way around this?
þ
SEARCHSTRINGþ
þ#SEARCHSTRINGþ
In Notepad++, . matches any char that is not a Unicode line break char.
If you need to match a line that is a chunk of chars other than LF and CR, use
^þ[^\r\n]*(SEARCHSTRING)[^\r\n]*þ$
I am looking for a way to capture text and its paragraph title from a text document.
Text File:
paraTitle-1
--------
Lines and words
empty....
more lines
still part of paraTitle-1
paraTitle-2
--------
Lines and words
empty....
more lines
still part of paraTitle-2
I want to capture both the titles and the text below them.
array = [paraTitle-1: <text...below paraTitle-11>,
paraTitle-2: <text below paraTitle-2>]
I made a few attempts with pattern (?<=(.*))\n----*\n(?=(.*)) to no avail. Any guidance would be awesome.
The following regex will do:
(?!--------\R)(.*)\R--------\R((?:\R?(?!.*\R--------\R).*)+)
See regex101.
The title separator line (--------) can also be specified as -{8}, which is easier to adjust to variable length if needed, e.g. instead of exactly 8 dashes, it could be 6 or more: -{6,}
Explanation:
Capture a line of text (paragraph title):
(.*)\R
The . doesn't match line break characters
\R matches line breaks, including the Windows CRLF pair. If your regex engine doesn't support \R, use \r?\n as a simple alternative.
Make sure the captured text is not the title separator line:
(?!--------\R)
Skip the mandatory title separator line:
--------\R
Capture the paragraph text, as a repeating group of lines:
((?:xxx)+)
A line has an optional leading line break (first line doesn't have one):
\R?.*
But make sure the line is not the title of the next paragraph, i.e. it's not a line followed by the title separator line.
(?!.*\R--------\R)
I use the following expression in Notepad++ to delete duplicate lines:
^(.*)(\r?\n\1)+$
The problems are:
It is only for single word lines, if there is space in a line it won't work.
It is only for consecutive duplicate lines.
Is there a solution (preferably regular expression or macro) to delete duplicate lines in a text that contains space, and that are nonconsecutive?
Since no one is interested, I will post what I think you need.
delete duplicate lines in a text that contains space, and that are nonconsecutive
I assume you have text having, say duplicate lines My Line One and some text and My Line Two and more text:
My Line One and some text
My Line One and some text
My Line Two and more text
My Line One and some text
My Line Two and more text
These duplicate lines are not all consecutive (only the first two).
So, you can remove duplicate lines by running this search and replace:
^(.+)\r?\n(?=[\s\S]*?^\1$)
Replace with empty string.
Regex note: ^ and $ are treated as line start/end anchors by default, so we only match one line and capture it with ^(.+)$. Then we match the newline symbol (any OS style) with \r?\n. The look-ahead (?=...) checks if there is any text (with [\s\S]*?) after our line under inspection with the same contents (with the ^\1$ where \1 is a backreference to the line text captured).
I exported some data in CSV format that has some line breaks within text fields. I can't get Excel to handle this correctly. I'd like to just edit the file to remove these line breaks.
A valid record begins with a number. I've tried putting \n^([^\d]) in the "Find what" box of Notepad++'s find/replace, to match any line beginning with a non-number and the preceding newline. It matches correctly. In the "replace with" box, I put a space followed by \1 to replace the newline with a space and leave the matched character. However, the replace isn't working at all, nothing gets changed.
What am I doing wrong?
Sample text:
123,0,1,"This is a single line comment","bob","jim"
124,0,1,"This is a multi line comment w/ newline.
This is the second line of the comment","ted","alfred"
125,0,1,"This is another single line comment","jim","bob"
I want to replace the newline just before "This is the second..." with a space so that the file looks like this:
123,0,1,"This is a single line comment","bob","jim"
124,0,1,"This is a multi line comment w/ newline. This is the second line of the comment","ted","alfred"
125,0,1,"This is another single line comment","jim","bob"
I figured it out. I used (\n)^(?!\d+,\d+) to match any newline followed by the beginning of a line that's not followed by at least one number, a comma, and at least one number. In "replace with" I just put a space.
\n works in Notepad++ if you set the linefeeds to Unix (LF).
If it's in Windows (CR LF), then \r\n should work, or convert the returns to Unix (from the bottom bar).
I have a text file in which any line that starts with a single word and has no other characters after that should be enclosed inside caret characters.
For example, a line that contains only the following 6 characters (plus the newline):
France
should be replaced with a line that consists of only the following 8 characters (plus the newline):
^France^
Is there a Regular Expression I could use in the Find/Replace feature of my text editor (Jedit) to make these modifications to the file?
Regex to find lines with a single word:
^(\w+)$
replace with:
^$1^