I am trying to delete only the first 2 lines of a text file.
I tried using \A.*, but this gets the first line and deletes the rest.
Is there a way to do the inverse?
It is maybe not the most convenient way, but it is possible with Regex:
^.*\n.*\n([\s\S]*)$
With default settings (neither single-line nor multi-line modifiers) the '.' captures everything, except newline. Therfore, .*\n captures one line, including the new line character. Repeat it twice, and we are at the beginning of the third line. Now capture all characters, including the new line character ([\s\S] is a nice workaround for this behavior) until the end of the file $.
Then substitute by the first capturing group
\1
and you have everything but the first 2 lines.
The details depend on your regex engine, how you give the substitute string. And depending on the platform or the used new line character of the file, you might need to exchange the \n with \r\n or \r or the one that matches it all (\r\n?|\n).
Here is a working Demo.
I need a regular expression to find a specific line in a file that occurs somewhere after another line. for example, I may want to find the string "friend", but only when it occurs on a line after a line containing the string "hello". so for example:
hello there
how are you
my friend
should pass, but
how are you
my friend
hello
or
hello friend
how are you
should not pass.
The only thing I've thought of is something like hello[.\s]*\n[.\s]*friend, which does not work.
EDIT: I'm using a customized program that has a lot of limitations. I don't have access to switches or custom modes. I need a single regular expression that works for the standard python regex mode.
hello[.\s]*\n[.\s]*friend
First note that a dot inside a character class matches for a literal dot, not as a "match all" character, so you really want alternation, not character class for this. But also not that a "match all" dot will also match spaces, so you don't even need alternation.
So overall, you really just need this:
hello.*?friend
Now comes the problem with matching across new-line chars. By default the "match all" dot does not match new-line chars. You can flag/modifier it to match it, but how you do that depends on what language you are using. In php or perl, you can use the s modifier, e.g.
php:
preg_match('~hello.*?friend~s',$content);
edit:
If you are trying to use regex in something like an editor (or otherwise can't add flags/modifiers), most editors have an option to flag it as such. If not, you can try alternation with newline chars like so:
hello(.|\r?\n)*friend
You need to include two newline characters.
hello(?:.*\n)+.*friend
This expects atleast one newline character present inbetween.
I'm by no means a regex expert (particularly not in Python), but my RegexBuddy app thinks this will work:
(?s)hello.*\n+.*friend
The (?s) is apparently an inline way of specifying the "Dot matches newline" option, which seems to be necessary for the \n to work.
I thought that the dot . in regex will match any character, except the end-of-line character.
However, in R, I found that the dot can match anything, including the newline characters \n, \r or \r\n:
grep(c("\r","\n","\r\n"),pattern=".")
[1] 1 2 3
Can someone explain the contradiction?
The page here http://www.regular-expressions.info/dot.html explains how the rule that dot does not match the end-of-line character exists mostly for historic reasons:
The first tools that used regular expressions were line-based. They would read a file line by line, and apply the regular expression separately to each line. The effect is that with these tools, the string could never contain line breaks, so the dot could never match them.
However,
Modern tools and languages can apply regular expressions to very large strings or even entire files. Except for JavaScript and VBScript, all regex flavors discussed here have an option to make the dot match all characters, including line breaks.
Apparently, R is one such language where by default, dot will match every character. (I point you to Joshua's comment above, recommending you look at ?regex and the POSIX 1003.2 standard.)
The page I linked above also mentions Perl and suggests how under its default mode, dot will not match line breaks.
Notice how R's grep function has a perl option. If you turn it on, you do get a different output:
> grep(".", c("\r","\n","\r\n"), perl = TRUE)
[1] 1 3
This is telling me that \n is the line break character, but not \r. Something that comparing cat("\r") and cat("\n") can confirm.
(I'm on a Mac OS if it makes any difference.)
I am sure this has been asked before, but I cannot find it.
Basically, assuming you are parsing a text file of unknown origin and want to replace line breaks with some other delimiter, is this the best regex, or is there another?
(\r\n)|(\n)|(\r)
Fletcher - this did get asked once before.
Here you go: Regular Expression to match cross platform newline characters
Spoiler Alert!
The regex I use when I want to be
precise is "\r\n?|\n".
Do check if your regex engine supports \R as a shorthand character class and you will not need to be concerned with the various Unicode newline / linefeed combos. If implemented correctly, you can then match all the various ascii or Unicode line endings transparently using \R.
In Unicode you need to detect NEL (OS/390 line ending, \x85) LS (Line Separator, \x2028) and PS (Paragraph Separator, \x2029) if you want to be completely cross platform these days.
It is debatable whether LS, NEL, and PS should be treated as line breaks, line endings, or white space. The XML 1.0 standard, for example, does not recognize NEL as a line break character. ECMAScript treats LS and PS as line breaks but NEL as whitespace. Perl unicode regexs will treat VT, FF, CR, CRLF, NEL, LS and PS as line breaks for the purpose of ^ and $ regex meta characters.
The Unicode Implementation Guide (section 5.8 and table 5.3) is probably the best bet of what the definitive treatment of what a "newline" is.
If you are only concerned with ascii with the DOS/Windows/Unix/Mac classic variants, the regex equivalent to \R is (?>\r\n|[\r\n])
In Unicode, the equivalent to \R is (?>\r\n|\n|\x0b|\f|\r|\x85|\x2028|\x2029) The \x0b in there is a vertical tab; once again, this may or may not fit you definition of what a line break is, but that does match the recommendation of the Unicode Implantation. (FF, or \x0C is not included in the regex since a Form Feed is a new page, not a new line in the definition.)
The regex to find any Unicode line terminator should be
(?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}]) rather
than as drewk wrote it, at least in Perl. Taken directly from the perl
5.10.0 documentation (it was removed in later versions).
Note the braces after \x: U+2029 is \x{2029}
but \x2029 is an ASCII whitespace (U+0020) + a digit 2 + a
digit 9. \n outside a character class ,is also not guaranteed to match \x{0a}.
If your platform does not support the \R class as suggested by #dawg above, you may still be able to make a pretty elegant and robust solution if your platform supports negative lookaround or character class subtraction (e.g. in Java class subtraction is through the syntax [x&&[^y]]).
In most regular expresssion grammars, the dot character is defined to mean "any character except the newline character" (see for example, for JavaScript, here). If you match something with the following characteristics:
not (any character except the newline character) → the newline character; and
is whitespace
Since I'm currently working in JavaScript, which AFAIK doesn't have the \R shorthand or character class subtraction, I can still use negative lookahead to get what I want. The following regular expression matches all newlines:
/((?!.)\s)+/g
And the following JavaScript code, at least when run in Chrome 42.0.2311.90m on Windows 7, wipes out all the kinds of newlines that JavaScript (i.e. the "ECMAScript" mentioned in #dawg's third paragraph) recognizes:
var input = "hello\r\n\f\v\u2028\u2029 world";
var output = input.replace(/((?!.)\s)+/g, "");
document.write(output); // hello world
Just replace /[\r\n]+/g with an empty string "".
It'll replace all \r and \n no matter what order they appear in the string.
How can I find/replace all CR/LF characters in Notepad++?
I am looking for something equivalent to the ^p special character in Microsoft Word.
[\r\n]+ should work too
Update March, 26th 2012, release date of Notepad++ 6.0:
OMG, it actually does work now!!!
Original answer 2008 (Notepad++ 4.x) - 2009-2010-2011 (Notepad++ 5.x)
Actually no, it does not seem to work with regexp...
But if you have Notepad++ 5.x, you can use the 'extended' search mode and look for \r\n. That does find all your CRLF.
(I realize this is the same answer than the others, but again, 'extended mode' is only available with Notepad++ 4.9, 5.x and more)
Since April 2009, you have a wiki article on the Notepad++ site on this topic:
"How To Replace Line Ends, thus changing the line layout".
(mentioned by georgiecasey in his/her answer below)
Some relevant extracts includes the following search processes:
Simple search (Ctrl+F), Search Mode = Normal
You can select an EOL in the editing window.
Just move the cursor to the end of the line, and type Shift+Right Arrow.
or, to select EOL with the mouse, start just at the line end and drag to the start of the next line; dragging to the right of the EOL won't work.
You can manually copy the EOL and paste it into the field for Unix files (LF-only).
Simple search (Ctrl+F), Search Mode = Extended
The "Extended" option shows \n and \r as characters that could be matched.
As with the Normal search mode, Notepad++ is looking for the exact character.
Searching for \r in a UNIX-format file will not find anything, but searching for \n will. Similarly, a Macintosh-format file will contain \r but not \n.
Simple search (Ctrl+F), Search Mode = Regular expression
Regular expressions use the characters ^ and $ to anchor the match string to the beginning or end of the line. For instance, searching for return;$ will find occurrences of "return;" that occur with no subsequent text on that same line. The anchor characters work identically in all file formats.
The '.' dot metacharacter does not match line endings.
[Tested in Notepad++ 5.8.5]: a regular expression search with an explicit \r or \n does not work (contrary to the Scintilla documentation).
Neither does a search on an explicit (pasted) LF, or on the (invisible) EOL characters placed in the field when an EOL is selected.
Advanced search (Ctrl+R) without regexp
Ctrl+M will insert something that matches newlines. They will be replaced by the replace string.
I recommend this method as the most reliable, unless you really need to use regex.
As an example, to remove every second newline in a double spaced file, enter Ctrl+M twice in the search string box, and once in the replace string box.
Advanced search (Ctrl+R) with Regexp.
Neither Ctrl+M, $ nor \r\n are matched.
The same wiki also mentions the Hex editor alternative:
Type the new string at the beginning of the document.
Then select to view the document in Hex mode.
Select one of the new lines and hit Ctrl+H.
While you have the Replace dialog box up, select on the background the new replacement string and Ctrl+C copy it to paste it in the Replace with text input.
Then Replace or Replace All as you wish.
Note: the character selected for new line usually appears as 0a.
It may have a different value if the file is in Windows Format. In that case you can always go to Edit -> EOL Conversion -> Convert to Unix Format, and after the replacement switch it back and Edit -> EOL Conversion -> Convert to Windows Format.
It appears that this is a FAQ, and the resolution offered is:
Simple search (Ctrl+H) without regexp
You can turn on View/Show End of Line
or view/Show All, and select the now
visible newline characters. Then when
you start the command some characters
matching the newline character will be
pasted into the search field. Matches
will be replaced by the replace
string, unlike in regex mode.
Note 1: If you select them with the
mouse, start just before them and drag
to the start of the next line.
Dragging to the end of the line won't
work.
Note 2: You can't copy and paste
them into the field yourself.
Advanced search (Ctrl+R) without regexp
Ctrl+M will insert something that matches newlines. They will be replaced by the replace string.
On the Replace dialog, you want to set the search mode to "Extended". Normal or Regular Expression modes wont work.
Then just find "\r\n" (or just \n for unix files or just \r for mac format files), and set the replace to whatever you want.
I've not had much luck with \r\n regular expressions from the find/replace window.
However, this works in Notepad++ v4.1.2:
Use the "View | Show end of line" menu to enable display of end of line characters.
(Carriage return line feeds should show up as a single shaded CRLF 'character'.)
Select one of the CRLF 'characters' (put the cursor just in front of one, hold down the SHIFT key, and then pressing the RIGHT CURSOR key once).
Copy the CRLF character to the clipboard.
Make sure that you don't have the find or find/replace dialog open.
Open the find/replace dialog.
The 'Find what' field shows the contents of the clipboard: in this case the CRLF character - which shows up as 2 'box characters' (presumably it's an unprintable character?)
Ensure that the 'Regular expression' option is OFF.
Now you should be able to count, find, or replace as desired.
Image with CRLF
Image without CRLF
The way I found it to work is by using the Replace function, and using "\n", with the "Extended" mode. I'm using version 5.8.5.
In 2013, v6.13 or later, use:
Menu Edit → EOL Conversion → Windows Format.
To find any kind of a line break sequence use the following regex construct:
\R
To find and select consecutive line break sequences, add + after \R: \R+.
Make sure you turn on Regular expression mode:
It matches:
U+000DU+000A -CRLF` sequence
U+000A - LINE FEED, LF
U+000B - LINE TABULATION, VT
U+000C - FORM FEED, FF
U+000D - CARRIAGE RETURN, CR
U+0085 - NEXT LINE, NEL
U+2028 - LINE SEPARATOR
U+2029 - PARAGRAPH SEPARATOR
Assuming it has a "regular expressions" search, look for \r\n. I prefer \r?\n, because some files don't use carriage returns.
EDIT: Thanks for the feedback, whoever voted this down. I have learned that... well, nothing, because you provided no feedback. Why is this wrong?
Use the advanced search option (Ctrl + R) and use the keyboard shortcut for CRLF (Ctrl + M) to insert a carriage return.
If you need to do a complex regexp replacement including \r\n, you can workaround the limitation by a three-step approach:
Replace all \r\n by a tag, let's say #GO# → Check 'Extended', replace \r\n by #GO#
Perform your regexp, example removing multiline ICON="*" from an html bookmarks → Check regexp, replace ICON=.[^"]+.> by >
Put back \r\n → Check 'Extended', replace #GO# by \r\n
Go to View--> Show symbol-->Show all character
// Its worked for me
Make this setting. Menu-> View-> Show Symbol-> uncheck Show End of the Line
I opened the file in Notepad++ and did a replacement in a few steps:
Replace all "\r\n" with " \r\n"
Replace all "; \r\n" with "\r\n"
Replace all " \r\n" with " "
This puts all the breaks where they should be and removes those that are breaking up the file.
It worked for me.
I was totally unable to do this in NP v6.9.
I found it easy enough on Msoft Word (2K).
Open the doc, go to edit->replace.
Then in the bottom of the search box, click "more" then find the "Special" button and they have several things for you. For Dos style, I used the "paragraph" one. This is a cr lf pair in windows land.
Just do a \r with a find and replace with a blank in the replace field so everything goes up to one line. Then do a find and replace (in my case by semi colon) and replace with ;\n
:)
-T&C
To change a document of separate lines into a single line, with each line forming one entry in a comma separated list:
ctrl+f to open the search/replacer.
Click the "Replace" tab.
Fill the "Find what" entry with "\r\n".
Fill the "Replace with" entry with "," or ", " (depending on preference).
Un-check the "Match whole word" checkbox (the important bit that eludes logic).
Check the "Extended" radio button.
Click the "Replace all" button.
These steps turn e.g.
foo bar
bar baz
baz foo
into:
foo bar,bar baz,baz foo
or: (depending on preference)
foo bar, bar baz, baz foo
Maybe you can use TextFX plugins
In TextFX, go to textfx edit → delete blank lines