I have multiple html files and some of them have some blank lines, I need a regex to remove all blank lines and leave only one blank line.. So it removes anything more than one blank line, and leave those that are just one or none (none like in having text in them).
I need it also to consider lines that are not totally blank, as some lines could have spaces or tabs (characters that doesn't show), so I need it to consider these lines with the regex to be removed as long as it is more than one line..
Search for
^([ \t]*)\r?\n\s+$
and replace with
\1
Explanation:
^ # Start of line
([ \t]*) # Match any number of spaces or tabs, capture them in group 1
\r?\n # Match one linebreak
\s+ # Match any following whitespace
$ # until the last possible end of line.
\1 will then contain the first line of whitespace characters, so when you use that as the replacement string, only the first line of whitespace will be preserved (excluding the linebreak at the end).
This worked for me on notepad++ v6.5.1. UNICODE windows 7
Search for: ^[ \t]*\r\n
Replace with: nothing, leave blank
Search mode: Regular expression.
search for (\r?\n(\t| )*){3,}, replace by \r\n\r\n, check "Regular expression" and ". matches newline".
Tested with Notepad++ 6.2
This will replace the successive blank lines containing white spaces (or not) and replace it with one new line.
Search for
(\s*\r?\n){3,}
replace with
\r\n
You can find it yourself what you need to replace with
\n\n OR \n\r\n or \r\n\r\n etc ... now you can even modify your regular expression ^([ \t]*)\r?\n\s+$ according to your need.
I tested any of the above suggestions, always was either too less or to much deleted. So that either you got no blank line where at least one was beforehand or deleted not enough (whitespaces was left, etc.). Unfortunately I cannot write comments yet. Tested both with 6.1.5 and updated to 6.2 and tested again. depending on how mayn files there are, I would suggest use
Edit->Blank Operations->Trim trailing whitespace
Followed by Ctrl+A and
TextFX -> TextFX Edit -> Delete surplus blank lines
A Macro I tried to record didn't work. Theres even a macro for just remove trailing whitespace (Alt+Shift+S, see Settings | Shortcut Mapper... | Macros). There's a
Edit->Blank Operations->Remove unnecessary EOL and whitespace
but that deletes every EOL and puts everything in a single line.
In notepad++ v8.4.7 there is the option:
Edit > Line Operations > Remove Empty Lines (Containing Blank characters)
or
Edit > Line Operations > Remove Empty Lines
So there is no need to use a regular expressions for this. But this only works for one file at a time.
I looked for ^\r\n and click "Replace All" with nothing (empty) in "Replace with" textbox.
Related
I try to verify a CSV file where we had problems with line breaks.
I want to find all lines not starting with a ".
I am trying with /!^"/gim but the ! negation is not working.
How can I negate /^"/gim properly?
In regex, the ! does not mean negation; instead, you want to negate a character set with [^"]. The brackets, [], denote a character set and if it starts with a ^ that means "not this character set".
So, if you wanted to match things that are not double-quotes, you would use [^"]; if you don't want to match any quotes, you could use [^"'], etc.
With Notepad++, you should be able to search with the following to find lines that don't start with the " character:
^[^"]
If you want to highlight the full line, use:
^[^"].*
In Notepad++ you can use the very usefull negative lookahead
In your case you can try the following:
^(?!")
If you want to match wholes lines add .+ or .{1,7} or anything e.g.:
^(?!").*
will also match empty lines.
Explanation part
^ line start
(?!regexp) negative lookahead part: this means that if the regexp match, the result will not be shown
Step 1 - Match lines. Find dialog > Mark tab, you can bookmark lines that match.
Step 2 - Remove lines bookmarked OR Remove lines not bookmarked. Search > Bookmark > Remove Unmarked Lines or Remove Bookmarked lines
I recently received a tab separated file that has 60 fields. Each field can have any character in it. The export I received also has linefeeds and carriage returns in some of the fields. This is causing the tab separated file to not import correctly. Is there a way to remove linebreaks and carriage returns if the line does not have 59 tabs on it? There may or may not be data between each tab.
Sample File
Line 3,4,5 is the issue I'm trying to fix.
Warning: I'm assuming that there are no tabs within a column's data. If there is, then you need something far more capable that what I have here.
The following works with the sample input provided:
First, replace all of the line breaks with a character that doesn't occur anywhere in your file. You can even use characters that you can't type with your keyboard.
Find what: (\r\n?|\n)
Replace with: \xB6
Then, match your 60-field rows and give them line-breaks (I'm going with Windows-style):
Find what: ^(([^\t]*\t){59}[^\t\xB6]*)\xB6
Replace with: $1\r\n
I'm making one huge assumption here: that column 60 never contains a line break. If this is false, then you're going to have some of column 60's data ending up in column 1 of the next record.
Now, if you don't like that paragraph symbol showing up in your data, you can either purge it or replace it with whatever you like:
Find what: \xB6
Replace with:
Explanation of matching patterns:
(\r\n?|\n) matches any of the three kinds of line breaks, which are single \r, a single \n, or the Windows-style \r\n. Wikipedia has a whole article about this.
See http://regex101.com/r/iB6fK9 to explore the ^(([^\t]*\t){59}[^\t\xB6]*)\xB6 pattern.
I'm matching the beginning of the line with ^ at the start.
I have a group of zero or more characters that are not a tab, followed by a tab, that I match exactly 59 times with ([^\t]*\t){59}. That gets us the first 59 tab-separated columns. Only column 59 is captured by this group.
For column 60, I match zero or more characters that are neither a tab nor our special character with [^\t\xB6]*.
I capture the 60 columns with parentheses, but I leave our special character outside of the captured group so that it gets replaced with the \r\n that we insert with the $1\r\n replacement.
What I understand from your question is that you want to remove the windows \r\n from your file, to do this you can use replace dialog ctrl+h.
On the Search Mode select Extended (\n, \r,..., then on the "Find What" look for \r\n and in "Replace" leave it empty (or replace it with what you want).
I'd do:
Find what: ^((?:[^\t]*\t[^\t]*){1,58})[\r\n]+
Replace with: $1
This will replace line break with nothing if there are less than 59 occurrence of \t character in a line.
My text file has more than ten thousand lines. Each line starts with a word or a phrase followed by a tab and the content, such as:
[line 1] This is the first line. [tab] Here is the content.[end of line]
I want to find character s in all the words between the beginning of each line and a tab (\t), and replace it by a pipe (|) so that the text will look like:
[line 1] Thi| i| the fir|t line. [tab] Here is the content.[end of line]
Here is what I have done:
Search: ^(.*)s+(.*)?\t
Replace: \1|\2\t
It works but the problem is it does not replace s in one replace. I have to click on Replace All for several times before s in all the words is replaced.
So it comes to my question: how can I replace all the occurrences of character s in just one search and replace?
Note that I'm working on TextWrangler but I'm OK with other editors.
Thanks a lot.
You are searching for lines containing an s and do the match. Instead you should be searching for the s directly, and use lookahead to ensure that it is followed by a tab.
Search: s(?=.*\t)
Replace: |
Note that this catches all s's up to the last tab. - This will be a problem if your main content can contain tabs.
To stop catching s's after the first tab you have to cheat. Since variable length negative lookbehind doesn't work in AFAIK any regexp dialect.
However if we can ensure that the last s catches the whole line...
Search: (?:(^[^s\t]*\t.*$)|s([^s\t]*(?:(?=s.*\t)|\t.*$)))
Replace: |\1\2
This will catch the whole line in the case where no s occurs before the first tab. And put a | in front of that line. I see no way around this.
I use Notepad++ and I need to delete all lines starting with, say "abc".
Attention, I don't need to replace the line starting with "abc" with a empty line, but I need to completely delete these lines.
How do I proceed (using regex, I suppose)?
Try replace
^abc.*(\r?\n)?
with
nothing
The ^ indicates the start of a line.
The . means wild-card.
The .* means zero or more wild-cards.
x? means x is optional.
The \r?\n covers both \r\n (generally Windows) and \n (generally Unix), but must be optional to cover the last line.
Search for this regular expression
^abc.*\r\n
Replace with nothing.
Searching a little bit more on regex in Notepad++ I discovered that the new line character is not \n as I expected (Windows), but the \n\r.
So, my regex replace expression should be:
Find: abc.*\r\n
Replace with: (nothing, empty field)
Try the regex \nabc.* in "Find and Replace" --> "Replace"
Leave "Replace With" field empty.
EDIT : This won't work with first like (because '\n' means "new line")
Press Ctrl+H to bring up the Replace window. Put
^abc.*(\r?\n)?
in the Find what and leave Replace with empty. Select Reqular expression and hit Replace All.
This reqular expression handles all the edge cases:
When the first line of the file starts with abc
When the last line of the file starts with abc and there is no new line at the end of the file.
The following pattern matches an entire line, how do I search and remove these lines. If I leave a space in the replace string, it leaves a blank in that particular line, when I do via eclipse.
^[\t ]*<param name="logic" .*$
I don't know anything about eclipse but you may need to include the \n newline match in order to remove the line completely. Also - is it possible to replace with an empty string as opposed to a space?
^[\t ]*<param name="logic" .*\n$
To delete the line, you must also remove the line-break, not only the contents of the line.
So your Expression should end with \r\n instead of $.
Can't try it at the moment, so you will have to experiment yourself for the correct syntax.