I try to verify a CSV file where we had problems with line breaks.
I want to find all lines not starting with a ".
I am trying with /!^"/gim but the ! negation is not working.
How can I negate /^"/gim properly?
In regex, the ! does not mean negation; instead, you want to negate a character set with [^"]. The brackets, [], denote a character set and if it starts with a ^ that means "not this character set".
So, if you wanted to match things that are not double-quotes, you would use [^"]; if you don't want to match any quotes, you could use [^"'], etc.
With Notepad++, you should be able to search with the following to find lines that don't start with the " character:
^[^"]
If you want to highlight the full line, use:
^[^"].*
In Notepad++ you can use the very usefull negative lookahead
In your case you can try the following:
^(?!")
If you want to match wholes lines add .+ or .{1,7} or anything e.g.:
^(?!").*
will also match empty lines.
Explanation part
^ line start
(?!regexp) negative lookahead part: this means that if the regexp match, the result will not be shown
Step 1 - Match lines. Find dialog > Mark tab, you can bookmark lines that match.
Step 2 - Remove lines bookmarked OR Remove lines not bookmarked. Search > Bookmark > Remove Unmarked Lines or Remove Bookmarked lines
Related
So, I know from this question how to find all the lines that don't contain a specific string. But it leaves a lot of empty newlines when I use it, for example, in a text editor substitution (Notepad++, Sublime, etc).
Is there a way to also remove the empty lines left behind by the substitution in the same regex or, as it's mentioned on the accepted answer, "this is not something regex ... should do"?
Example, based on the example from that question:
Input:
aahoho
bbhihi
cchaha
sshede
ddhudu
wwhada
hede
eehidi
Desired output:
sshede
hede
[edit-1]
Let's try this again: what I want is a way to use regex replace to remove everything that does not contain hede on the text editor. If I try .*hede.* it will find all hede:
But it will not remove. On a short file, this is easy to do manually, but the idea here is to replace on a larger file, with over 1000+ lines, but that would contain anywhere between 20-50 lines with the desired string.
If I use ^((?!hede).)*$ and replace it with nothing, I end up with empty lines:
I thought it was a simple question, for people with a better understanding of regex than me: can a single regex replace also remove those empty lines left behind?
An alternative try
Find what: ^(?!.*hede).*\s?
Replace with: nothing
Explanation:
^ # start of a line
(?!) # a Negative Lookahead
. # matches any character (except for line terminators)
* # matches the previous token between zero and unlimited times,
hede # matches the characters hede literally
\s # matches any whitespace character (equivalent to [\r\n\t\f\v ])
? # matches the previous token between zero and one times,
Using Notepad++.
Ctrl+H
Find what: ^((?!hede).)*(?:\R|\z)
Replace with: LEAVE EMPTY
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
((?!hede).)* # tempered greedy token, make sure we haven't hede in the line
(?:\R|\z) # non capture group, any kind of line break OR end of file
Screenshot (before):
Screenshot (after):
Have you tried:
.*hede.*
I don't know why you are doing an inverse search for this.
You can use sed like:
sed -e '/.*hede.*/!d' input.txt
I try to verify a CSV file where we had problems with line breaks.
I want to find all lines not starting with a ".
I am trying with /!^"/gim but the ! negation is not working.
How can I negate /^"/gim properly?
In regex, the ! does not mean negation; instead, you want to negate a character set with [^"]. The brackets, [], denote a character set and if it starts with a ^ that means "not this character set".
So, if you wanted to match things that are not double-quotes, you would use [^"]; if you don't want to match any quotes, you could use [^"'], etc.
With Notepad++, you should be able to search with the following to find lines that don't start with the " character:
^[^"]
If you want to highlight the full line, use:
^[^"].*
In Notepad++ you can use the very usefull negative lookahead
In your case you can try the following:
^(?!")
If you want to match wholes lines add .+ or .{1,7} or anything e.g.:
^(?!").*
will also match empty lines.
Explanation part
^ line start
(?!regexp) negative lookahead part: this means that if the regexp match, the result will not be shown
Step 1 - Match lines. Find dialog > Mark tab, you can bookmark lines that match.
Step 2 - Remove lines bookmarked OR Remove lines not bookmarked. Search > Bookmark > Remove Unmarked Lines or Remove Bookmarked lines
My text file has more than ten thousand lines. Each line starts with a word or a phrase followed by a tab and the content, such as:
[line 1] This is the first line. [tab] Here is the content.[end of line]
I want to find character s in all the words between the beginning of each line and a tab (\t), and replace it by a pipe (|) so that the text will look like:
[line 1] Thi| i| the fir|t line. [tab] Here is the content.[end of line]
Here is what I have done:
Search: ^(.*)s+(.*)?\t
Replace: \1|\2\t
It works but the problem is it does not replace s in one replace. I have to click on Replace All for several times before s in all the words is replaced.
So it comes to my question: how can I replace all the occurrences of character s in just one search and replace?
Note that I'm working on TextWrangler but I'm OK with other editors.
Thanks a lot.
You are searching for lines containing an s and do the match. Instead you should be searching for the s directly, and use lookahead to ensure that it is followed by a tab.
Search: s(?=.*\t)
Replace: |
Note that this catches all s's up to the last tab. - This will be a problem if your main content can contain tabs.
To stop catching s's after the first tab you have to cheat. Since variable length negative lookbehind doesn't work in AFAIK any regexp dialect.
However if we can ensure that the last s catches the whole line...
Search: (?:(^[^s\t]*\t.*$)|s([^s\t]*(?:(?=s.*\t)|\t.*$)))
Replace: |\1\2
This will catch the whole line in the case where no s occurs before the first tab. And put a | in front of that line. I see no way around this.
I have multiple html files and some of them have some blank lines, I need a regex to remove all blank lines and leave only one blank line.. So it removes anything more than one blank line, and leave those that are just one or none (none like in having text in them).
I need it also to consider lines that are not totally blank, as some lines could have spaces or tabs (characters that doesn't show), so I need it to consider these lines with the regex to be removed as long as it is more than one line..
Search for
^([ \t]*)\r?\n\s+$
and replace with
\1
Explanation:
^ # Start of line
([ \t]*) # Match any number of spaces or tabs, capture them in group 1
\r?\n # Match one linebreak
\s+ # Match any following whitespace
$ # until the last possible end of line.
\1 will then contain the first line of whitespace characters, so when you use that as the replacement string, only the first line of whitespace will be preserved (excluding the linebreak at the end).
This worked for me on notepad++ v6.5.1. UNICODE windows 7
Search for: ^[ \t]*\r\n
Replace with: nothing, leave blank
Search mode: Regular expression.
search for (\r?\n(\t| )*){3,}, replace by \r\n\r\n, check "Regular expression" and ". matches newline".
Tested with Notepad++ 6.2
This will replace the successive blank lines containing white spaces (or not) and replace it with one new line.
Search for
(\s*\r?\n){3,}
replace with
\r\n
You can find it yourself what you need to replace with
\n\n OR \n\r\n or \r\n\r\n etc ... now you can even modify your regular expression ^([ \t]*)\r?\n\s+$ according to your need.
I tested any of the above suggestions, always was either too less or to much deleted. So that either you got no blank line where at least one was beforehand or deleted not enough (whitespaces was left, etc.). Unfortunately I cannot write comments yet. Tested both with 6.1.5 and updated to 6.2 and tested again. depending on how mayn files there are, I would suggest use
Edit->Blank Operations->Trim trailing whitespace
Followed by Ctrl+A and
TextFX -> TextFX Edit -> Delete surplus blank lines
A Macro I tried to record didn't work. Theres even a macro for just remove trailing whitespace (Alt+Shift+S, see Settings | Shortcut Mapper... | Macros). There's a
Edit->Blank Operations->Remove unnecessary EOL and whitespace
but that deletes every EOL and puts everything in a single line.
In notepad++ v8.4.7 there is the option:
Edit > Line Operations > Remove Empty Lines (Containing Blank characters)
or
Edit > Line Operations > Remove Empty Lines
So there is no need to use a regular expressions for this. But this only works for one file at a time.
I looked for ^\r\n and click "Replace All" with nothing (empty) in "Replace with" textbox.
I'd like to use Notepad++ to replace all leading spaces on a line with a like number of given characters. So for instance, I want to change:
zero
one
two
three
into:
zero
#one
##two
###three
I haven't been successful at getting this working. I did find Regex to replace html whitespace and leading whitespace in notepad++, but wasn't able to get the result I wanted.
Is this possible with Notepad++? I'd rather not have to write code to do this...
As Tim's answer indicates, this can't be done in a single search/replace, however here is how you can accomplish the same task fairly quickly using multiple replacements:
Find: ^( *)[ ]
Replace with: \1#
Now just spam the "Replace All" button until it indicates that there were no matches to replace. This will replace a single space at the beginning of each line on each click, so it will require the same number of clicks as your most-indented line.
Make sure "Regular expression" is selected as the search mode.
You would need variable-length lookbehind assertions to do this in a single regex, and Notepad++ doesn't support these.
For the record, in EditPadPro you can search for (?<=^ *)\s and replace with #.