Deleting every 2nd line from a file using Notepad++ - regex

I am looking for some regex help.
I have a textfile, nothing super important but I would like to delete every second line from it - I have tried following this guide: Delete every other line in notepad++
However I just can't get it to work, is the regex I am using ok? I am noob with regex
Find:
([^\n]*\n)[^\n]*\n
Replace with:
$1
No matter what I try (mouse position at the beginning, ctrl+a and Replace All) I just can't get it to work. I appreciate any help.
I've put the regex into here: http://regexpal.com/ and if I remove the final \n it highlights the individual rows.

Make sure you select regular expression for the search mode...
Also, you may want to make that final newline optional. In the case that there are an even number of lines and you do not have a trailing newline, it won't remove the last line.
([^\n]*\n)[^\n]*\n?
Update:
See how Windows handle new lines with \r\n instead of just \n. Try updating the expression to take this into account:
([^\r\n]*[\r\n]+)[^\r\n]*[\r\n]*
Final Update:
Thanks to #zx81, I now know that N++ uses PCRE so \R can be used for unicode newline characters. However [^\R] won't work (this looks for anything except R literally), so you will need to keep [^\r\n]. This can be simplified as:
([^\r\n]*\R)[^\r\n]*\R?

Related

Global find and replace with newline in Visual Studio Code

Suppose I want to remove all lines matching a regex in my project. Is there a way to do that?
Using the global find and replace function with regexes enabled I've tried:
Replace foo|bar with an empty string. This doesn't work because it leaves the line there with an empty string. I want the newline removed.
Replace (foo|bar)\n with an empty string. This doesn't actually match anything.
Replace (foo|bar)$ with an empty string. Again, doesn't match anything.
Any ideas?
Edit: It seems like some of my files have Windows line endings so (foo|bar)\r?\n does match. However when you replace it with an empty string it actually still leaves the line endings there.
Here's a test case:
a
foo
b
It should end up like this:
a
b
Not like this:
a
b
foo\n^ and (foo|bar)\n^ both work.
I just tested in my vs code - and you leave the replacement string blank
Yes, it is possible to remove entire lines with the search-across-files feature.
I'm guessing the original problem was due to a bug or otherwise unwanted behavior in an older version of VSCode. With VSCode 1.37.1, so long as the \n is included in the regex, the line is removed. In particular, the regex (foo|bar)\n, described in the original question as not working, now works fine.
Before:
After pressing the "Replace All" button:
Related observations:
This same regex works even if I set the file line endings to CRLF.
Appending ^ makes no difference. That's a bit surprising, but perhaps "after newline" counts as "beginning of line".
Appending $ causes the regex to not match anything. That is quite surprising given the behavior of ^.
I looked through the search configuration settings, but nothing seemed like it could affect this.

Notepad++ Regex: Find all 1 and 2 letter words

I’m working with a text file with 200.000+ lines in Notepad++. Each line has only one word. I need to strip out and remove all words which only contains one letter (e.g.: I) and words which contains only two letters (e.g.: as).
I thought I could just pas in regular regex like this [a-zA-Z]{1,2} but I does not recognize anything (I’m trying to Mark them).
I’ve done manual search and I know that there do exists words of that length so therefor can it only be my regex code that’s wrong. Anyone knows how to do this in Notepad++ ???
Cheers,
- Mestika
If you want to remove only the words but leave the lines empty, this works:
^[a-zA-Z]{1,2}$
Replace this with an empty string. ^ and $ are anchors for the beginning and the end of a line (because Notepad++'s regexes work in multi-line mode).
If you want to remove the lines completely, search for this:
^[a-zA-Z]{1,2}\r\n
And replace with an empty string. However, this won't work before Notepad++ 6, so make sure yours is up-to-date.
Note that you will have to replace \r\n with the specific line-endings of your file!
As Tim Pietzker suggested, a platform independent solution that also removes empty lines would be:
^[a-zA-Z]{1,2}[\r\n]+
A platform-independent solution that does not remove empty lines but only those with one or two letters would be:
^[a-zA-Z]{1,2}(\r\n?|\n)
I don't use Notepad++ but my guess is it could be because you have too many matches - try including word boundaries (your exp will match every set of 2 letters)
\b[a-zA-Z]{1,2}\b
The regex you specified should find 1-or-2 characters (even in Notepad++'s Find-dialog), but not in the way you'd think. You want to have the regex make sure it starts at the beginning of the line and ends at the end with ^ and $, respecitevely:
^[a-zA-Z]{1,2}$
Notepad++ version 6.0 introduced the PCRE engine, so if this doesn't work in your current version try updating to the most recent.
You seem to use the version of Notepad++ that doesn't support explicit quantifiers: that's why there's no match at all (as { and } are treated as literals, not special symbols).
The solution is to use their somewhat more lengthy replacement:
\w\w?
... but that's only part of the story, as this regex will match any symbol, and not just short words. To do that, you need something like this:
^\w\w?$

Notepad++ Find/Replace Regex Help

I am having issues doing a string replacement in Notepad++, and need some help.
My file:
LastName,(tab)FirstName[optional]MiddleName
Some times there is data that has a middle name, sometimes not.
Public,JohnQ.
Doe,John
Clinton,WilliamJefferson
would be:
Public(tab)John(tab)Q
Doe(tab)John
Clinton(tab)William(tab)Jefferson
I want to split it out into this:
LastName(tab)FirstName(tab)MiddleName
Thanks for adding the sample input. It helps immensely to have that around. Try this and see if it does what you want.
Find, making sure Match case is checked:
([A-Z][a-z]*),([A-Z][a-z]*)(.*)
Replace with:
\1(tab)\2(tab)\3
Of course, (tab) is actually a tab character that you have to place in the replacement string yourself.
An ugly regex like this works for me on the example you've provided:
(\w+),(\w+?)(([A-Z]\w*\.?)?)\n
replace with
\1\t\2\t\3\n
Note:
This only works if the middle name starts with a letter in the A-Z. You might be able to replace [A-Z] with [[:upper:]] if notepad++ supports it (I don't know).
I need that second bracket around the middle name part because I need to match at least an empty string when there is no middle name.

How to read this command to remove all blanks at the end of a line

I happened across this page full of super useful and rather cryptic vim tips at http://rayninfo.co.uk/vimtips.html. I've tried a few of these and I understand what is happening enough to be able to parse it correctly in my head so that I can possibly recreate it later. One I'm having a hard time getting my head wrapped around though are the following two commands to remove all spaces from the end of every line
:%s= *$== : delete end of line blanks
:%s= \+$== : Same thing
I'm interpreting %s as string replacement on every line in the file, but after that I am getting lost in what looks like some gnarly variation of :s and regex. I'm used to seeing and using :s/regex/replacement. But the above is super confusing.
What do those above commands mean in english, step by step?
The regex delimiters don't have to be slashes, they can be other characters as well. This is handy if your search or replacement strings contain slashes. In this case I don't know why they use equal signs instead of slashes, but you can pretend that the equals are slashes:
:%s/ *$//
:%s/ \+$//
Does that make sense? The first one searches for a space followed by zero or more spaces, and the second one searches for one or more spaces. Each one is anchored at the end of the line with $. And then the replacement string is empty, so the spaces are deleted.
I understand your confusion, actually. If you look at :help :s you have to scroll down a few pages before you find this note:
*E146*
Instead of the '/' which surrounds the pattern and replacement string, you
can use any other character, but not an alphanumeric character, '\', '"' or
'|'. This is useful if you want to include a '/' in the search pattern or
replacement string. Example:
:s+/+//+
I do not know vim syntax, but it looks to me like these are sed-style substitution operators. In sed, the / (in s/REGEX/REPLACEMENT/) can be uniformly replaced with any other single character. Here it appears to be =. So if you mentally replace = with /, you'll get
:%s/ *$//
:%s/ \+$//
which should make more sense to you.

Regex: remove lines not starting with a digit

I have been fighting this problem with the help of a RegEx cheat sheet, trying to figure out how to do this, but I give up... I have this lengthy file open in Notepad++ and would like to remove all lines that do not start with a digit (0..9). I would use the Find/Replace functionality of N++. I am only mentioning this as I am not sure what Regex implementation is N++ using... Thank you
Example. From the following text:
1hello
foo
2world
bar
3!
I would like to extract
1hello
2world
3!
not:
1hello
2world
3!
by doing a find/replace on a regular expression.
You can clear up those line with ^[^0-9].* but it will leave blank lines.
Notepad++ use scintilla, and also using its regex engine to match those.
\r and \n are never matched because in
Scintilla, regular expression searches
are made line per line (stripped of
end-of-line chars).
http://www.scintilla.org/SciTERegEx.html
To clear up those blank lines, only way is choose extended mode, and replace \n\n to \n, If you are in windows mode change \r\n\r\n to \r\n
[^0-9] is a regular expression that matches pretty much anything, except digits. If you say ^[^0-9] you "anchor" it to the start of the line, in most regular expression systems. If you want to include the rest of the line, use ^[^0-9].+.
^[^\d].* marks a whole line whose first character is not a digit. Check if there are really no whitespaces in front of the digits. Otherwise you'd have to use a different expression.
UPDATE:
You will have to do ot in two steps. First empty the lines that do not start with a digit. Then remove the empty lines in extended mode.
One could also use the technique of bookmarking in Notepad++. I started benefiting from this feature (long time present but only more recently made somewhat more visible in the UI) not very long ago.
Simply bring up the find dialogue, type regex for lines not starting with digit ^\D.*$ and select Mark All. This will place blue circles, like marbles, in the left gutter - these are line bookmarks. Then just select from main menu Search -> Bookmark -> Remove bookmarked lines.
Bookmarks are cool, you could extract these lines by simply selecting to copy bookmarked lines, opening new document and pasting lines there. I sometimes use this technique when reviewing log files.
I'm not sure what you are asking. but the reg exp for finding the lines with a digit at the beginning would be
^\d.*
you can remove all the lines that match the above or alternatly keep all the lines that match this expression:
^[^\d].*