How to delete every third line on Notepad++? - regex

I have text on new lines like so:
tom
tim
john
will
tod
hello
test
ttt
three
I want to delete every third line so using the example above I want to remove: john,hello,three
I know this calls for some regex, but I am not the best with it!
What I tried:
Search: ([^\n]*\n?){3} //3 in my head to remove every third
Replace: $1
The others I tried were just attempts with \n\r etc. Again, not the best with regex. The above attempt I thought was kinda close.

This will delete every third line that may contain more than one word.
Ctrl+H
Find what: (?:[^\r\n]+\R){2}\K[^\r\n]+(?:\R|\z)
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
Replace all
Explanation:
(?: # start non capture group
[^\r\n]+ # 1 or more non linebreak
\R # any kind of linebreak (i.e. \r, \n, \r\n)
){2} # end group, appears twice (i.e. 2 lines)
\K # forget all we have seen until this position
[^\r\n]+ # 1 or more non linebreak
(?: # start non capture group
\R # any kind of linebreak (i.e. \r, \n, \r\n)
| # OR
\z # end of file
) #end group
Result for given example:
tom
tim
will
tod
test
ttt
Screen capture:
Demo

gedit ubuntu
Search for: (.*?)\n(.*?)\n(.*)\n
Replace with: \1\n\2\n

Since the OP says Sahil's answer "worked like a charm" I'll assume the text in notepad++ ended with a newline character. Otherwise, Sahil's and Toto's answers will fail to match the final set of words.
Sahil's pattern: (.*?)\n(.*?)\n(.*)\n takes 79 steps *if the text ends in \n; otherwise 112 steps and fails.
His replacement expression needlessly uses two capture group references.
Toto's pattern: ((?:[^\r\n]+\R){2})[^\r\n]+\R takes 39 steps *if the text ends in \n; otherwise 173 steps and fails.
His replacement expression uses one capture group reference.
My suggested pattern will take only 25 steps and uses no capture groups.
Your text is a series of non-white characters followed by white characters and so the following is the shortest, most accurate pattern which provides maximum speed:
\S+\s+\S+\s+\K\S+\s*
This pattern should be paired with an empty replacement.
\S means "non-white-space character"
\s means "white-space character"
+ means one or more of the preceding match
* means zero or more of the preceding match
\K means Keep the match starting from here
The * on the final \s allows the final 3 lines of text to conclude without a trailing newline character. When doing this kind of operation on a big batch of text, it is important to be sure that the replacement is working properly on the whole text and no undesired substrings remain.
While I'm sure you've long forgotten about this regex task, it is important that future readers benefit from learning the best way to achieve the desired result.

Another way, you can use the plugin ConyEdit to do this. Use the command line cc.dl 3.3 to delete the third line of each group, 3 lines for each group.

Related

Increase 1 space after period (dot) to 2 but not more in vim

I am working on a simple text file in vim where I want to end every sentence with 2 spaces after full stop (dot/period). However, I do not want those sentences which already have 2 spaces after full stop to have further increase in spaces. The test text could be:
This sentence has only 1 space after it. This one has two. This line has again 1 space only. This is last line.
I tried:
%s/\. /\. /g
but this increases all spaces by one. I tried following also but it does not work:
%s/\. \\([^ ]\\)/. \\1/g
How can I achieve this in vim?
Replaces all periods followed by spaces with a period followed by 2 spaces
%s/\. \+/. /g
well, you shouldn't double the escapes, and it works:
:%s/\. \([^ ]\)/. \1/g
result:
This sentence has only 1 space after it. This one has two. This line has again 1 space only. This is last line.~
You may use
%s/\. \( \)\#!/. /g
The \. \( \)\#! pattern matches a . and a space that is not followed with another space.
The (...)#! is a negative lookahead construct in Vim. See Lookbehind / Lookahead Regex in Vim. In other common regex flavors, it is written as (?!pattern). You may learn more about how negative lookaheads work in this answer of mine.
To match any whitespace, replace the literal space with \s inside the pattern.
Adds an extra space after periods followed by exactly one space
:%s/\. \zs\ze[^ ]/ /g
Here is another possibility:
:%s/\. \?/. /g
The \? will capture a space (and only one) if it can, effectively not changing anything if there are already two spaces.

What regular expression will select all lines that have more than one punctuation mark?

I have this regular expression:
\..*?\.
But it only selects between two periods, not every punctuation mark, and it also selects across multiple lines.
Would modifying this expression to only take in one line at a time work somehow, if there's also a way to group punctuation into where we have a period?
Just to make things simpler, at this time I only need the expression to recognize periods, exclamation points, and question marks. I don't need it to register commas.
Thanks to Nathan and Agumander below, I know to substitute [.!?] in place of \. now, but I'm still having trouble with the other half of my question.
Just to make sure I'm being more clear, using [.!?].*?[.!?]\s will highlight text between punctuation marks, but across multiple lines. So I can't use it to bookmark only the lines that have multiple punctuation marks.
Placing characters inside a pair of square brackets will match to any of the enclosed characters. In your case you'd want [.?!]
If you want to match any sentence that has two of these, then you'll be looking for a pair of [.!?] separated by zero or more of any character.
The regex that matches strings with more than one of the set [.?!] would then be [.!?].*[.!?]
To make . match newlines, you'd add the s modifier to your regex.
...so the full regex would be /[.!?].*[.!?]/s
Ok I figured it out. Thanks to Agumander and Nathan above I substituted [.!?] in for the two \. in my original regex:
\..*?\. became [.!?].*[.!?]
Putting \s at the end of the regex made it pink select the entire document in notepad++.
The last issue I had was remembering to turn off "matches newline."
Agumander, I think you're asking for a regex that basically finds multiple punctuation marks on a single line. So here's one way to do it.
Here's the text I'm going to match. The regex will match the first line in it's entirety, but will not match the second.
Here's a line with multiple punctuation. The entire line will match the regex!
This line does not have multiple punctuation.
Regex
^.*(?:[\.?!].*){2,}$
Explanation
^ -- Start matching at the beginning of a line
.* -- match any character 0 or more times
(?: -- start a new non-capturing group
[.?!] -- find a character matching a period, question mark, or exclamation point.
.* -- match any character 0 or more times
)
{2,} -- repeat the previous group 2 or more times. This is how we ensure there's at least two punctuation marks before considering it a match.
$ -- end of line anchor, basically stop matching at the end of a line

Keep only the strings in between quotes in Notepad++

In Notepad++, I use the expression (?<=").*(?=") to find all strings in between quotes. It would the seem rather trivial to be able to only keep those results. However, I cannot find an easy solution for this.
I think the problem is that Notepad++ is not able to make multiple selections. But there must be some kind of workaround, right? Perhaps I must invert the regex and then find/replace those results to end up with the strings I want.
For example:
blablabla "Important" blabla
blabla "Again important" blablabla
I want to keep:
Important
Again important
There is no great solution for this and depending on your use case I would recommend writing a quick script that actually uses your first expression and creates a new file with all of the matches (or something like this). However, if you just want something quick and dirty, this expression should get you started:
[^"]*(?:"([^"]*)")?
\1\n
Explanation:
[^"]* # 0+ non-" characters
(?: # Start non-capturing group
" # " literally
( # Start capturing group
[^"]* # 0+ non-" characters
) # End capturing group
" # " literally
)? # End non-capturing group AND make it optional
The reason the optional non-capturing group is used is because the end of your file may very well not have a string in quotes, so this isn't a necessary match (we're more interested in the first [^"]* that we want to remove).
Try something like this:
[^"\r\n]+"([^"]+)"[^"\r\n]+
And replace with $1. The above regex assumes there will be only 2 double quotes in each line.
[^"]+ matches non-quote characters.
[^"\r\n]+ matches non-quote, non newline characters.
regex101 demo
Hard to be certain from your post, but I think you may want : SEE BELOW
<(?<=")(.*)(?=")
The part you keep will be captured as \2.
(?<=")(.*)(?=")
\1 \2 \3
Your original regex string uses parentheses to group characters for evaluation. Parentheses ALSO group characters for capturing. That is what I added.
Update:
The regex pattern you provided doesn't seem to work correctly. Won't this work?
\"(.*)\"
\1 now captures the content.

Notepad++ regular expressions

First of all, regular expressions are quite possibly the most confusing thing I have every dealt with - with that being said I cannot believe how efficient they can make ones life.
So I am trying to understand the wildcard regex with no luck
Need to turn
f_firstname
f_lastname
f_dob
f_origincountry
f_landing
Into
':f_firstname'=>$f_firstname,
':f_lastname'=>$f_lastname,
':f_dob'=>$f_dob,
':f_origincountry'=>$f_origincountry,
':f_landing'=>$f_landing,
In the answer can you please briefly describe the regex you are using, I have been reading the tutorials but they boggle my mind. Thanks.
Edit: As Chris points out, you can improve the regex by cleaning up any white space there may be in the target string. I also replace the dot with \w as he did because it's better practice than using the .
Search: ^f_(\w+)\s*$
^ # start at the beginning of the line
f_ # look for f_
(\w+) # capture in a group all characters
\s* # optionally skip over (don't capture) optional whitespace
$ # end of the line
Replace: ':f_\1'=>$f_\1,
':f_ # beginning of replacement string
\1 # the group of characters captured above
'=>$f_ # some more characters for the replace
\1, # the capture group (again)
Find: (^.*)
Replace with: ':$1'=>$$1,
Find What:
(f_\w+)
Here we're matching f_ followed by a word character \w+ (the plus mean one or more times). Wrapping the whole thing in brackets means we can reference this group in the replace pattern
Replace With:
':\1'=>$\1,
This is simply your result phrase but instead of hardcoding the f words I've put \1 to reference the group in the search

regex optional word match

I'm trying to create a regex for extracting singers, lyricists. I was wondering how to make lyricists search optional.
Sample Multiline String:
Fireworks Singer: Katy Perry
Vogue Singers: Madonna, Karen Lyricist: Madonna
Regex: /Singers?:(.\*)\s?Lyricists?:(.\*)/
This matches the second line correctly and extracts Singers(Madonna, Karen) and Lyricists(Madonna)
But it does not work with the first line, when there are no Lyricists.
How do I make Lyricists search optional?
You can enclose the part you want to match in a non-capturing group: (?:). Then it can be treated as a single unit in the regex, and subsequently you can put a ? after it to make it optional. Example:
/Singers?:(.*)\s?(?:Lyricists?:(.*))?/
Note that here the \s? is useless since .* will greedily eat all characters, and no backtracking will be necessary. This also means that the (?:Lyricists?:(.*)) part will never be matched for the same reason. You can use the non-greedy version of .*, .*? along with the $ to fix this:
/Singers?:(.*?)\s*(?:Lyricists?:(.*))?$/
Some extra whitespace ends up captured; this can be removed also, giving a final regex of:
/Singers?:\s*(.*?)\s*(?:Lyricists?:\s*(.*))?$/
Just to add to Cameron's solution. if the source string has multiple lines each containing both Singers and Lyricists, you'll probably need to add the 'm' multi-line modifier so that the '$' will match ends-of-lines. (You didn't say what language you are using - you may want to add the 'i' modifier as well.)