NP++: Clear all lines which does not contain a certain string - regex

How to find a line in NP++, which does not contain the string (for example)
marg%233!_
I tried
.*[^(marg%233!_)].*\r
But that seems wrong.

You want to use a negative lookahead, which will fail the whole regex if what's inside is matched:
^(?!.*marg%233!_).*\r?
and replace these matches with an empty string.
The final ? is to catch the final line of your file, and this is assuming your linebreaks are \r. If it's not you can replace this last character with ([\n\r]|\r\n|\n\r).
[^...] is a negative character class, it will match any character (and only one haracter, as [...] would) which is not contained inside the class.

There is an easy way to achieve this. You need to perform 2 steps.
Go to Search menu > Find... > Select "Mark" Tab. Search for marg%233!_. Don't forget to check "Bookmark lines" and Press "Mark All"
==> All Rows you want to keep got a Bookmark
Go to Menu "Search - Bookmark - Remove Unmarked lines"
==> All lines without a bookmark are deleted.

Related

Notepad++ and regex (multiline)

I have been facing a challenge. I have a text file with the following pattern:
SOME RANDOM TITLE IN CAPS (nnnn)
text text text
more text
...
SOME OTHER RANDOM TITLE IN CAPS (nnnn)
What is for sure is that what I want to extract are lines with a bracket and a date ex: (2015) ; (20008)
After the (nnnn) there is no text, sometimes space and CR LF, sometimes just CR LF
I would like to delete everything else and keep just the TITLE LINE with the brackets
The time I spent I could have done it by hand (there are 100lines) but I like the challenge :)
I thought I could find the issue but I am stuck.
I have tried something along this line:
^.*\(\d\d\d\d\)(?s)(.*)(^.*\(\d\d\d\d\))
But I don't get what I want. I can't seem to stop the (?s)(.*) going all the way to the end of the text instead of stopping at the next occurrence.
I suggest using the Search > Mark feature. Use a pattern like \(\d{4}\) and check the "Bookmark Line" option then click "Mark All". Then use Search > Bookmark > Remove Unmarked Lines. This will remove all lines except the ones that have matched your pattern.
Note: If it's possible to have parentheses with 4 digits within your other lines you could add $ to the end of the expression to ensure that the pattern only matches the end of the line. E.g. more text (1234) and other stuff would be matched by the pattern I gave above but if you use pattern \(\d{4}\)$ it will no longer match.
If you want to be even more specific with your pattern by looking for those lines with only uppercase letters and spaces followed by parentheses with 4 digits inside where the parentheses are at the end of the line, then you could use a pattern like this: [A-Z ]+\(\d{4}\)$
Sample input:
SOME RANDOM TITLE IN CAPS (2008)
text text text
more text
...
SOME OTHER RANDOM TITLE IN CAPS (2010)
Here is how to mark the lines:
After clicking "Mark All" here is what you see:
Now use Search > Bookmark > Remove Unmarked Lines and you get this:
The following RegEx maches the 2 lines with brackets containing 4 numbers:
.*?\(\d{4}\)\s*
It starts matching anything at start zero or more times (non greedy), then it matches a start bracket followed by 4 numbers. Finally ending White Space and new line.
If you want to remove all lines but the ones that end with (4numbers) you may try with this:
^(?!.*\(\d{4}\)\h*$).*(?:\r?\n|\z)
Replace by: (nothing)
See demo

Regex - replace folder details with filename

Completely new to Regex so I was hoping I could find an answer here.
I'm using Notepad++, and I have a big bulk of file details from a folder in a text document, like so:
01/01/2015 08:00 1,000,000 filename.exe
01/02/2015 08:30 1,450,000 aDifferentFilename.exe
And I want to do a find and replace so that the whole thing is replaced by:
filename.exe
aDifferentFilename.exe
I could delete them manually, but there's over a thousand lines!
I've used ^(.*)% to find the lines one by one, but what would I put in the replace field to keep the filename, i.e filename.exe?
Any help/explanation would be great!
In Notepad++'s find dialog, click on the tab for "replace" (probably obvious, but to be complete). Make sure the radio button for "Regular expression" is checked (again, probably obvious). In the "Find what:" text box enter:
^([^ ]+[ ]+){3}(.*)$
if the pattern in your file is consistently four total fields of information (including the file names) each separated by spaces. Explanation: finds three groups of one or more non spaces followed by one or more spaces followed by everything else on the line. "Everything else on the line" is assigned to group 2 (it is enclosed in the second set of parenthesis the expression). We will use this fact below to specify the "Replace with:" string. This is necessary to advance the search position past the text we want to keep, otherwise after the replacement it would match the expression, and would itself be replaced.
Enter this:
^(.{34})(.*)$
if the consistent pattern in your file is that the file name always starts in the 35th column (both patterns could hold true, in which case you could use either). Explanation: This finds the first 34 characters at the start of each line followed by everything else on the line. See explanation above why we want to "find everything else on the line." Note that it is not necessary to group ".{34}" in parenthesis, I simply did this so that in both exampls the "replace with:" text would be group 2.
In the "Replace with:" text box enter \2
Explanation: This tells Notepad++ to replace what we matched with the group 2 subset of what we matched, in other words, "everything else on the line", which in this case is the file name.
Click "Replace"
Another option: If the text you want to keep always starts in column 35 (like required for the approach immediately above), you can select the column of text you want to delete by holding down ctrl+alt+shift and then left clicking with your mouse and dragging. Once the text is selected, hit delete
You can try matching on either 3 sets of spaces, or assume the comma is always fixed. Here is something quick and dirty which matches the comma in a greedy fashion, and 5 characters after that.
^(.*,.....)

How to perform negative search (or replace) in common text editors

Is there any way I can replace all the words/lines which don't match in my search query in text editors like notepad++ or sublime text.
For example I have a document having few url links in it. Can I do something which leaves only url links in my document. If I have to remove url links, I can search them using regex and replace them with an empty string. But can I do the same thing but for the content which doesn't match regex.
Example:
this is line which I want to remove and can also have special characters in it link % $ [] (0) and here is url: https://google.com one more line with some random garbagee and https://www.example.com
For above text, output should be:
https://google.com
https://www.example.com
In Sublime Text, you can search, hit "Find All", then copy and paste all the matches at once into a new document. This isn't exactly "negative search", but it does accomplish your goal.
With Notepad++ you can do it in two passes. The first pass isolates the wanted text. The second pass removes the unwanted pieces.
Firstly do a regular expression search for \b(https://[^\s]+)(\s|$) and replace with \r\n\1\r\n. This is a very crude and easy to fool URL detector, but it works on the examples you give. The search string looks for "https://" preceded by a word boundary (ie \b). That is followed by some non-whitespace characters that are considered to be part of the wanted text. The last part of the search text looks for either a whitespace character or the end of line. The wanted part is retained in a capture for the replace text.
Second do a regular search for ^https:// using the "Mark" tab in the find window. Select "Bookmark lines" then click on "Mark all". (You might like to click on "Clear all marks" before clicking on "Mark all".) Finally use menu => Search => Bookmark => Remove unmarked lines.
(Checked in Notepad++ version 6.6.9)
In SynWrite app:
call dialog "Search/ Extract strings"
enter Regex for URL, do "Find"
now press button to copy found URLs to new tab

How do I remove all non-ASCII characters with regex and Notepad++?

I searched a lot, but nowhere is it written how to remove non-ASCII characters from Notepad++.
I need to know what command to write in find and replace (with picture it would be great).
If I want to make a white-list and bookmark all the ASCII words/lines so non-ASCII lines would be unmarked
If the file is quite large and can't select all the ASCII lines and just want to select the lines containing non-ASCII characters...
This expression will search for non-ASCII values:
[^\x00-\x7F]+
Tick off 'Search Mode = Regular expression', and click Find Next.
Source: Regex any ASCII character
In Notepad++, if you go to menu Search → Find characters in range → Non-ASCII Characters (128-255) you can then step through the document to each non-ASCII character.
Be sure to tick off "Wrap around" if you want to loop in the document for all non-ASCII characters.
In addition to the answer by ProGM, in case you see characters in boxes like NUL or ACK and want to get rid of them, those are ASCII control characters (0 to 31), you can find them with the following expression and remove them:
[\x00-\x1F]+
In order to remove all non-ASCII AND ASCII control characters, you should remove all characters matching this regex:
[^\x1F-\x7F]+
To remove all non-ASCII characters, you can use following replacement: [^\x00-\x7F]+
To highlight characters, I recommend using the Mark function in the search window: this highlights non-ASCII characters and put a bookmark in the lines containing one of them
If you want to highlight and put a bookmark on the ASCII characters instead, you can use the regex [\x00-\x7F] to do so.
Cheers
To keep new lines:
First select a character for new line... I used #.
Select replace option, extended.
input \n replace with #
Hit Replace All
Next:
Select Replace option Regular Expression.
Input this : [^\x20-\x7E]+
Keep Replace With Empty
Hit Replace All
Now, Select Replace option Extended and Replace # with \n
:) now, you have a clean ASCII file ;)
Another good trick is to go into UTF8 mode in your editor so that you can actually see these funny characters and delete them yourself.
Another way...
Install the Text FX plugin if you don't have it already
Go to the TextFX menu option -> zap all non printable characters to #. It will replace all invalid chars with 3 # symbols
Go to Find/Replace and look for ###. Replace it with a space.
This is nice if you can't remember the regex or don't care to look it up. But the regex mentioned by others is a nice solution as well.
Click on View/Show Symbol/Show All Character - to show the [SOH] characters in the file
Click on the [SOH] symbol in the file
CTRL=H to bring up the replace
Leave the 'Find What:' as is
Change the 'Replace with:' to the character of your choosing (comma,semicolon, other...)
Click 'Replace All'
Done and done!
In addition to Steffen Winkler:
[\x00-\x08\x0B-\x0C\x0E-\x1F]+
Ignores \r \n AND \t (carriage return, linefeed, tab)

Removing empty lines in Notepad++

How can I replace empty lines in Notepad++? I tried a find and replace with the empty lines in the find, and nothing in the replace, but it did not work; it probably needs regex.
There is now a built-in way to do this as of version 6.5.2
Edit -> Line Operations -> Remove Empty Lines or Remove Empty Lines (Containing Blank characters)
You need something like a regular expression.
You have to be in Extended mode
If you want all the lines to end up on a single line use \r\n. If you want to simply remove empty lines, use \n\r as #Link originally suggested.
Replace either expression with nothing.
There is a plugin that adds a menu entitled TextFX. This menu, which houses a dizzying array of quick text editing options, gives a person the ability to make quick coding changes. In this menu, you can find selections such as Drop Quotes, Delete Blank Lines as well as Unwrap and Rewrap Text
Do the following:
TextFX > TextFX Edit > Delete Blank Lines
TextFX > TextFX Edit > Delete Surplus Blank Lines
notepad++
Ctrl-H
Select Regular Expression
Enter ^[ \t]*$\r?\n into find what, leave replace empty. This will match all lines starting with white space and ending with carriage return (in this case a windows crlf)
Click the Find Next button to see for yourself how it matches only empty lines.
Press ctrl + h (Shortcut for replace).
In the Find what zone, type ^\R ( for exact empty lines) or ^\h*\R ( for empty lines with blanks, only).
Leave the Replace with zone empty.
Check the Wrap around option.
Select the Regular expression search mode.
Click on the Replace All button.
You can follow the technique as shown in the following screenshot:
Find what: ^\r\n
Replace with: keep this empty
Search Mode: Regular expression
Wrap around: selected
NOTE: for *nix files just find by \n
This worked for me:
Press ctrl + h (Shortcut for replace)
Write one of the following regex in find what box.
[\n\r]+$ or ^[\n\r]+
Leave Replace with box blank
In Search Mode, select Regex
Click on Replace All
Done!
In notepad++ press CTRL+H , in search mode click on the "Extended (\n, \r, \t ...)" radio button then type in the "Find what" box: \r\n (short for CR LF) and leave the "Replace with" box empty..
Finally hit replace all
Well I'm not sure about the regex or your situation..
How about CTRL+A, Select the TextFX menu -> TextFX Edit -> Delete Blank Lines and viola all blank line gone.
A side note - if the line is blank i.e. does not contain spaces, this will work
1) Ctrl + H ( Or Search 🠆 Replace..) to open Replace window.
2) Select 'Search Mode' 'Regular expression'
3) In 'Find What' type ^(\s*)(.*)(\s*)$ & in 'Replace With' type \2
^ - Matches start of line character
(\s*) - Matches empty space characters
(.*) - Matches any characters
(\s*) - Matches empty spaces characters
$ - Matches end of line character
\2 - Denotes the matching contend of the 2nd bracket
Refer https://www.rexegg.com/regex-quickstart.html for more on regex.
You can search for the following regex: ^(?:[\t ]*(?:\r?\n|\r))+ and replace it with empty field
Ctrl+H.
find - \r\r
replace with - \r.
This obviously does not work if the blank lines contain tabs or blanks. Many web pages (e.g. http://www.guardian.co.uk/) contain these white lines, as a result of a faulty HTML editor.
Remove white space using regular expression as follows:
change pattern: [\t ]+$
into nothing.
where [\t ] matches either tab or space. '+' matches one or more occurrences, and '$' marks the end of line.
Then use notepad++/textFX to remove single or extra empty lines.
Be sure that these blank lines are not significant in the given context.
Edit >> Blank Operations >> Trim Leading and Trailing Spaces (to remove black tabs and spaces in empty lines)
Ctrl + H to get replace window and replace pattern: ^\r\n with nothing (select regular expression)
Note: step 1 will remove your code intendation done via tabs and blank spaces
Sometimes \n\r etc not work, here to figure it out, what your actually regular expression should be.
Advantage of this trick: If you want to replace in multiple file at once, you must need this method. Above will not work...
CTRL+A, Select the TextFX menu -> TextFX Edit -> Delete Blank Lines as suggested above works.
But if lines contains some space, then move the cursor to that line and do a CTRL + H. The "Find what:" sec will show the blank space and in the "Replace with" section, leave it blank.
Now all the spaces are removed and now try CTRL+A, Select the TextFX menu -> TextFX Edit -> Delete Blank Lines
/n/r assumes a specific type of line break. To target any blank line you could also use:
^$
This says - any line that begins and then ends with nothing between. This is more of a catch-all. Replace with the same empty string.
I did not see the combined one as answer, so search for ^\s+$ and replace by {nothing}
^\s+$ means
^ start of line
\s+ Matches minimum one whitespace character (spaces, tabs, line breaks)
$ until end of line
This pattern is tested in Notepad++ v8.1.1
It replaces all spaces/tabs/blank lines before and after each row of text.
It shouldn't mess with anything in the middle of the text.
Find: ^(\s|\t)+|(\s|\t)+$
Replace: leave this blank
Before:
_____________________________________
\tWORD\r\n
\r\n
\tWORD\s\tWORD\s\t\r\n
\r\n
\r\n
WORD\s\s\tWORD\t\sWORD\s\r\n
\t\r\n
\s\s\s\r\n
WORD\s\sWORD\s\s\t\r\n
____________________________________
After:
_____________________________________
WORD\r\n
WORD\s\tWORD\r\n
WORD\s\s\tWORD\t\sWORD\r\n
WORD\s\sWORD
_____________________________________
A few of the above expressions and extended expressions did not work for me, but the regular expression "$\n$" did.
An easy alternative for removing white space from empty lines:
TextFX>TextFX Edit> Trim Trailing Spaces
This will remove all trailing spaces, including trailing spaces in blank lines.
Make sure, no trailing spaces are significant.
this work for me:
SEARCH:^\r
REPLACE: (empty)