Removing duplicates with different endings in notepad++

Removing duplicates with different endings in notepad++ - replace

Hi how do I remove duplicates that have different endings?
I have a big list like this:
1.2.3.4:12345
1.2.3.4:54321
1.2.3.4:41873
1.2.3.4:48138
I want to remove all of them except the first one 1.2.3.4:12345. Is this possible?

Here's a way to remove duplicate lines for all values preceding the ":" in the file.
Search for:
([^:]*)(:[0-9]+)\r\n(.*)^\1:\w+(\r\n|\Z)
Replace with:
\1\2\r\n\3
Make sure Search Mode is "Regular Expression" and ". matches newline" is checked.
You will have to click "Replace All" until no matches are found. Or record one iteration of this as a macro and run it as many times as necessary.

Related

Match all spaces after a particular string

Doing a find and replace in VsCode on a large amount of files. I'm looking to replace all spaces after a set of quotes, but only on a specific line.
I can very easily find all spaces using \s+, but I don't understand how to capture only the spaces after a specific string(one specific line). I've tried positive look behinds, but I can only get it to match the first space, but I need to match all spaces on that line.
Example code:
variable = "01 - Testing this thing"
I need to find and replace all the spaces between the quotation marks with underscores, but I can't get any regex to match all the spaces between the quotes. I might want to replace the dash(-) as well, but the spaces are more important and I'm struggling to figure it out.

Here is a pretty good workflow.
Open a Search Editor (from the Command Palette or set a keybinding to it).
Use this regex (?<=variable = ")[^"]*.
That will find all matches in all files in your workspace or whatever folders you designate in the file to include filter. I suggest setting the context lines option to 0.
Ctrl+Shift+L to select all your matches. The matches are the 01 - Testing this thing part.
Now do a regular find in that search editor tab - with the Find in Selection option enabled.
Simply doing a find of and replaceAll with _ will make all those changes (in the Search Editor only).
To apply those changes to all the files with your initial search results, use the extension search-editor-apply-changes Apply Search Editor Changes... command.
Then you can check to see if the changes were as you expected and save all. It will open all affected files so you can inspect them.
Seems like a few steps but notice the first regex can be very simple. And then you are doing a simple find/replace in just those selections. Demo:

You search for a string that matches, it has A space between the quotes. Replace with what is before and after the space but the space is now an underscore. You have to apply this as often as the max number od spaces in a string. It can't be done in 1 regex search-replace.
In the Search Bar
Find Regex:
(variable = "[^" ]*) ([^"]*")
Replace:
$1_$2
Then apply Replace All (button) and Refresh (button) until no more searches found.

Why doesn't my non-greedy match work in vim?

This is test
There are two tabs (\t) in this line. I want to get rid of the part from the beginning to the first tab key, which is "This ", and I used the following pattern:
:s/.\{-}\t//g
It says it can't find the pattern. If I use the following, both tabs are replaced, which isn't what I want. Why doesn't the first pattern work?
:s/.*\t//g

Your first attempt does not work because you are matching the fewest number of any character followed by a tab. The fewest number of any character is zero (0). So both of your tabs match without any other characters.
Based on the comments, the above explanation was incorrect.
Here is one possible solution.
:s/^[^\t]*\t//
This goes from the beginning ^, capturing any number of non-tab characters [^\t]* until it reaches a tab \t.

Your pattern /.\{-}\t didn't work because of the g flag in the :s command. This flag enables global matching so it matches twice. Just remove the flag and it will work. In addition, when deleting something you can omit the replacement part in :s:
:s/.\{-}\t
The full :s/.\{-}\t// is fine as well. Note that in either case it should not say "pattern not found" as you described. If you see that message, there is something else different between your example and your actual text.

Limit g flag in regex substitution to part of line?

I have a file like this:
"File_name_1.dat" "File_name_1.dat"
"File_name_2.dat" "File_name_2.dat"
"Some_other_thing.dat" "Some_other_thing.dat"
Is there a regex technique can I use to replace the underscores in only the second file name on each line, like this?
"File_name_1.dat" "File name 1.dat"
"File_name_2.dat" "File name 2.dat"
"Some_other_thing.dat" "Some other thing.dat"
I tried matching the column (\%XXc in Vim), but it seems to disable the g flag.
This only replaces the first underscore after column 25:
:%s/\%25c\([^_]*\)\zs_/ /g
This only replaces the last underscore in the line:
:%s/\%25c\(.*\)\zs_/ /g
I know I could repeat that command until they're gone, but I was wondering if there is a slicker way to do it.

Yes, there is an easy way to do this with visual selections. It's convenient that your data is layed out nicely, otherwise this wouldn't work.
Visually select all of the second filenames
Run this regex:
:'<,'>s/\%V_/ /g
The \%V will restrict your substitute to the inside of the current visual selection. Here's a screen shot of what I mean:

There are probably many ways to do this. Since the data is formatted nicely I would probably visually select and delete the first column (with <c-v>), Then run :%s/_/ /g. Then paste back the first column.
If you really wanted to do this in a single regex, you would need to use a lookbehind
:%s/\(\%25c.\{-}\)\#<=_/ /g
Where \#<= matches if the preceding element matches. :help \#<=

Remove everything before and after variable=int

I'm terrible at regex and need to remove everything from a large portion of text except for a certain variable declaration that occurs numerous times, id like to remove everything except for instances of mc_gross=anyint.

Generally we'd need to use "negative lookarounds" to find everything but a specified string. But these are fairly inefficient (although that's probably of little concern to you in this instance), and lookaround is not supported by all regex engines (not sure about notepad++, and even then probably depends on the version you're using).
If you're interested in learning about that approach, refer to How to negate specific word in regex?
But regardless, since you are using notepad++, I'd recommend selecting your target, then inverting the selection.
This will select each instance, allowing for optional white space either side of the '=' sign.
mc_gross\s*=\s*\d+
The following answer over on super user explains how to use bookmarks in notepad++ to achieve the "inverse selection":
https://superuser.com/questions/290247/how-to-delete-all-line-except-lines-containing-a-word-i-need
Substitute the regex they're using over there, with the one above.

You could do a regular expression replace of ^.*\b(mc_gross\s*=\s*\d+)\b.*$ with \1. That will remove everything other than the wanted text on each line. Note that on lines where the wanted text occurs two or more times, only one occurrence will be retained. In the search the ^.*\b matches from start-of-line to a word boundary before the wanted text; the \b.*$ matches everything from a word boundary after the wanted text until end of line; the round brackets capture the wanted text for the replacement text. If text such as abcmc_gross=13def should be matched and retained as mc_gross=13 then delete the \bs from the search.
To remove unwanted lines do a regular expression search for ^mc_gross\s*=\s*\d+$ from the Mark tab, tick Bookmark line and click Mark all. Then use Menu => Search => Bookmark => Remove unmarked lines.

Find what: [\s\S]*?(mc_gross=\d+|\Z)
Replace with: \1
Position the cursor at the start of the text then Replace All.
Add word boundaries \b around mc_gross=\d+ if you think it's necessary.

Regex: remove lines not starting with a digit

I have been fighting this problem with the help of a RegEx cheat sheet, trying to figure out how to do this, but I give up... I have this lengthy file open in Notepad++ and would like to remove all lines that do not start with a digit (0..9). I would use the Find/Replace functionality of N++. I am only mentioning this as I am not sure what Regex implementation is N++ using... Thank you
Example. From the following text:
1hello
foo
2world
bar
3!
I would like to extract
1hello
2world
3!
not:
1hello
2world
3!
by doing a find/replace on a regular expression.

You can clear up those line with ^[^0-9].* but it will leave blank lines.
Notepad++ use scintilla, and also using its regex engine to match those.
\r and \n are never matched because in
Scintilla, regular expression searches
are made line per line (stripped of
end-of-line chars).
http://www.scintilla.org/SciTERegEx.html
To clear up those blank lines, only way is choose extended mode, and replace \n\n to \n, If you are in windows mode change \r\n\r\n to \r\n

[^0-9] is a regular expression that matches pretty much anything, except digits. If you say ^[^0-9] you "anchor" it to the start of the line, in most regular expression systems. If you want to include the rest of the line, use ^[^0-9].+.

^[^\d].* marks a whole line whose first character is not a digit. Check if there are really no whitespaces in front of the digits. Otherwise you'd have to use a different expression.
UPDATE:
You will have to do ot in two steps. First empty the lines that do not start with a digit. Then remove the empty lines in extended mode.

One could also use the technique of bookmarking in Notepad++. I started benefiting from this feature (long time present but only more recently made somewhat more visible in the UI) not very long ago.
Simply bring up the find dialogue, type regex for lines not starting with digit ^\D.*$ and select Mark All. This will place blue circles, like marbles, in the left gutter - these are line bookmarks. Then just select from main menu Search -> Bookmark -> Remove bookmarked lines.
Bookmarks are cool, you could extract these lines by simply selecting to copy bookmarked lines, opening new document and pasting lines there. I sometimes use this technique when reviewing log files.

I'm not sure what you are asking. but the reg exp for finding the lines with a digit at the beginning would be
^\d.*
you can remove all the lines that match the above or alternatly keep all the lines that match this expression:
^[^\d].*

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Removing duplicates with different endings in notepad++ - replace

Hi how do I remove duplicates that have different endings? I have a big list like this: 1.2.3.4:12345 1.2.3.4:54321 1.2.3.4:41873 1.2.3.4:48138 I want to remove all of them except the first one 1.2.3.4:12345. Is this possible?

Related

Match all spaces after a particular string

Why doesn't my non-greedy match work in vim?

Limit g flag in regex substitution to part of line?

Remove everything before and after variable=int

Regex: remove lines not starting with a digit

Categories

Resources