regex to match what is not matched - regex

I have a regex to search through just under 2 million product numbers: -([A-Za-z0-9]{1-5})$ to match the MFG code (last few letters after the last dash) for example, G4F,XB-RJG4 SJG2G-TRMH would match -TRMH. This was supposed to match every string on my list, however, I am a couple thousand short. This probably means that some were formatted wrong.
what could I do to match a string that doesn't end in -XXXXX, -XXXX, -XXX, or -XX, or in other words, match what is not matched?

Just two steps:
In the search dialog, tick "Bookmark line"
After the search is done, click "Search -> Bookmark -> Inverse Bookmark"
Alternatively, in step 2: "Search -> Bookmark -> Remove bookmarked lines"; afterwards, only the lines that didn't match the regular expression remain.

Related

How to stop matching if the condition is satisfied?

Target is to remove patterns (split by '/') with single alphabet, AND if one such pattern appears, then remove the rest right parts.
For example:
/modadisi/v/list -> /modadisi
/i/m/videos/tnt -> null
New examples:
/abcd/abcd/abcd/a/abcd -> /abcd/abcd/abcd
/abcd -> /abcd
/abcd/abcd/abcd -> /abcd/abcd/abcd
The current regex I use is
\/[a-zA-Z]{2,}
This will match all patterns, like /modadisi/v/list-> /modadisi/list. Is it possible to modify the regex to scan from left to right, and stop if condition is matched?
Based on your new examples, just anchor the pattern to the start of the string using ^, and put the pattern inside a group that repeats. The full pattern would be ^(\/[a-zA-Z]{2,})*.
For the inputs:
/modadisi/v/list
/i/m/videos/tnt
/abcd/abcd/abcd/a/abcd
/abcd
/abcd/abcd/abcd
it produces:
/modadisi
{nothing}
/abcd/abcd/abcd
/abcd
/abcd/abcd/abcd
If any of this isn't right, let me know and I will adjust the pattern.

Remove duplicate lines based on a search in notepad++

I have a text file that contains thousands of lines of text as below.
aaaa "test "
aa "test "(version 2)
bbbb "test "(version 4)
bbbbb "test1 "(with heads)
abs "test1 "
absc "test3"
I would like to be able to remove all the duplicates based on a search and keep only the first line (in my case all lines with the same value between the quotation marks)
EDIT : More details about how I detect that a line is a duplicate of another :
I check the value between the quotation marks. On the 3 first lines there is the value "test " between quotation marks so I want to keep the first line with this value and remove the other values. For lines 4 and 5 the value is "test1 " so I keep only line 4 and remove the other.
So after cleaning my text file would have this form
aaaa "test "
bbbbb "test1 "(with heads)
absc "test3"
I tried to use this regular search in notepad++
(.\".*?")
But I don't know how to use it to find duplicates and remove the other lines with the same value. I already checked other user's case but I can't found a solution.
I would solve it in several steps.
append line numbers
put the quoted text in front
sort, now lines with the same quoted text are sorted behind each other, and secondly in the original sequence due to the line numbers from step 1
remove "duplicates"
remove the inserted quoted text from step 2
sort by the line number from step 1
remove the line numbers from step 1
Now the detailed explanation:
append line numbers: use Edit -> Column Editor in the first column two times
insert text (some delimiter that does not occur in the file, e.g. | or : )
insert numbers start with 1 increment by 1 use leading zeros
Now each line should start with a line number and a delimiter
prepend the quoted text: use regexp replace
Find what: ^([^"]*)("[^"]+")(.*)$
Replace: \2\1\2\3
Now your lines should start with the text.
Sort: by using Edit -> Line Operations -> Sort ...
Remove Duplicates: with an regexp replace:
Find What: ("[^"]+")(.*)\n\1.*
Replace: \1\2
Use Replace All.
Remove the texts from step 2: using regex replace
Find What: ^"[^"]+"
Replace with: Nothing i.e. leave empty
Sort by the original line numbers: by using Edit -> Line Operations -> Sort ...
Remove the line numbers from step 1: using a regexp replace:
Find What: ^(.*\|) (use \| or whatever you used in step 1 as delimiter)
Replace with: Nothing i.e. leave empty

Seeking regex in Notepad++ to search and replace CRLF between two quotation marks ["] only

I've got a CSV file with some 600 records where I need to replace some [CRLF] with a [space] but only when the [CRLF] is positioned between two ["] (quotation marks). When the second ["] is encountered then it should skip the rest of the line and go to the next line in the text.
I don't really have a starting point. Hope someone comes up with a suggestion.
Example:
John und Carol,,Smith,,,J.S.,,,,,,,,,,,,,+11 22 333 4444,,,,,"streetx 21[CRLF]
New York City[CRLF]
USA",streetx 21,,,,New York City,,,USA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Normal,,My Contacts,[CRLF]
In this case the two [CRLF] after the first ["] need to be replaced with a space [ ]. When the second ["] is encountered, skip the end of the line and go to next line.
Then again, now on the next line, after the first ["] is encountered replace all [CRLF] until the second ["] is encountered. The [CRLF]s vary in numbers.
In the CSV-file the amount of commas [,] before (23) and after (65) the 2 quotation marks ["] is constant.
So maybe a comma counter could be used. I don't know.
Thanks for feedback.
This will work using one regex only (tested in Notepad++):
Enter this regex in the Find what field:
((?:^|\r\n)[^"]*+"[^\r\n"]*+)\r\n([^"]*+")
Enter this string in the Replace with field:
$1 $2
Make sure the Wrap around check box (and Regular expression radio button) are selected.
Do a Replace All as many times as required (until the "0 occurrences were replaced" dialog pops up).
Explanation:
(
(?:^|\r\n) Begin at start of file or before the CRLF before the start of a record
[^"]*+ Consume all chars up to the opening "
" Consume the opening "
[^\r\n"]*+ Consume all chars up to either the first CRLF or the closing "
) Save as capturing group 1 (= everything in record before the target CRLF)
\r\n Consume the target CRLF without capturing it
(
[^"]*+ Consume all chars up to the closing "
" Consume the closing "
) Save as capturing group 2 (= the rest of the string after the target CRLF)
Note: The *+ is a possessive quantifier. Use them appropriately to speed up execution.
Update:
This more general version of the regex will work with any line break sequence (\r\n, \r or \n):
((?:^|[\r\n]+)[^"]*+"[^\r\n"]*+)[\r\n]+([^"]*+")
Maybe do it in three steps (assuming you have 88 fields in the CSV, because you said there are 23 commas before, and 65 after each second ")
Step 1: replace all CR/LF with some character not anywhere in the file, like ~
Search: \r\n Replace: ~
Step 2: replace all ~ after every 88th 'comma group' (or however many fields in CSV) with \r\n -- to reinsert the required CSV linebreaks:
Search: ((?:[^,]*?,){88})~ Replace: $1\r\n
Step 3: replace all remaining ~ with space
Search ~ Replace: <space>
In this case the source data is generated by the export function in GMail for your contacts.
After the modification outlined below (without RegEx) the result can be used to tidy up your contacts database and re-import it to GMail or to MS Outlook.
Yes, I am standing on the shoulders of #alan and #robinCTS. Thank you both.
Instructions in 5 steps:
use Notepad++ / find replace / extended search mode / wrap around = on
-1- replace all [CRLF] with a unique set characters or a string (I used [~~])
find: \r\n and replace with: ~~
The file contents are now on one line only.
-2- Now we need to separate the header line. For this move to where the first record starts exactly before the 88th. comma (including the word after the 87th. comma [,]) and enter the [CRLF] manually by hitting the return key. There are two lines now: header and records.
-3- now find all [,~~] and replace with [,\r\n] The result is one record per line.
-4- remove the remaining [~~] find: ~~ and replace with: [ ] a space.
The file is now clean of unwanted [CRLF]s.
-5- Save the file and use it as intended.

Regex taking whole line instead of exact words

I'm trying to match a list of different items in a text. I created a regex but it matches the whole string instead of each seperate item.
This is my current regex:
\[[a-zA-Z]\](.*)\. {1}
My test text:
[step 1] test blahblah blah [A] test item 1. [B] test item 2.
The current regex matches:
[A] test item 1. [B] test item 2.
In 1 string instead of 2 matches.
I think you want to have non-greedy behaviour:
\[[a-zA-Z]\](.*?)\. {1}
Note the question mark (?). It says that the expression coming right before it should match as little as possible to fullfil the expression. Basically, it makes it stop before the first dot and not the last one.
Proof

Substitute the n-th occurrence of a word in vim

I saw other questions dealing with the finding the n-th occurrence of a word/pattern, but I couldn't find how you would actually substitute the n-th occurrence of a pattern in vim. There's the obvious way of hard coding all the occurrences like
:s/.*\(word\).*\(word\).*\(word\).*/.*\1.*\2.*newWord.*/g
Is there a better way of doing this?
For information,
s/\%(\(pattern\).\{-}\)\{41}\zs\1/2/
also works to replace 42th occurrence. However, I prefer the solution given by John Kugelman which is more simple -- even if it will not limit itself to the current line.
You can do this a little more simply by using multiple searches. The empty pattern in the :s/pattern/repl/ command means replace the most recent search result.
:/word//word//word/ s//newWord/
or
:/word//word/ s/word/newWord/
You could then repeat this multiple times by doing #:, or even 10#: to repeat the command 10 more times.
Alternatively, if I were doing this interactively I would do something like:
3/word
:s//newWord/r
That would find the third occurrence of word starting at the cursor and then perform a substitution.
Replace each Nth occurrence of PATTERN in a line with REPLACE.
:%s/\(\zsPATTERN.\{-}\)\{N}/REPLACE/
To replace the nth occurrence of PATTERN in a line in vim, in addtion to the above answer I just wanted to explain the pattern matching i.e how it is actually working for easy understanding.
So I will be discussing the \(.\{-}\zsPATTERN\)\{N} solution,
The example I will be using is replacing the second occurrence of more than 1 space in a sentence(string).
According to the pattern match code->
According to the zs doc,
\zs - Scroll the text horizontally to position the cursor at the start (left
side) of the screen.
.\{-} 0 or more as few as possible (*)
Here . is matching any character and {} the number of times.
e.g ab{2,3}c here it will match where b comes either 2 or 3 times.
In this case, we can also use .* which is 0 or many as many possible.
According to vim non-greedy docs, "{-}" is the same as "*" but uses the shortest match first algorithm.
\{N} -> Matches n of the preceding atom
/\<\d\{4}\> search for exactly 4 digits, same as /\<\d\d\d\d>
**ignore these \<\> they are for exact searching, like search for fred -> \<fred\> will only search fred not alfred.
\( \) combining the whole pattern.
PATTERN here is your pattern you are matching -> \s\{1,} (\s - space and {1,} as explained just above, search for 1 or more space)
"abc subtring def"
:%s/\(.\{-}\zs\s\{1,}\)\{2}/,/
OUTPUT -> "abc subtring,def"
# explanation: first space would be between abc and substring and second
# occurence of the pattern would be between substring and def, hence that
# will be replaced by the "," as specified in replace command above.
This answers your actual question, but not your intent.
You asked about replacing the nth occurrence of a word (but seemed to mean "within a line"). Here's an answer for the question as asked, in case someone finds it like I did =)
For weird tasks (like needing to replace every 12th occurrence of "dog" with "parrot"), I like to use recursive recordings.
First blank the recording in #q
qqq
Now start a new recording in q
qq
Next, manually do the thing you want to do (using the example above, replace the 12th occurrence of "dog" with "parrot"):
/dog
nnnnnnnnnnn
delete "dog" and get into insert
diwi
type parrot
parrot
Now play your currently empty "#q" recording
#q
which does nothing.
Finally, stop recording:
q
Now your recording in #q calls itself at the end. But because it calls the recording by name, it won't be empty anymore. So, call the recording:
#q
It will replay the recording, then at the end, as the last step, replay itself again. It will repeat this until the end of the file.
TLDR;
qq
q
/dog
nnnnnnnnnnndiwiparrot<esc>
#q
q
#q
Well, if you do /gc then you can count the number of times it asks you for confirmation, and go ahead with the replacement when you get to the nth :D