Removing duplicate strings/words(not lines) using RegEx(notepad++)

Removing duplicate strings/words(not lines) using RegEx(notepad++) - regex

I would like to know a way to remove duplicate words or strings in a text file(not lines) using notepad++ regex find tool.
I only saw ways to remove duplicate lines using TextFx and that is not what i am looking for.
Example -
123 / 789
123 / 321
Removing 123 would result in
123 / 789
/ 321

I'm not familiar with Notepad++, but assuming it uses standard syntax, replace
\b(\w+)\b([\w\W]*)\b\1\b
with
$1$2

Related

How to pad zeroes into String using regex - using only regex101.com

I need to come up with "Regular expression" and a "Substitute" to pad any string that's shorter than 10 characters with zeros. It has to work on regex101.com, PHP flavor. This is all I need.
Example Input:
123
12345
1234567891
Expected output:
0000000123
0000012345
1234567891
I wish it was simple as searching for ([0-9]{1,9}) and replacing it with 000000000$1 but obviously string would exceed length of 10 characters. So I am trying with read ahead syntax but no luck.

As you mentioned in the comments below your question, I provided a .NET method using a catalog to pad a string in regex without using a conditional replacement (see my answer here).
This answer can be adapted to PCRE by using a branch reset group (?|...).
See regex in use here
Options gJsm and substitution of ${x}$1
^((?|[1-9](?=.*1\t+(?<x>0+))|[1-9]\d(?=.*2\t+(?<x>0+))|[1-9]\d{2}(?=.*3\t+(?<x>0+))|[1-9]\d{3}(?=.*4\t+(?<x>0+))|[1-9]\d{4}(?=.*5\t+(?<x>0+))|[1-9]\d{5}(?=.*6\t+(?<x>0+))|[1-9]\d{6}(?=.*7\t+(?<x>0+))|[1-9]\d{7}(?=.*8\t+(?<x>0+))))\b
The result:
1
12
123
12345678
123456789
Becomes...
000000001
000000012
000000123
012345678
123456789

SSN issue in SIEBEL

I am working on siebel CRM. I have space issues in my regex.
I have SSN numbers in these formats
123 456 789
123-456-789
123 45 6789
I need to dispaly my SSN Like XXX-XX-4567. My regex looks like
([\s.:])(?!000)(?!666)(?!9[0-9][0-9])\d{3}[- ]?(?!00)\d{2}[- ]?(?!0000)\d{4})([\s.:]) |
([\s.:])(?!000)(?!666)(?!9[0-9][0-9])\d{3}[- ]?(?!00)\d{3}[- ]?(?!00)\d{3})([\s.:]).
How can I remove all blank spaces in the above expression and display the format as i mentioned above?

It looks like there are syntax errors in your RegEx. There are a couple of unmatched brackets, at (?!0000)\d{4}) on the first section, the last bracket is unmatched.
I think I've managed to write the regex you're looking for, but a bit shorter than the one you were using:
([\s.:])((?!000)(?!666)(?!9[0-9]{2})\d{3})[- ]?((?!00)\d{2,3})[- ]?((?!00)\d{3,4})([\s.:])
This will match the following strings:
123-12-1234
123 456 789
123-456-789
123 45 6789
But will not match the following:
666-45-1234
abc-12-1232
123-00-1233
123-224-0011
123 224 0000
There are several capture groups here:
Matches any character (you may want to change this).
Matches the first three digit number.
Matches the second, two or three digit number.
Matches the third, three or four digit number.
Matches any character (you may want to change this).
You should be able to reconstruct the SSN in the format you need with the result of this RegEx.

Notepad ++ clone each line

I have a file that includes such lines
111
112
113
I want to clone the lines and add a seperator between the numbers. Output should be as the following
111#111
112#112
113#113
How can I do it with notepadd++ by using regex replace

Find (.+)
Replace \1#\1
This will work

Regex matches only first Pattern and not subsequent Patterns

In the sample below:
MARY 2.629 3,991,060 1
PATRICIA 1.073 1,628,911 2
LINDA 1.035 1,571,224 3
BARBARA 0.98 1,487,729 4
ELIZABETH 0.937 1,422,451 5
In this sample I want to select the characters other than the names and remove them.In Eclipse, using Find and Replace with Regex, Find : ([0-9,\.\s\n]*)$Replace: \n
It just finds the matching characters in first line, 2.629 3,991,060 1
And not in other lines.What am I doing wrong?

Use (\d|\.|,|\s)+ in find expression and \n in the replace expression of Eclipse to achieve what you want. It will replace all the characters that occur after the initial text characters.

Removing duplicate strings from text file using RegEx

I would like to remove strings from a file that already existed in a line with less number suing RegEx(Note++).
Example -
123 = 45,
789 = 321,
123 = 951
Should result in -
123 = 45,
789 = 321,
= 951

Well, this is a good example of how though RegEx is very powerful, it is not always the right tool for the job. For instance, the following RegEx will probably do what you want (I don't have Notepad++ installed, but it works in my RegEx client)
Search: (\b\d+\b)(.+?)\1
Replace: \1\2 (or $1$2, depending on your setup)
This takes and instance of a number, searches until it finds another instance of it, then replaces the entire thing with itself minus the second instance.
However, aside from being pretty dirty, this type of thing would be much simpler using a quick script or even something like Excel.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Removing duplicate strings/words(not lines) using RegEx(notepad++) - regex

I would like to know a way to remove duplicate words or strings in a text file(not lines) using notepad++ regex find tool. I only saw ways to remove duplicate lines using TextFx and that is not what i am looking for. Example - 123 / 789 123 / 321 Removing 123 would result in 123 / 789 / 321

I'm not familiar with Notepad++, but assuming it uses standard syntax, replace \b(\w+)\b([\w\W]*)\b\1\b with $1$2

Related

How to pad zeroes into String using regex - using only regex101.com

SSN issue in SIEBEL

Notepad ++ clone each line

Regex matches only first Pattern and not subsequent Patterns

Removing duplicate strings from text file using RegEx

Categories

Resources