Replace duplicates Items from a string using Regex - regex

I have a string which looks something like this
xyz 123;abc;xyz 123;efg;
I want to remove the duplicates and keep only one occurrence in the string. I want the output to be like this
xyz 123;abc;efg;
I tried using (?<=;|^)([^;]*);(\1)+(?=;|$) but couldn't figure out how to remove one of the duplicates. Any suggestions ?

Brief
Since you didn't specify a language, I'll assume the tokens in your original regex are all working in whatever language you're using.
Code
See regex in use here
(([^;]*;).*)\2
Replace with \1
Explanation
(([^;]*;).*) Capture the following into capture group 1
([^;]*;) Capture the following into capture group 2
-[^;]* Match any character except the semi-colon character ; any number of times
; Match the semi-colon character literally
\2 Matches the same text as most recently matched by the second capture group

Thanks all for your suggestions. Finally i got this working with this regex
(?<=,|^)([^,]*)(?=.*\\b\\1\\b)(?=,|$)

The below is for java.
For duplicate words(consequent/random) you can use the regex string as
\b(\w+)\b(?=.*?\b\1\b
For duplicate characters(consequent/random) in a string you can use
(.)(?=.*?\1)

Related

RegEx to match all sets of items that have part of specific value

I'm trying to use RegEx to filter all sets of items that have part of a specific value in a capture group that I have defined.
I have to check if the fifth capture group contains at least part of a specific text.
My string:
First Item;Second Item;Third Item;Fourth Item;First Word;Sixth
Item?First Item;Second Item;Third Item;Fourth Item;Second Word;Sixth
Item?First Item;Second Item;Third Item;Fourth Item;Can't Capture This
Set;Sixth Item
RegEx that works for exact word:
(?:^|\?)([^;]+);([^;]+);([^;]+);([^;]+);(Second Word);([^;\?$]+)
The problem is that I need this RegEx to work to capture only part of the word.
Not Working:
(?:^|\?)([^;]+);([^;]+);([^;]+);([^;]+);(.*Word.*);([^;\?$]+) >
Thanks!
Use [^;]* instead of .* because you have semi-colons as field delimiters:
(?:^|\?)([^;]+);([^;]+);([^;]+);([^;]+);([^;]*Word[^;]*);([^;?]+)
See proof. ([^;]*Word[^;]*) will match zero or more characters other than semi-colons, then a Word and zero or more characters other than semi-colons.

Regex to find if all the characters in a word are the same specific character

I have a set of words coming in one by one like aa, ##, ???, ~~~, ?~ etc
I need a regex to find if any of these words is containing only ? or only ~.
Of the above input examples, ??? and ~~~ should match but not the others.
I tried ^[\s?]*$ and ^[\s~]*$ separately and it works, I am trying to combine them.
^[\s?||~]*$ doesn't work as it also recognizes ?~ as valid.
Any help?
You can use this regex, which looks for a string starting with a ~ or a ?, and then asserts that every other character in the string is the same as the first one using a backreference (\1):
^([~?])\1+$
Demo on regex101
You need to use backreference to achived your desired result.
If you want only ~ or ? use
^([~?])\1+$
If you want any repetitive pattern, use
^(.)\1+$
Explanation (.) or ([~?]) capturing the first charactor.
Then, \1+ checking the first charactor, one or more times (backreferencing)
You want to match lines that both start and end with any number of either a tilde or questionmark. That would be ^\(~\|?\)*$. The parentheses to make a group and the vertical bar to do the 'or' need to be backslash escaped.

Remove all matching words

I have this text:
"headword":"final"
"headword":"family name"
"headword":"penultimate"
I want to get only
final
family name
penultimate
I tried several regex but no luck to make it work,
this will do the opposite
(\W*(headword))\W*
I tried to negate using [^] does not work
Use the following regex pattern:
(?:"\w+":)"([^"]+)"
https://regex101.com/r/KLPP22/1
[^"]+ - matches all characters except "
The needed values are in the 1st Capturing Group
This seems to work
.+"."(.+)"
https://regex101.com/r/BwFP0z/1
// str is the text you want to replace and first captured group is replaced with whole capture.
str.replace(/(?:"headword":")([^"]+)(?:")/gmi, '$1');
http://codepen.io/asanhix/pen/XpGoKg?editors=0012

Regex for deleting characters before a certain character?

I'm very new at regex, and to be completely honest it confounds me. I need to grab the string after a certain character is reached in said string. I figured the easiest way to do this would be using regex, however like I said I'm very new to it. Can anyone help me with this or point me in the right direction?
For instance:
I need to check the string "23444:thisstring" and save "thisstring" to a new string.
If this is your string:
I'm very new at regex, and to be completely honest it confounds me
and you want to grab everything after the first "c", then this regular expression will work:
/c(.*)/s
It will return this match in the first matched group:
"ompletely honest it confounds me"
Try it at the regex tester here: regex tester
Explanation:
The c is the character you are looking for
.* (in combination with /s) matches everything left
(.*) captures what .* matched, making it available in $1 and returned in list context.
Regex for deleting characters before a certain character!
You can use lookahead like this
.*(?=x)
where x is a particular character or word or string.{using characters like .,$,^,*,+ have special meaning in regex so don't forget to escape when using it within x}
EDIT
for your sample string it would be
.*(?=thisstring)
.* matches 0 to many characters till thisisstring
Here is a one-line solution for matching everything after "before"
print $1."\n" if "beforeafter" =~ m/before(.*)/;
Edit:
While using lookbehind is possible, it's not required. Grouping provides an easier solution.
To get the string before : in your example, you have to use [^:][^:]*:\(.*\). Notice that you should have at least one [^:] followed by any number of [^:]s followed by an actual :, the character you are searching for.

remove repeated character between words

I am trying out the quiz from Regex 101
In Task 6, the question is
Oh no! It seems my friends spilled beer all over my keyboard last night and my keys are super sticky now. Some of the time when I press a key, I get two duplicates. Can you pppllleaaaseee help me fix this? Content in bold should be removed.
I have tried this regex
([a-z])(\1{2})
But couldn't get the solution.
The solution for the riddle on that website is:
/(.)\1{2}/g
Since any key on the keyboard can get stuck, so we need to use ..
\1 in the regex means match whatever the 1st capturing group (.) matches.
Replacement is $1 or \1.
The rest of your regex is correct, just that there are unnecessary capturing groups.
Your regex is correct if you want to match exactly three characters. If you want to match at least three, that is
([a-z])(\1{2,})
or
([a-z])(\1\1+)
Since you don't need to capture anything but the first occurence, these are slightly better:
([a-z])\1{2} # your original regex (exactly three occurences)
([a-z])\1{2,}
([a-z])\1\1+
Now, the replacement should be exactly one occurence of the character, and nothing more:
\1
Replace:
(.)\1+
with:
\1
This of course requires that your regex engine suports backreferences... Also, in the replacement part, and according to regex engines, \1 may have to be written as $1.
I'd do it with (\w)(\1+)? but can't find out how to "remove" within the given site...
Best way would be to replace the results of the secound match with empty strings