Remove all matching words - regex

I have this text:
"headword":"final"
"headword":"family name"
"headword":"penultimate"
I want to get only
final
family name
penultimate
I tried several regex but no luck to make it work,
this will do the opposite
(\W*(headword))\W*
I tried to negate using [^] does not work

Use the following regex pattern:
(?:"\w+":)"([^"]+)"
https://regex101.com/r/KLPP22/1
[^"]+ - matches all characters except "
The needed values are in the 1st Capturing Group

This seems to work
.+"."(.+)"
https://regex101.com/r/BwFP0z/1

// str is the text you want to replace and first captured group is replaced with whole capture.
str.replace(/(?:"headword":")([^"]+)(?:")/gmi, '$1');
http://codepen.io/asanhix/pen/XpGoKg?editors=0012

Related

Replace duplicates Items from a string using Regex

I have a string which looks something like this
xyz 123;abc;xyz 123;efg;
I want to remove the duplicates and keep only one occurrence in the string. I want the output to be like this
xyz 123;abc;efg;
I tried using (?<=;|^)([^;]*);(\1)+(?=;|$) but couldn't figure out how to remove one of the duplicates. Any suggestions ?
Brief
Since you didn't specify a language, I'll assume the tokens in your original regex are all working in whatever language you're using.
Code
See regex in use here
(([^;]*;).*)\2
Replace with \1
Explanation
(([^;]*;).*) Capture the following into capture group 1
([^;]*;) Capture the following into capture group 2
-[^;]* Match any character except the semi-colon character ; any number of times
; Match the semi-colon character literally
\2 Matches the same text as most recently matched by the second capture group
Thanks all for your suggestions. Finally i got this working with this regex
(?<=,|^)([^,]*)(?=.*\\b\\1\\b)(?=,|$)
The below is for java.
For duplicate words(consequent/random) you can use the regex string as
\b(\w+)\b(?=.*?\b\1\b
For duplicate characters(consequent/random) in a string you can use
(.)(?=.*?\1)

RegEx: How to exclude the first two characters from selection

If you look at this text:
FIRST TEXT (IF CAPS AND IF IT ENDS WITH A PERIOD) SHOULD BE EXCLUDED. Here comes all the text we want to grab. And the ONLY problem with our current regular expression is that it also includes the period and space in front of this text. Does anyone know how to fix it so we grab from "Here comes..." and not ". Here comes..."? Thank you.
My current regular expression looks like this: (?![A-ZÆØÅ!´'/0-9\s()]+[.])[^=]*
But I simply can't figure out how to exclude the first ". " from the selection. Can anyone please help? You can try it out here:
https://regex101.com/r/UpRlOV/3
The dot and spaces are matched because your lookahead pattern does not match only up to the dot. To make sure your match does not start with . + space(s), you may consume them if they are present. An optional non-capturing group is quite handy in such situations:
(?![A-ZÆØÅ!´'\/0-9\s()]+[.])(?:\.\s*)?\K[^=]+
^^^ ^^
or, if your regex engine does not support \K match reset operator, use a capturing group:
(?![A-ZÆØÅ!´'\/0-9\s()]+[.])(?:\.\s*)?([^=]+)
^ ^
See the regex demo.
.+?\.(.+)
Is something like this what you're looking for?
this way you can just grab group 1 from the result
https://regex101.com/r/1Eo38B/1

regex stop at char not working

I have the following string:
'"var1":"val1","var2":"val2","var3":"val3"'
I want to extract val2 via regex
/var2":"(.*)"/g
gets everything after var2":"
/var2":"(?=[^"])"/g
does not get any matches
Your second regex is incorrect and should be
/var2":"([^"]*)"/g
Explanation:
(?=[^"])" means: "First make sure that the next character is anything but a ". Then match a ". That's obviously a) impossible and b) not what you wanted :)
In contrast to that, ([^"]*) means: "Match any number (including zero) of characters that aren't "s, then capture that submatch in group 1."
I think you're almost there. You need to replace the lookahead with a simple group. Also, need to add a * after the character class:
var2":"([^"]*)"
This will match the whole "var2":"..." and the first group will contain val1.
I'd add a few optional spaces around the colon (demo):
var2" *: *"([^"]*)"
There is a caveat though: the value of the var2 cannot contain quotes.
seems javascript? please try: /var2":"(.*?)"/g, or /var2":"(?:[^"]+?)"/g

RegEx: Match everything up to the last space without including it

I'd like to match everything in a string up to the last space but without including it. For the sake of example, I would like to match characters I put in bold:
RENATA T. GROCHAL
So far I have ^(.+\s)(.+) However, it matches the last space and I don't want it to. RegEx should work also for other languages than English, as mine does.
EDIT: I didn't mention that the second capturing group should not contain a space – it should be GROCHAL not GROCHAL with a space before it.
EDIT 2: My new RegEx based on what the two answers have provided is: ^((.+)(?=\s))\s(.+) and the RegEx used to replace the matches is \3, \1. It does the expected result:
GROCHAL, RENATa T.
Any improvements would be desirable.
^(.+)\s(.+)
with substitution string:
\2, \1
Update:
Another version that can collapse extra spaces between the 2 capturing groups:
^(.+?)\s+(\S+)$
Use a positive lookahead assertion:
^(.+)(?=\s)
Capturing group 1 will contain the match.
I like using named capturing groups:
rawName = RENATA T. GROCHAL
RegexMatch(rawName, "O)^(?P<firstName>.+)\s(?P<lastName>.+)", match)
MsgBox, % match.lastName ", " match.firstName

Trying to figure out how to capture text between slashes regex

I have a regex
/([/<=][^/]*[/=?])$/g
I'm trying to capture text between the last slashes in a file path
/1/2/test/
but this regex matches "/test/" instead of just test. What am I doing wrong?
You need to use lookaround assertions.
(?<=\/)[^\/]*(?=\/[^\/]*$)
DEMO
or
Use the below regex and then grab the string you want from group index 1.
\/([^\/]*)\/[^\/]*$
The easy way
Match:
every character that is not a "/"
Get what was matched here. This is done by creating a backreference, ie: put inside parenthesis.
followed by "/" and then the end of string $
Code:
([^/]*)/$
Get the text in group(1)
Harder to read, only if you want to avoid groups
Match exactly the same as before, except now we're telling the regex engine not to consume characters when trying to match (2). This is done with a lookahead: (?= ).
Code:
[^/]*(?=/$)
Get what is returned by the match object.
The issue with your code is your opening and closing slashes are part of your capture group.
Demo
text: /1/2/test/
regex: /\/(\[^\/\]*?)(?=\/)/g
captures a list of three: "1", "2", "test"
The language you're using affects the results. For instance, JavaScript might not have certain lookarounds, or may actually capture something in a non-capture group. However, the above should work as intended. In PHP, all / match characters must be escaped (according to regex101.com), which is why the cleaner [/] wasn't used.
If you're only after the last match (i.e., test), you don't need the positive lookahead:
/\/([^\/]*?)\/$/