regex stop at char not working - regex

I have the following string:
'"var1":"val1","var2":"val2","var3":"val3"'
I want to extract val2 via regex
/var2":"(.*)"/g
gets everything after var2":"
/var2":"(?=[^"])"/g
does not get any matches

Your second regex is incorrect and should be
/var2":"([^"]*)"/g
Explanation:
(?=[^"])" means: "First make sure that the next character is anything but a ". Then match a ". That's obviously a) impossible and b) not what you wanted :)
In contrast to that, ([^"]*) means: "Match any number (including zero) of characters that aren't "s, then capture that submatch in group 1."

I think you're almost there. You need to replace the lookahead with a simple group. Also, need to add a * after the character class:
var2":"([^"]*)"
This will match the whole "var2":"..." and the first group will contain val1.
I'd add a few optional spaces around the colon (demo):
var2" *: *"([^"]*)"
There is a caveat though: the value of the var2 cannot contain quotes.

seems javascript? please try: /var2":"(.*?)"/g, or /var2":"(?:[^"]+?)"/g

Related

Is there a RegEx to remove the first instance of "."?

I am trying to remove the first dot "." from a sequence of numbers like this: 2500155978.06. intending to have 250015597806.
Typically, I try to only match what I need and substitute later, i.e. match all "." and then remove the first match. I have been trying with ^[^.]+ but I am only getting the digits up to the first "."
Thought about using a capture group with a positive lookahead but it got me nowhere (still learning RegEx).
Thank you in advance for your time and assistance!
You can use
^(\d+)\.
and replace with $1, the placeholder pointing to the value stored in Group 1.
See the regex demo. Details:
^ - start of string
(\d+) - Group 1 (later referred to with $1 from the replacement pattern): one or more digits
\. - a dot.

Exclude everything between quotes

I've worked with regex for super simple stuff.
Now I came up with a situation that my knowledge isn't sufficient.
I need to get this info out of a lot of lines.
Everything after the first quotes and before the # sign is what I need to have to copy to a new file.
0: "mailname#…"
6: "mailname2#yahoo.com"
etc..
I first did the following
(?<=")\S\D[^"]+(?=")
But this takes everything in between the quotes. But it should exclude everything out of the quotes and just give me the mail address or the name before the # sign.
This is what I have so far before the mail and I'm stuck to remove the # and everything behind it.
(\d{0,2})([:])\s(["(.+)"$])
First, take a copy of your file, then use this in notepad++
Find what: ^.*"(.+)#.+
Replace with: $1
If you want to find and match the parts, you could use
(?<=")[^"\s#]+(?=#[^\s"]+")
The pattern matches:
(?<=") Positive lookbehind to assert " to the left
[^"\s#]+ Match 1+ occurrences of any char except " a whitespace char or #
(?=#[^\s"]+") Positive lookahead to assert 1+ times any char except " a whitspace char or " followed by a " at the right
Regex demo
If using a quantifier in the lookbehind is supported, a bit more precise match asserting from the start of the string taking the digit and the colon into account:
(?<=^\d+:\s")[^"\s#]+(?=#[^\s"]+"$)
Regex demo

Remove all matching words

I have this text:
"headword":"final"
"headword":"family name"
"headword":"penultimate"
I want to get only
final
family name
penultimate
I tried several regex but no luck to make it work,
this will do the opposite
(\W*(headword))\W*
I tried to negate using [^] does not work
Use the following regex pattern:
(?:"\w+":)"([^"]+)"
https://regex101.com/r/KLPP22/1
[^"]+ - matches all characters except "
The needed values are in the 1st Capturing Group
This seems to work
.+"."(.+)"
https://regex101.com/r/BwFP0z/1
// str is the text you want to replace and first captured group is replaced with whole capture.
str.replace(/(?:"headword":")([^"]+)(?:")/gmi, '$1');
http://codepen.io/asanhix/pen/XpGoKg?editors=0012

RegEx: Match everything up to the last space without including it

I'd like to match everything in a string up to the last space but without including it. For the sake of example, I would like to match characters I put in bold:
RENATA T. GROCHAL
So far I have ^(.+\s)(.+) However, it matches the last space and I don't want it to. RegEx should work also for other languages than English, as mine does.
EDIT: I didn't mention that the second capturing group should not contain a space – it should be GROCHAL not GROCHAL with a space before it.
EDIT 2: My new RegEx based on what the two answers have provided is: ^((.+)(?=\s))\s(.+) and the RegEx used to replace the matches is \3, \1. It does the expected result:
GROCHAL, RENATa T.
Any improvements would be desirable.
^(.+)\s(.+)
with substitution string:
\2, \1
Update:
Another version that can collapse extra spaces between the 2 capturing groups:
^(.+?)\s+(\S+)$
Use a positive lookahead assertion:
^(.+)(?=\s)
Capturing group 1 will contain the match.
I like using named capturing groups:
rawName = RENATA T. GROCHAL
RegexMatch(rawName, "O)^(?P<firstName>.+)\s(?P<lastName>.+)", match)
MsgBox, % match.lastName ", " match.firstName

Regex to match one or two quotes but not three in a row

For the life of me I can't figure this one out.
I need to search the following text, matching only the quotes in bold:
Don't match: """This is a python docstring"""
Match: " This is a regular string "
Match: "" ← That is an empty string
How can I do this with a regular expression?
Here's what I've tried:
Doesn't work:
(?!"")"(?<!"")
Close, but doesn't match double quotes.
Doesn't work:
"(?<!""")|(?!"")"(?<!"")|(?!""")"
I naively thought that I could add the alternates that I don't want but the logic ends up reversed. This one matches everything because all quotes match at least one of the alternates.
(Please note: I'm not running the code, so solutions around using __doc__ won't help, I'm just trying to find and replace in my code editor.)
You can use /(?<!")"{1,2}(?!")/
DEMO
Autopsy:
(?<!") a negative look-behind for the literal ". The match cannot have this character in front
"{1,2} the literal " matched once or twice
(?!") a negative look-ahead for the literal ". The match cannot have this character after
Your first try might've failed because (?!") is a negative look-ahead, and (?<!") is a negative look-behind. It makes no sense to have look-aheads before your match, or look-behinds after your match.
I realized that my original problem description was actually slightly wrong. That is, I need to actually only match a single quote character, unless if it's part of a group of 3 quote characters.
The difference is that this is desirable for editing so that I can find and replace with '. If I match "one or two quotes" then I can't automatically replace with a single character.
I came up with this modification to h20000000's answer that satisfies that case:
(?<!"")(?<=(?!""").)"(?!"")
In the demo, you can see that the "" are matched individually, instead of as a group.
This works very similarly to the other answer, except:
it only matches a single "
that leaves us with matching everything we want except it still matches the middle quotes of a """:
Finally, adding the (?<=(?!""").) excludes that case specifically, by saying "look back one character, then fail the match if the next three characters are """):
I decided not to change the question because I don't want to hijack the answer, but I think this can be a useful addition.