Regular expression that matches between quotes - regex

I want to write a regular expression that match string in quotes except quotes in my quotes.
For example:
My string:
"Good programm\",\"pls help"
I want to get:
Good programm\",\"pls help

Try (?<=").*(?=") check online: http://regexr.com?349d2

As long as you don't have nested structures you can try this:
(?<=")(?:[^"]|(?<=\\)")*(?=")
See it here on Regexr
(?<=") positive lookbehind assertion, ensures there is a " before the match (Try if it is working for you, in Regexr it is.)
(?:[^"]|(?<=\\)") Alternation: matches either a character that is not a ", or a " that is escaped (ensured by the lookbehind (?<=\\)).
* The character from the alternation is matches 0 or more times.
(?=") positive lookahead assertion, ensures there is a " after the match
But be careful: It matches across newlines and also between escaped ", when there are no non escaped quotes available.
Regexr

Related

Regular expression to match strings for syntax highlighter

I'm looking for a regular expression that matches strings for a syntax highlighter used in a code editor. I've found
(")(?:(?!\1|\\).|\\.)*\1
from here regex-grabbing-values-between-quotation-marks (I've changed the beginning since I only need double quotes, no single quotes)
The above regular expression correctly matches the following example having escaped double quotes and escaped backslashes
"this is \" just a test\\"
Most code editors however also highlight open ended strings such as the following example
"this must \" match\\" this text must not be matched "this text must be matched as well
Is it possible to alter the above regular expression to also match the open ended string? Another possibility would be a second regular expression that just matches the open ended string such as
"[^"]*$ but match only if preceded by an even count of non-escaped quotes
You could use an alternation to match either a backreference to group 1 or assert the end of the string with your current pattern.
(")(?:(?!\1|\\).|\\.)*(?:\1|$)
But as you are only capturing a single character (") you can omit the capture group and instead of the backreference \1 just match "
Alternatively written pattern:
"[^"\\]*(?:\\.[^"\\]*)*(?:"|$)
See a regex demo.
If the match should not start with \" and a lookbehind is supported:
(?<!\\)"[^"\\]*(?:\\.[^"\\]*)*(?:"|$)
This pattern matches:
(?<!\\) Negative lookbehind, assert not \ directly to the left
" Match the double quote
[^"\\]* Optionally match any char except " or \
(?:\\.[^"\\]*)* Optionally repeat matching \ and any char followed by any char except " or \
(?:"|$) Match either " or assert the end of the string.

Regex: How to get all words, special characters and white spaces between quotation marks?

Currently I have a regex expression ([^\[\][\[^\[\][\n"]+) to match text between "", but this does not capture whitespaces, for e.g. if I enter " hello ", it will return hello, without the spaces before and after the word.
Is there some expression I can use to just simply catch anything between two quotation marks?
Thank you.
Maybe this will help:
(?<!\\)(\"|')(.+?)(?:(?<!\\)\1)
And to get the text inside the quotes, get the second capture group.
Proof.
Explanation
(?<!\\) - Negative lookbehind. Looks for literal backslash ('')
(\"|') - to test for the start of the "string"
(.+?) - . will match anything but newlines.
+? means as much as possible but only as much needed to match.
(?:(?<!\\)\1) - Non capturing group.
Used here so we can use the (?<!\\) described earlier without looking behind the whole expression. The
\1 matches the first capture group ((\"|')). Can be replaced with $1
You should use following regex:
\"\s*([^\"]+?)\s*\"
([^\"]+?)The text you want to get will be between space and quote.
Demo & Explanation

Trying to match string A if string B is found anywhere before it

What I'm trying to do is, if a string consists of some substring that starts with "!" encapsulated in "[" and "]", to separate those brackets from the rest of the string via a space, e.g. "[!foo]" --> "[ !foo ]", "[!bar]" --> "[ !bar ]", etc. Since that substring can be variable length, I figured this had to be done with regex. My thought was to do this in two steps - first separate the first bracket, then separate the second bracket.
The first one isn't hard; the regex is just \[! and so I can just do str = str.replace(/\[!/g, "[ !"); in Javascript. It's the second part I can't get to work.
Because now, I need to match "]" if the string literal "[ !" is found anywhere before it. So a simple positive lookbehind doesn't match because it only looks directly behind: (?<=\Q[ !\E)\] doesn't match.
And I still don't understand why, but I'm not allowed to make the positive lookbehind non-fixed length; (?<=\Q[ !\E.*)\] throws the error Syntax Error: Invalid regular expression: missing / in the console, and this regex debugger yields a pattern error explaining "A quantifier inside a lookbehind makes it non-fixed width".
Putting a non-capturing group of non-fixed width between the lookbehind and the capturing group doesn't work; (?<=\Q[ !\E)(?:.*)\] doesn't match.
One thing that won't work is just trying to match "[ !" at the start of the string, because this whole "[!foo]" string is actually itself a substring of an even bigger string and isn't at the beginning.
What am I missing?
Using 2 positive lookarounds, you can assert what is on the left is an opening square bracket (?<=\[)
Then match any char except ] using a negated character class ![^[\]]+ preceded by an exclamation mark and assert what is on the right is a closing square bracket using (?=])
Note that in Javascript the lookbehind is not yet widely supported.
(?<=\[)![^[\]]+(?=])
In the replacement use the matched substring $&
Regex demo
[
"[!foo]",
"[!bar]"
].forEach(s =>
console.log(s.replace(/(?<=\[)![^[\]]+(?=])/g, " $& "))
)
Or you could also use 3 capturing groups instead:
(\[)(![^\]]+)(\])
In the replacement use
$1 $2 $3
Regex demo
[
"[!foo]",
"[!bar]"
].forEach(s =>
console.log(s.replace(/(\[)(![^\]]+)(\])/g, "$1 $2 $3"))
)
You can use this regex: \[!([^]]+)\] with this substitution string [! \1 ].
Explanation:
The regex:
\[!: match begins with [!
([^]]+): capture in group 1 all the characters that are not ]
\]: match ]
The substitution: substitute the full match with [!{contents of group 1}].
Regex Demo
I hope it helps.

Delete String Within Quotation Marks

I have a xml file with these datas:
PONumber="HC01/1501/000001"
PONumber="HC01/1501/000002"
PONumber="HC01/1501/000003"
PONumber="HC01/1501/000004"
...
PONumber="HC01/1501/000100"
What i want is to delete 'HC01/1501/000001' until 'HC01/1501/000100'.
How to do it using regular expression to replace them with empty string
Thanks in advance
The below regex would replace chars present present with double quotes with an empty string.
Regex:
"[^"]*"
" - matches double quotes.
[^"]* - negated character class which matches any character but not of double quotes, zero or more times.
"- Matches the ending double quote.
So this matches a complete double quoted block. So by replacing the matched double quoted block with "" will give you the expected output.
Replacement string:
""
(?<=").*?(?=")
You can use lookaheads here.See demo.Replace by empty string
https://regex101.com/r/sJ9gM7/85
For each line you can replace the matches string of following regex
(?<==).*
With ''.
Demo
(?<=) is a positive look-behind and (?<==).* will match every thing after =.
If thats only the data that you have, use the RegEx /".*"/ and replacement as "".
Demo & Explanation
Else, use this RegEx: /"HC01\/1501\/000(0[0-9][0-9]|100)"/g
and the replacement string as "".
Demo & Explanation

Regex to match a string ignoring \"

I current have this regex
"[^"]*"
I am testing it againts this string (i am using http://regexpal.com/ so it has not been string encoded yet!)
"This is a test \"Text File\"" "This is a test \"Text File\""
Currently it is matching
"This is a test \"
""
"This is a test \"
""
I would like it have the following matches
"This is a test \"Text File\""
"This is a test \"Text File\""
Basicly I want it to match something that starts with " and ends with " but ignore anything in the middle that is \". What do i need to add to my regex to acheive this?
Thanks in advance
Then best way of doing this depends on the matching capabilities are of your regex engine (many of them have varying support for various features). For just a bare-bones regex engine that does not support any kind of look-behind capabilities, this is what you want: "([^"]*\\")*[^"]*"
This will match a quote, followed by zero or more pairs of non-quote sequences and \" sequences, followed by a required non-quote sequence, and finally a final quote.
(\\"|[^"])+
will match \" as well as any character that is not "
Regex for DART:
RegExp exp = new RegExp(r"(".*?"")");
http://regex101.com/r/hM5pI7
EXPLANATION:
Match the regular expression below and capture its match into backreference number 1 «(".*?"")»
Match the character “"” literally «"»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the characters “""” literally «""»