Regex to match a string ignoring \"

Regex to match a string ignoring \" - regex

I current have this regex
"[^"]*"
I am testing it againts this string (i am using http://regexpal.com/ so it has not been string encoded yet!)
"This is a test \"Text File\"" "This is a test \"Text File\""
Currently it is matching
"This is a test \"
""
"This is a test \"
""
I would like it have the following matches
"This is a test \"Text File\""
"This is a test \"Text File\""
Basicly I want it to match something that starts with " and ends with " but ignore anything in the middle that is \". What do i need to add to my regex to acheive this?
Thanks in advance

Then best way of doing this depends on the matching capabilities are of your regex engine (many of them have varying support for various features). For just a bare-bones regex engine that does not support any kind of look-behind capabilities, this is what you want: "([^"]*\\")*[^"]*"
This will match a quote, followed by zero or more pairs of non-quote sequences and \" sequences, followed by a required non-quote sequence, and finally a final quote.

(\\"|[^"])+
will match \" as well as any character that is not "

Regex for DART:
RegExp exp = new RegExp(r"(".*?"")");
http://regex101.com/r/hM5pI7
EXPLANATION:
Match the regular expression below and capture its match into backreference number 1 «(".*?"")»
Match the character “"” literally «"»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the characters “""” literally «""»

Related

How to match new-line character using regex

I'm trying to match the special characters and the breaks using a regex to cleanse the string so that I'm left with only the following string 'All release related activities' extracted from the following line:
{"Well":"\n\n\t\n\t\n\n\t\n\tAll release related activities\n\n\t"}
I've tried the regex ^{.Well":" and I'm able to match till the first colon appears. How do I match the \n characters in the string?

I am not quite sure about the prefix of "well:" So I am basically providing you with a basic regex:
^\{[^}]*?(?:\\[ntr])+([^}]+)\}
and replace by:
\1
Example

Try:
/":"(?:\\[nt])*(.*)}"$/
See Regex Demo
":" Matches ":".
(?:\\[nt])* Matches 0 or more occurrences of either \n or \t.
(.*) Matches 0 or more characters in Capture Group 1 until:
}"$ Matches }" followed by the end of the string.
The string you are looking for is in Capture Group 1.

Regex to catch 14 digits number that starts with 25 within a text string [duplicate]

I'm quite new to regular expressions. Could you help me to create a pattern, which matches whole words, containing specific part? For example, if I have a text string "Perform a regular expression match" and if I search for express, it shuld give me expression, if I search for form, it should give me Perform and so on. Got the idea?

preg_match('/\b(express\w+)\b/', $string, $matches); // matches expression
preg_match('/\b(\w*form\w*)\b/', $string, $matches); // matches perform,
// formation, unformatted
Where:
\b is a word boundary
\w+ is one or more "word" character*
\w* is zero or more "word" characters
See the manual on escape sequences for PCRE.
* Note: although not really a "word character", the underscore _ is also included int the character class \w.

This matches 'Perform':
\b(\w*form\w*)\b
This matches 'expression':
\b(\w*express\w*)\b

Regex Match % but not \%

I'm struggling to find the right regex for the case where I want to match a '%' but only if it's not preceded by a '\' between quotations.
For example I want this to come back as a match
Test \" this % matches \" test
But not match
Test \" this \% doesn't match \" test
Would a regex master be willing to assist me with this?!
Ultimately my goal is to ensure every '%' is escaped when found within quotations.
Edit:
Here's what I have right now
This is currently what I have but definitely isn't correct.
\".[%][^\%].\"

([^\\]|\\[^%])*
It looks like this seems to work in my tests on https://regex101.com/
The sections are ( [^\\] | \\[^%] )*
The ()* means 0 or more of the contained group.
The contained group is either [^\\] or \\[^%]. The first case is "any character that is not a backslash," which include the percent sign. The second case is "a backslash followed by any character that is not a percent sign."
The [^ ] operator is "any character except these."

Is there a way to "recall" a char sequence already matched in the regex itself?

The regex I'm searching has the following constraints:
it starts with "//"
then "[" a non number sequence (called delimiter in this list) and "]"
next line "\n"
"[" 0 or more number separated by the delimiter previously found "]".
For example the following text matches the regex:
//[*#*]
[1*#*34*#*64]
and the following text doesn't match the regex:
//[*#*]
[1#34#64]
because the delimiter is not the same matched in the first row
The regex I currently create is
^//\[(\D)+\]\n\[[(\d)+(\D)+]*(\d)+\]$|^//\[(\D)+\]\n\[\]$|^//\[(\D)+\]\n\[(\d)+\]$
but obviously this regex match with both previous examples.
Is there a way to "recall" a char sequence already matched in the regex itself?

You need something called back-reference (a very good tutorial here).
Use this regex in Python:
r'^//\[([^\]]+)\]\n\[\d+(\1\d+)*\]'
Sample run:
>>> string = """//[*#*]
... [1*#*34*#*64]"""
>>> print re.search(r'^//\[([^\]]+)\]\n\[\d+(\1\d+)*\]',string).group(0)
//[*#*]
[1*#*34*#*64]
will match your string in Python.
Debuggex Demo

You need to use a back-reference, in most languages you can reference a matching group using \n where n is the group number.
This pattern will work:
//\[([^]]++)]\n\[(?>\d++\1?)+]
To break it down:
// just matches the literal
\[([^]]++)] matches some characters in square brackets
\n matches the newline
\[(?:\d++\1?)++] matches one or more digits followed by the match captured in the first pattern section - optionally. This is an atomic group.

Regular Expression:- String can contain any characters but should not be empty

My requirement is
"A string should not be blank or empty"
Eg., A String can contain any number of characters or strings followed by any special characters but should never be empty for eg., a string can contain "a,b,c" or "xyz123abc" or "12!#$#%&*()9" or " aa bb cc "
So, this is what i tried
Regex for blank or space:-
^\s*$
^ is the beginning of string anchor
$ is the end of string anchor
\s is the whitespace character class
* is zero-or-more repetition of
I'm stuck on how to negate the regex ^\s*$ so that it accepts any string like "a,b,c" or "xyz" or "12!#$#%&*()9"
Any help is appreciated.

No need for a regex. In Groovy you have the isAllWhitespace method:
groovy:000> "".allWhitespace
===> true
groovy:000> " \t\n ".allWhitespace
===> true
groovy:000> "something".allWhitespace
===> false
So asking !yourString.allWhitespace should tell you if your string is something else than empty or blank :)

\S
\S matches any non-white space character
Each character class has it's own anti-class defined, so for \w you have \W for \s you have \S for \d you have \D etc.
http://www.regular-expressions.info/charclass.html
Your regex engine may not support \S. If this is the case you use [^ \t\v] if you support unicode (which you should) there are more space types that you should watch for.
If both your regex engine and you support unicode AND \S is not supported by your regex engine then you'll probably want to use (if you care about people entering different unicode space types):
[^ \r\f\t\v\u0085\u00A0\u1680\u180E\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u200B\u2028\u2029\u202F\u205F\u3000\uFEFF]
http://www.cs.tut.fi/~jkorpela/chars/spaces.html
http://en.wikipedia.org/wiki/Whitespace_character#Unicode

to me two simple ways to express it are (both no need for anchoring):
s.trim() =~ /.+/
or
s =~ /\S+/
the first assumes you know how trim() works, the second assumes the meaning of \S.
Of course
!s.allWhitespace
is perfect, again if you know it exists

The following regular expression will ensure that a string contains at least 1 non-whitespace character.
^(?!\s*$).+
Note: I am not familiar with groovy. But I would imagine there is a native functions (trim, empty, etc) that test this more naturally than a regular expression.

is this in a grails domain class?
if so, just use the blank constraint

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to match a string ignoring \" - regex

(\\"|[^"])+ will match \" as well as any character that is not "

Related

How to match new-line character using regex

Regex to catch 14 digits number that starts with 25 within a text string [duplicate]

Regex Match % but not \%

Is there a way to "recall" a char sequence already matched in the regex itself?

Regular Expression:- String can contain any characters but should not be empty

Categories

Resources