Regex - ignoring quotes - regex

I'm using the regex
(?:^|;)\s*([^=]*[^=\s])\s*=\s*([^;]*[^;\s])
on the following string
"""A"" = .B; ""C"" = .D; ""E"" = .F"
The second capture group ([^;]*[^;\s]) matches the text .B, .D and .F", whilst the first capture group matches the text """A"", "C"" and ""E"".
How can I update this regex to match the text only, i.e., .B, .D and .F, and A, C and E?
I've tried add the quoted to the capture groups, e.g., ([^=\"]*[^=\s]), but this seems to have no affect.

You may match zero or more quotes before the key value and then restrict the [^=\s] character class to avoid matching " by adding it to the class and again match 0+ quotes right after:
(?:^|;)\s*"*([^=]*[^=\s"])"*\s*=\s*([^;]*[^;\s"])
^^ ^ ^^ ^
See the regex demo. Note that [^;]* will also match double quotes if any since it is a greedy pattern.
Details
(?:^|;) - start of string or ;
\s* - 0+ whitespaces
"* - 0+ double quotes
([^=]*[^=\s"]) - Group 1:
[^=]* - 0+ chars other than =
[^=\s"] - a char other than =, whitespace and "
"* - 0+ double quotes
\s*=\s* - a = enclosed with 0+ whitespaces
([^;]*[^;\s"]) - Group 2:
[^;]* - 0+ chars other than ;
[^;\s"] - a char other than ;, whitespace and ".

Related

Regex to capture everything after optional token

I have fields which contain data in the following possible formats (each line is a different possibility):
AAA - Something Here
AAA - Something Here - D
Something Here
Note that the first group of letters (AAA) can be of varying lengths.
What I am trying to capture is the "Something Here" or "Something Here - D" (if it exists) using PCRE, but I can't get the Regex to work properly for all three cases. I have tried:
- (.*) which works fine for cases 1 and 2 but obviously not 3;
(?<= - )(.*) which also works fine for cases 1 and 2;
(?! - )(.+)| - (.+) works for cases 2 and 3 but not 1.
I feel like I'm on the verge of it but I can't seem to crack it.
Thanks in advance for your help.
Edit: I realized that I was unclear in my requirements. If there is a trailing " - D" (the letter in the data is arbitrary but should only be a single character), that needs to be captured as well.
About the patterns that you tried:
- (.*)This pattern will match the first occurrence of - followed by matching the rest of the line. It will match too much for the second example as the .* will also match the second occurrence of -
(?<= - )(.*)This pattern will match the same as the first example without the - as it asserts that is should occur directly to the left
(?! - )(.+)| - (.+) This pattern uses a negative lookahead which asserts what is directly to the right is not (?! - ). As none of the example start with - , the whole line will be matched directly after the negative lookahead due to .+ and the second part after the alternation | will not be evaluated
If the first group of letters can be of varying length, you could make the match either specific matching 1 or more uppercase characters [A-Z]+ or 1+ word characters \w+.
To get a more broad match, you could match 1 or more non whitespace characters using \S+
^(?:\S+\h-\h)?\K\S+(?:\h(?!-\h)\S+)*
Explanation
^ Start of string
(?:\S+\h-\h)? Optionally match the first group of non whitespace chars followed by - between horizontal whitespace chars
\K Clear the match buffer (Forget what is currently matched)
\S+ Match 1+ non whitespace characters
(?: Non capture group
\h(?!-\h) Match a horizontal whitespace char and assert what is directly to the right is not - followed by another horizontal whitespace char
\S+ Match 1+ non whitespace chars
)* Close non capture group and repeat 1+ times to match more "words" separated by spaces
Regex demo
Edit
To match an optional hyphen and trailing single character, you could add an optional non capturing group (?:-\h\S\h*)?$ and assert the end of the string if the pattern should match the whole string:
^(?:\S+\h-\h)?\K\S+(?:\h(?!-\h)\S+)*\h*(?:-\h\S\h*)?$
Regex demo
You may use
^(?:.*? - )?\K.*?(?= - | *$)
^(?:.*?\h-\h)?\K.*?(?=\h-\h|\h*$)
See the regex demo
Details
^ - start of string
-(?:.*? - )? - an optional non-capturing group matching any 0+ chars other than line break chars as few as possible up to the first space-space
\K - match reset operator
.*? - any 0+ chars other than line break chars as few as possible
(?= - | *$) - space-space or 0+ spaces till the end of string should follow immediately on the right.
Note that \h matches any horizontal whitespace chars.
^(?:[A-Z]+ - \K)?.*\S
demo
Since "Something Here" can be anything, there's no reason to specially describe the eventual last letter in the pattern. You don't need something more complicated.
With this pattern I assume that you are not interested by the trailing spaces, that's why I ended it with \S. If you want to keep them, remove the \S and change the previous quantifier to +.

Is there a regex to grab all spaces that separate key value pairs

Is there a regex to extract all spaces that separate key+value pairs and ignoring those delimited by double quotes
sample:
key1=value1 key1=value1 spaces="some spaces in text" nested1="key2=value2 key2=value2 key2=value2" nested2="key2=value2, key2=value2, key2=value2" quoted="his name is \"no body\""
this is where i come for so far: (?<!,) (?=\w+=), but of course it doesn't work.
[^\s="]+\s*=\s*(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^\s=]+)\K[ \t]+
PCRE demo
No need to write back. just matches space delimiters.
can replace with new delimiter
([^\s="]+\s*=\s*(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^\s=]+))([ \t]+)
Python demo
Can write back \1 or \2 if needed.
can replace with new delimiter
note - the part of the above expressions matching the field info
could benifit by placing Atomic group around (?>) but not strictly
necessary as the field structure is fairly concise.
are other options to garantee integrity as well like matching every
character with the use of the \G anchor if availibul.
let me know if need this approach.
many ways to go here
Here is another option:
".*?(?<!\\)"(*SKIP)(*F)| +
See the online demo
Please do let me know if it actually does what is required as I'm unsure. Anyways, here is a breakdown:
" - A literal double quote.
.*? - Anything but newline zero or more times but lazy.
(?<!\\) - A negative lookbehind for \.
" - A literal double quote.
(*SKIP)(*F) - Consume all characters of matches, force a failure and continue matching.
| - Alternation.
+ - One or more space characters.
If it's Python you are using, you'll need a reference to the PypI regex module.
You could do that with the following PCRE-compatible regular expression.
\G[^" \n]*(?:(?<!\\)"(?:[^\n"]|(?<=\\)")*(?<!\\)"[^" \n]*)*\K +
Start your engine!
\G : assert position at the end of the previous match
or the start of the string for the first match
[^" \n]* : match 0+ chars other than those in char class
(?: : begin non-capture group
(?<!\\) : use negative lookbehind to assert next char is not
preceded by a backslash
" : match double-quote
(?: : begin non-capture group
[^"\n] : match a char other than those in char class
| : or
(?<=\\) : use positive lookbehind to assert next char is
preceded by a backslash
" : match double-quote
) :end non-capture group
* : match non-capture group 0+ times
(?<!\\) : use negative lookbehind to assert next char is not
" : match double-quote
[^" \n]* : match 0+ chars other than those in char class
) : end non-capture group
* : match non-capture group 0+ times
\K : forget everything matched so far and reset start of match
\ + : match 1+ spaces

Right regexp for detect changes in mysql config

I need to catch all redefined variables in my.cnf
In my case, they looks like
#basedir = /usr/local/mysql
basedir = /usr
So I need to extract all redefined parameters.
Search criteria that parameter was redefined: file has both strings which starts from #param and param.
Please advice me correct regexp.
You may use
^\h*#\K([_$a-zA-Z0-9]+)(?=\s+=\s.+\R\h*\1\s)
See the regex demo
For the regex to work, use the m multiline modifier and read the file into memory as a single string (you can do it with -0777 options).
Pattern details
^ - start of a line
\h* - 0+ horizontal whitespaces
# - a # char
\K - match reset operator
([_$a-zA-Z0-9]+) - Group 1: any 1 or more ASCII letters, digits, _ and $
(?=\s+=\s.+\R^\h*\1\s) - that is immediately followed with:
\s+ - 1+ whitespaces
= - a = char
\s - whitespace
.+ - 1+ chars other than line break chars
\R - a line break sequence
\h* - 0+ horizontal whitespaces
\1 - same value as in Group 1
\s - whitespace.

Regex pattern for underscore or hyphen but not both

I have a regular expression that is allowing a string to be standalone, separated by hyphen and underscore.
I need help so the string only takes hyphen or underscore, but not both.
This is what I have so far.
^([a-z][a-z0-9]*)([-_]{1}[a-z0-9]+)*$
foo = passed
foo-bar = passed
foo_bar = passed
foo-bar-baz = passed
foo_bar_baz = passed
foo-bar_baz_qux = passed # but I don't want it to
foo_bar-baz-quz = passed # but I don't want it to
You may expand the pattern a bit and use a backreference to only match the same delimiter:
^[a-z][a-z0-9]*(?:([-_])[a-z0-9]+(?:\1[a-z0-9]+)*)?$
See the regex demo
Details:
^ - start of string
[a-z][a-z0-9]* - a letter followed with 0+ lowercase letters or digits
(?:([-_])[a-z0-9]+(?:\1[a-z0-9]+)*)? - an optional sequence of:
([-_]) - Capture group 1 matching either - or _
[a-z0-9]+ - 1+ lowercase letters or digits
(?:\1[a-z0-9]+)* - 0+ sequences of:
\1 - the same value as in Group 1
[a-z0-9]+ - 1 or more lowercase letters or digits
$ - end of string.
Here's a nice clean solution:
^([a-zA-Z-]+|[a-zA-Z_]+)$
Break it down!
^ start at the beginning of the text
[a-zA-Z-]+ match anything a-z or A-Z or -
| OR operator
[a-zA-Z_]+ match anything a-z or A-Z or _
$ end at the end of the text
Here's an example on regexr!

Regex : Everything in group except white space

I have this regex right here :
^(#include rem\(\s*(.*)),\s*(.*)\)
That matches this string :
#include rem( padding-top, $alert-padding );
I want to be able that the group with $alert-padding ignores the white space at the end. I tried doing :
^(#include rem\(\s*(.*)),\s*(/S)\)
replace the .* by /S but it doesn't match.
You can play around with the regex here :
https://regex101.com/r/9rouVU/1/
You may use \S+ to match 1 or more non-whitespace characters:
^(#include rem\(\s*(\S+))\s*,\s*(\S+)\s*\)
See the regex dem0
Details:
^ - start of string
(#include rem\(\s*(\S+)) - Group 1 capturing:
#include rem\( - a literal substring #include rem(
\s* - 0+ whitespaces
(\S+) - Group 2 capturing 1+ non-whitespace symbols
\s*,\s* - 0+ whitespaces, , and again 0+ whitespaces
(\S+) - 1+ non-whitespace symbols
\s* - 0+ whitespaces
\) - a literal ).
You can make the match in the second group lazy and then match for further optional whitespace:
^(#include rem\(\s*(.*)),\s*(.*?)\s*\)