Extracting days from a string Regex - regex

I am trying to extract the days using regex groups in C# from the following string,
"RRULE:FREQ=MONTHLY;UNTIL=20211126T143000Z;INTERVAL=1;BYDAY=MO,TU,WE,TH,FR;BYSETPOS=-1"
I am new to regular expressions and looked at various websites to try write an expression the expression i have got so far is the following
(?:BYDAY=)([A-Z,]*);
Which matches
MO,TU,WE,TH,FR;
as a whole, which i can then use ',' in a split to achieve what I want, I wanted to know if there is a way of doing this purely in Regex.

If a quantifier in the lookbehind is supported, you might use:
(?<=BYDAY=[A-Z,]*)[A-Z]+
Explanation
(?<= Positive lookbehind, assert what is on the left is
BYDAY=[A-Z,]* match BYDAY= followed by 0 or more times A-Z or ,
) Close lookbehind
[A-Z]+ Match 1+ chars A-Z
.Net regex demo | C# demo by WiktorStribiżew
Alternatively you can make use of the \G anchor to get iterative matches and capture the value in group 1
(?:\G(?!^)|BYDAY=)([A-Z]+),?
Regex demo

Related

Find match within a first match

I have the following string
abc123+InterestingValue+def456
I want to get the InterestingValue only, I am using this regex
\+.*\+
but the output it still includes the + characters
Is there a way to search for a string between the + characters, then search again for anything that is not a + character?
Use lookarounds.
(?<=\+)[^+]*(?=\+)
DEMO
You can use a positive lookahead and a positive lookbehind (more info about these here). Basically, a positive lookbehind tells the engine "this match has to come before the next match", and a positive lookahead tells the engine "this has to come after the previous match". Neither of them actually match the pattern they're looking for though.
A positive lookbehind is a group beginning with ?<= and a positive lookahead is a group beginning with ?=. Adding these to your existing expression would look like this:
(?<=\+).*(?=\+)
regex101
If it should be the first match, you can use a capture group with an anchor:
^[^+]*\+([^+]+)\+
^ Start of string
[^+]* Optionally match any char except + using a negated character class
\+ Match literally
([^+]+) Capture group 1, match 1+ chars other than +
\+ Match literally
Regex demo

Regex: Match pattern unless preceded by pattern containing element from the matching character class

I am having a hard time coming up with a regex to match a specific case:
This can be matched:
any-dashed-strings
this-can-be-matched-even-though-its-big
This cannot be matched:
strings starting with elem- or asdf- or a single -
elem-this-cannot-be-matched
asdf-this-cannot-be-matched
-
So far what I came up with is:
/\b(?!elem-|asdf-)([\w\-]+)\b/
But I keep matching a single - and the whole -this-cannot-be-matched suffix. I cannot figure it out how to not only ignore a character present inside the matching character class conditionally, and not matching anything else if a suffix is found
I am currently working with the Oniguruma engine (Ruby 1.9+/PHP multi-byte string module).
If possible, please elaborate on the solution. Thanks a lot!
If a lookbehind is supported, you can assert a whitespace boundary to the left, and make the alternation for both words without the hyphen optional.
(?<!\S)(?!(?:elem|asdf)?-)[\w-]+\b
Explanation
(?<!\S) Assert a whitespace boundary to the left
(?! Negative lookahead, assert the directly to the right is not
(?:elem|asdf)?- Optionally match elem or asdf followed by -
) Close the lookahead
[\w-]+ Match 1+ word chars or -
\b A word boundary
See a regex demo.
Or a version with a capture group and without a lookbehind:
(?:\s|^)(?!(?:elem|asdf)?-)([\w-]+)\b
See another regex demo.

Regex exclude character from the group

I am trying to write this regex to match dots with a few rules
(\.+ *|([a-zA-ZÀ-ž]\.\d))(?=[^\d{1}(\.\d{1})])(?=[^.,])
But my regex is matching few characters before and after the dot as well
For example:
č.1 > match č.1 (incorrect, match should be only .)
St.M > match . (correct)
2.0 > no match (correct)
Do you have any idea, how to "exclude" these other characters from the result and match only the dot?
Thanks for your help
You could shorten the pattern using a positive lookbehind (?<=) asserting the character class with the specific ranges to the left.
(?<=[a-zA-ZÀ-ž])\.
Regex demo
As per the comments, the pattern with the positive lookahead
(?<=[a-zA-ZÀ-ž])\.+ *(?=[^.,])
Regex demo

Get the first character using Regex

I'm using regex trying to get the first character of a specific word between (.*?)
About Sildenafil Citrate Phosphodiesterase-5 Enzyme Inhibitor
and the regex:
Citrate (.*?)Enzyme
So I get match Phosphodiesterase-5
But I need to get only the first character P
You could use the capturing group to capturing a single non whitespace char (\S) and use word boundaries \b :
\bCitrate (\S).*? Enzyme\b
Regex demo
Changing your regex to Citrate (.).*?Enzyme would be enough. This captures the first character after "Citrate ".
If your environments supports lookaround you try this pattern
(?<=\bCitrate ).(?=.*?Enzyme\b)
(?<=\bCitrate ) - Positive lookbehind, match must be preceded by \bCitrate
. - Match anything expect new line
(?=.*?Enzyme\b) - Positive lookahead, match must be followed by .*?Enzyme\b
Regex Demo

JavaScript regex to match a character only if preceded with another character

I have the following regular expression:
[^0-9+-]|(?<=.)[+-]
This regex matches either a non-digit and not + and - or +/- preceded by something. However, positive lookbehind isn't supported in JavaScript regex. How can I make it work?
The (?<=.) lookbehind just makes sure the subsequent pattern is not located at the start of the string. In JS, it is easy to do with (?!^) lookahead:
[^0-9+-]|(?!^)[+-]
^^^^^
See the regex demo (cf. the original regex demo).