Looking for help to construct a Regex for pattern matching

Looking for help to construct a Regex for pattern matching - regex

I'm looking for help in making a regex to match and not match a series of name patterns if anyone can help with that.
Here's a list of cases I want to match/ not match :
// Should Match :
_class
c-class
_class-like
_class--variation
_class__children
_class__children--variation
c-custon-button-test
_class__lol--test
c-my-button-super-style
_class--variation-like
// Should not Match :
class
c--class
_class---variation
_class----variation
_class__test__test
_class--variation__children
_like
c-like
noMargin
no-Margin
_no-Margin
no-margin
_class-like__children
_class-like--variation
For now I came up with this regex :
^(c-|_)([a-z]+)(__|--|-)?([a-z]+)(-{0,2}[a-z]+)+(-?(([a-z]-?)+|(like))$)
Which almost work but I still got a match on some case which shouldn't match and I'm afraid I'm struggling to find how to sort the last cases.
(Here's a link to regex101 with unit test and match case: https://regex101.com/r/HNAUpd/1/)
edit : I forgot to mention, about the word "like" it's a keyword in my pattern and can only be found at the end of the string and cannot be the sole word in the string.
edit 2 : As for the rules of matching they're as follow :
A string can start only with "_", "c-" or "js-".
the following word can be anything but not the word "like" and should not be anything else that letter in the range [a-z] and only in lowercase.
The word "like" can only be the last one of the string and must not be the only one in the string.
Words can be separated by "--" or "__".
If the string starts with "c-" the word can then be separated with "-" in addition to the previous separator.
The purpose of all this is for a CSS class/id matcher for a linter.
If anyone can help me with this it would be awesome :)

I think you're looking for something like this:
^(?!.*[\-_]like[\-_])(?:c-|js-|_)(?!like$)(?:[a-z]+(?:__|--?))?[a-z]+(?:--?[a-z]+)*$
Demo
Breakdown:
^ - Beginning of the string.
(?!.*[\-_]like[\-_]) - Doesn't contain the word "like" between two separators (only at the end of the string).
(?:c-|js-|_) - Either "c-", "js-", or "_" at the beginning of the string.
(?!like$) - Not immediately followed by the word "like".
(?:[a-z]+(?:__|--?))? - (optional) one or more a-z letters followed two underscores or one or two hyphens.
[a-z]+ - One or more a-z letters.
(?:--?[a-z]+)* - Match one or two hyphens followed by one or more a-z letters, and repeat zero or more times.
$ - End of string.

Related

Find certain colons in string using Regex

I'm trying to search for colons in a given string so as to split the string at the colon for preprocessing based on the following conditions
Preceeded or followed by a word e.g A Book: Chapter 1 or A Book :Chapter 1
Do not match if it is part of emoticons i.e :( or ): or :/ or :-) etc
Do not match if it is part of a given time i.e 16:00 etc
I've come up with a regex as such
(\:)(?=\w)|(?<=\w)(\:)
which satisfies conditions 2 & 3 but still fails on condition 3 as it matches the colon present in the string representation of time. How do I fix this?
edit: it has to be in a single regex statement if possible

You can use
(:\b|\b:)(?!(?:(?<=\b\d:)|(?<=\b\d{2}:))\d{1,2}\b)
See the regex demo. Details:
(:\b|\b:) - Group 1: a : that is either preceded or followed with a word char
(?!(?:(?<=\b\d:)|(?<=\b\d{2}:))\d{1,2}\b) - there should be no one or two digits right after : (followed with a word boundary) if the : is preceded with a single or two digits (preceded with a word boundary).
Note :\b is equal to :(?=\w) and \b: is equal to (?<=\w):.
If you need to get the same capturing groups as in your original pattern, replace (:\b|\b:) with (?:(:)\b|\b(:)).
More flexible solution
Note that excluding matches can be done with a simpler pattern that matches and captures what you need and just matches what you do not need. This is called "best regex trick ever". So, you may use a regex like
8:|:[PD]|\d+(?::\d+)+|(:\b|\b:)
that will match 8:, :P, :D, one or more digits and then one or more sequences of : and one or more digits, or will match and capture into Group 1 a : char that is either preceded or followed with a word char. All you need to do is to check if Group 1 matched, and implement required extraction/replacement logic in the code.

Word characters \w include numbers [a-zA-Z0-9_]
So just use [a-ZA-Z] instead
(\:)(?=[a-zA-Z])|(?<=[a-zA-Z])(\:)
Test Here

How to group expressions to be matched as one?

What i am trying to match is like this :
char-char-int-int-int
char-char-char-int-int-int
char-char-int-int-int-optionnalValue (optionalValue being a "-" plus letters after it
My current regep looks like this :
([A-Za-z]{1,2})([1-9]{3})("-"[\w])
In the end, the regexp should match any of these:
AB001
aB999
Hm000
en789
rv005-ab
These should be invalid:
ab (because only letters)
abcfr (because too much letters)
158 (because only numbers)
78532 (because too much numbers)
123ab (because all letters should come before numbers, optionalValue exepted)
a1b23 (because letters and numbers are mixed)
What am i doing wrong ? (please be gentle this is my first post ever on stackoverflow)

If you use [A-Za-z]{1,2} then the second example would not match as there a 3 char-char-char
Using \w would also match numbers and an underscore. If you mean letters like a-zA-Z you can use that in an optional group preceded by a hyphen (?:-[a-zA-Z]+)?
You could use
^[a-zA-Z]{2,3}[0-9]{3}(?:-[a-zA-Z]+)?$
^ Start of string
[a-zA-Z]{2,3} Match 2 or 3 times a char A-Za-z
[0-9]{3} Match 3 digits
(?:-[a-zA-Z]+)? Optionally match a - and 1 or more chars A-Za-z
$ End of string
Regex demo
Or using word boundaries \b instead of anchors
\b[a-zA-Z]{2,3}[0-9]{3}(?:-[a-zA-Z]+)?\b
Regex demo

I have corrected your regex below. Please give it a try.
([A-Za-z]{1,2})([0-9]{3})(-\w*)?
Demo

How can I allow or ignore apostrophes?

I am looking for a regex expression that allows (or ignores) an apostrophe? I'm fairly new to regex and I looked at other similar questions but didn't find the help I need.
I am using a textbox to search an RTB and match all words with a specific or common ending (i.e. the search term inserted in the textbox). Then, I need to pass all matches to a second RTB.
I have tried many different expressions including: \b\w*[-']\w*\b but the program either separates the word at the apostrophe, finds only words with an apostrophe, or lists all words as matches?
My sample list of words to search is:
mi'iria, mi'i, piraria, makuptiaria, netap, hap, kuap, uimikuaptiaria, uhyt, set, uipu'aptiaria, mu'ap, atat, hat, haria, yat. (commas are not in the original list)!
As you can see, there are words that end in "ria" which contain an apostrophe and words that do not. I want to match all words that end with "ria," but I get results like: mi as one match, iria as another match and piraria, makuptiaria, uimikuaptiaria and haria aren't matched?
I need an expression that will allow (or ignore) the apostrophe so that all words that end in "ria" are matched independent of whether they contain an apostrophe or not. Also, words which contain an apostrophe (i.e. similar to mi'iria) should not be separated because of the apostrofe. Can anyone help on this? I am very grateful for any help! Thanks!
Okay, I spent some time tinkering on https://regex101.com/r/X4oL0y/1 and came up with the following expression which matches all words that end with "ria" including those with and those without an apostrophe:
\b\w+\'?\w+ria\w*\b
However, the w+ria part of this regex represents literal characters. This limits the functionality to words that end with "ria." Is there a way to generically declare the search term the user enters in the textbox as the character(s) to match so that all whole words that end with the search term are matched?
This is my code so far:
'Set index:
Dim index As Integer = 0
'Find and highlight all search term occurencies:
While index < RichTextBox1.Text.LastIndexOf(TextBox1.Text)
RichTextBox1.Find(TextBox1.Text, index, RichTextBox1.TextLength, RichTextBoxFinds.None)
RichTextBox1.SelectionBackColor = ColorTranslator.FromOle(RGB(255, 255, 192))
index = RichTextBox1.Text.IndexOf(TextBox1.Text, index) + 1
End While
' Input string.
Dim value As String = RichTextBox1.Text
' Call Regex.Matches method.
Dim matches As MatchCollection = Regex.Matches(value, "\b\w+\'?\w+ria\w*\b")
' Loop over matches.
For Each m As Match In matches
' Loop over captures.
For Each c As Capture In m.Captures
' Display.
RichTextBox2.Text += String.Format("Index={0}, Value={1}" & Chr(13), c.Index, c.Value)
Next
Next

If you want the whole word to be matched, you could make the character class optional [-']? and add ria to the end right before the word boundary
\b\w*[-']?\w*ria\b
See a .NET regex demo
As per comment of #ctwheels using an optional non capturing group is more efficient.
\b\w*(?:[-']\w*)?ria\b
\b Word boundary
\w* Match 0+ word chars
(?: Non capturing group
[-']\w* Match either - or ' and 0+ word chars
)? Close group and make it optional
ria Match literally
\b Word boundary
See another .NET regex demo

Assuming that list is file like this:
mi'iria
mi'i
piraria
makuptiaria
netap
hap
kuap
uimikuaptiaria
uhyt
set
uipu'aptiaria
mu'ap
atat
hat
haria
yat
Try this one
\b[a'-z]*.ria\b

How to get the first match in regexp?

I have three strings as list below:
Levofloxacin 500mg/100mL
Levofloxacin 500mg
Procaterol Hydrochloride …………… 25μg
The first line, I want to just get 'mg' without 'mL' in my result.
The second line, I want get 'mg'.
The third line, I want get 'ug'.
I have try regexp pattern like:
(?!(.*[ ]{1}[0-9]+))[a-zA-Zμ]+
However, the first line always returns 'mg' with 'mL'...
How could I just acquire 'mg' with regexp?
Any suggestions will be appreciated.

As mentioned in the comment section, try this regex:
^\D*[\d.]+\K[a-zμ]+
Click for Demo
Explanation:
^ - asserts the start of the string
\D* - matches 0+ occurrences of any character that is not a digit
[\d.]+ - matches 1+ occurrences of any character that is a digit
\K - removes what has been matched so far
[a-zμ]+ - this is what you want. This will contain the units like mg, ml appearing after the first number. If there are any other special characters like μ, you can add them too in this character list

REGEX to find the first one or two capitalized words in a string

I am looking for a REGEX to find the first one or two capitalized words in a string. If the first two words is capitalized I want the first two words. A hyphen should be considered part of a word.
for Madonna has a new album I'm looking for madonna
for Paul Young has no new album I'm looking for Paul Young
for Emmerson Lake-palmer is not here I'm looking for Emmerson Lake-palmer
I have been using ^[A-Z]+.*?\b( [A-Z]+.*?\b){0,1} which does great on the first two, but for the 3rd example I get Emmerson Lake, instead of Emmerson Lake-palmer.
What REGEX can I use to find the first one or two capitalized words in the above examples?

You may use
^[A-Z][-a-zA-Z]*(?:\s+[A-Z][-a-zA-Z]*)?
See the regex demo
Basically, use a character class [-a-zA-Z]* instead of a dot matching pattern to only match letters and a hyphen.
Details
^ - start of string
[A-Z] - an uppercase ASCII letter
[-a-zA-Z]* - zero or more ASCII letters / hyphens
(?:\s+[A-Z][-a-zA-Z]*)? - an optional (1 or 0 due to ? quantifier) sequence of:
\s+ - 1+ whitespace
[A-Z] - an uppercase ASCII letter
[-a-zA-Z]* - zero or more ASCII letters / hyphens
A Unicode aware equivalent (for the regex flavors supporting Unicode property classes):
^\p{Lu}[-\p{L}]*(?:\s+\p{Lu}[-\p{L}]*)?
where \p{L} matches any letter and \p{Lu} matches any uppercase letter.

This is probably simpler:
^([A-Z][-A-Za-z]+)(\s[A-Z][-A-Za-z]+)?
Replace + with * if you expect single-letter words.

If u need a Full name only (a two words with the first capitalize letters), this is a simple example:
^([A-Z][a-z]*)(\s)([A-Z][a-z]+)$
Try it. Enjoy!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Looking for help to construct a Regex for pattern matching - regex

Related

Find certain colons in string using Regex

How to group expressions to be matched as one?

How can I allow or ignore apostrophes?

How to get the first match in regexp?

REGEX to find the first one or two capitalized words in a string

Categories

Resources