Regular Expression to search substring

Regular Expression to search substring - regex

let's say I have a string like Michael is studying at the Faculty of Economics at the University
and I need to check if a given string contains the following expression: Facul* of Econom*
where the star sign implies that the word can have many different endings
In general, my goal is to find similar expressions within tables from the clickhouse database. If you suggest other options for solving this problem, I will be grateful

If you want to match any lowercase letters following your two words use this:
\bFacul[a-z]* of Econom[a-z]*\b
If you want to match any optional letters following your two words use this:
\bFacul[A-Za-z]* of Econom[A-Za-z]*\b
Explanation:
\b - word boundary
Facul - literal text
[A-Za-z]* - 0 to multiple alpha chars
of - literal text
Econom - literal text
[A-Za-z]* - 0 to multiple alpha chars
\b - word boundary
If you want to be be more forgiving with upper/lowercase and spaces use this:
\b[Ff]acul[A-Za-z]* +of +[Ee]conom[A-Za-z]*\b

Use any number of "word" chars for word tails and "word boundary" at the front:
\bFacul\w* of Econom\w*
consider case insensitivity too:
(?i)\bfacul\w* of econom\w*

Related

Postgresql - Regex to get all words in string without special characters except -

Input
Word-Word, Some other words and this is another word et another one
Expected output
Word-Word
Some
other
words
this
is
another
word
another
one
I have a table (t) with many strings like the one showed in the input.
I'm trying to get every word in the sentence but the comas (','), the word 'and', 'et', 'und' and of course every whitespace or sequence of whitespace that may be between words.
Regex that I'm using:
Don't match whitespace \\s+
Don't match whitespace as long as special characters ((\b[^\s]+\b)((?<=\.\w).)?) - doesn't work in postgres for some reason
Don't match a particular word ^(?!et$|and$|und$) - doesn't work either
Query that I'm running
SELECT word FROM t,
unnest(regexp_split_to_array(t.word, E'Missing expression')) as word;

You can use an extracting approach here in the following way:
SELECT regexp_matches(
'Word-Word, Some other words and this is another word et another one ',
E'\\y(?!(?:et|[ua]nd)\\y)\\w+(?:-\\w+)*',
'g');
See the online demo. Regex details:
\y - a word boundary
(?!(?:et|[ua]nd)\y) - a negative lookahead that fails the match if there is et, und or and as whole words immediately to the right of the current location
\w+(?:-\w+)* - one or more word chars and then zero or more occurrences of - and one or more word char sequences
See the regex demo (converted to PCRE).

Looking for help to construct a Regex for pattern matching

I'm looking for help in making a regex to match and not match a series of name patterns if anyone can help with that.
Here's a list of cases I want to match/ not match :
// Should Match :
_class
c-class
_class-like
_class--variation
_class__children
_class__children--variation
c-custon-button-test
_class__lol--test
c-my-button-super-style
_class--variation-like
// Should not Match :
class
c--class
_class---variation
_class----variation
_class__test__test
_class--variation__children
_like
c-like
noMargin
no-Margin
_no-Margin
no-margin
_class-like__children
_class-like--variation
For now I came up with this regex :
^(c-|_)([a-z]+)(__|--|-)?([a-z]+)(-{0,2}[a-z]+)+(-?(([a-z]-?)+|(like))$)
Which almost work but I still got a match on some case which shouldn't match and I'm afraid I'm struggling to find how to sort the last cases.
(Here's a link to regex101 with unit test and match case: https://regex101.com/r/HNAUpd/1/)
edit : I forgot to mention, about the word "like" it's a keyword in my pattern and can only be found at the end of the string and cannot be the sole word in the string.
edit 2 : As for the rules of matching they're as follow :
A string can start only with "_", "c-" or "js-".
the following word can be anything but not the word "like" and should not be anything else that letter in the range [a-z] and only in lowercase.
The word "like" can only be the last one of the string and must not be the only one in the string.
Words can be separated by "--" or "__".
If the string starts with "c-" the word can then be separated with "-" in addition to the previous separator.
The purpose of all this is for a CSS class/id matcher for a linter.
If anyone can help me with this it would be awesome :)

I think you're looking for something like this:
^(?!.*[\-_]like[\-_])(?:c-|js-|_)(?!like$)(?:[a-z]+(?:__|--?))?[a-z]+(?:--?[a-z]+)*$
Demo
Breakdown:
^ - Beginning of the string.
(?!.*[\-_]like[\-_]) - Doesn't contain the word "like" between two separators (only at the end of the string).
(?:c-|js-|_) - Either "c-", "js-", or "_" at the beginning of the string.
(?!like$) - Not immediately followed by the word "like".
(?:[a-z]+(?:__|--?))? - (optional) one or more a-z letters followed two underscores or one or two hyphens.
[a-z]+ - One or more a-z letters.
(?:--?[a-z]+)* - Match one or two hyphens followed by one or more a-z letters, and repeat zero or more times.
$ - End of string.

REGEX to find the first one or two capitalized words in a string

I am looking for a REGEX to find the first one or two capitalized words in a string. If the first two words is capitalized I want the first two words. A hyphen should be considered part of a word.
for Madonna has a new album I'm looking for madonna
for Paul Young has no new album I'm looking for Paul Young
for Emmerson Lake-palmer is not here I'm looking for Emmerson Lake-palmer
I have been using ^[A-Z]+.*?\b( [A-Z]+.*?\b){0,1} which does great on the first two, but for the 3rd example I get Emmerson Lake, instead of Emmerson Lake-palmer.
What REGEX can I use to find the first one or two capitalized words in the above examples?

You may use
^[A-Z][-a-zA-Z]*(?:\s+[A-Z][-a-zA-Z]*)?
See the regex demo
Basically, use a character class [-a-zA-Z]* instead of a dot matching pattern to only match letters and a hyphen.
Details
^ - start of string
[A-Z] - an uppercase ASCII letter
[-a-zA-Z]* - zero or more ASCII letters / hyphens
(?:\s+[A-Z][-a-zA-Z]*)? - an optional (1 or 0 due to ? quantifier) sequence of:
\s+ - 1+ whitespace
[A-Z] - an uppercase ASCII letter
[-a-zA-Z]* - zero or more ASCII letters / hyphens
A Unicode aware equivalent (for the regex flavors supporting Unicode property classes):
^\p{Lu}[-\p{L}]*(?:\s+\p{Lu}[-\p{L}]*)?
where \p{L} matches any letter and \p{Lu} matches any uppercase letter.

This is probably simpler:
^([A-Z][-A-Za-z]+)(\s[A-Z][-A-Za-z]+)?
Replace + with * if you expect single-letter words.

If u need a Full name only (a two words with the first capitalize letters), this is a simple example:
^([A-Z][a-z]*)(\s)([A-Z][a-z]+)$
Try it. Enjoy!

Regex to pull uppercase words and timestamps?

I'm quite inexperienced with Regex and even though I would like to figure it out myself, I'm not sure how to get started.
I would like to develop a Ruby scan Regex that takes a string and returns an array of strings. The Regex should identify stock market ticker symbols, and also include short timestamps (inc. -1d, -1m, -1y) if they follow the ticker.
As an example:
How is AMZN-1d today and what about MSFT?
would return...
["AMZN-1d", "MSFT"]
Additionally, if this could be expanded on to the following Regex, which gets the ticker symbols, but not timestamps - that would be brilliant!
scan(/[\b\$]?[A-Z]{1,}\.[A-Z]+\b|[\b\$]?[A-Z]{2,}\b|\$[A-Z]{1,}\b|\b[A-Z]{1,}\$/)

You can use
/\b\p{Lu}{2,}(?:-\d\p{L}+\b)?/
See the regex demo
The pattern matches:
\b - word boundary
\p{Lu}{2,} - 2 or more uppercase letters
(?:-\d\p{L}+\b)? - 1 or zero sequences (due to the ? quantifier) of
- - a hyphen
\d - a digit (add a + quantifier to match 1 or more digits if more than 1 can occur)
\p{L}+ - 1 or more letters
If you only need to match ASCII characters, replace \d with [0-9], \p{L} with [a-zA-Z] and \p{Lu} with [A-Z].

You specifications are incomplete. So it is not possible to give a completely valid answer.
You may try using something like this.
/([A-Z]{2,}-\d[dmy])|([A-Z]{2,})/g
I'm assuming that ticker symbols will have a minimum length of two characters.

PCRE - perl regex

I am trying to make an regex in PCRE for string detection. The kind of strings I want to detect are abcdef001, zxyabc003. A word with first 6 characters are a-zA-Z and last two or three are digits 0-9; and this string could be anywhere in the whole text.
E.g - "User activity from server1, user id abcdef009, time 10.20am".
How do I go about this?

Try this:
/[a-zA-Z]{6}[0-9]{2,3}/
If you want to limit it to whole words, try:
/\b[a-zA-Z]{6}[0-9]{2,3}\b/
\b - word boundry
[a-zA-Z]{6} - six letters
[0-9]{2,3} - either 2 or 3 numbers
\b - word boundry

Use regex pattern
/[a-z]{6}\d{2,3}/i

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular Expression to search substring - regex

Use any number of "word" chars for word tails and "word boundary" at the front: \bFacul\w* of Econom\w* consider case insensitivity too: (?i)\bfacul\w* of econom\w*

Related

Postgresql - Regex to get all words in string without special characters except -

Looking for help to construct a Regex for pattern matching

REGEX to find the first one or two capitalized words in a string

Regex to pull uppercase words and timestamps?

PCRE - perl regex

Categories

Resources