Regex to detect preferred stock symbols - regex

To start off, regex is probably the least talented aspect within my programming belt, this is what I have so far:
\D{1,5}(PR)\D+$
\D{1,5} because common stock symbols are always a maximum of 5 letters
(PR) because that is part of the pattern that needs to be searched (more below in the background info)
\D+$ because I'm trying to match any single letter at the end of the string
A small tidbit of background
Preferred stock symbols are not standardized and so every platform, exchange, etc has their own way to display them. Having said that, most display a special character in their name, which makes those guys easy to detect. The characters are
[] {'.', '/', '-', ' ', '+'};
The trickier ones all have a similar pattern:
{symbol}PR{0}
{symbol}p{0}
{symbol}P{0}
Where 0 is just any single letter A-Z
Here is a sample data set for the trickier ones:
PSAPRZ
PSApA
PSApZ
PSAPA
PSAPZ
My regex seems to be working for the first one, since I'm specifically looking for (PR) and matching any single letter character at the end, but I can't for the life of me figure out how to also detect the patterns that end in p{0} or P{0} in the same regex. I completely gave up trying to incorporate finding the special symbols because I can easily just do a string.Contains on the target string for any of those chars. The more important part is figuring out these trickier ones.
How do I get my regex statement to also detect the p{0} and P{0} matches within the same regex statement?
Edit 1
If you're curious at the madness of different possibilities, including the "easy to detect" versions, grab a popcorn, here you go :)
PSA.PA
PSA.PR.A
PSA/PA
PSAPRA
PSA-A
PSA PRA
PSA.PRA
PSA.PA
PSA+A
PSA/PRA
PSApA
PSAPA
PSA-PA

This should do it:
^[A-Z]{1,5}([Pp]|PR)[A-Z]$
Explanation:
^ - anchor at start
[A-Z]{1,5} - one to five uppercase letters
([Pp]|PR) - capture group used for: uppercase P or lowercase p or uppercase PR
[A-Z] - one uppercase letters
$ - anchor at end
UPDATE after EDIT 1 in question. To support the odd formats with ., /, -, + use this:
^[A-Z]{1,5}[.\/\s\+\-]?([Pp]|PR\.?)[A-Z]$
Explanation:
^ - anchor at start
[A-Z]{1,5} - one to five uppercase letters
[.\/\s\+\-]? - optional single character ., /, , +, -
([Pp]|PR\.?) - capture group used for: uppercase P, or lowercase p, or uppercase PR followed by optional .
[A-Z] - one uppercase letters
$ - anchor at end
Note on anchors: Use ^...$ anchors if you only have the stock symbol in the string. If you have text with a stock symbol anywhere within, use word boundaries \b...\b instead.
Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex

Related

Looking for help to construct a Regex for pattern matching

I'm looking for help in making a regex to match and not match a series of name patterns if anyone can help with that.
Here's a list of cases I want to match/ not match :
// Should Match :
_class
c-class
_class-like
_class--variation
_class__children
_class__children--variation
c-custon-button-test
_class__lol--test
c-my-button-super-style
_class--variation-like
// Should not Match :
class
c--class
_class---variation
_class----variation
_class__test__test
_class--variation__children
_like
c-like
noMargin
no-Margin
_no-Margin
no-margin
_class-like__children
_class-like--variation
For now I came up with this regex :
^(c-|_)([a-z]+)(__|--|-)?([a-z]+)(-{0,2}[a-z]+)+(-?(([a-z]-?)+|(like))$)
Which almost work but I still got a match on some case which shouldn't match and I'm afraid I'm struggling to find how to sort the last cases.
(Here's a link to regex101 with unit test and match case: https://regex101.com/r/HNAUpd/1/)
edit : I forgot to mention, about the word "like" it's a keyword in my pattern and can only be found at the end of the string and cannot be the sole word in the string.
edit 2 : As for the rules of matching they're as follow :
A string can start only with "_", "c-" or "js-".
the following word can be anything but not the word "like" and should not be anything else that letter in the range [a-z] and only in lowercase.
The word "like" can only be the last one of the string and must not be the only one in the string.
Words can be separated by "--" or "__".
If the string starts with "c-" the word can then be separated with "-" in addition to the previous separator.
The purpose of all this is for a CSS class/id matcher for a linter.
If anyone can help me with this it would be awesome :)
I think you're looking for something like this:
^(?!.*[\-_]like[\-_])(?:c-|js-|_)(?!like$)(?:[a-z]+(?:__|--?))?[a-z]+(?:--?[a-z]+)*$
Demo
Breakdown:
^ - Beginning of the string.
(?!.*[\-_]like[\-_]) - Doesn't contain the word "like" between two separators (only at the end of the string).
(?:c-|js-|_) - Either "c-", "js-", or "_" at the beginning of the string.
(?!like$) - Not immediately followed by the word "like".
(?:[a-z]+(?:__|--?))? - (optional) one or more a-z letters followed two underscores or one or two hyphens.
[a-z]+ - One or more a-z letters.
(?:--?[a-z]+)* - Match one or two hyphens followed by one or more a-z letters, and repeat zero or more times.
$ - End of string.

Regex to replace up to 4 digits before a word

I am using this extension for chrome (It's called Word Replacer II) and I'm trying to create a Regex find and replace.
Quick backstory, my partner is recovering from an eating disorder and I want to find all mentions of Kilojoules and kJs and replace them with .
I am entirely new to Regex and after a few hours, I'm not much closer to getting a working expression.
I need it to remove up to 4 digits before the letters "kJs". E.g, 400kJs and 1000kJs. I'd like the "400kJs and 1000kJs" to be replaced with "[removed kJs] and [removed kJs]".
The code I have put together so far is;
\s+(a{1,4}<=\d)\s+(?=kJ)
And help would be much appreciated!
You may use the following approach:
\d{1,4}\s*kJs\b
See the regex demo
If you need to keep kJs, you may wrap the right part of the pattern with a lookahead, \d{1,4}(?=\s*kJs\b).
If you do not want to touch 5 or more digit numbers, use
\b\d{1,4}\s*kJs\b
(?<!\d)\d{1,4}\s*kJs\b
That is, add a word boundary, \b, or a left-hand digit boundary, (?<!\d).
Pattern details
\d{1,4} - one to four digits
\s* - 0+ whitespaces
kJs - a string of letters
\b - a word boundary (may not be necessary if there can be no word starting with kJs).

.net Regex to look ahead and eliminate strings in advance that dont contain certain characters

I am Using .Net Flavor of Regex.
Suppose i have a string 123456789AB
and i want to match AB (Could be any two Capital letters) only if the string part containing numbers(123456789) has 5 and 8 in it.
So what i came up with was
(?=5)(?=8)([A-Z]{2})
But this is not working.
After some trail error on RegexStorm
I got to
(?=(.*5))(?=(.*8))[A-Z]{2}
What i am expecting is it will start matching from the start of the string as look ahead does not consume any characters.
But the part "[A-Z]{2}" does not move ahead to match AB in the input string.
My question is why is that so?
i know replacing it with .*[A-Z]{2} will make it move ahead but then the string matched has entire string in it.
What is the solution in this case other than putting word part ([A-Z]{2}) in a separate group and then catching only that group.
Lookaheads check for the pattern match immediately to the right of the current position in the string. (?=(.*5))(?=(.*8)) matches a location that is immediately followed with any 0 or more chars other than line break chars as many as possible and then 5 and then - at the same position - another similar check if performed but requiring 8 after any zero or more chars, as many as possible.
You may use as many as lookbehinds as there are required substrings before the two letters:
(?s)(?<=5.*?)(?<=8.*?)[A-Z]{2}
See the regex demo
Details
(?s) - makes the . match newline characters, too
(?<=5.*?) - a location that is immediately preceded with 5 and then 0 or more chars as few as possible
(?<=8.*?) - a location that is immediately preceded with 8 and then 0 or more chars as few as possible
[A-Z]{2} - two ASCII uppercase letters.
An alternative would be to "unfold" what you expect to match using exclusionary character classes and alternation of match order. Not pretty, but pretty fast:
(?<=\b[^58]*?(?:5[^8]*8|8[^5]*5)[^A-Z]*?)[A-Z]{2}

REGEX to find the first one or two capitalized words in a string

I am looking for a REGEX to find the first one or two capitalized words in a string. If the first two words is capitalized I want the first two words. A hyphen should be considered part of a word.
for Madonna has a new album I'm looking for madonna
for Paul Young has no new album I'm looking for Paul Young
for Emmerson Lake-palmer is not here I'm looking for Emmerson Lake-palmer
I have been using ^[A-Z]+.*?\b( [A-Z]+.*?\b){0,1} which does great on the first two, but for the 3rd example I get Emmerson Lake, instead of Emmerson Lake-palmer.
What REGEX can I use to find the first one or two capitalized words in the above examples?
You may use
^[A-Z][-a-zA-Z]*(?:\s+[A-Z][-a-zA-Z]*)?
See the regex demo
Basically, use a character class [-a-zA-Z]* instead of a dot matching pattern to only match letters and a hyphen.
Details
^ - start of string
[A-Z] - an uppercase ASCII letter
[-a-zA-Z]* - zero or more ASCII letters / hyphens
(?:\s+[A-Z][-a-zA-Z]*)? - an optional (1 or 0 due to ? quantifier) sequence of:
\s+ - 1+ whitespace
[A-Z] - an uppercase ASCII letter
[-a-zA-Z]* - zero or more ASCII letters / hyphens
A Unicode aware equivalent (for the regex flavors supporting Unicode property classes):
^\p{Lu}[-\p{L}]*(?:\s+\p{Lu}[-\p{L}]*)?
where \p{L} matches any letter and \p{Lu} matches any uppercase letter.
This is probably simpler:
^([A-Z][-A-Za-z]+)(\s[A-Z][-A-Za-z]+)?
Replace + with * if you expect single-letter words.
If u need a Full name only (a two words with the first capitalize letters), this is a simple example:
^([A-Z][a-z]*)(\s)([A-Z][a-z]+)$
Try it. Enjoy!

Regex to pull uppercase words and timestamps?

I'm quite inexperienced with Regex and even though I would like to figure it out myself, I'm not sure how to get started.
I would like to develop a Ruby scan Regex that takes a string and returns an array of strings. The Regex should identify stock market ticker symbols, and also include short timestamps (inc. -1d, -1m, -1y) if they follow the ticker.
As an example:
How is AMZN-1d today and what about MSFT?
would return...
["AMZN-1d", "MSFT"]
Additionally, if this could be expanded on to the following Regex, which gets the ticker symbols, but not timestamps - that would be brilliant!
scan(/[\b\$]?[A-Z]{1,}\.[A-Z]+\b|[\b\$]?[A-Z]{2,}\b|\$[A-Z]{1,}\b|\b[A-Z]{1,}\$/)
You can use
/\b\p{Lu}{2,}(?:-\d\p{L}+\b)?/
See the regex demo
The pattern matches:
\b - word boundary
\p{Lu}{2,} - 2 or more uppercase letters
(?:-\d\p{L}+\b)? - 1 or zero sequences (due to the ? quantifier) of
- - a hyphen
\d - a digit (add a + quantifier to match 1 or more digits if more than 1 can occur)
\p{L}+ - 1 or more letters
If you only need to match ASCII characters, replace \d with [0-9], \p{L} with [a-zA-Z] and \p{Lu} with [A-Z].
You specifications are incomplete. So it is not possible to give a completely valid answer.
You may try using something like this.
/([A-Z]{2,}-\d[dmy])|([A-Z]{2,})/g
I'm assuming that ticker symbols will have a minimum length of two characters.