I'm quite inexperienced with Regex and even though I would like to figure it out myself, I'm not sure how to get started.
I would like to develop a Ruby scan Regex that takes a string and returns an array of strings. The Regex should identify stock market ticker symbols, and also include short timestamps (inc. -1d, -1m, -1y) if they follow the ticker.
As an example:
How is AMZN-1d today and what about MSFT?
would return...
["AMZN-1d", "MSFT"]
Additionally, if this could be expanded on to the following Regex, which gets the ticker symbols, but not timestamps - that would be brilliant!
scan(/[\b\$]?[A-Z]{1,}\.[A-Z]+\b|[\b\$]?[A-Z]{2,}\b|\$[A-Z]{1,}\b|\b[A-Z]{1,}\$/)
You can use
/\b\p{Lu}{2,}(?:-\d\p{L}+\b)?/
See the regex demo
The pattern matches:
\b - word boundary
\p{Lu}{2,} - 2 or more uppercase letters
(?:-\d\p{L}+\b)? - 1 or zero sequences (due to the ? quantifier) of
- - a hyphen
\d - a digit (add a + quantifier to match 1 or more digits if more than 1 can occur)
\p{L}+ - 1 or more letters
If you only need to match ASCII characters, replace \d with [0-9], \p{L} with [a-zA-Z] and \p{Lu} with [A-Z].
You specifications are incomplete. So it is not possible to give a completely valid answer.
You may try using something like this.
/([A-Z]{2,}-\d[dmy])|([A-Z]{2,})/g
I'm assuming that ticker symbols will have a minimum length of two characters.
Related
I need to build a regex that have the following:
Rules to be applied:
exactly 14 characters
only letters (latin characters) and numbers
at least 3 letters
Regex still confuses me so I am struggling to get the correct output. I want to use it with swift and swiftui in an app I am making
(?=(.*[a-zA-Z]){3,}([0-9]){0,}){14,14}$
I tried this. But I know it is not the way
I would use a positive lookahead for the length requirement:
^(?=.{14}$)(?:[A-Za-z0-9]*[A-Za-z]){3}[A-Za-z0-9]*$
This pattern says to match:
^ from the start of the input
(?=.{14}$) assert exact length of 14
(?:
[A-Za-z0-9]*[A-Za-z] zero or more alphanumeric followed by one alpha
)
[A-Za-z0-9]* any alphanumeric zero or more times
$ end of the input
You need to use
^(?=(?:[0-9]*[a-zA-Z]){3})[a-zA-Z0-9]{14}$
Details
^ - start of string
(?=(?:[0-9]*[a-zA-Z]){3}) - at least three repeations of a letter after any zero or more digits sequence required
[a-zA-Z0-9]{14} - fourteen letters/digits
$ - end of string.
See the regex demo.
To start off, regex is probably the least talented aspect within my programming belt, this is what I have so far:
\D{1,5}(PR)\D+$
\D{1,5} because common stock symbols are always a maximum of 5 letters
(PR) because that is part of the pattern that needs to be searched (more below in the background info)
\D+$ because I'm trying to match any single letter at the end of the string
A small tidbit of background
Preferred stock symbols are not standardized and so every platform, exchange, etc has their own way to display them. Having said that, most display a special character in their name, which makes those guys easy to detect. The characters are
[] {'.', '/', '-', ' ', '+'};
The trickier ones all have a similar pattern:
{symbol}PR{0}
{symbol}p{0}
{symbol}P{0}
Where 0 is just any single letter A-Z
Here is a sample data set for the trickier ones:
PSAPRZ
PSApA
PSApZ
PSAPA
PSAPZ
My regex seems to be working for the first one, since I'm specifically looking for (PR) and matching any single letter character at the end, but I can't for the life of me figure out how to also detect the patterns that end in p{0} or P{0} in the same regex. I completely gave up trying to incorporate finding the special symbols because I can easily just do a string.Contains on the target string for any of those chars. The more important part is figuring out these trickier ones.
How do I get my regex statement to also detect the p{0} and P{0} matches within the same regex statement?
Edit 1
If you're curious at the madness of different possibilities, including the "easy to detect" versions, grab a popcorn, here you go :)
PSA.PA
PSA.PR.A
PSA/PA
PSAPRA
PSA-A
PSA PRA
PSA.PRA
PSA.PA
PSA+A
PSA/PRA
PSApA
PSAPA
PSA-PA
This should do it:
^[A-Z]{1,5}([Pp]|PR)[A-Z]$
Explanation:
^ - anchor at start
[A-Z]{1,5} - one to five uppercase letters
([Pp]|PR) - capture group used for: uppercase P or lowercase p or uppercase PR
[A-Z] - one uppercase letters
$ - anchor at end
UPDATE after EDIT 1 in question. To support the odd formats with ., /, -, + use this:
^[A-Z]{1,5}[.\/\s\+\-]?([Pp]|PR\.?)[A-Z]$
Explanation:
^ - anchor at start
[A-Z]{1,5} - one to five uppercase letters
[.\/\s\+\-]? - optional single character ., /, , +, -
([Pp]|PR\.?) - capture group used for: uppercase P, or lowercase p, or uppercase PR followed by optional .
[A-Z] - one uppercase letters
$ - anchor at end
Note on anchors: Use ^...$ anchors if you only have the stock symbol in the string. If you have text with a stock symbol anywhere within, use word boundaries \b...\b instead.
Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex
I am looking for a REGEX to find the first one or two capitalized words in a string. If the first two words is capitalized I want the first two words. A hyphen should be considered part of a word.
for Madonna has a new album I'm looking for madonna
for Paul Young has no new album I'm looking for Paul Young
for Emmerson Lake-palmer is not here I'm looking for Emmerson Lake-palmer
I have been using ^[A-Z]+.*?\b( [A-Z]+.*?\b){0,1} which does great on the first two, but for the 3rd example I get Emmerson Lake, instead of Emmerson Lake-palmer.
What REGEX can I use to find the first one or two capitalized words in the above examples?
You may use
^[A-Z][-a-zA-Z]*(?:\s+[A-Z][-a-zA-Z]*)?
See the regex demo
Basically, use a character class [-a-zA-Z]* instead of a dot matching pattern to only match letters and a hyphen.
Details
^ - start of string
[A-Z] - an uppercase ASCII letter
[-a-zA-Z]* - zero or more ASCII letters / hyphens
(?:\s+[A-Z][-a-zA-Z]*)? - an optional (1 or 0 due to ? quantifier) sequence of:
\s+ - 1+ whitespace
[A-Z] - an uppercase ASCII letter
[-a-zA-Z]* - zero or more ASCII letters / hyphens
A Unicode aware equivalent (for the regex flavors supporting Unicode property classes):
^\p{Lu}[-\p{L}]*(?:\s+\p{Lu}[-\p{L}]*)?
where \p{L} matches any letter and \p{Lu} matches any uppercase letter.
This is probably simpler:
^([A-Z][-A-Za-z]+)(\s[A-Z][-A-Za-z]+)?
Replace + with * if you expect single-letter words.
If u need a Full name only (a two words with the first capitalize letters), this is a simple example:
^([A-Z][a-z]*)(\s)([A-Z][a-z]+)$
Try it. Enjoy!
I want to filter out all .+[0-9]. (correct way?) patterns to avoid duplicate decimal points within a numeral: (e.g., .12345.); but allow non-numerals to include duplicate decimal points: (e.g. .12345*.) where * is any NON-NUMERAL.
How do I include a non-numeral negation value into the regexp pattern? Again,
.12345. <-- error: erroneous numeral.<br/>
.12345(.' or '.12345*.' <-- Good.
I think you are looking for
^\d*(?:\.\d+)?(?:(?<=\d)[^.\d\n]+\.)?$
Here is a demo
Remember to escape the regex properly in Swift:
let rx = "^\d*(?:\\.\\d+)?(?:(?<=\\d)[^.\\d\\n]+\\.)?$"
REGEX EXPLANATION:
^ - Start of string
\d* - Match a digit optionally
(?:\.\d+)? - Match decimal part, 0 or 1 time (due to ?)
(?:(?<=\d)[^.\d\n]+\.)? - Optionally (due to ? at the end) matches 1 or more symbols preceded with a digit (due to (?<=\d) lookbehind) other than a digit ([^\d]), a full stop ([^.]) or a linebreak ([^\n]) (this one is more for demo purposes) and then followed by a full stop (\.).
$ - End of string
I am using non-capturing groups (?:...) for better performance and usability.
UPDATE:
If you prefer an opposite approach, that is, matching the invalid strings, you can use a much simpler regex:
\.[0-9]+\.
In Swift, let rx = "\\.[0-9]+\\.". It matches any substrings starting with a dot, then 1 or more digits from 0 to 9 range, and then again a dot.
See another regex demo
The non-numeral regex delimited character is \D. Conversely, if you're looking for only numerals, \d would work.
Without further context of what you're trying to achieve it's hard to suggest how to build a regex for it, though based on your example, (I think) this should work: .+\d+\D+
I am trying to make an regex in PCRE for string detection. The kind of strings I want to detect are abcdef001, zxyabc003. A word with first 6 characters are a-zA-Z and last two or three are digits 0-9; and this string could be anywhere in the whole text.
E.g - "User activity from server1, user id abcdef009, time 10.20am".
How do I go about this?
Try this:
/[a-zA-Z]{6}[0-9]{2,3}/
If you want to limit it to whole words, try:
/\b[a-zA-Z]{6}[0-9]{2,3}\b/
\b - word boundry
[a-zA-Z]{6} - six letters
[0-9]{2,3} - either 2 or 3 numbers
\b - word boundry
Use regex pattern
/[a-z]{6}\d{2,3}/i