I am trying to make an regex in PCRE for string detection. The kind of strings I want to detect are abcdef001, zxyabc003. A word with first 6 characters are a-zA-Z and last two or three are digits 0-9; and this string could be anywhere in the whole text.
E.g - "User activity from server1, user id abcdef009, time 10.20am".
How do I go about this?
Try this:
/[a-zA-Z]{6}[0-9]{2,3}/
If you want to limit it to whole words, try:
/\b[a-zA-Z]{6}[0-9]{2,3}\b/
\b - word boundry
[a-zA-Z]{6} - six letters
[0-9]{2,3} - either 2 or 3 numbers
\b - word boundry
Use regex pattern
/[a-z]{6}\d{2,3}/i
Related
I have the following regex pattern to find an email address in my code:
/[\._a-zA-Z0-9-]+#[\._a-zA-Z0-9-]{8,}/i
I want to make sure it does not match a certain string if it includes:
abc
xyz
Just to exclude the abc I have tried:
/(?!.*abc)[\._a-zA-Z0-9-]+#[\._a-zA-Z0-9-]{8,}/i
But that is horribly slow.
You need to "anchor" the regex to a position that can be found by the regex engine in an optimal way. The best way is to "tie" it to a word boundary position, and that should work here since emails start with word chars:
/\b(?!\S*abc)[\w.-]+#[\w.-]{8,}/i
BTW, [_a-zA-Z0-9] is equal to \w in JavaScript regex. Details:
\b - a word boundary
(?!\S*abc) - a negative lookahead that fails the match if there are zero or more non-whitespace chars and then abc immediately to the right of the current location
[\w.-]+ - one or more word, . or - chars
# - a # char
[\w.-]{8,}- eight or more word, . or - chars.
You can do it in two steps. Use a regular expression to find the email address, then check that it doesn't contain any of the prohibited strings.
if (preg_match('/[\._a-zA-Z0-9-]+#[\._a-zA-Z0-9-]{8,}/i', $text, $match) && !preg_match('/abc|xyz/i', $match[0])) {
$email = $match[0];
}
let's say I have a string like Michael is studying at the Faculty of Economics at the University
and I need to check if a given string contains the following expression: Facul* of Econom*
where the star sign implies that the word can have many different endings
In general, my goal is to find similar expressions within tables from the clickhouse database. If you suggest other options for solving this problem, I will be grateful
If you want to match any lowercase letters following your two words use this:
\bFacul[a-z]* of Econom[a-z]*\b
If you want to match any optional letters following your two words use this:
\bFacul[A-Za-z]* of Econom[A-Za-z]*\b
Explanation:
\b - word boundary
Facul - literal text
[A-Za-z]* - 0 to multiple alpha chars
of - literal text
Econom - literal text
[A-Za-z]* - 0 to multiple alpha chars
\b - word boundary
If you want to be be more forgiving with upper/lowercase and spaces use this:
\b[Ff]acul[A-Za-z]* +of +[Ee]conom[A-Za-z]*\b
Use any number of "word" chars for word tails and "word boundary" at the front:
\bFacul\w* of Econom\w*
consider case insensitivity too:
(?i)\bfacul\w* of econom\w*
I am using this extension for chrome (It's called Word Replacer II) and I'm trying to create a Regex find and replace.
Quick backstory, my partner is recovering from an eating disorder and I want to find all mentions of Kilojoules and kJs and replace them with .
I am entirely new to Regex and after a few hours, I'm not much closer to getting a working expression.
I need it to remove up to 4 digits before the letters "kJs". E.g, 400kJs and 1000kJs. I'd like the "400kJs and 1000kJs" to be replaced with "[removed kJs] and [removed kJs]".
The code I have put together so far is;
\s+(a{1,4}<=\d)\s+(?=kJ)
And help would be much appreciated!
You may use the following approach:
\d{1,4}\s*kJs\b
See the regex demo
If you need to keep kJs, you may wrap the right part of the pattern with a lookahead, \d{1,4}(?=\s*kJs\b).
If you do not want to touch 5 or more digit numbers, use
\b\d{1,4}\s*kJs\b
(?<!\d)\d{1,4}\s*kJs\b
That is, add a word boundary, \b, or a left-hand digit boundary, (?<!\d).
Pattern details
\d{1,4} - one to four digits
\s* - 0+ whitespaces
kJs - a string of letters
\b - a word boundary (may not be necessary if there can be no word starting with kJs).
Im currenty having issues with a regex that Im creating. The regex has to extract all the groups that says number #### between Hello and Regards. At this moment my regex only extracts one group and I need all the groups inside, at this case I have 2, but there may be more inside.
Regex Image
I'm using the web page https://regex101.com/
Flavor: PCRE (PHP)
Regex: Hello\s.*(number\s*[\d]*)\s.*Regards
Text:
This is my test text number 25120
Hello my name is testing
I'm 20 years old
Please help me with the regex number 1542
I have been trying to create the regex many times this is my number 5152
Regards
I'm still trying my attempt number 5150
Result:
My Result is only the group number 5152 but inside is another group number 1542.
You may use
(?si)(?:\G(?!\A)|\bHello\b)(?:(?!\bHello\b).)*?\K\bnumber\s*\d+(?=.*?\bRegards\b)
See the regex demo.
Details
(?si) - s - DOTALL modifier making . match any chars, and i makes the pattern case insensitive
(?:\G(?!\A)|\bHello\b) - either the end of the previous match (\G(?!\A)) or (|) a whole word Hello (\bHello\b)
(?:(?!\bHello\b).)*? - any char, 0 or more times but as few as possible, that does not start a whole word Hello char sequence
\K - match reset operator that discards all text matched so far
\bnumber - a whole word number
\s* - 0+ whitespaces
\d+ - 1+ digits
(?=.*?\bRegards\b) - there must be a whole word Regards somewhere after any 0+ chars (as few as possible).
I'm quite inexperienced with Regex and even though I would like to figure it out myself, I'm not sure how to get started.
I would like to develop a Ruby scan Regex that takes a string and returns an array of strings. The Regex should identify stock market ticker symbols, and also include short timestamps (inc. -1d, -1m, -1y) if they follow the ticker.
As an example:
How is AMZN-1d today and what about MSFT?
would return...
["AMZN-1d", "MSFT"]
Additionally, if this could be expanded on to the following Regex, which gets the ticker symbols, but not timestamps - that would be brilliant!
scan(/[\b\$]?[A-Z]{1,}\.[A-Z]+\b|[\b\$]?[A-Z]{2,}\b|\$[A-Z]{1,}\b|\b[A-Z]{1,}\$/)
You can use
/\b\p{Lu}{2,}(?:-\d\p{L}+\b)?/
See the regex demo
The pattern matches:
\b - word boundary
\p{Lu}{2,} - 2 or more uppercase letters
(?:-\d\p{L}+\b)? - 1 or zero sequences (due to the ? quantifier) of
- - a hyphen
\d - a digit (add a + quantifier to match 1 or more digits if more than 1 can occur)
\p{L}+ - 1 or more letters
If you only need to match ASCII characters, replace \d with [0-9], \p{L} with [a-zA-Z] and \p{Lu} with [A-Z].
You specifications are incomplete. So it is not possible to give a completely valid answer.
You may try using something like this.
/([A-Z]{2,}-\d[dmy])|([A-Z]{2,})/g
I'm assuming that ticker symbols will have a minimum length of two characters.