I need to divide a list in alphabetical order.
I am using:
regexp_match("via",'^[A-G]')
for one segment, and
regexp_match("via",'^[H-Z]')
However, I need to cut the list halfway the "G" set of words, that is: to make "Galveston" fall in the first segment, and "Geneve" in the second.
How can I do this?
You can use the following two regexps:
^([A-F]|G[a-d])
^([H-Z]|G[e-z])
See regex demo #1 and regex demo #2.
Details
^([A-F]|G[a-d]) - a letter from A to F, or G followed with a letter from a to d
^([H-Z]|G[e-z]) - a letter from H to Z, or G followed with a letter from e to z.
Related
Thanks for your help in advance.
I want to check if a substring would start within range of characters after a prefix
For example, I have the following strings with prefix 'abc_xyz$'
abc_xyz$Item Ledger_Entry_CT
abc_xyz$Purchase
To check if string after prefix would start G thru R, I have written the following regular expression,
abc_xyz\$^[G-Rg-r].*
Unfortunately it does not help.
Here are use cases
abc_xyz$Item Ledger_Entry_CT --> should match since first char in 'Item' matches thru G and R
abc_xyz$Purchase --> should match since first char in 'Purchase' matches thru G and R
abc_xyz$Customer --> should NOT match since first char in 'Customer' do not match thru G and R
abc_xyz$Sales --> should NOT match since first char in 'Sales' do not match thru G and R
Any help?
You need to use
^abc_xyz\$[G-Rg-r].*
^abc_xyz\$(?i:[g-r]).*
^abc_xyz\$(?i)[g-r].*
See the regex demo.
The pattern matches
^ - start of string
abc_xyz\$ - a abc_xyz$ fixed string
[G-Rg-r] - G to R or g to r letters
.* - the rest of the line.
Note the (?i:[g-r]) inline modifier group makes the [g-r] pattern part case insensitive.
The (?i) part makes all the pattern parts to the right of it case insensitive.
I am trying to figure out a regex. That includes all characters after it but if another patterns occurs it does not overlap
This is my current regex
[a-zA-Z]{2}\d{1}\s?\w?
The pattern is always 2 letter followed by a number like AE1 or BE3 but I need all the characters following the pattern.
So AE1 A E F but if another pattern occurs in the string like
AE1 A D BE1 A D C it cannot overlap with and be two separate matches.
So to clarify
AB3 D T B should be one match on the regex
ABC D A F DE3 D CD A
should have 2 matches with all the char following it because of the the two letter word and number.
How do I achieve this
I'm not quite following the logic here, yet my guess would be that we might want something similar to this:
([A-Z]{2}\d\s([A-Z]+\s)+)|([A-Z]{3}\s([A-Z]+\s)+)
which allows two letters followed by a digit, or three letters, both followed by ([A-Z]+\s)+.
Demo
Look, you have to consider where your pattern will start. I mean, you know, what is different between AE1 A E F and BE1 A D C in AE1 A D BE1 A D C? You don't want to treat both similarly. So you have to separate them. Separation of these two texts is possible only determining which one is placed in text start.
Altogether, only adding ^ to start your pattern will solve problem.
So your regex should be like this:
^[a-zA-Z]{2}\d{1}\s?\w?
Demo
What you want to do is to split a string with your pattern having the current pattern match as the start of the extracted substrings.
You may use
(?!^)(?=[a-zA-Z]{2}\d)
to split the string. Details
(?!^) - not at the start of the string
(?=[a-zA-Z]{2}\d) - a location in the string that is immediately followed with 2 ASCII letters and any digit.
See the Scala demo:
val s = "ABC D A F DE3 D CD A"
val rx = """(?!^)(?=[a-zA-Z]{2}\d)"""
val results = s.split(rx).map(_.trim)
println(results.mkString(", "))
// => ABC D A F, DE3 D CD A
You can just use this regex:
(?i)\b[a-z]{2}\d\b(?:(?:(?!\b[a-z]{2}\d\b).)+\s?)?
Demo and explanations: https://regex101.com/r/DtFU8j/1/
It uses a negative lookahead (?!\b[a-z]{2}\d\b) to add the constraint that the character matched after the initial pattern (?i)\b[a-z]{2}\d\b should not contain this exact pattern.
I am trying to prevent the inclusion of suffix name, for example, JR/SR, or other suffix made up of using I,V,X using regular expression way. To accomplish this I have implemented the following regex
((^((?!((\b((I+))\b)|(\b(V+)\b)|(\b(X+)\b)|\b(IV)\b|(\b(V?I){1,2}\b)|(\b(IX)\b)|(\bX[I|IX]{1,2}\b)|(\bX|X+[V|VI]{1,2}\b)|(\b(JR)\b)|(\b(SR)\b))).)*$))
Using this I am able to prevent various possible combination eg.,
'Last Name I',
'Last Name II',
'Last Name IJR',
'Last Name SRX' etc.
However, there are still couple of combinations remaining, which this regex can match. eg., 'Last Name IXV' or 'Last Name VXI'
These two I am not able to debug. Please suggest me in which part of this regex I can make changes to satisfy the requirement.
Thank you!
Try this pattern: .+\b(?:(?>[JS]R)|X|I|J|V)+$
Explanation:
.+ - match one or more of any characters
\b - word boudnary
(?:...) - non-capturing group
(?>...) - atomic group
[JS]R - match whether S or J followed by R
| - alternation: match what is on the left OR what's on the right
+ - quantifier: match one or more times preceeding pattern
$ - match end of the string
Demo
In order to solve this I have worked on the above regex a little bit more. And here is the final result that can successfully match up with the "roman numeral" upto thirty constituted I, V, and X.
"(\b(?!(IIX|IIV|IVV|IXX|IXI))I[IVX]{0,3}\b|\b(V|X)\b|\bV[I]{1,2}\b|\b((?!XVV|XVX)X([IXV]{1,2}))\b|\b[S|J]R\b)|^$"
What I have done here is:
I have taken those input into consideration which are standalone,
that is: SR or XXV I have observed the incorrect pattern and
have restricted them to match as a positive result.
Separate input has been ensured using \b the word boundary.
Word-boundary: It suggests that starting of a word, that means in
simple words it says "yes there is a word" or "no it is not."
it has done in the following way-
using negative lookahead (?!(IIX|IIV|IVV|IXX|IXI))
How I have arrived on this solution is given as follows:
I have observed closely all the pattern first, that from I to X - that is:
I
I I
I I I
I V
V
V I
V I I
V I I I (it is out of the range of 3 characters.)
I X
X
we have an I, V, and X at first position. Then there is another I, X and V
on the second position. After then again same I and V. I have
implemented this in the following regex of the above written code:
\b(?!(IIX|IIV|IVV|IXX|IXI))I[IVX]{0,3}\b
Start it with I and then look for any of I, V, or X in a range of 'zero' to 'three' characters, and do neglect invalid numbers written inside the ?!(IIX|IIV|IVV|IXX|IXI) Similarly, I have done with other combinations given below.
Then for V and X : \b(V|X)\b
Then for the VI, VII: \bV[I]{1,2}\b
Then for the XI - XXX: \b((?!XVV|XVX)X([IXV]{1,2}))\b
To validate a suffix name, i.e. JR, SR, one can use following regex: \b[S|J]R\b
and the last (^$) is for matching a blank string or in other words, when no input has provided to the given input-box or textbox.
You may post any question or suggestion, if you have.
Thanks!
Ps: This regex is simply a solution to validate "roman numbers" from 1 to 30 using I, V, and X. I hope it helps to learn a bit to each and every newbie of regex.
I solved this with a more explicit:
(.+) (?:(?>JR$|SR$|I$|II$|III$|IV$|MD$|DO$|PHD$))|(.+)
I know I could do something like [JS]R but I like the way this reads:
(.+) match any characters and then a space
(?:(?>JR$|SR$|I$|II$|III$|IV$|MD$|DO$|PHD$)) atomically look for but don't match endings like JR etc
|(.+) if you don't find the endings then match any characters
Feel free to add the endings you'd like to suit your needs.
I'm making a tool to find open reading frames for amino acids as a personal project. I have many strings that have characters consisting of the 26 uppercase English alphabet letters (A through Z). They look like this:
GMGMGRZMQGGRZR
I want to find all possible matches that are between the letters M and Z, with some additional rules.
There should not be any Z's in between an M and a Z
Example: If EMAZAZ is the input string then MAZ should match, MAZAZ should not
There can be multiple M's between an M and a Z
Example: If the input string is GMGMGRZMQGGRZR then MGMGRZ should match, but MGRZ shouldn't since there are more M's before the first M in MGRZ that could be used to match.
For Example
With the above string (GMGMGRZMQGGRZR), only MGMGRZ and MQGGRZ should match. MGMGRZMQGGRZ, MGRZ, and MGRZAMQGGRZ should NOT be match.
Does anyone know how to construct a regex like this? I consulted a few Java regex tutorials (I am using Java to write this program) but was unable to come up with a regex that followed all of the above rules.
The closest I have gotten is this regex:
M((?!(Z)))*Z
It shows that the substrings MGMGRZ, MQGGRZ, and MGRZ match. However, I do not want MGRZ to match.
What you want is:
(M[^Z]+Z)
DEMO
The regex works as follow: It will try to match an M, followed by any number of chars that are not a Z up to a Z
The thing is that every char is consumed only once from left to right, so in
GMGMGRZMQGGRZR
^----^ 1st match MGMGRZ
^----^ 2nd match MQGGRZ
And consequently, it will match MGRZ if you feed it alone to the regex !!
I want to find consonant clusters with regex. An example of a cluster is mpl in examples.
To start, I filtered out all the vowels and replaced them with spaces. With vowels filtered out, examples is x mpl s.
How can I filter out the x and the s too?
Seems like you want something like this,
(?:(?![aeiou])[a-z]){2,}
(?![aeiou])[a-z] means choose any character from the lowercase alphabets but not of a or e or i or o or u
DEMO
(?![aeiou])[a-z] Matches a lowercase consonent
(?:(?![aeiou])[a-z]){2,} two or more times.
Since your working definition of "consonant cluster" is two or more consonants in succession, you can simply use the following pattern (case-insensitively if you want to handle capital consonants):
[bcdfghjklmnpqrstvwxyz]{2,}
[bcdfghjklmnpqrstvwxyz] – a simple whitelist character class for consonants (i.e. that will only match a consonant)
{2,} – two or more in succession
You can test the pattern against a couple input strings in a related regex fiddle.
Note that since vowels are "a, e, i, o, u, and sometimes y", I have included y in the whitelist character class for consonants above.
You could drop y and use...
[bcdfghjklmnpqrstvwxz]{2,}
...if you want to unconditionally treat y as a vowel rather than a consonant; but the rules for when y is a consonant are a bit more complicated than a simple regex will handle (basically requiring that you identify syllables first, then y's location within them).
Turning a comment into an answer…
As you changed vowels into white space: Search for \b.\b (or \b\w\b to target a bit better) and replace with a blank - to get rid of all isolated letters, leaving you with sequences of at least two.
Like RegEx101.