I want to use regular expressions to find a number of strings in a text file that meet all of the following requirements.
Are of length 3
Are made of all capital letters
The first character is NOT 'A'
The second character is NOT 'J'
The third character is NOT 'K'
I started with this: /[A-Z]{3}/ but this matches lowercase 3 letter strings as well for some reason.
Is this possible? Any guidance is appreciated.
You need to anchor the regexp so it matches the entire line. Otherwise, it will match a string that's longer than 3, but contains 3 uppercase letters together anywhere in it.
You can use character sets for each character.
/^[B-Z][A-IK-Z][A-JL-Z]$/
^ matches the beginning of the line. [B-Z] matches any uppercase letter that isn't A, [A-IK-Z] matches any letter except J, and [A-JL-Z] matches any letter except M. $ matches the end of the line.
Another solution using lookahead:
^(?=[A-Z]{3}$)[^A][^J][^K]$
Demo & explanation
Try the follow to return all the matches: /\b(?=[A-Z])[^A](?=[A-Z])[^J](?!=[A-Z])[^K]\b/g
It utilizes look-aheads and will return only 3 letter matches and can be relatively easily repeatable for any other variations A, J, K
Demo: https://regex101.com/r/5s2Gkj/1
Related
I am new to the regular expression and trying to build a expression wherein i want to check if the first three letters of the string are in upper case ?
I have expression like this "ALB.latin" or CAT.Cyrillic etc . i just want to check if the first three letters before the dot/period are capital and letter after the dot/period are in title case.
I have tried to build expression like to in FME test filter ^[A-Z]{3}\.[A-Za-z]$.
You need to remove the $ anchor from the pattern as it requires the end of string to appear right after the last [A-Za-z] subpattern matches an uppercase letter.
If you just need to check if the string starts with 3 uppercase ASCII letters, . and an ASCII letter, use
^[A-Z]{3}\.[A-Za-z]
Or, if you also need to make sure there are 1 or more ASCII letters only at the end, add + between [A-Za-z] and $, to match 1 or more symbols defined in the [a-zA-Z] character class:
^[A-Z]{3}\.[A-Za-z]+$
See the regex demo.
Hope this will give you the solution.
^[A-Z]{3}\.[A-Z][a-z]*$
With this, Letter after DOT will be in Title case. But at least one title case letter should be there after DOT.
I need a regular expression that matches a String at the beginning of the input which satisfies following conditions:
start with a letter
end with a letter or a number
may contain letters, numbers and spaces
I have this expression so far:
^([a-zA-Z]+[a-zA-Z0-9 ]*[a-zA-Z0-9]+)|[a-zA-Z]
http://userguide.icu-project.org/strings/regexp
The OR statement in the expression is to allow a String that consists of one letter.
The problem is that the second part of the OR statement is always preferred, so when the input is query1, it matches only q.
How can I solve this problem?
Is there a way to simplify the expression? My way seems a little to complex for this relatively simple case.
^([a-zA-Z]+[a-zA-Z0-9 ]*[a-zA-Z0-9]+)$|^[a-zA-Z]$
You can make use of ^$ anchors to imply that that it is only for single letter string
You can use this regex to satisfy all conditions:
^[a-zA-Z](?:[a-zA-Z0-9 ]*[a-zA-Z0-9])?$
^[a-zA-Z] matches a letter at start.
(?:...)? is optional part to allow single char input.
[a-zA-Z0-9] in the makes sure last char is alpha-numeric.
RegEx Demo
Regex to match with a character at the start, character, number or spaces in between and ends with character or number:
^[a-z|A-Z][a-z|A-Z|0-9| ]*[a-z|A-Z|0-9]$
I am trying to match same ('reference') letter only in a word. For example:
Makaraka
Wasagara
degenerescence
desilicification
odontonosology
There are 4 'a' in the first word, 6 'o' in the last one. How can I match all of then using RE? I tried using backreference, but I couldn't manage, the last "sample" letter was never matched. Is there a way to specify the number of occurrences for a capturing group? Thanks.
You can use this regex:
^.*?(\w)(?=(?:.*?\1){3}).*$
RegEx Demo
Explanation: This regex matches any word character in the input and captures it for back reference \1 later. Then the lookahead part (?=(?:.*?\1){3}) ensures that there are at least 3 more occurrences of the captures word character.
How about:
(?:.*a){4,}
Just change the a for the letter you're searching.
I am trying to do do following match using regex.
The input characters should be capital letters starting from 2-10 characters.
If it's 2 characters then allow only those 2 characters which does not contain A,E,I,O,U either at first place or second place.
I tried:
[B-DF-HJ-NP-TV-XZ]{2,10}
It works well, but I am not too sure if this is the right and most efficient way to do regex here.
All credit to Jerry, for his answer:
^(?:(?![AEIOU])[A-Z]{2}|[A-Z]{3,10})$
Explanation:
^ = "start of string", and $ = "end of string". This is useful for preventing false matches (e.g. a 10-character match from an 11 character input, or "MR" matching in "AMRXYZ").
(?![AEIOU]) is a negative look-ahead for the characters A,E,I,O and U - i.e. the regex will not match if the text contains a vowel. This is only applied to the first half of the conditional "OR" (|) regex, so vowels are still allowed in longer matches.
The rest is fairly obvious, based on what you've already demonstrated an understanding about regex in your question above.
I have a list of words as follows:
cat
concatenate
matter
pattern
hat
rather
fathom
at
saturate
vat
I need a regular expression to match any words which are a single letter followed by the letters 'at'.
I currently have [A-Za-z]at but that includes the 'cat' and 'nat' in 'concatenate' and the 'rat' in 'saturate'.
How can I make it look for exactly one character before, and make sure that there is not more than 1 character before the 'at'. I tried using {1} but that still didn't work. Thanks for your help.
Use word boundary:
\b[A-Za-z]at\b
or, if you have string contains just those 3 characters, then you can use anchors:
^[A-Za-z]at$
You can use ^[A-Za-z]at$
[A-za-z] would check for a single letter. Following at would look for exact match.
Using the ^ and $ sign would force the word to start and end in the given boundaries.