Regex: Match words that only contain certain letters [duplicate] - regex

This question already has answers here:
regex to match entire words containing only certain characters
(4 answers)
Closed 3 years ago.
I am using a Regex dictionary located here, and want to find words that contain ONLY the following letters: B, C, D, E, H, I, K, O. So, for example: cod, hoe, and hob.
I thought the simple way of doing this would be with the following regex query: [bcdehiko]+, but this yields many words that contain at least one instance of the bracketed letters, and any other letter.

For that website, the easiest solution is to combine your started regex with line start and line end matches. This will ensure that the word contains nothing but the characters you want. Here is the regex you want to use to get your results:
^[bcdehiko]+$
If you're okay with - in words, you can use this as well:
^[bcdehiko]+(-[bcdehiko]+)*$
Credit to #ctwheels for the improvement on the second regex.

Since you haven't specified a language (and I think that others looking for such answers might find this useful), here is an answer to your question in python without the use of regex.
l = 'bcdehiko'
d = ['cod', 'codz']
for w in d:
print(all(x in l for x in w))
This method loops over the dictionary* d and ensures all characters in that word exists in the string l. See it working here.
* dictionary in the OP's original question refers to a dictionary in the wordbook sense, not in the computing sense.In the script, the variable d is a list.
Alternatively, if you want to ensure that a word contains at least one character from a list of characters, you can replace any with all in the above script (you can test by adding the word ran to the list d - which doesn't contain a single letter in the list d). See it working here.

You are using this regex:
[bcdehiko]+
Which means match one or more instances of given characters in square brackets.
However this regex will also allow matching other characters in a word since there is no word boundary in use.
You may want to wrap your regex with \b on either side to ensure there are no other characters allowed:
\b[bcdehiko]+\b

Related

Regular Expression: Two words in any order but with a string between?

I want to use positive lookaheads so that RegEx will pick up two words from two different sets in any order, but with a string between them of length 1 to 20 that is always in the middle.
It also is already case insensitive, allow for any number of characters including 0 before the first word found and the same after the second word found - I am unsure if it is more correct to terminate in $.
Without the any order matching I am so far as:
(?i:.*(new|launch|releas)+.{1,20}(product1|product2)+.*)
I have attempted to add any order matching with the following but it only picks up the first word:
(?i:.*(?=new|launch|releas)+.{1,20}(?=product1|product2)+.*)
I thought perhaps this was because of the +.{1,20} in the middle but I am unsure how it could work if I add this to both sets instead, as for instance this could cause a problem if the first word is the very first part of the source text it is parsing, and so no character before it.
I have seen example where \b is used for lookaheads but that also seems like it may cause a problem as I want it to match when the first word is at the start of the source text but also when it is not.
How should I edit my RegEx here please?

Regex Meaning (Regex Golf) [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 7 years ago.
Regex newbie here, so I was trying this website for fun: https://regex.alf.nu
In particular, I'm concerned about the "Ranges" section here: https://regex.alf.nu/2
I was able to get as far as ^[a-f]+, and couldn't figure out the rest. By accident, I added a $ to get ^[a-f]+$ which was actually the answer.
Trying to wrap my mind around the meaning of this regex. Can someone give the plain English explanation of what's happening here?
It seems to say "a string that starts and ends with one or more of the letters a through f," but that doesn't quite make sense for me, for instance, with the word "cajac" which seems to satisfy those conditions.
For those who can't see the URL, it's asking me to match these words:
abac
accede
adead
babe
bead
bebed
bedad
bedded
bedead
bedeaf
caba
caffa
dace
dade
daff
dead
deed
deface
faded
faff
feed
But NOT match these:
beam
buoy
canjac
chymia
corah
cupula
griece
hafter
idic
lucy
martyr
matron
messrs
mucose
relose
sonly
tegua
threap
towned
widish
yite
In English it means: Match any words which contain only the letters a thru f.
Your pattern, when broken down:
^ assert position at start of the string
[a-f]+ match a single character present in the list below:
+ Between one and unlimited times, as many times as possible, giving back as needed
a-f a single character in the range between a and f (case sensitive)
$ assert position at end of the string
You can also see a quick explanation of your patterns on the Regex101 webpage.

issue in a regexp

I'm using the following expression:
/^[alopinme]{5}$/
This regexp take me words from a set of words with letters contained within the brackets.
well, now i need to add some more functionality to such expression because i need that the fetched words could contain ONLY one more letter from another set of letters. Let's say that i want to get words formed with letters from set A and could (if exist) contain one more letter from set B.
i'm trying to guess how could i complete my regular expression but i do not find the right way.
Anyone could help me?
Thanks.
EDIT:
Here i post an example:
SELECT sin_acentos FROM Finder.palabras_esp WHERE sin_acentos REGEXP '^[tehsolm]{5}$'
This expression choose me words like: helms, moths meths homes and so on.....
but i need to add a set B of letters and get words that could contain ONLY one from such set. Lets say I have another set of letters [xzk] so the expression could get more words but only with the possibility of choosing one letter from set B.
The result could get words like: mozes, hoxes, tozes, and so on... if you check such words, you can see that most of letters for every word are from set A but only one from set B.
If the one of the other characters should appear exactly once, you can use:
^(?=.{5}$)[alopinme]*(?:[XYZ][alopinme]*)?$
(?=.{5}$) - Check the string is 5 characters long, even before matching. (this might not work on MySql)
[alopinme]* - Characters from A
(?:[XYZ][alopinme]*)? - Optional - one character from B, and some more from A.
Working example: http://rubular.com/r/aw6l561Int
Or, for if you want them up to 3 times, for example:
^(?=.{5}$)[alopinme]*(?:[XYZ][alopinme]*){0,3}$
Since the words that you are looking for are all five-character long, I can think of a rather ugly expression that would do the trick: let's say [alopinme] is your base set, and [xyz] is your optional set. Then the expression
/^([alopinmexyz][alopinme]{4}|[alopinme][alopinmexyz][alopinme]{3}|[alopinme]{2}[alopinmexyz][alopinme]{2}|[alopinme]{3}[alopinmexyz][alopinme]|[alopinme]{4}[alopinmexyz])$/
should allow five-letter words of the structure that you are looking for.
In general, a need to count anything makes your regex non-readable. Problems like this one are good to illustrate this point: it is much easier to write /^[alopinmexyz]{5}$/ expression, and add an extra step in code to check that [xyz] appears in the text no more than once. You can even use a regexp to do the additional check:
/^[^xyz]*[xyz]?[^xyz]*$/
The result in SQL would look as follows:
SELECT sin_acentos
FROM Finder.palabras_esp
WHERE sin_acentos REGEXP '^[tehsolmxyz]{5}$' -- Length == 5, all from tehsolm+xyz
AND sin_acentos REGEXP '^[^xyz]*[xyz]?[^xyz]*$' -- No more than one character from xyz

Regex - How to search for singular or plural version of word [duplicate]

This question already has answers here:
Regex search and replace with optional plural
(4 answers)
Closed 6 years ago.
I'm trying to do what should be a simple Regular Expression, where all I want to do is match the singular portion of a word whether or not it has an s on the end. So if I have the following words
test
tests
EDIT: Further examples, I need to this to be possible for many words not just those two
movie
movies
page
pages
time
times
For all of them I need to get the word without the s on the end but I can't find a regular expression that will always grab the first bit without the s on the end and work for both cases.
I've tried the following:
([a-zA-Z]+)([s\b]{0,}) - This returns the full word as the first match in both cases
([a-zA-Z]+?)([s\b]{0,}) - This returns 3 different matching groups for both words
([a-zA-Z]+)([s]?) - This returns the full word as the first match in both cases
([a-zA-Z]+)(s\b) - This works for tests but doesn't match test at all
([a-zA-Z]+)(s\b)? - This returns the full word as the first match in both cases
I've been using http://gskinner.com/RegExr/ for trying out the different regex's.
EDIT: This is for a sublime text snippet, which for those that don't know a snippet in sublime text is a shortcut so that I can type say the name of my database and hit "run snippet" and it will turn it into something like:
$movies= $this->ci->db->get_where("movies", "");
if ($movies->num_rows()) {
foreach ($movies->result() AS $movie) {
}
}
All I need is to turn "movies" into "movie" and auto inserts it into the foreach loop.
Which means I can't just do a find and replace on the text and I only need to take 60 - 70 words into account (it's only running against my own tables, not every word in the english language).
Thanks!
- Tim
Ok I've found a solution:
([a-zA-Z]+?)(s\b|\b)
Works as desired, then you can simply use the first match as the unpluralized version of the word.
Thanks #Jahroy for helping me find it. I added this as answer for future surfers who just want a solution but please check out Jahroy's comment for more in depth information.
For simple plurals, use this:
test(?=s| |$)
For more complex plurals, you're in trouble using regex. For example, this regex
part(y|i)(?=es | )
will return "party" or "parti", but what you do with that I'm not sure
Here's how you can do it with vi or sed:
s/\([A-Za-z]\)[sS]$/\1
That replaces a bunch of letters that end with S with everything but the last letter.
NOTE:
The escape chars (backslashes before the parens) might be different in different contexts.
ALSO:
The \1 (which means the first pattern) may also vary depending on context.
ALSO:
This will only work if your word is the only word on the line.
If your table name is one of many words on the line, you could probably replace the $ (which stands for the end of the line) with a wildcard that represents whitespace or a word boundary (these differ based on context).

Multiple words in any order using regex [duplicate]

This question already has answers here:
Regex to match string containing two names in any order
(9 answers)
Closed 3 years ago.
As the title says , I need to find two specific words in a sentence. But they can be in any order and any casing. How do I go about doing this using regex?
For example, I need to extract the words test and long from the following sentence whether the word test comes first or long comes.
This is a very long sentence used as a test
UPDATE:
What I did not mention in the first part is that it needs to be case insensitive as well.
You can use
(?=.*test)(?=.*long)
Source: MySQL SELECT LIKE or REGEXP to match multiple words in one record
Use a capturing group if you want to extract the matches: (test)|(long)
Then depending on the language in use you can refer to the matched group using $1 and $2, for example.
I assume (always dangerous) that you want to find whole words, so "test" would match but "testy" would not. Thus the pattern must search for word boundaries, so I use the "\b" word boundary pattern.
/(?i)(\btest\b.*\blong\b|\blong\b.*\btest\b)/
without knowing what language
/test.*long/
or
/long.*test/
or
/test/ && /long/
Try this:
/(?i)(?:test.*long|long.*test)/
That will match either test and then long, or long and then test. It will ignore case differences.
Vim has a branch operator \& that allows an even terser regex when searching for a line containing any number of words, in any order.
For example,
/.*test\&.*long
will match a line containing test and long, in any order.
See this answer for more information on usage. I am not aware of any other regex flavor that implements branching; the operator is not even documented on the Regular Expression wikipedia entry.
I was using libpcre with C, where I could define callouts. They helped me to easily match not just words, but any subexpressions in any order. The regexp looks like:
(?C0)(expr1(?C1)|expr2(?C2)|...|exprn(?Cn)){n}
and the callout function guards that every subexpression is matched exactly once,like:
int mycallout(pcre_callout_block *b){
static int subexpr[255];
if(b->callout_number == 0){
//callout (?C0) - clear all counts to 0
memset(&subexpr,'\0',sizeof(subexpr));
return 0;
}else{
//if returns >0, match fails
return subexpr[b->callout_number-1]++;
}
}
Something like that should be possible in perl as well.
I don't think that you can do it with a single regex. You'll need to d a logical AND of two - one searching for each word.