Regex expression to catch variant of a word - regex

I am looking for simple way to use regex and catch variant of word with simplest format.
For example, the 5 variants of the word below.
hike
hhike
hiike
hikke
hikkee
Using something similar to the format below...
[([a-zA-Z]){4,}]
Thanks

Are you looking for something like /h+i+k+e+/?
Meaning:
The literal h character repeated 1 to infinity times
The literal i character repeated 1 to infinity times
The literal k character repeated 1 to infinity times
The literal e character repeated 1 to infinity times
DEMO
If each character can maximum be there twice, you can use /h{1,2}i{1,2}k{1,2}e{1,2}/ meaning "present 1 or 2 times".

You probably cannot solve this generically (i.e. for any word) under standard regex syntax.
For a given word, as others have pointed out, it is trivial.

This is more of a soundex kind of problem I think:
https://stackoverflow.com/a/392236/514463

Related

Regex, allow characters and digits, but allow up to 7 digits only

I would very much appreciate a bit of help with the following regex riddle.
I need regex statement that would validate against the following rules:
The input can contain letters, special characters and digits.
The input can't start with "0",
The input Can have up to 7 digits
Examples of valid input:
aa1234aa2.(less than 7 digits)
asd234566 (less than 7 digits)
Examples of invalid input:
0asdfd92 (starts with 0)
asd12312311 (more than 7 digits)
What I have tried so far:
^\D[0-9]{0,7}$,
validates against d0000000, but the input may be d0d0dddd1234d
The part can't start with 0 can be removed from the requirement if it complicates a lot. The most important is to have "Can have up to 7 digits" part.
Regards,
Oleg
This is what you need!
Attempt 1: ^[1-9]\d{0,6}$
Attempt 2: ^[^0][\d\w]{0,6}$
Attempt 3: ^[^0].{0,6}$
Attempt 4: ^([\D]*\d){0,7}[\D]*$
Attempt 5: ^([\D]*[1-9]){0,7}[\D]*$|^[^0]\d{0,6}$
Attempt 6: ^([\D]*[1-9]){1,7}[\D]*$|^[^0]\d{1,6}$ <- this should work
Example here
If I understand the requirements correctly, this will work:
^(?=[^0])(\D*\d){0,7}\D*$
That will allow any string that does not start with a zero and has 7 or fewer digits. Any other characters are allowed in any quantity.
Explanation
The first part (?=[^0]) is an assertion that checks to make sure the string does not start with zero. The rest matches any number of non-digits followed by a digit, up to 7 times. Then any number of non-digits before the end of the string.
Assuming Perl (it looks like Perl regular expressions):
Check for leading zero: if (subst($pass, 0, 1) eq '0') { fail }
Check for no more than seven digits: if (($pass =~ tr /0-9/0-9/) > 7) { fail }
I'm generally against trying to cram everything into a single regular expression, especially when there are other tools available to do the job. In this case, the tr will not be executed if there is a leading zero, and a leading zero is easy to spot in the beginning of a string.
Doing it this way, it's easy to add further restrictions independently of the others. For example, "there may be more than 7 digits if they are all separated by other types of characters" (a regex for this one, probably).
You can use this regex:
^[^0](?:\D*\d){1,7}\D*$
RegEx Demo
This will perform following validations:
Must start with non-zero
Has 1 to 7 digits after first char
Verbose, but does the trick.
(^[1-9][^\d]*([\d]?[^\d]*){0,6}$|^[^\d]+([\d]?[^\d]*){0,7}$)
I found it easier to split the RegEx into two cases: when the string starts with a digit, and when it doesn't.
^((?:\D+(?:\d?\D*){0,7})|(?:[1-9]\D*(?:\d?\D*){0,6}))$
You can test it here

Regular expression not containing 101

I came across the regular expression not containing 101 as follows:
0∗1∗0∗+(1+00+000)∗+(0+1+0+)∗
I was unable to understand how the author come up with this regex. So I just thought of string which did not contain 101:
01000100
I seems that above string will not be matched by above regex. But I was unsure. So tried translating to equivalent pcre regex on regex101.com, but failed there too (as it can be seen my regex does not even matches string containing single 1.
Whats wrong with my translation? Is above regex indeed correct? If not what will be the correct regex?
Here is a bit shorter expression ^0*(1|00+)*0*$
https://www.regex101.com/r/gG3wP5/1
Explanation:
(1|00+)* we can mix zeroes and ones as long as zeroes occur in groups
^0*...0*$ we can have as many zeroes as we want in prefix/suffix
Direct translation of the original regexp would be like
^(0*1*0*|(1|00|000)*|(0+1+0+)*)$
Update
This seems like artificially complicated version of the above regexp:
(1|00|000)* is the same as (1|00+)*
it is almost the solution, but it does not match strings 0, 01.., and ..10
0*1*0* doesn't match strings with 101 inside, but matches 0 and some of 01.., and ..10
we still need to match those of 01.., and ..10 which have 0 & 1 mixed inside, e.g. 01001.. or ..10010
(0+1+0+)* matches some of the remaining cases but there are still some valid strings unmatched
e.g. 10010 is the shortest string that is not matched by all of the cases.
So, this solution is overly complicated and not complete.
read the explanation in the right side tab in regex101 it tells you what your regex does( I think you misunderstood what list operator does) , inside a list operator ( [ ) , the other characters such as ( won't be metacharacters anymore so the expression [(0*1*0*)[1(00)(000)] will be equivalent to [01()*[] which means it matches 0 or 1 or ( or ) or [
The correct translation of the regular expression 0∗1∗0∗+(1+00+000)∗+(0+1+0+)∗
will be as follows:
^((?:0*1*0*)|(?:1|00|000)*|(?:0+1+0+)*)$
regex101
Debuggex Demo
What your regex [(0*1*0*)[1(00)(000)]*(0+1+0+)*] does:
[(0*1*0*)[1(00)(000)]* -> matches any of characters 0,(,),*,[ zero or more times followed by
(0+1+0+)* --> matches the pattern 0+1+0+ 0 or more times followed by
] --> matches the character ]
so you expression is equivalent to
[([)01](0+1+0+)*] which is not a regular expression to match strings that do not contain 101
0* 1* ( (00+000)* 1*)* (ε+0)
i think this expression covers all cases because --
any number apart from 1 can be broken into constituent 2's and 3's i.e. any number n=2*i+3*j. So there can be any number of 0's between 2 consecutive 1's apart from one 0.Hence, 101 cannot be obtained.
ε+0 for expressions ending in one 0.
The RE for language not containing 101 as sub-string can also be written as (0*1*00)*.0*.1*.0*
This may me a smaller one then what you are using. Try to make use of this.
Regular Expression I got (0+10)1. (looks simple :P)
I just considered all cases to make this.
you consider two 1's we have to end up with continuous 1's
case 1: 11111111111111...
case 2: 0000000011111111111111...(once we take two 1's we cant accept 0's so one and only chance is to continue with 1's)
if you consider only one 1 which was followed by 0 So, no issue and after one 1 we can have any number of 0's.
case 3: 00000000 10100100010000100000100000 1111111111
=>(0*+10*)1
final answer (0+10)1.
Thanks for your patience.

Regular Expression for a 0.25 interval

My aim is to write a regular expression for a decimal number where a valid number is one of
xx.0, xx.125, xx.25, xx.375, xx.5, xx.625, xx.75, xx.875 (i.e. measured in 1/8ths) The xx can be 0, 1 or 2 digits.
i have come up with the following regex:
^\d*\.?((25)|(50)|(5)|(75)|(0)|(00))?$
while this works for 0.25,0.5,0.75 it wont work for 0.225, 0.675 etc .
i assumed that the '?' would work in a case where there is preceding number as well.
Can someone point out my mistake
Edit : require the number to be a decimal !
Edit2 : i realized my mistake i was confused about the '?'. Thank you.
I would add another \d* after the literal . check \.
^\d*\.?\d*((25)|(50)|(5)|(75)|(0)|(00))?$
I think it would probably just be easier to multiply the decimal part by 8, but you don't consider digits that lead the last two decimals in the regex.
^\d{0,2}\.(00?|(1|6)?25|(3|8)?75|50?)$
Your mistake is: \.? indicates one optional \., not a digit (or anything else, in this case).
About the ? (question mark) operator: Makes the preceding item optional. Greedy, so the optional item is included in the match if possible. (source)
^\d{0,2}\.(0|(1|2|6)?25|(3|6|8)?75|5)$
Regular expressions are for matching patterns, not checking numeric values. Find a likely string with the regex, then check its numeric value in whatever your host language is (PHP, whatever).

R regular expressions: unexpected behavior of "[:digit:]"

I'd like to extract elements beginning with digits from a character vector but there's something about POSIX regular expression syntax that I don't understand.
I would think that
vec <- c("012 foo", "305 bar", "other", "notIt 7")
grep(pattern="[:digit:]", x=vec)
would return 1 2 4 since they are the four elements that have digits somewhere in them. But in fact it returns 3 4.
Likewise grep(pattern="^0", x=vec) returns 1 as I would expect because element 1 starts with a zero. However grep(pattern="^[:digit:]", x=vec) returns integer(0) whereas I would expect it to return 1 2 since those are the elements that start with digits.
How am I misunderstanding the syntax?
Try
grep(pattern="[[:digit:]]", x=vec)
instead as the 'meta-patterns' between colons usually require double brackets.
Another solution
grep(pattern="\\d", x=vec)
man 7 regex
Within a bracket expression, the name of a character class enclosed in "[:" and ":]" stands for the list of all characters belonging to that class. Standard character class names are:
alnum digit punct
alpha graph space
blank lower upper
cntrl print xdigit
Therefore a character class that is the sole member of a bracket expression will look like double-brackets, such as [[:digit:]]. As another example, consider that [[:alnum:]] is equivalent to [[:alpha:][:digit:]].

Regular expression for numbers without leading zeros

I need a regular expression to match any number from 0 to 99. Leading zeros may not be included, this means that f.ex. 05 is not allowed.
I know how to match 1-99, but do not get the 0 included.
My regular expression for 1-99 is
^[1-9][0-9]?$
There are plenty of ways to do it but here is an alternative to allow any number length without leading zeros
0-99:
^(0|[1-9][0-9]{0,1})$
0-999 (just increase {0,2}):
^(0|[1-9][0-9]{0,2})$
1-99:
^([1-9][0-9]{0,1})$
1-100:
^([1-9][0-9]{0,1}|100)$
Any number in the world
^(0|[1-9][0-9]*)$
12 to 999
^(1[2-9]|[2-9][0-9]{1}|[1-9][0-9]{2})$
Updated:
^([0-9]|[1-9][0-9])$
Matches 0-99. Doesn't match values with leading zeros. Depending on your application you may need to escape the parentheses and the or symbol.
^(0|[1-9][0-9]?)$
Test here http://regexr.com?2uu31 (various samples included)
You have to add a 0|, but be aware that the "or" (|) in Regexes has the lowest precedence. ^0|[1-9][0-9]?$ in reality means (^0)|([1-9][0-9]?$) (we will ignore that now there are two capturing groups). So it means "the string begins with 0" OR "the string ends with [1-9][0-9]?". An alternative to using brackets is to repeat the ^$, like ^0$|^[1-9][0-9]?$.
[...] but do not get the 0 included.
Just add 0|... in front of the expression:
^(0|[1-9][0-9]?)$
^^
console.log(/^0(?! \d+$)/.test('0123')); // true
console.log(/^0(?! \d+$)/.test('10123')); // false
console.log(/^0(?! \d+$)/.test('00123')); // true
console.log(/^0(?! \d+$)/.test('088770123')); // true
How about this?
A simpler answer without using the or operator makes the leading digit optional:
^[1-9]?[0-9]$
Matches 0-99 disallowing leading zeros (01-09).
This should do the trick:
^(?:0|[1-9][0-9]?)$
Answer:
^([1-9])?(\d)$
Explanation:
^ // beginning of the string
([1-9])? // first group (optional) in range 1-9 (not zero here)
(\d) // second group matches any digit including 0
$ // end of the string
Same as (Not grouping):
^[1-9]?\d$
Test:
https://regex101.com/r/Tpe9Ia/1
Try this it will help you
^([0-9]|[1-9][0-9])$
([1-9][0-9]+).*
this will be simple and efficient
it will help with any range of whole numbers
([1-9][0-9\.]+).*
this expression will help with decimal numbers
You can use the following regex:
[1-9][0-9]\d|0
^(0{1,})?([1-9][0-9]{0,1})$
It includes:
1-99,
01-099,
00...1-