Specifying range style in which all the elements must be present - regex

I am practising my regex here
I have the following string
nabcdf
and I would like to select all of it. so I wrote the following regex
(n[abc]) -> n followed by a , b or c
because of this only n and a are highlighted. Based on this I have two questions
1)Why arent b and c also highlighted ? since they are present as well ?
2)[abc] specify that either a or b or c is present. Is it possible to specify a range such as a->c in which all elements in a range should be present (i.e) so it ends up like abc ? I know regex has [a-c] however that means any element between a to c must be present. What I want is that all elements between a range should be present. Is there an expression for that ?

n[abc]
Will capture only n and one of the character class.To capture more you need a quantifier like * or +.
So it will be
n[a-c]+ #will capture `n` and at least one of the character class
or
n[a-c]* #will capture `n` and `0` or more of character class
See demo.
or
If you want all of abc should be present you can use lookahead.
(?=.*a)(?=.*b)(?=.*c)n[abc]+
See demo.
https://regex101.com/r/pT4tM5/13

Related

Regex match pair ocurrences of a specific character

I've been trying to make a regex that satisfies this conditions:
The word consists of characters a,b
The number of b characters must be pair (consecutive or not)
So for example:
abb -> accepted
abab -> accepted
aaaa -> rejected
baaab -> accepted
So far i got this: ([a]*)(((b){2}){1,})
As you can see i know very little about the matter, this checks for pairs but it does still accept words with odd number of b's.
You could use this regex to check for some number of as with an even number of bs:
^(?:a*ba*ba*)+$
This looks for 1 or more occurrences of 2 bs surrounded by some number (which may be 0) as.
Demo on regex101
Note this will match bb (or bbbb, bbbbbb etc.). If you don't want to do that, the easiest way is to add a positive lookahead for an a:
^(?=b*a)(?:a*ba*ba*)+$
Demo on regex101
Checking an Array of Characters Against Two Conditionals
While you could do this using regular expressions, it would be simpler to solve it by applying some conditional checks against your two rules against an Array of characters created with String#chars. For example, using Ruby 3.1.2:
# Rule 1: string contains only the letters `a` and `b`
# Rule 2: the number of `b` characters in the word is even
#
# #return [Boolean] whether the word matches *both* rules
def word_matches_rules word
char_array = word.chars
char_array.uniq.sort == %w[a b] and char_array.count("b").even?
end
words = %w[abb abab aaaa baaab]
words.map { |word| [word, word_matches_rules(word)] }.to_h
#=> {"abb"=>true, "abab"=>true, "aaaa"=>false, "baaab"=>true}
Regular expressions are very useful, but string operations are generally faster and easier to conceptualize. This approach also allows you to add more rules or verify intermediate steps without adding a lot of complexity.
There are probably a number of ways this could be simplified further, such as using a Set or methods like Array#& or Array#-. However, my goal with this answer was to make the code (and the encoded rules you're trying to apply) easier to read, modify, and extend rather than to make the code as minimalist as possible.

Regex: This range OR that range

So I am trying match a certain postcode range:
CB1 *, CB2 *, CB3 *, CB4 *, CB5 *, CB21 *, CB22 *, CB23 *, CB24 *, CB25 *
So I am trying to use range 1-5 OR 21-25.
This is my current regex:
^[CBcb].([1-5]|[21-25]).+$
I want to make sure the post code parts contains the following
[CB OR cb],[1-5 OR 21-25] and [Any combination]
Have a tinker: https://regex101.com/r/aP9uG3/2
How do you do you specify two ranges?
Since the patterns are the same and it is just the 2 that may or may not occur, you can say something like:
CB2?[1-5] # add ^ and $ if required
If you want to specify two ranges, you can always group them with parentheses common_pattern(pattern1|pattern2).
Your Regex pattern:
^[CBcb].([1-5]|[21-25]).+$
is being interpreted as:
^[CBcb].([12345]|[2125]).+$
You need:
^CB2?[1-5].+'
here ? means zero or one match of the preceding token, 2 in this case.
^cb2?[1-5].+$ and use the i flag as well.
The first error was that you were only matching one character from the list [cbCB]. The second is that there's a strange . in the middle. And the third is that you do not specify a range of numbers, but a range of characters. 21 is not a character, it is a sequence of characters. A range of characters to get all possible (integer) numbers would be [0-9]*. What you want is an optional 2 followed by a character from the range [1-5].
You should read up on what lists and ranges are and mean in Regular Expressions because you misused both of them! Eeryone makes mistakes obviously, but this is one of the basics you should get a hang of.
Having characters inside [] makes it a character class. This means that in matches any character inside the brackets (unless it's negated). It doesn't understand numbers, but characters.
If you want to match CB or cb, you separate them by | like CB|cb. Or even better - make your regex case independent. This is done in different ways in different regex flavors. In javascript for example, attach the character i to the regex: /cb/i.
As for the rest of the pattern, if 1-5 and 20-25 is literally what you want, matching 1-5 is done with a character class (which you now are familiar with ;) like [1-5] meaning match any character in the ASCII range between the characters 1 and 5 inclusive.
Make the preceding 2 optional, and your regex looks like this
CB2?[1-5]
It matches your postcode and without a terminating $, it allows for your [Any combination].
Hope this helps.
Regards

ReGex - matching a logical implication

Given the alphabet {a, b, c}, how can I create a simple regular expression which matches exactly those words which meet the following criteria:
If the string "aa" occurs, then consequently "cc" must also occur (Note the logical implication).
The order of occurence doesn't matter ("cc" as well as "aa" can occur first).
Due to the former logical implication (if-then relationship), the string "cc" can occur even without "aa", but not vice versa.
I am looking for a way to implement this by using these syntax elements (., *, +, ?, |, ) as well as brackets.
Example what should be matched:
cc
abba
bccb
bccaa
ε (epsilon - empty string)
What shouldn't be matched:
aa
aacbcb
abaaba
baaa
caac
I have tried the following: a?b?(ba)*(ccaa)*(aacc)*c*b?
Not sure if I should answer this, since it's homework, but here goes...
First of, you identify the clauses P:"string 'aa' occurs" and Q:"string 'cc' occurs". Then you transform P -> Q into !P v Q. Then you translate this into the regular expression formed of the two parts:
^(((b|c)*(a(b|c)+)*a?)|(.*cc.*))$
The first part denies any 'aa' groups, and goes like this: allow any number of bs and cs at the beginning (including none), then if you find an a, force at least one different character after it. as followed by non as may happen any number of times. At the end we also have the a? to allow for the string to end in an a and we are sure it has no a before it because none of the 2 preceding groups end in a.
The second part is trivial.
Test it here: http://rubular.com/r/mAPHO8bulo
See it here: http://jex.im/regulex/...

Regular expression for formal languages

I'm trying to write a regular expression for a language consisting of:
Strings which contain any number of a’s followed by a single b and
Strings which contain any number of a’s followed by a single b followed by an even number of a's.
I thought (b | ((a^+)b)^* ) U (a | ( (b^+) a)* ) but it was wrong.
Is there anyone who knows where am I wrong?
Assumption
I'll assume it should be "strings that consist of", not "strings which contains". The difference is that bbbbbaaabaabbbb would be a valid string if it's "contains" (since it contains aaabaa).
To make it "strings that contains", the only difference would be adding .*? to the start and .* to the end (or [ab]*? and [ab]* if you want to limit it to a and b).
Problem analysis
I believe you can simplify the problem to just "strings that consist of any number of a's followed by a single b followed by an even number of a's", since 0 is an even number.
I have no idea what ^ or U is doing in your regular expression. Is this language specific syntax (usually ^ indicates the start of the line / string)?
Solution
It should be as simple as:
a*b(aa)*
a* - any number of a's
b - a single b
(aa)* an even number of a's
EDIT:
According to comments, it appears that you may want strings that consist of something like:
any number of a's
followed by any number of the following:
a single b
followed by an even number of a's (number != 0)
optionally followed by a b
The regex would be:
a*(b(aa)+)*b?

Regular expression matching any subset of a given set?

Is it possible to write a regular expression which will match any subset of a given set of characters a1 ... an ?
I.e. it should match any string where any of these characters appears at most once, there are no other characters and the relative order of the characters doesn't matter.
Some approaches that arise at once:
1. [a1,...,an]* or (a1|a2|...|an)*- this allows multiple presence of characters
2. (a1?a2?...an?) - no multiple presence, but relative order is important - this matches any subsequence but not subset.
3. ($|a1|...|an|a1a2|a2a1|...|a1...an|...|an...a1), i.e. write all possible subsequences (just hardcode all matching strings :)) of course, not acceptable.
I also have a guess that it may be theoretically impossible, because during parsing the string we will need to remember which character we have already met before, and as far as I know regular expressions can check out only right-linear languages.
Any help will be appreciated. Thanks in advance.
This doesn't really qualify for the language-agnostic tag, but...
^(?:(?!\1)a1()|(?!\2)a2()|...|(?!\n)an())*$
see a demo on ideone.com
The first time an element is matched, it gets "checked off" by the capturing group following it. Because the group has now participated in the match, a negative lookahead for its corresponding backreference (e.g., (?!\1)) will never match again, even though the group only captured an empty string. This is an undocumented feature that is nevertheless supported in many flavors, including Java, .NET, Perl, Python, and Ruby.
This solution also requires support for forward references (i.e., a reference to a given capturing group (\1) appearing in the regex before the group itself). This seems to be a little less widely supported than the empty-groups gimmick.
Can't think how to do it with a single regex, but this is one way to do it with n regexes: (I will usr 1 2 ... m n etc for your as)
^[23..n]*1?[23..n]*$
^[13..n]*2?[13..n]*$
...
^[12..m]*n?[12..m]*$
If all the above match, your string is a strict subset of 12..mn.
How this works: each line requires the string to consist exactly of:
any number of charactersm drawn fromthe set, except a particular one
perhaps a particular one
any number of charactersm drawn fromthe set, except a particular one
If this passes when every element in turn is considered as a particular one, we know:
there is nothing else in the string except the allowed elements
there is at most one of each of the allowed elements
as required.
for completeness I should say that I would only do this if I was under orders to "use regex"; if not, I'd track which allowed elements have been seen, and iterate over the characters of the string doing the obvious thing.
Not sure you can get an extended regex to do that, but it's pretty easy to do with a simple traversal of your string.
You use a hash (or an array, or whatever) to store if any of your allowed characters has already been seen or not in the string. Then you simply iterate over the elements of your string. If you encounter an element not in your allowed set, you bail out. If it's allowed, but you've already seen it, you bail out too.
In pseudo-code:
foreach char a in {a1, ..., an}
hit[a1] = false
foreach char c in string
if c not in {a1, ..., an} => fail
if hit[c] => fail
hit[c] = true
Similar to Alan Moore's, using only \1, and doesn't refer to a capturing group before it has been seen:
#!/usr/bin/perl
my $re = qr/^(?:([abc])(?!.*\1))*$/;
foreach (qw(ba pabc abac a cc cba abcd abbbbc), '') {
print "'$_' ", ($_ =~ $re) ? "matches" : "does not match", " \$re \n";
}
We match any number of blocks (the outer (?:)), where each block must consist of "precisely one character from our preferred set, which is not followed by a string containing that character".
If the string might contain newlines or other funny stuff, it might be necessary to play with some flags to make ^, $ and . behave as intended, but this all depends on the particular RE flavor.
Just for sillyness, one can use a positive look-ahead assertion to effectively AND two regexps, so we can test for any permutation of abc by asserting that the above matches, followed by an ordinary check for 'is N characters long and consists of these characters':
my $re2 = qr/^(?=$re)[abc]{3}$/;
foreach (qw(ba pabc abac a cc abcd abbbbc abc acb bac bca cab cba), '') {
print "'$_' ", ($_ =~ $re2) ? "matches" : "does not match", " \$re2 \n";
}