How do I account for all possible orders? - regex

I have 7 items...
a b c d e f and g
The seven items can be in any order. How to I check with regex that they are there (or not) but no other items are...
^(a)?(b)?(c)?(d)?(e)?(f)?(g)?$
Thad would check for the seven items with any combination of items missing, but only in that order. How do I have the regex check for any possible order of the 7 items?
Both of these would pass:
abcdefg
aceg
I need these to pass as well
bc
fabcd
bgef
I'm using single letters to simplify things. For example (\stest)? would be an example of one of the items (\skey="([^"<>]+)?")? is another... I would like to prevent duplicates as well.
These should not pass
abca
aa
gfdef

Something like this would work:
^(?!(.*(a|b|c|d|e|f|g).*(\2)))((a|b|c|d|e|f|g)+)$

If you are using php use preg_split seven times with a, b, c, d, e, f, g as the spliter expresion.
Concatenate the result and perform the next.
If any split gives you more than 2 elements you have duplicates
if your resulting final string is different from '' you have invalid parts.
Here you have the code
// checks that any part present is valid but not repeated and not extra stuff
function checkAny($que) {
// atention to the escaping \
$toCheck = array('a','b','c','\\skey="([^"<>]+)?"','d','/slashtest','f','g');
foreach($toCheck as $one){
// $t = preg_split('/'.preg_quote($one,'/').'/', $que); // preg_cuote is not valid for your propouse
$t = preg_split('~'.$one.'~', $que); // so, select a proper sign insted of ~
if(count($t)>2) return 'fail: repeated component/part';
$que = implode('', $t); // can use trim() here if it is usefull for you
if($que=='') return 'pass!!!'; //do not waste time doing any more tests
}
return 'fail: unknown component/part';
}
//test
echo checkAny('abcc'); // fail
echo checkAny('ab/slashtestc'); // fail because the repated a, be careful in test order to avoid this problem
echo checkAny('abcg'); // pass
echo checkAny('ab key="xx"c'); // pass
if php is not the case, preg_replace can be easy substituded on any language supporting regex

Related

Dart - First letter of each word in a List

List<String> pisah = ['saya','sedang','belajar','menjadi','programmer','yang','handal','dan','menyenangkan'];
How can I make that list so that I can print just every first letter of the word
example like :
baris 1 : S S B M P Y H D M
That's all in a day's work for the method map.
final letters = pisah.map((s) => s[0]).toList();
print(letters);
The .toList() call may or may not be needed depending on what exactly you want to do with the result. For example, it can be omitted if you want only to iterate over it, but it's required if you need to access individual letters using letter[i].

Regex permutations without repetition [duplicate]

This question already has answers here:
How to find all permutations of a given word in a given text?
(6 answers)
Closed 7 years ago.
I need a RegEx to check if I can find a expression in a string.
For the string "abc" I would like to match the first appearance of any of the permutations without repetition, in this case 6: abc, acb, bac, bca, cab, cba.
For example, in this string "adesfecabefgswaswabdcbaes" it'd find a coincidence in the position 7.
Also I'd need the same for permutations without repetition like this "abbc". The cases for this are 12: acbb, abcb, abbc, cabb, cbab, cbba, bacb, babc, bcab, bcba, bbac, bbca
For example, in this string "adbbcacssesfecabefgswaswabdcbaes" it'd find a coincidence in the position 3.
Also, I would like to know how would that be for similar cases.
EDIT
I'm not looking for the combinations of the permutations, no. I already have those. WHat I'm looking for is a way to check if any of those permutations is in a given string.
EDIT 2
This regex I think covers my first question
([abc])(?!\1)([abc])(?!\2|\1)[abc]
Can find all permutations(6) of "abc" in any secuence of characters.
Now I need to do the same when I have a repeated character like abbc (12 combinations).
([abc])(?!\1)([abc])(?!\2|\1)[abc]
You can use this without g flag to get the position.See demo.The position of first group is what you want.
https://regex101.com/r/nS2lT4/41
https://regex101.com/r/nS2lT4/42
The only reason you might "need a regex" is if you are working with a library or tool which only permits specifying certain kinds of rules with a regex. For instance, some editors can be customized to color certain syntactic constructs in a particular way, and they only allow those constructs to be specified as regular expressions.
Otherwise, you don't "need a regex", you "need a program". Here's one:
// are two arrays equal?
function array_equal(a1, a2) {
return a1.every(function(chr, i) { return chr === a2[i]; });
}
// are two strings permutations of each other?
function is_permutation(s1, s2) {
return array_equal(s1.split('').sort(), s2.split('').sort());
}
// make a function which finds permutations in a string
function make_permutation_finder(chars) {
var len = chars.length;
return function(str) {
for (i = 0; i < str.length - len; i++) {
if (is_permutation(chars, str.slice(i, i+len))) return i;
}
return -1;
};
}
> finder = make_permutation_finder("abc");
> console.log(finder("adesfecabefgswaswabdcbaes"));
< 6
Regexps are far from being powerful enough to do this kind of thing.
However, there is an alternative, which is precompute the permutations and build a dynamic regexp to find them. You did not provide a language tag, but here's an example in JS. Assuming you have the permutations and don't have to worry about escaping special regexp characters, that's just
regexp = new RegExp(permuations.join('|'));

Sort a set of variables based on frequency from a Perl RegEx

I am attempting to use a table or array to list and sort items
this following sequence of letters, or items, by what is 'eaten'
first (in the captured perl RegEx) These four lists are the exact same
items, just entered in a different order, in succession.
Input items: these letters represent an action or an input into the
client.
a b c d
b c d a
c d b a
d a b c
perl regex:
^(\w+) eats (a|an) (\w+)\.$
So matches[4] will be the item captured.
This will trigger RegEx will fire in the client with 'each' set of
letters (a, b, c, d) Entered, separately. So four sets of a, b, c, d that
will be input in succession but on a rotating order basis. The above
RegEx will in fire 16x (once for each letter.) I need to be able to
sort it so, if (a) is eaten first every time, then that will have
priority at the top going down. But it might not always be (a), it
could be any of the letters that hold priority.
I need this priority list to be displayed to a Geyser such as
PrioList= Geyser.MiniConsole:new({
name="PrioList",
x="70%", y="50%",
width="30%", height="50%",
})
I then need to be able to set each letter to a different priority list
or variable. Because each separate letter will indicate a different
action needed to be taken, so I will need to say
if (a == highestpriority) then
do action / function()
end
I am unsure of how to write the 'for' statement that will be able to
sort and list these items based off the 4 groups of letters. I figure
the list will have to be saved and reset, after each sequence then
somehow entered into a table or array, and compared to each other for
the highest priority. But this seriously beyond what I know how to
script, but I would definitely love to learn this.
If I'm correctly understanding you, one option is to use a 1) hash to tally the frequency of the first letter entered, and 2) dispatch table to associate each letter with a subroutine:
use strict;
use warnings;
use List::Util qw/shuffle/;
my %seen;
my %dispatchTable = (
a => \&a_priority,
b => \&b_priority,
c => \&c_priority,
d => \&d_priority
);
for my $i ( 1 .. 4 ) {
my #chars = shuffle qw/a b c d/;
print "Round $i: #chars\n";
$seen{ $chars[0] }++;
}
my $priority = ( sort { $seen{$b} <=> $seen{$a} } keys %seen )[0];
print "Priority: $priority\n";
$dispatchTable{$priority}->();
sub a_priority {
print "a priority sub called\n";
}
sub b_priority {
print "b priority sub called\n";
}
sub c_priority {
print "c priority sub called\n";
}
sub d_priority {
print "d priority sub called\n";
}
Sample run output:
Round 1: d c a b
Round 2: b a d c
Round 3: d b a c
Round 4: c d a b
Priority: d
d priority sub called
You said, "I need to be able to sort it so, if (a) is eaten first every time..." The above attempts to select the item with the highest frequency--not the item that was first all four times.
You'll need to decide what to do in cases where more than one letter shares the same frequency, but perhaps this will help provide some direction.

Using alternation or character class for single character matching?

(Note: Title doesn't seem to clear -- if someone can rephrase this I'm all for it!)
Given this regex: (.*_e\.txt), which matches some filenames, I need to add some other single character suffixes in addition to the e. Should I choose a character class or should I use an alternation for this? (Or does it really matter??)
That is, which of the following two seems "better", and why:
a) (.*(e|f|x)\.txt), or
b) (.*[efx]\.txt)
Use [efx] - that's exactly what character classes are designed for: to match one of the included characters. Therefore it's also the most readable and shortest solution.
I don't know if it's faster, but I would be very much surprised if it wasn't. It definitely won't be slower.
My reasoning (without ever having written a regex engine, so this is pure conjecture):
The regex token [abc] will be applied in a single step of the regex engine: "Is the next character one of a, b, or c?"
(a|b|c) however tells the regex engine to
remember the current position in the string for backtracking, if necessary
check if it's possible to match a. If so, success. If not:
check if it's possible to match b. If so, success. If not:
check if it's possible to match c. If so, success. If not:
give up.
Here is a benchmark:
updated according to tchrist comment, the difference is more significant
#!/usr/bin/perl
use strict;
use warnings;
use 5.10.1;
use Benchmark qw(:all);
my #l;
foreach(qw/b c d f g h j k l m n ñ p q r s t v w x z B C D F G H J K L M N ñ P Q R S T V W X Z/) {
push #l, "abc$_.txt";
}
my $re1 = qr/^(.*(b|c|d|f|g|h|j|k|l|m|n|ñ|p|q|r|s|t|v|w|x|z)\.txt)$/;
my $re2 = qr/^(.*[bcdfghjklmnñpqrstvwxz]\.txt)$/;
my $cpt;
my $count = -3;
my $r = cmpthese($count, {
'alternation' => sub {
for(#l) {
$cpt++ if $_ =~ $re1;
}
},
'class' => sub {
for(#l) {
$cpt++ if $_ =~ $re2;
}
}
});
result:
Rate alternation class
alternation 2855/s -- -50%
class 5677/s 99% --
With a single character, it's going to have such a minimal difference that it won't matter. (unless you're doing LOTS of operations)
However, for readability (and a slight performance increase) you should be using the character class method.
For a bit further information - opening a round bracket ( causes Perl to start backtracking for that current position, which, as you don't have further matches to go against, you really don't need for your regex. A character class will not do this.

Regular Expression to find numbers with same digits in different order

I have been looking for a regular expression with Google for an hour or so now and can't seem to work this one out :(
If I have a number, say:
2345
and I want to find any other number with the same digits but in a different order, like this:
2345
For example, I match
3245 or 5432 (same digits but different order)
How would I write a regular expression for this?
There is an "elegant" way to do it with a single regex:
^(?:2()|3()|4()|5()){4}\1\2\3\4$
will match the digits 2, 3, 4 and 5 in any order. All four are required.
Explanation:
(?:2()|3()|4()|5()) matches one of the numbers 2, 3, 4, or 5. The trick is now that the capturing parentheses match an empty string after matching a number (which always succeeds).
{4} requires that this happens four times.
\1\2\3\4 then requires that all four backreferences have participated in the match - which they do if and only if each number has occurred once. Since \1\2\3\4 matches an empty string, it will always match as long as the previous condition is true.
For five digits, you'd need
^(?:2()|3()|4()|5()|6()){5}\1\2\3\4\5$
etc...
This will work in nearly any regex flavor except JavaScript.
I don't think a regex is appropriate. So here is an idea that is faster than a regex for this situation:
check string lengths, if they are different, return false
make a hash from the character (digits in your case) to integers for counting
loop through the characters of your first string:
increment the counter for that character: hash[character]++
loop through the characters of the second string:
decrement the counter for that character: hash[character]--
break if any count is negative (or nonexistent)
loop through the entries, making sure each is 0:
if all are 0, return true
else return false
EDIT: Java Code (I'm using Character for this example, not exactly Unicode friendly, but it's the idea that matters now):
import java.util.*;
public class Test
{
public boolean isSimilar(String first, String second)
{
if(first.length() != second.length())
return false;
HashMap<Character, Integer> hash = new HashMap<Character, Integer>();
for(char c : first.toCharArray())
{
if(hash.get(c) != null)
{
int count = hash.get(c);
count++;
hash.put(c, count);
}
else
{
hash.put(c, 1);
}
}
for(char c : second.toCharArray())
{
if(hash.get(c) != null)
{
int count = hash.get(c);
count--;
if(count < 0)
return false;
hash.put(c, count);
}
else
{
return false;
}
}
for(Integer i : hash.values())
{
if(i.intValue()!=0)
return false;
}
return true;
}
public static void main(String ... args)
{
//tested to print false
System.out.println(new Test().isSimilar("23445", "5432"));
//tested to print true
System.out.println(new Test().isSimilar("2345", "5432"));
}
}
This will also work for comparing letters or other character sequences, like "god" and "dog".
Put the digits of each number in two arrays, sort the arrays, find out if they hold the same digits at the same indices.
RegExes are not the right tool for this task.
You could do something like this to ensure the right characters and length
[2345]{4}
Ensuring they only exist once is trickier and why this is not suited to regexes
(?=.*2.*)(?=.*3.*)(?=.*4.*)(?=.*5.*)[2345]{4}
The simplest regular expression is just all 24 permutations added up via the or operator:
/2345|3245|5432|.../;
That said, you don't want to solve this with a regex if you can get away with it. A single pass through the two numbers as strings is probably better:
1. Check the string length of both strings - if they're different you're done.
2. Build a hash of all the digits from the number you're matching against.
3. Run through the digits in the number you're checking. If you hit a match in the hash, mark it as used. Keep going until you don't get an unused match in the hash or run out of items.
I think it's very simple to achieve if you're OK with matching a number that doesn't use all of the digits. E.g. if you have a number 1234 and you accept a match with the number of 1111 to return TRUE;
Let me use PHP for an example as you haven't specified what language you use.
$my_num = 1245;
$my_pattern = '/[' . $my_num . ']{4}/'; // this resolves to pattern: /[1245]{4}/
$my_pattern2 = '/[' . $my_num . ']+/'; // as above but numbers can by of any length
$number1 = 4521;
$match = preg_match($my_pattern, $number1); // will return TRUE
$number2 = 2222444111;
$match2 = preg_match($my_pattern2, $number2); // will return TRUE
$number3 = 888;
$match3 = preg_match($my_pattern, $number3); // will return FALSE
$match4 = preg_match($my_pattern2, $number3); // will return FALSE
Something similar will work in Perl as well.
Regular expressions are not appropriate for this purpose. Here is a Perl script:
#/usr/bin/perl
use strict;
use warnings;
my $src = '2345';
my #test = qw( 3245 5432 5542 1234 12345 );
my $canonical = canonicalize( $src );
for my $candidate ( #test ) {
next unless $canonical eq canonicalize( $candidate );
print "$src and $candidate consist of the same digits\n";
}
sub canonicalize { join '', sort split //, $_[0] }
Output:
C:\Temp> ks
2345 and 3245 consist of the same digits
2345 and 5432 consist of the same digits