Regex permutations without repetition [duplicate] - regex

This question already has answers here:
How to find all permutations of a given word in a given text?
(6 answers)
Closed 7 years ago.
I need a RegEx to check if I can find a expression in a string.
For the string "abc" I would like to match the first appearance of any of the permutations without repetition, in this case 6: abc, acb, bac, bca, cab, cba.
For example, in this string "adesfecabefgswaswabdcbaes" it'd find a coincidence in the position 7.
Also I'd need the same for permutations without repetition like this "abbc". The cases for this are 12: acbb, abcb, abbc, cabb, cbab, cbba, bacb, babc, bcab, bcba, bbac, bbca
For example, in this string "adbbcacssesfecabefgswaswabdcbaes" it'd find a coincidence in the position 3.
Also, I would like to know how would that be for similar cases.
EDIT
I'm not looking for the combinations of the permutations, no. I already have those. WHat I'm looking for is a way to check if any of those permutations is in a given string.
EDIT 2
This regex I think covers my first question
([abc])(?!\1)([abc])(?!\2|\1)[abc]
Can find all permutations(6) of "abc" in any secuence of characters.
Now I need to do the same when I have a repeated character like abbc (12 combinations).

([abc])(?!\1)([abc])(?!\2|\1)[abc]
You can use this without g flag to get the position.See demo.The position of first group is what you want.
https://regex101.com/r/nS2lT4/41
https://regex101.com/r/nS2lT4/42

The only reason you might "need a regex" is if you are working with a library or tool which only permits specifying certain kinds of rules with a regex. For instance, some editors can be customized to color certain syntactic constructs in a particular way, and they only allow those constructs to be specified as regular expressions.
Otherwise, you don't "need a regex", you "need a program". Here's one:
// are two arrays equal?
function array_equal(a1, a2) {
return a1.every(function(chr, i) { return chr === a2[i]; });
}
// are two strings permutations of each other?
function is_permutation(s1, s2) {
return array_equal(s1.split('').sort(), s2.split('').sort());
}
// make a function which finds permutations in a string
function make_permutation_finder(chars) {
var len = chars.length;
return function(str) {
for (i = 0; i < str.length - len; i++) {
if (is_permutation(chars, str.slice(i, i+len))) return i;
}
return -1;
};
}
> finder = make_permutation_finder("abc");
> console.log(finder("adesfecabefgswaswabdcbaes"));
< 6
Regexps are far from being powerful enough to do this kind of thing.
However, there is an alternative, which is precompute the permutations and build a dynamic regexp to find them. You did not provide a language tag, but here's an example in JS. Assuming you have the permutations and don't have to worry about escaping special regexp characters, that's just
regexp = new RegExp(permuations.join('|'));

Related

Get the number between two characters - Typescript [duplicate]

This question already has answers here:
RegExp in TypeScript
(5 answers)
Closed 2 years ago.
I am new to Typescript and trying to make a webhook in my Google Cloud Functions.
I have a string: C1234567890A460450P10TS1596575969702
I want to use regex to extract the number 1234567890 from that string.
The first character C is fixed and does not change, the character A after the number is variable and can be any other alphabet.
The regex that matches the number is (?<=C)(\d{10})(?=\w).
I want to know how to execute this regex in Typescript so that I can get the number into a variable(eg: const number = [the number extracted from the string] //value 1234567890)
Edit 1:
Based on the provided suggestions (which I had tried already before posting this question), here is the code I could make out of it:
const string = request.body.string;
let regxp = new RegExp('(?<=C)(\d{10})(?=\w)');
const number = regxp.exec(string);
response.send(number);
This gives a blank response.
There is two problems, you never parsed the returned string to a number with parseInt and (?<=C) (positive lookbehind) is not always supported.
Second, your regular expression can be simplified into ^C\d{10} and a .splice(1) to remove the C.
const string: string = request.body.string;
const matches = s.match(/^C\d{10}/);
let number: number;
if(matches !== null) {
number = parseInt(matches[0].slice(1));
} else {
res.status(400).end(); // Assuming this is express
return;
}
res.send(number); // 1234567890
Playground

Regex to match all occurrences that begin with n characters in sequence

I'm not sure if it's even possible for a regular expression to do this. Let's say I have a list of the following strings:
ATJFH
ABHCNEK
BKDFJEE
NCK
ABH
ABHCNE
KDJEWRT
ABHCN
EGTI
And I want to match all strings that begin with any number of characters for the following string: ABHCNEK
The matches would be:
ABH
ABHCN
ABHCNE
ABHCNEK
I tried things like ^[A][B][H][C][N][E][K] and ^A[B[H[C[N[E[K]]]]]], but I can't seem to get it to work...
Can this be done in regex? If so, what would it be?
The simplest can be
^(?:ABHCNEK|ABHCNE|ABHCN|ABHC|ABH|AB|A)$
See demo.
https://regex101.com/r/eB8xU8/6
Use this regular expression:
^[ABHCNEK]+$
You haven't said how you want to use it, but one option doesn't require regex. Loop through the various strings and check for a match within your test string:
var strings = ['ATJFH', 'ABHCNEK', 'BKDFJEE', 'NCK', 'ABH', 'ABHCNE', 'KDJEWRT', 'ABHCN', 'EGTI'];
var test = 'ABHCNEK';
for (var i = 0; i < strings.length; i++) {
if (test.match(strings[i])) {
console.log(strings[i]);
}
}
This returns:
ABHCNEK
ABH
ABHCNE
ABHCN

RegExp JS regarding sequential patttern matching

P.S: --> I know there is an easy solution to my needs, and I can do it that way but, -- I am looking for a "diff" solution for learning sake & challenge sake. So, this is just to solve an algorithm in a lesser traditional way.
I am working on solving an algorithm, and thought I had everything working well but one use case is failing. That is because I am building a regexp dynamically - now, my issue is this.
I need to match letters sequentially up until one doesn't match, then I just "match" what did match sequentially.
so... lets say I was matching this:
"zaazizz"
with this: /\bz[a]?[z]?/
"zizzi".match(/\bz[z]?[i]?/)
currently, that is matching with a : [zi], but the match should only be [z]
zzi only matches "z" from the front of "zizzi", in that order zzi - I now I am using [z]? etc... so it is optional.. but what I really need is match sequentially.. I'd only get "zi" IF from the front, it matched: zzi per my regex.... so, some sort of lookahead or ?. I tried ?= and != no luck.
I still think a non-regex-approach is best here. Have a look at the following JS-Code:
var match = "abcdef";
var input = "abcxdef";
var mArray = match.split("");
var inArray = input.split("");
var max = Math.min(mArray.length, inArray.length) - 1;
for (var i = 0; i < max; i++) {
if (mArray[i] != inArray[i]) { break; }
}
input.substring(0, i);
Where match is the string to be partially matched, input is the input and input.substring(0, i) is the result of the matching part. And you can change match as often as you like.

How to accept numbers and specific words?

i have validating a clothes size field, and want it to accept only numbers and specific "words" like S, M, XL, XXL etc. But i am unsure how to add the words to the pattern. For example, i want it to match something like "2, 5, 23, S, XXXL" which are valid sizes, but not random combinations of letters like "2X3, SLX"
Ok since people are not suggesting regexp solutions i guess i should say that this is part of a larger method of validation which uses regexp. For convenience and code consistency i want to do this with regexp.
Thanks
If they're a known set of values, I am not sure a regex is the best way to do it. But here is one regex that is basically a brute-force match of your values, each with a \b (word boundary) anchor
\b2\b|\b5\b|\b23\b|\bXXXL\b|\bXL\b|\bM\b|\bS\b
Sorry for not giving you a straight answer. regexp might be overkill in your case. A solution without it could, depending on your needs, be more maintainable.
I don't know which language you use so I will just pick one randomly. You could treat it as a piece of pseudo code.
In PHP:
function isValidSize($size) {
$validSizeTags = array("S", "M", "XL", "XXL");
$minimumSize = 2;
$maximumSize = 23;
if(ctype_digit(strval($size))) { // explanation for not using is_numeric/is_int below
if($size >= $minimumSize && $size <= $maxiumSize) {
return true;
}
} else if(in_array($size, $validSizeTags)) {
return true;
} else {
return false;
}
}
$size = "XS";
$isValid = isValidSize($size); // true
$size = 23;
$isValid = isValidSize($size); // true
$size = "23";
$isValid = isValidSize($size); // true, is_int would return false here
$size = 50;
$isValid = isValidSize($size); // false
$size = 15.5;
$isValid = isValidSize($size); // false, is_numeric would return true here
$size = "diadjsa";
$isValid = isValidSize($size); // false
(The reason for using ctype_digit(strval($size)) instead of is_int or is_numeric is that the first one will only return true for real integers, not strings like "15". And the second one will return true for all numeric values not just integers. ctype_digit will however return true for strings containing numeric characters, but return false for integers. So we convert the value to a string using strval before sending it to ctype_digits. Welcome to the world of PHP.)
With this logic in place you can easily inject validSizeTags, maximumSize and minimumSize from a configuration file or a database where you store all valid sizes for this specific product. That would get much messier using regular expressions.
Here is an example in JavaScript:
var patt = /^(?:\d{1,2}|X{0,3}[SML])$/i;
patt.test("2"); // true
patt.test("23"); // true
patt.test("XXXL"); // true
patt.test("S"); // true
patt.test("SLX"); // false
Use Array Membership Instead of Regular Expressions
Some problems are easier to deal with by using a different approach to representing your data. While regular expressions can be powerful, you might be better off with an array membership test if you are primarily interested in well-defined fixed values. For example, using Ruby:
sizes = %w[2 5 23 S XXXL].map(&:upcase)
size = 'XXXL'
sizes.include? size.to_s.upcase # => true
size = 'XL'
sizes.include? size.to_s.upcase # => false
seeing as it is being harder than i had thought, i am thinking to store the individual matched values in an array and match those individually against accepted values. i will use something like
[0-9]+|s|m|l|xl|xxl
and store the matches in the array
then i will check each array element against [0-9]+ and s|m|l|xl|xxl and if it matches any of these, it's valid. maybe there is a better way but i can't dwell on this for too long
thanks for your help
This will accept the alternatives one or more times, separated by whitespace or punctuation. It should be easy enough to expand the separator character class if you think you need to.
^([Xx]{0,3}[SsMmLl]|[0-9]+)([ ,:;-]+([Xx]{0,3}[SsMmLl]))*$
If you can interpolate the accepted pattern into a string before using it as a regex, you can reduce the code duplication.
This is a regular egrep pattern. Regex dialects differ between languages, so you might need to tweak something in order to adapt it to your language of choice (PHP? It's good form to include this information in the question).

RegEx to find words with characters

I've found answers to many of my questions here but this time I'm stuck. I've looked at 100's of questions but haven't found an answer that solves my problem so I'm hoping for your help :D
Considering the following list of words:
iris
iridium
initialization
How can I use regex to find words in this list when I am looking using exactly the characters u, i, i? I'm expecting the regex to find "iridium" only because it is the only word in the list that has two i's and one u.
What I've tried
I've been searching both here and elsewhere but haven't come across any that helps me.
[i].*[i].*[u]
matches iridium, as expected, and not iris nor initialization. However, the characters i, i, u must be in that sequence in the word, which may or may not be the case. So trying with a different sequence
[u].*[i].*[i]
This does not match iridium (but I want it to, iridium contains u, i, i) and I'm stuck for what to do to make it match. Any ideas?
I know I could try all sequences (in the example above it would be iiu; iui; uii) but that gets messy when I'm looking for more characters (say 6, tnztii which would match initialization).
[t].*[n].*[z].*[t].*[i].*[i]
[t].*[z].*[n].*[t].*[i].*[i]
[t].*[z].*[n].*[i].*[t].*[i]
..... (long list until)
[i].*[n].*[i].*[t].*[z].*[t] (the first matching sequence)
Is there a way to use regex to find the word, irrespective of the sequence of the characters?
I don't think there's a way to solve this with RegularExpressions which does not end in a horribly convoluted expression - might be possible with LookForward and LookBehind expressions, but I think it's probably faster and less messy if you simply solve this programmatically.
Chop the string up by its whitespaces and then iterate over all the words and count the instances your characters appear inside this word. To speed things up, discard all words with a length less than your character number requirement.
Is this an academic exercise, or can you use more than a single regular expression? Is there a language wrapped around this? The simplest way to do what you want is to have a regexp that matches just i or u, and examine (count) the matches. Using python, it could be a one-liner. What are you using?
The part you haven't gotten around to yet is that there might be additional i's or u's in the word. So instead of matching on .*, match on [^iu].
Here's what I would do:
Array.prototype.findItemsByChars = function(charGroup) {
console.log('charGroup:',charGroup);
charGroup = charGroup.toLowerCase().split('').sort().join('');
charGroup = charGroup.match(/(.)\1*/g);
for (var i = 0; i < charGroup.length; i++) {
charGroup[i] = {char:charGroup[i].substr(0,1),count:charGroup[i].length};
console.log('{char:'+charGroup[i].char+' ,count:'+charGroup[i].count+'}');
}
var matches = [];
for (var i = 0; i < this.length; i++) {
var charMatch = 0;
//console.log('word:',this[i]);
for (var j = 0; j < charGroup.length; j++) {
try {
var count = this[i].match(new RegExp(charGroup[j].char,'g')).length;
//console.log('\tchar:',charGroup[j].char,'count:',count);
if (count >= charGroup[j].count) {
if (++charMatch == charGroup.length) matches.push(this[i]);
}
} catch(e) { break };
}
}
return matches.length ? matches : false;
};
var words = ['iris','iridium','initialization','ulisi'];
var matches = words.findItemsByChars('iui');
console.log('matches:',matches);
EDIT: Let me know if you need any explanation.
I know this is a really old post, but I found this topic really interesting and thought people might look for a similar answer some day.
So the goal is to match all words with a specific set of characters in any order. There is a simple way to do this using lookaheads :
\b(?=(?:[^i\W]*i){2})(?=[^u\W]*u)\w+\b
Here is how it works :
We use one lookahead (?=...) for each letter to be matched
In this, we put [^x\W]*x where x is the the letter that must be present.
We then make this pattern occur n times, where n is the number of times that x must appear in th word using (?:...){n}
The resulting regex for a letter x having to appear n times in the word is then (?=(?:[^x\W]*x){n})
All you have to do then is to add this pattern for each letter and add \w+ at the end to match the word !