Permutation/Combination with license plates - combinations

Question, with a California license plate, it has #LLL### where L = Alphabet. I know with the combination is 10^4 * 10^3 for all possible solution. How about if I excluded a certain word, such as "FSS", where any combination of car license plate would not include the word "FSS".
How do I go upon this? I can still use the letters, but the three can't be together. Its throwing me for a loop. Do I use permutation to exclude the repetition word? Any help is appreciated.
EDIT- the # = digits. So from 0-9, there are ten possibilities, sorry didn't clarify

There are only so many ways you can have FSS in a string of seven characters.
FSS####
#FSS###
##FSS##
###FSS#
####FSS
So there are five different license plates with the string FSS in them. If there is no constraint on the four numbers, that means you have 9,999 different license plates for each position of "FSS".
You would want to subtract 9,999 * 5 from you total answer to get the plates allowed.
Edit:
So you want all permutations of 0-9 in the first, fifth, sixth, and seventh positions. And all permutations of A-Z of the second, third and fourth positions, except for F in the second, S in the third and S in the fourth, right? If so, it would be 10*25*25*25*10*10*10, or 10^4 * 25^3. Did I get your problem right?

Related

Filtering 4 capital letters

I have a couple of cells that all vary in their content, but one thing that is the same across all is that they all contain a string of 4 letters which are all capital.
Is there some way for me to only show those 4 capital letters? (through a formula)?
I'm fairly flexible in the solution here, it could either be done in the cell itself or in another cell that references the cell in question.
try:
=INDEX(IFNA(REGEXEXTRACT(A1:A, "[A-Z]{4}")))

Regex (Python) data extraction - overlapping or incomplete results

I'm trying to extract data from some WHO codebooks that I've converted from PDF to text with Python slate library.
The text I want to hit starts with 2 digits, dash, 2 digits, followed by some text and ends with "Q"+1 or 2 digits and again "Q"+1 or 2 digits
17-17How old are you?Q1Q1
31-31During the past 30 days, how many times per day did you usually eat fruit, such as bananas, apples, oranges, dates, or any other fruits?Q7Q11
Sometimes those phrases end with a blank, sometimes the next questions starts immediately (here are three question), observe Q4Q424-29 and Q5Q530-30
20-23How tall are you without your shoes on? (Note: Data are in meters.)Q4Q424-29How much do you weigh without your shoes on? (Note: Data are in kilograms.)Q5Q530-30During the past 30 days, how often did you go hungry because there was not enough food in your home?Q6Q7
With
\d{2}-\d{2}[a-zA-Z0-9 .()?:,]+Q\d{1,2}Q\d(\d)*?
I get pretty close, but I'm missing the second digit when the second "Q" has two digits.
I've tried to add a negative lookahead
\d{2}-\d{2}[a-zA-Z0-9 .()?:,]+Q\d{1,2}Q\d((\d)(?!\d\d-))
to exclude the start of the pattern with two digits and a dash.
\d{2}-\d{2}[a-zA-Z0-9 .()?:,]+Q\d{1,2}Q\d{1,2}
includes the second digit of the "Q" but generates overlapping results e.g. at Q4Q424-29 where the first string ends with Q4Q42 and the second string starts with 4-29.
The regex with parts of the original sample text is here: https://regex101.com/r/d9Dlga/2/
Any suggestions who to extract the correct strings like:
17-17How old are you?Q1Q1
20-23How tall are you without your shoes on? (Note: Data are in meters.)Q4Q4
24-29How much do you weigh without your shoes on? (Note: Data are in kilograms.)Q5Q5
31-31During the past 30 days, how many times per day did you usually eat fruit, such as bananas, apples, oranges, dates, or any other fruits?Q7Q11
Thanks!
I see the problem now. New attempt that I think works:
\d{2}-\d{2}.+?Q\d{1,2}Q\d{1,2}(?!\d-\d{2})
I put a negative lookahead at the end to test if a new section has begun.
9 matches
Correctly grabs the full 2-digit endings
Demo
The following pattern should work:
\d{2}-\d{2}[a-zA-Z0-9 .()?:,]+Q\d{1,2}Q\d(\d(?!\d-))?

String Finding Alg w/ Lowest Freq Char

I have 3 text files. One with a set of text to be searched through
(ex. ABCDEAABBCCDDAABC)
One contains a number of patterns to search for in the text
(ex. AB, EA, CC)
And the last containing the frequency of each character
(ex.
A 4
B 4
C 4
D 3
E 1
)
I am trying to write an algorithm to find the least frequent occurring character for each pattern and search a string for those occurrences, then check the surrounding letters to see if the string is a match. Currently, I have the characters and frequencies in their own vectors, respectively. (Where i=0 for each vector would be A 4, respectively.
Is there a better way to do this? Maybe a faster data structure? Also, what are some efficient ways to check the pattern string against the piece of the text string once the least frequent letter is found?
You can run the Aho-Corasick algorithm. Its complexity (once the preprocessing - whose complexity is unrelated to the text - is done), is Θ(n + p), where
n is the length of the text
p is the total number of matches found
This is essentially optimal. There is no point in trying to skip over letters that appear to be frequent:
If the letter is not part of a match, the algorithm takes unit time.
If the letter is part of a match, then the match includes all letters, irrespective of their frequency in the text.
You could run an iteration loop that keeps a count of instances and has a check to see if a character has appeared more than a percentage of times based on total characters searched for and total length of the string. i.e. if you have 100 characters and 5 possibilities, any character that has appeared more than 20% of the hundred can be discounted, increasing efficiency by passing over any value matching that one.

Calculate max even lengths of a string to be split

I know what I want but I have no idea if there's a technical name for this or how to go about calculating it.
Suppose I have a string:
ABCDEFGHI
This string can be split evenly into a "MAXIMUM" of 3 characters each sub-string.
Basically, I'm trying to find out how to split a string evenly into its maximum sub-lengths. A string of 25 characters can be evenly split into 5 parts consisting of 5 characters each. If I tried to split it into 4 or 6 pieces, I'd have an odd length string (a string with a size different from the others).
A string of 9 characters can be split into only 3 pieces of 3 characters each.
A string of 10 characters can be split into only 2 pieces of 5 characters each.
A string of 25 characters can be split into only 5 pieces of 5 characters each.
A string of 15 characters can be split into 3 pieces of 5 characters each OR 5 pieces of 3 characters each.
A string of 11 characters cannot be split because one string will always be larger than the other.
So how do I go about doing this? I've thought of using the square root but that doesn't work for a string of "10" characters. Works for 9 and 25 just fine.
Any ideas? Is there a technical name for such a thing? I thought of "Greatest Common Divisor", but I'm not so sure.
Well... let me see... I think that if I got it right, you want to verify if a certain number (the length of your string) is prime or not :)
A first idea would be this:
1) get length of string, make a loop where you divide the length of the string by all numbers from 2 up to (length of string/2) [you will need to check if this (length of string/2) is a whole number too, and adjust it if not ;)]
2) if at least ONE number divides it, bam. (You check this one by verifying if there is a remainder after the division)
3) if not, you got yourself a prime. Sorry, no even division.
Of course that approach would not be very fast for very long strings... just an idea.
It is only about prime number and composite number, a basic math concept. The algorithm you need is Primality Test

Word lexical families

I am given a set of N words, and an integer K. 2 words are in the same group if they have exactly the first k letters and the last k letters identical. If they have more than k letters identical or less than k letters identical then the words are not in the same group. For example:
For k=3.
"abcdefg" and "abczefg" are in the same group
"abcddefg" and "abcdzefg" are not in the same group (the first k+1 letters are identical)
"abc" and "abc" are in the same group
A word can be in more than 1 groups. For example (k=3):
"abczefg" and "abcefg" form a group
"abczaefg" and "abcefg" form a group
"abczaefg" and "abczefg" are not in the same group (the first k+1 letters are identical)
The problem asks me to find the number of groups which contain the maximum number of words.
I thought about using a Trie (or Prefix Tree) and I assume this is the right data structure for this problem but I don't know how can I adapt them for this problem, because the part where if 2 words have more than k letters identical are not in the same group confuse me. My ideea has the complexity O(N*N*K) and considering that N<=10,000 and K<=100 I don't think that this ideea is fast enough. I would like to explain you my ideea, but it is not cleary yet even for me and I don't even know if it is correct, so I will skip this part.
My question is if there is a way I could solve this problem using a faster algorithm, and if there is such algorithm, I kindly ask you to explain it a little bit. Thank you in advance and I am sorry for the gramatical mistakes and if I didn't explain the problem clearly!
First group all the words that share the first k letters and last k letters. Your largest group must sit inside one of these groups, since there's no way two words that differ at their starts and ends can be in the same solution.
So, within each of these groups (of words that share the same k letters at their start and end), you need to find a maximal set of words such that no two share the k+1'th letter, nor the k+1'th letter from the end.
Construct a graph where vertices are the pairs of letters that are (k+1) from each end (de-duping) from words in one of these groups, and edges occur between (a, b) and (c, d) if a=c or b=d.
You need to find a subgraph of this which has no edges in it. This reduced problem is an instance of the "maximum independent subgraph" problem, which is NP-hard, so you'll need to solve it by using a search and hoping the set of words you're given isn't too nasty. Perhaps there's something about the graphs here to give a faster solution, but I don't see it.
The solution to the entire problem is the largest solution to one of the reduced problems described above.
Hope this helps!