Permutations with both repeating and non-repeating elements, partially figured out formula - combinations

I'm trying to find the formula to calculate the amount of permutations for x # of elements that can repeat, y # of elements that cannot repeat, and z # of spaces/slots. So far I figured out the parts for when only x elements are included: x^z, and when all elements are included but none act as repeating: ((x+y)!)/(((x+y)-z)!), which are basic derivations of known combinations and permutations formulas. These added together + the cases where repeatable and non-repeatable elements are both present, and repeatable elements are repeating at least once would be the total number of permutations, but I can't seem to narrow down the formula for that part. Any guidance would be apprecaited.

Related

finding the most repeated number in an array C++

Have to make program, where you input number, and program outputs most repeated digit in it, can't figure out have to do it. Tried some things, for me it works with static array, but I need dynamic, so I dont now what to do.
Can someone help me?
make an array in size of 10 (the number of digits)
run over your original array and extract from each number all its digit, increase the value of your digit array for each digit you find.
-find the maximal digit using the sight array.
You can upload a code if you want more help
I assume you are served in real time digits [0-9] and you need a function that at any given time returns you the most frequent digit seen so far. A simplest solution would be just to have a hash map of [0-9] keys that maintains the number of times every digit is seen. When you need the most frequent digit you iterate over the 10 keys and return the one with the biggest count.

Match all numbers present in two arrays efficiently [Python]

I want to match numbers that are present in two arrays (not of equal length) and output them to another array if there is a match. The numbers are floating point.
Currently I have a working program in Python but it is slow when I run it for large datasets. What I've done is two nested for loops.
The first nested for loop runs through array1 and checks if any numbers from array2 are in array 1. If there is a match, I write it to an array called arrayMatch1.
I then check array2 and see if there is a match with arrayMatch1. And output the final result to arrayFinal.
arrayFinal will have all numbers that exist within both array1, array2.
My problem:
Two nested for loops give me a complexity of O(n^2). This method works fine for data sets under an array length of 25000 but slows down significant if greater. How can I make it more efficient. The numbers are floating point and always are in this format ######.###
I want to speed up my program but keep using Python because of the simplicity. Are there better ways to find matches between two arrays?
Why not just find the interesection of two lists?
a = [1,2,3,4.3,5.7,9,11,15]
b = [4.3,5.7,6.3,7.9,8.1]
def intersect(a, b):
return list(set(a) & set(b))
print intersect(a, b)
Output:
[5.7, 4.3]
Gotten from this question.
So what you're basically trying to do is find intersection(logically correct term) of 2 list.
First you need to eliminate the duplicate form the list itself, set is great way to do that, then you can just & those lists and you will be good to go.
a = [23.3213,23.123,43.213,12.234] #List First
b = [12.234,23.345,34.224] #List Second
def intersect(a, b):
return list(set(a) & set(b))
print intersect(a, b)

Convert a regulation expression to DFA

I have been trying different ways to solve this problem for over an hour and am getting very frustrated.
The problem is: Give regular expressions and DFAs for each of the following languages over Sigma = {0,1}.
a). {w ∈ Σ* | w contains an even number of 0s or an odd number of 1s}
If anyone could provide hints or get me started on figuring this one out, it would be very appreciated!
I know it is something along the lines of this DFA but this one is for
{w ∈ Σ* | w contains an even number of 0s or exactly two 1's}
so it's a bit different but I can't figure it out.
You can see it as follows: you always have to remember two things:
whether the number of 0s is even or odd; and
whether the number of 1s is even or odd.
Now if we denote even with e and odd with o, we consider four states: ee (both even), eo (even number of 0s and odd number of 1s), oe and oo.
Now when we read a zero (0), we simply swap the first state token, so it means we introduce transitions from:
ee - 0 -> oe;
eo - 0 -> oo;
oe - 0 -> ee; and
oo - 0 -> eo.
The same for ones (1):
ee - 1 -> eo;
eo - 1 -> ee;
oe - 1 -> oo; and
oo - 1 -> oe.
Now we only need to determine the initial state and the accepting state(s). The intial state is ee, since at that moment we have considered no zeros and no ones.
Furthermore the accepting state can by determined by the condition:
w contains an even number of 0s or an odd number of 1s
So that means the accepting states are ee, eo and oo. A drawing of this DFA is shown below:
There exists an algorithmic way to convert a DFA into an equivalent regular expression as is stated here.
You can construct a regular expression by splitting the problem into two easier problems:
a regex that checks if the number of 0s is even; and
a regex that checks if the number of 1s is odd.
For the first, you can use the regex:
(1*01*0)*1*
Indeed: you first have a group (1*01*0). This group ensures that there are two zeros, and 1s can appear everywhere in between. We allow an arbitrary number of repetitions, since the number always remains even. The regex ends with 1* since it is still possible that there are additional ones in the string.
The second problem can be solved with the regex:
0*1(0*10*1)*0*
The solution is more or less the same. The expression between the brackets: (0*10*1) ensures that the ones occur evenly. By adding a 1 in front, we ensure the number of 1s is odd.
A regular expression that then solves the problem is:
(1*01*0)*1*|0*1(0*10*1)*0*
Since the "pipe" (|) means "or".
Think about what possible states you can ever be in.
A number contains either an even number of 0's or an odd number of 0's. (2 possible states)
A number contains either an even number of 1's or an odd number of 1's. (2 possible states)
Now let's look at what combinations are accepted by your language:
even 0's, even 1's: accept
even 0's, odd 1's: accept
odd 0's, even 1's: reject
odd 0's, odd 1's: accept
As a result, your DFA will need 4 states, of which 3 are accept states and 1 is a reject state. Every state will have 2 transitions leading to a different state. Since the empty string has an even number of 0's and an even number of 1's, the first state will be the initial state.
For making this into a regular expression: think about how you'd match an even number of 0's, then how you'd match an odd number of 1's. The language is just the union of these two.
Alternatively, as suggested by Willem, you can use an algorithm to convert any NFA to a regular expression. It has the advantage of being very general, but it's also more technical. Either way, it should lead to an equivalent regular expression.
What does a number with an even number of 0's look like? It might start with any number of 1's, but when we do find a 0 we better find another one! There can be any number of 1's in between, but we only care about the 0's. Thus, we come up with the following regular expression:
1*(01*01*)*
You should be able to apply a similar logic to match an odd number of 1's. Finally, OR the two expressions to get the requested regular expression.

Constructing palindrome from a list of words

Recently I was looking through some interview questions, and found some interesting one:
You are given a list of word. Find if two words can be joined to-gather to form a palindrome. eg Consider a list {bat, tab, cat} Then bat and tab can be joined to gather to form a palindrome.
Expecting a O(nk) solution where n = number of works and k is length
There can be multiple pairs, just return true if found one.
Also, in the comments one of the approaches was this:
1) Add the first word to the trie ( A B)
2) Take the second word (D E E D B A) and reverse it (A B D E E D)
3) See how many letters in the reversed word you can match in the trie (the first 2)
4) Take the rest of the string (D E E D) see if it is a palindrome if it is you are done return true
5) add the second word to the trie (D E E D B A)
6) go back to step 2 with the next word
7) when out of words return false
But in my opinion this is not an O(nk) solution.
Can anyone suggest a solution?? Or explain why the algorithm described above is O(nk)??
The algorithms is correct, or at least it gets quite close. There are minor technical issues. In step 4. one should save the proposition of a solution if it's better than the current one, and in step 7. return it, or say it was impossible to make a palindrome.
The main idea is to process words into cores and prefixes. If a core is a palindrome, then we need to match the prefix with other word. Trie serves as a "database" for processed strings, so with each new word, one can check all possible extensions. If words were kept separately one would need to compare prefixes of each word separately.
(Edit: I think there still is a small loophole, in case there are two words in a trie which starts the same, and the incoming one would make a palindrome with the shorter one, but not the longer, but I won't go into details. Handling it would complicate the algo but wouldn't affect complexity.)
It also is O(n*k). Adding and checking a prefix vs a trie takes number of steps proportional to the number of characters. So in this case this is bound by k. Just like tree operations are O(h) where h is the height of the tree. So in conclusion:
k steps.
takes k steps.
also takes at most k steps.
also takes less than k steps but we can bound it by k.
also takes k steps.
Steps 2 to 5 are done n-1 times.
Of course each step has a different dominant operation, so it is hard to specify the exact constant, but all of them are bound by k so the complexity is O(c*(n-1)*k) which essentially is O(n*k).
There's a really interesting discussion of this in an article from Dr. Dobbs, way back in 2004. The full explanation is a little long, but the general idea is:
Suppose you start with Lion, where the pivot is left of the actual word. I can calculate the center of the string, which is position two. The pivot is at zero, so the string is too heavy on the right, but at the moment, Lion qualifies as a partial palindrome. The "dot" at the pivot point matches the dot at the pivot point, so there is at least one correct character, albeit the same character. You now wish to prepend words that end with noil, attempting to convert the string to noil.Lion. I use to mean any string of characters. If you're successful, then you need to locate words starting with so that they can be appended to the string.
Note that he defines a partial palindrome as:
A string is a partial palindrome if, working from the pivot point outwards, either the left or right end of the string is encountered before a mismatch occurs.

Simple spell checking algorithm

I've been tasked with creating a simple spell checker for an assignment but have given next to no guidance so was wondering if anyone could help me out. I'm not after someone to do the assignment for me, but any direction or help with the algorithm would be awesome! If what I'm asking is not within the guildlines of the site then I'm sorry and I'll look elsewhere. :)
The project loads correctly spelled lower case words and then needs to make spelling suggestions based on two criteria:
One letter difference (either added or subtracted to get the word the same as a word in the dictionary). For example 'stack' would be a suggestion for 'staick' and 'cool' would be a suggestion for 'coo'.
One letter substitution. So for example, 'bad' would be a suggestion for 'bod'.
So, just to make sure I've explained properly.. I might load in the words [hello, goodbye, fantastic, good, god] and then the suggestions for the (incorrectly spelled) word 'godd' would be [good, god].
Speed is my main consideration here so while I think I know a way to get this work, I'm really not too sure about how efficient it'll be. The way I'm thinking of doing it is to create a map<string, vector<string>> and then for each correctly spelled word that's loaded in, add the correctly spelled work in as a key in the map and the populate the vector to be all the possible 'wrong' permutations of that word.
Then, when I want to look up a word, I'll look through every vector in the map to see if that word is a permutation of one of the correctly spelled word. If it is, I'll add the key as a spelling suggestion.
This seems like it would take up HEAPS of memory though, cause there would surely be thousands of permutations for each word? It also seems like it'd be very very slow if my initial dictionary of correctly spelled words was large?
I was thinking that maybe I could cut down time a bit by only looking in the keys that are similar to the word I'm looking at. But then again, if they're similar in some way then it probably means that the key will be a suggestion meaning I don't need all those permutations!
So yeah, I'm a bit stumped about which direction I should look in. I'd really appreciate any help as I really am not sure how to estimate the speed of the different ways of doing things (we haven't been taught this at all in class).
The simpler way to solve the problem is indeed a precomputed map [bad word] -> [suggestions].
The problem is that while the removal of a letter creates few "bad words", for the addition or substitution you have many candidates.
So I would suggest another solution ;)
Note: the edit distance you are describing is called the Levenshtein Distance
The solution is described in incremental step, normally the search speed should continuously improve at each idea and I have tried to organize them with the simpler ideas (in term of implementation) first. Feel free to stop whenever you're comfortable with the results.
0. Preliminary
Implement the Levenshtein Distance algorithm
Store the dictionnary in a sorted sequence (std::set for example, though a sorted std::deque or std::vector would be better performance wise)
Keys Points:
The Levenshtein Distance compututation uses an array, at each step the next row is computed solely with the previous row
The minimum distance in a row is always superior (or equal) to the minimum in the previous row
The latter property allow a short-circuit implementation: if you want to limit yourself to 2 errors (treshold), then whenever the minimum of the current row is superior to 2, you can abandon the computation. A simple strategy is to return the treshold + 1 as the distance.
1. First Tentative
Let's begin simple.
We'll implement a linear scan: for each word we compute the distance (short-circuited) and we list those words which achieved the smaller distance so far.
It works very well on smallish dictionaries.
2. Improving the data structure
The levenshtein distance is at least equal to the difference of length.
By using as a key the couple (length, word) instead of just word, you can restrict your search to the range of length [length - edit, length + edit] and greatly reduce the search space.
3. Prefixes and pruning
To improve on this, we can remark than when we build the distance matrix, row by row, one world is entirely scanned (the word we look for) but the other (the referent) is not: we only use one letter for each row.
This very important property means that for two referents that share the same initial sequence (prefix), then the first rows of the matrix will be identical.
Remember that I ask you to store the dictionnary sorted ? It means that words that share the same prefix are adjacent.
Suppose that you are checking your word against cartoon and at car you realize it does not work (the distance is already too long), then any word beginning by car won't work either, you can skip words as long as they begin by car.
The skip itself can be done either linearly or with a search (find the first word that has a higher prefix than car):
linear works best if the prefix is long (few words to skip)
binary search works best for short prefix (many words to skip)
How long is "long" depends on your dictionary and you'll have to measure. I would go with the binary search to begin with.
Note: the length partitioning works against the prefix partitioning, but it prunes much more of the search space
4. Prefixes and re-use
Now, we'll also try to re-use the computation as much as possible (and not just the "it does not work" result)
Suppose that you have two words:
cartoon
carwash
You first compute the matrix, row by row, for cartoon. Then when reading carwash you need to determine the length of the common prefix (here car) and you can keep the first 4 rows of the matrix (corresponding to void, c, a, r).
Therefore, when begin to computing carwash, you in fact begin iterating at w.
To do this, simply use an array allocated straight at the beginning of your search, and make it large enough to accommodate the larger reference (you should know what is the largest length in your dictionary).
5. Using a "better" data structure
To have an easier time working with prefixes, you could use a Trie or a Patricia Tree to store the dictionary. However it's not a STL data structure and you would need to augment it to store in each subtree the range of words length that are stored so you'll have to make your own implementation. It's not as easy as it seems because there are memory explosion issues which can kill locality.
This is a last resort option. It's costly to implement.
You should have a look at this explanation of Peter Norvig on how to write a spelling corrector .
How to write a spelling corrector
EveryThing is well explain in this article, as an example the python code for the spell checker looks like this :
import re, collections
def words(text): return re.findall('[a-z]+', text.lower())
def train(features):
model = collections.defaultdict(lambda: 1)
for f in features:
model[f] += 1
return model
NWORDS = train(words(file('big.txt').read()))
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def edits1(word):
splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
deletes = [a + b[1:] for a, b in splits if b]
transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b]
inserts = [a + c + b for a, b in splits for c in alphabet]
return set(deletes + transposes + replaces + inserts)
def known_edits2(word):
return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
def known(words): return set(w for w in words if w in NWORDS)
def correct(word):
candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
return max(candidates, key=NWORDS.get)
Hope you can find what you need on Peter Norvig website.
for spell checker many data structures would be useful for example BK-Tree. Check Damn Cool Algorithms, Part 1: BK-Trees I have done implementation for the same here
My Earlier code link may be misleading, this one is correct for spelling corrector.
off the top of my head, you could split up your suggestions based on length and build a tree structure where children are longer variations of the shorter parent.
should be quite fast but i'm not sure about the code itself, i'm not very well-versed in c++