Terminology: Is this a combination, permutation, or something else? - combinations

I'm writing an article and want to discuss how adding another dimension can greatly increase the number of possible values, and am looking for the correct term for something. For example, if t-shirts come in 7 sizes and 5 colors then there are 35 possible ?'s. And if we add collar style as a variable with 2 possible values, then we double the number of ?'s to 70. What is the correct term for this? I first thought of combinations and permutations, but their mathematical definitions at https://en.wikipedia.org/wiki/Combination and https://en.wikipedia.org/wiki/Permutation do not match this situation. Both refer to selections from a single pool of values; in my case I have separate pools of values for each position in the result. Is there a correct term for this?

Correct combinatorial term is Cartesian product (set of all tuples (ordered pairs, triplets etc))

Related

Generating "unique" matricies

This may be more of a math question than a programming question but since I am specifically working in c++ I figured maybe there was a library or something I didn't know about.
Anyway I'm working on a game where I'm generating some X by X arrays of booleans and randomly assigning Y of them to be true. Think tetris block kind of stuff. What I need to know is if there's a clever way to generate "unique" arrays without having to rotate the array 4 times and compare each time. To use tetris as an example again. an "L" piece is an "L" piece no matter how it's rotated, but a "J" piece would be a different unique piece. As a side question, is there a way to determine the maximum number of unique possible configurations for an X by X array with Y filled in elements?
You could sum (x-X/2)^2 + (y-X/2)^2 for each (x,y) true grid element. This effectively gives the squared distances from the centre of your grid to each "true" cell. Two grids that are the same when rotated share the property that their "true" cells are all the same distances from the centre, so this sum will also be the same. If the grids all have unique sums of squares, they are unique under rotation.
Note that although unique sums guarantees no rotational duplicates, the converse isn't true; two non-matching grids can have the same sum of squares.
If your grids are quite small and you are struggling to maximize the number of different patterns, you'll probably want to test those with equal sums. Otherwise, if your generator spits out a grid with a sum of squares that matches a previously created grid, reject it.
What you can do, is make a basic form: somehow uniquely decide which orientation among the 4 possible ones is the basic one and then compare them via the basic forms only.
How to decide which form is the basic one? It doesn't really matter as long as it is consistent. Say, pick the highest one according to lexicographical comparison.
Edit:
About the number of unique shapes: roughly speaking it is binomial number (n^2 over k)/4 - only that it doesn't take into account symetrical shapes that are preserved by 180° rotation, though there are only a few such shapes in comparison (at least for large n,k).
Side note: you should also consider the case of shapes that differ by shift only.

Algorithm Design: Best Way to Represent a 2D Grid, with Boundary Digits, in C++?

I like working on algorithms in my spare time to improve my algorithm design skills. I tackle Jane Street's monthly puzzles as my 'monthly challenge'. I've previously developed algorithms to solve their October Puzzle, and have solved their November puzzle by hand.
I solved their November puzzle (Hooks #6) by hand, but only because I'm not sure how to solve it (and future puzzles) that involve a grid with a numbered border, computationally. I'm not sure how I'd go about setting the foundation this type of problem.
For instance, many of their problems involve a 2D grid with numbers on the border of the grid. Furthermore, a recurring theme is that whatever is in the grid must meet multiple conditions that involve looking at that number from different sides of the grid. For example, if I have the following 2 by 2 grid, with 4 numbers outside its boundaries,
_ _
5| | 45
5|_ _| 15
Place four numbers in the grid such that, when you
look at the grid from the left, at least one number
in that row is the border number.
In the case of the top left of the 2 by 2 grid,
looking at it from the left means the number 5 must be in either (0,0) or (0,1).
In addition, when looking at that row from the right, the product
of the numbers in the row must equal the boundary number on the right.
In the case of the top right of the 2 by 2 grid,
looking at it from the right means the number 9 must be in either (0,0)
or (0,1), as 9 * 5 = 45.
Hence, the first row in the 2 by 2 grid can either be 5 and 9, or 9 and 5.
One of the solutions for this problem, by hand, is
(0,0) = 5, (0,1) = 9, (1,0) = 5, (1,1) = 3
but how can I go about this computationally?
How can I go about translating these grid-like problems with differing conditions based on the position one "looks" at the grid into code?
Thanks!
I'm not convinced these puzzles were meant to be solved via code. They seem idiosyncratic and complicated enough that coding them would be time-consuming.
That said, the November puzzle in particular seems to have rather limited options for "fixing" a number placement. I would consider a backtracking algorithm that keeps a complete board state and has ready methods that evaluate if a particular row or column is not breaking a rule, as well as the "free square" rule.
Then try each possible placement of the numbers given by the black indicators, ordered -- there aren't that many, considering they concatenate squares -- and call the evaluation on the affected rows and columns. Given the constraints, I think wrong branches would likely terminate quickly.
It seems to me it's more or less the best we can do since there don't seem to be clear heuristics to indicate a branch is more likely to succeed.
If you are looking for a data structure to represent one filled grid, I would recommend a struct row containing numbers left and right and a std::vector of numbers. A grid would be a vector of rows. You can write methods that allow you to pass it functions that check conditions on the rows.
But solving these problems in a generic way seems complicated to me. Different rules can mean very different approaches to solving them. Of course, if the instances are always this small, one can probably just try all (reasonable) possible fillings of the grid. But this will very fast become infeasible.
You can maybe implement somewhat generic algorithms, if there are rules that are similar to each other. For example a fixed value for the sum of all numbers in a row is a very similar problem to having a fixed value for the product.
But without constraining the possible rules and finding some similarities in them, you will have to write specific solver code for each and every rule.

find the number of all possible combinations with conflicts

I am trying to solve an optimization problem, but first I have to find the number of all possible combinations of n elements but considering some conflicts. A possible example could be:
elements: {1,2,3,4}
conflicts: {1,2},{3,4}
The term "conflict" means that the numbers that belong to the same conflict set must not be allocated into the same combination. Also the conflict sets are not always disjoint and the elements in each conflict set are always two.
Until now I only found how all possible combinations can be calculated, that is 2^n.
Thank you.
The conflict sets can be modeled as edges in a graph. You are asking for the number of independent vertex sets in a graph
An independent vertex set of a graph G is a subset of the vertices such that no two vertices in the subset represent an edge of G
- http://mathworld.wolfram.com/IndependentVertexSet.html
The above link also refers to something called the independence polynomial which can be used to count such things -- though this is useful only if the conflict graph has a nice structure. The general problem of determining the number of independent sets is known to be #P-complete (see https://en.wikipedia.org/wiki/Sharp-P-complete for a definition of this complexity class) so there is little chance that your question has a simple answer. Markov-chain techniques have been applied to approximate this number in some cases. See http://www.researchgate.net/publication/221590282_Approximately_Counting_Up_To_Four_(Extended_Abstract)

Given 200 strings, what is a good way to key a LUT of relationship values

I've got 200 strings. Each string has a relationship (measured by a float between 0 and 1) with every other string. This relationship is two-way; that is, relationship A/B == relationship B/A. This yields n(n-1)/2 relationships, or 19,800.
What I want to do is store these relationships in a lookup table so that given any two words I can quickly find the relationship value.
I'm using c++ so I'd probably use a std::map to store the LUT. The question is, what's the best key to use for this purpose.
The key needs to be unique and needs to be able to be calculated quickly from both words.
My approach is going to be to create a unique identifier for each word pair. For example given the words "apple" and "orange" then I combine them together as "appleorange" (alphabetical order, smallest first) and use that as the key value.
Is this a good solution or can someone suggest something more cleverer? :)
Basically you are describing a function of two parameters with the added property that order of parameters is not significant.
Your approach will work if you do not have ambiguity between words when changing order (I would suggest putting a coma or like between the two words to remove possible ambiguities). Any 2D array would also work.
I would probably convert each keyword to some unique identifier (using a simple map) before trying to find the relationship value, but it does not change much from what you are proposing.
If boost/tr1 is acceptable, I would go for an unordered_map with the pair of strings as key. The main question would then be: what with the order of the strings? This could be handled by the hash-function, which starts with the lexical first string.
Remark: this is just a suggestion after reading the design-issue, not a study.
How "quickly" is quickly? Given you don't care about the order of the two words, you could try a map like this:
std::map<std::set<std::string>, double> lut;
Here the key is a set of the two words, so if you insert "apple" and "orange", then the order is the same as "orange" "apple", and given set supports the less than operator, it can function as a key in a map. NOTE: I intentionally did not use a pair for a key, given the order matters there...
I'd start with something fairly basic like this, profile and see how fast/slow the lookups etc. are before seeing if you need to do anything smarter...
If you create a sorted array with the 200 strings, then you can binary search it to find the matching indices of the two strings, then use those two indices in a 2D array to find the relationship value.
If your 200 strings are in an array, your 20,100 similarity values can be in a one dimensional array too. It's all down to how you index into that array. Say x and y are the indexes of the strings you want the similarity for. Swap x and y if necessary so that y>=x, then look at entry i= x + y(y+1)/2 in the large array.
(x,y) of (0,0),(0,1),(1,1),(0,2),(1,2),(2,2),(0,3),(1,3)... will take you to entry 0,1,2,3,4,5,6,7...
So this uses space optimally and it gives faster look up than a map would. I'm assuming efficiency is at least mildly important to you since you are using C++!
[if you're not interested in self similarity values where y=x, then use i = x + y(y-1)/2 instead].

How to compute multiple related Levenshtein distances?

Given two strings of equal length, Levenshtein distance allows to find the minimum number of transformations necessary to get the second string, given the first. However, I'd like to find a way to adjust the alogrithm for multiple pairs of strings, given that they were all generated in the same way.
Reading the comments, it appears that this is the problem:
You are given a set of pairs of strings, all the same length and each pair is the input to some function paired with the output from the function. So, for the pair A,B, we know that f(A)=B. The goal is to reverse engineer f() with a large set of A,B pairs.
Using Levenshtein distance on the entire set will, at most, tell you the maximum number of transformations that must take place.
A better start would be Hamming distance (modified to allow multiple characters) or Jaccard similarity to identify how many positions in strings do not change at all for all of the pairs. Then, you are left only with those that do change.
This will fail if the letters shift.
To detect shift, you want to use global alignment (Needleman-Wunsch). You will then see something like "ABCDE"=>"xABCD" to show that from the input to the output, there was a left shift.
Overall, I feel that Levenshtein distance will do very little to help you get at the original algorithm.