Combination of elements of lists that meet some condition? - list

Given:
a = [5, 2, 8, 3, 9]
b = [3, 5, 7, 6, 8]
c = [8, 5, 7, 4, 9].
What is needed:
d = [(9, 8), (8, 7), ..., (5, 5, 5), (5, 6, 5), (5, 6, 7), ..., (8, 7, 7), (9, 8, 9), ...].
Description:
(1) In the above example, there are three lists a, b, c having integer elements and the output is another list d of tuples.
(2) The tuples in d have elements belonging to (a and b and c) or (a and b) or (b and c) such that difference between elements within any tuple is not greater than 1.
(3) Problem: How to find the complete list d where we take any element from any input list and find the difference less than or equal to 1. Generalize to more than just three input list: a, b, c, d, e, ... and each one is having ~ 1000 elements. I also need to retrieve the indices relative to the input lists/ arrays that form the tuples.
(4) Clarification: (a) All such tuples which contain entries not differing by more than 1 are allowed.
(b) Tuples must have elements that are close to at least one other element by not more than 1.
(c) Entries within a tuple must belong to different input arrays/ lists.
Let me know if there are further clarifications needed!

You can use sorting to find results faster than a naive brute-force. That being said, this assumes the number of output tuple is reasonably small. Otherwise, there is no way to find a solution in a reasonable time (eg. several months). As #mosway pointed out in the comments, the number of combinations can be insanely huge since the complexity is O(N ** M) (ie. exponential) where N is the number of list and M is the length of the lists.
The idea is to use np.unique on all lists so to get many sorted arrays with unique items. Then, you can iterate over the first array, and for each number (in the first array), find the range of values in the second one fitting in [n-1;n+1] using a np.searchsorted. You can then iterate over the filtered values of the second array and recursively do that on other array.
Note that regarding which array is chosen first, the method can be significantly faster. Thus, a good heuristic could be to select an array containing values very distant from others. Computing a distance matrix with all the values of all array and selecting the one having the biggest average distance should help.
Note also that using Numba should significantly speed up the recursive calls.

Related

How would I compare a list (or equivalent) to another list in c++

I am attempting to learn C++ from scratch and possess a medium amount of python knowledge.
Here is some of my python code which takes a number, turns it into a list and checks if it contains all digits 0-9. If so it returns True, if not it returns False.
def val_checker(n):
values = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
lst = []
for i in range(len(str(n))):
lst.append((n // 10 ** i) % 10)
lst = lst[::-1]
return all(i in lst for i in values)
How would I achieve a similar thing in C++?
You would use the standard library container std::set or better yet std::unordered_set
This container will hold at most one of each distinct element, duplicates insertions are ignored.
So you can run through your original number in a loop, adding each digit into the set, and consider success if s.size() == 10 if your set is called s

Find the same numbers between [a,b] intervals

Suppose I have 3 array of consecutive numbers
a = [1, 2, 3]
b = [2, 3, 4]
c = [3, 4]
Then the same number that appears in all 3 arrays is 3.
My algorithm is to use two for loops in each other to check for the same array and push it in another array (let's call it d). Then
d = [2, 3] (d = a overlap b)
And use it again to check for array d and c => The final result is 1, cause there are only 1 numbers that appears in all 3 arrays.
e = [3] (e = c overlap d) => e.length = 1
Other than that, if there exists only 1 array, then the algo should return the length of the array, as all of its numbers appear in itself. But I think my said algo above would take too long because the numbers of array can go up to 10^5. So, any idea of a better algorithm?
But I think my said algo above would take too long because the numbers of array can go up to 105. So, any idea of a better algorithm?
Yes, since these are ranges, you basically want to calculate the intersection of the ranges. This means that you can calculate the maximum m of all the first elements of the lists, and the minimum n of all the last elements of the list. All the numbers between m and n (both inclusive) are then members of all lists. If m>n, then there are no numbers in these lists.
You do not need to calculate the overlap by enumerating over the first list, and check if these are members of the last list. Since these are consecutive numbers, we can easily find out what the overlap is.
In short, the overlap of [a, ..., b] and [c, ..., d] is [ max(a,c), ..., min(b,d) ], there is no need to check the elements in between.

Python3 how to create a list of partial products

I have a very long list (of big numbers), let's say for example:
a=[4,6,7,2,8,2]
I need to get this output:
b=[4,24,168,336,2688,5376]
where each b[i]=a[0]*a[1]...*a[i]
I'm trying to do this recursively in this way:
b=[4] + [ a[i-1]*a[i] for i in range(1,6)]
but the (wrong) result is: [4, 24, 42, 14, 16, 16]
I don't want to compute all the products each time, I need a efficient way (if possible), because the list is very long
At the moment this works for me:
b=[0]*6
b[0]=4
for i in range(1,6): b[i]=a[i]*b[i-1]
but it's too slow. Any ideas? Is it possible to avoid "for" or to speedup it in other way?
You can calculate the product step-by-step since every next calculation heavily depends on the previous one.
What I mean is:
1) Compute the product for the first i - 1 numbers
2) The i-th product will be equal to a[i] * product of the last i - 1 numbers
This method is called dynamic programming
Dynamic programming (also known as dynamic optimization) is a method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions
This is the implementation:
a = [4, 6, 7, 2, 8, 2]
b = []
product_so_far = 1
for i in range(len(a)):
product_so_far *= a[i]
b.append(product_so_far)
print(b)
This algorithm works in linear time (O(n)), which is the most efficient complexity you'll get for such a task
If you want a little optimization, you could generate the b list to the predefined length (b = [0] * len(a)) and, instead of appending, you would do this in a loop:
b[i] = product_so_far

How to find closest exceeding number between two lists?

I have two lists of numbers list A and list B
I want to map every number in list A to a number in list B. That number is the closest number that list A exceeds in list B.
So for example, if i have the number 5 in list A and there are the numbers 3 and 6 in list B, then I want the number 5 to map to 3.
I realize I could do this by taking the difference between each number in list A with each number in list B then indexing and such but my list A and list B are extremely long and was wondering if there was a more efficient way to go about this.
Thanks!
You say you are looking for something faster than getting the difference. If you look at this answer, which computes the closest value for a single item in O(n), your list would only take O(n^2), which is really quick. Your solution would look like this:
>>> A = [100, 7, 9]
>>> B = [2, 5, 6, 8, 123, 12]
>>> [min(A, key=lambda x: 2**16 if x > y else abs(x-y)) for y in B]
[12, 6, 8]
The 2**16 is slightly dirty, but gets the job done.

How to use combinations of sets as test data

I would like to test a function with a tuple from a set of fringe cases and normal values. For example, while testing a function which returns true whenever given three lengths that form a valid triangle, I would have specific cases, negative / small / large numbers, values close-to being overflowed, etc.; what is more, main aim is to generate combinations of these values, with or without repetition, in order to get a set of test data.
(inf,0,-1), (5,10,1000), (10,5,5), (0,-1,5), (1000,inf,inf),
...
As a note: I actually know the answer to this, but it might be helpful for others, and a challenge for people here! --will post my answer later on.
Absolutely, especially dealing with lots of these permutations/combinations I can definitely see that the first pass would be an issue.
Interesting implementation in python, though I wrote a nice one in C and Ocaml based on "Algorithm 515" (see below). He wrote his in Fortran as it was common back then for all the "Algorithm XX" papers, well, that assembly or c. I had to re-write it and make some small improvements to work with arrays not ranges of numbers. This one does random access, I'm still working on getting some nice implementations of the ones mentioned in Knuth 4th volume fascicle 2. I'll an explanation of how this works to the reader. Though if someone is curious, I wouldn't object to writing something up.
/** [combination c n p x]
* get the [x]th lexicographically ordered set of [p] elements in [n]
* output is in [c], and should be sizeof(int)*[p] */
void combination(int* c,int n,int p, int x){
int i,r,k = 0;
for(i=0;i<p-1;i++){
c[i] = (i != 0) ? c[i-1] : 0;
do {
c[i]++;
r = choose(n-c[i],p-(i+1));
k = k + r;
} while(k < x);
k = k - r;
}
c[p-1] = c[p-2] + x - k;
}
~"Algorithm 515: Generation of a Vector from the Lexicographical Index"; Buckles, B. P., and Lybanon, M. ACM Transactions on Mathematical Software, Vol. 3, No. 2, June 1977.
With the brand new Python 2.6, you have a standard solution with the itertools module that returns the Cartesian product of iterables :
import itertools
print list(itertools.product([1,2,3], [4,5,6]))
[(1, 4), (1, 5), (1, 6),
(2, 4), (2, 5), (2, 6),
(3, 4), (3, 5), (3, 6)]
You can provide a "repeat" argument to perform the product with an iterable and itself:
print list(itertools.product([1,2], repeat=3))
[(1, 1, 1), (1, 1, 2), (1, 2, 1), (1, 2, 2),
(2, 1, 1), (2, 1, 2), (2, 2, 1), (2, 2, 2)]
You can also tweak something with combinations as well :
print list(itertools.combinations('123', 2))
[('1', '2'), ('1', '3'), ('2', '3')]
And if order matters, there are permutations :
print list(itertools.permutations([1,2,3,4], 2))
[(1, 2), (1, 3), (1, 4),
(2, 1), (2, 3), (2, 4),
(3, 1), (3, 2), (3, 4),
(4, 1), (4, 2), (4, 3)]
Of course all that cool stuff don't exactly do the same thing, but you can use them in a way or another to solve you problem.
Just remember that you can convert a tuple or a list to a set and vice versa using list(), tuple() and set().
Interesting question!
I would do this by picking combinations, something like the following in python. The hardest part is probably first pass verification, i.e. if f(1,2,3) returns true, is that a correct result? Once you have verified that, then this is a good basis for regression testing.
Probably it's a good idea to make a set of test cases that you know will be all true (e.g. 3,4,5 for this triangle case), and a set of test cases that you know will be all false (e.g. 0,1,inf). Then you can more easily verify the tests are correct.
# xpermutations from http://code.activestate.com/recipes/190465
from xpermutations import *
lengths=[-1,0,1,5,10,0,1000,'inf']
for c in xselections(lengths,3): # or xuniqueselections
print c
(-1,-1,-1);
(-1,-1,0);
(-1,-1,1);
(-1,-1,5);
(-1,-1,10);
(-1,-1,0);
(-1,-1,1000);
(-1,-1,inf);
(-1,0,-1);
(-1,0,0);
...
I think you can do this with the Row Test Attribute (available in MbUnit and later versions of NUnit) where you could specify several sets to populate one unit test.
While it's possible to create lots of test data and see what happens, it's more efficient to try to minimize the data being used.
From a typical QA perspective, you would want to identify different classifications of inputs. Produce a set of input values for each classification and determine the appropriate outputs.
Here's a sample of classes of input values
valid triangles with small numbers such as (1 billion, 2, billion, 2 billion)
valid triangles with large numbers such as (0.000001, 0.00002, 0.00003)
valid obtuse triangles that are 'almost'flat such as (10, 10, 19.9999)
valid acute triangles that are 'almost' flat such as (10, 10, 0000001)
invalid triangles with at least one negative value
invalid triangles where the sum of two sides equals the third
invalid triangles where the sum of two sides is greater than the third
input values that are non-numeric
...
Once you are satisfied with the list of input classifications for this function, then you can create the actual test data. Likely, it would be helpful to test all permutations of each item. (e.g. (2,3,4), (2,4,3), (3,2,4), (3,4,2), (4,2,3), (4,3,2)) Typically, you'll find there are some classifications you missed (such as the concept of inf as an input parameter).
Random data for some period of time may be helpful as well, that can find strange bugs in the code, but is generally not productive.
More likely, this function is being used in some specific context where additional rules are applied.(e.g. only integer values or values must be in 0.01 increments, etc.) These add to the list of classifications of input parameters.