How to do a set difference, except without eliminating repeated elements

How to do a set difference, except without eliminating repeated elements - list

I am trying to do the following in Matlab. Take two lists of numbers, possibly containing repeated elements, and subtract one set from the other set.
Ex: A=[1 1 2 4]; B=[1 2 4];
Desired result would be A-B=C=[1]
Or, another example, E=[3 3 5 5]; F=[3 3 5];
Desired result would be E-F=G=[5]
I wish I could do this using Matlab's set operations, but their function setdiff does not respect the repeated elements in the matrices. I appreciate that this is correct from a strict set theory standpoint, but would nevertheless like to tackle problems like: "I have 3 apples and 4 oranges, and you take 2 apples and 1 orange, how many of each do I have left." My range of possible values in these sets is in the thousands, so building a large matrix for tallying elements and then subtracting matrices does not seem feasible for speed reasons. I will have to do thousands of these calculations with thousands of set elements during a gui menu operation.
Example of what I would like to avoid for tackling the second example above:
E=[0 0 2 0 2]; F=[0 0 2 0 1];
G=E-F=[0 0 0 0 1];
Thanks for your help!

This can be done with the accumarray command.
A = [1 1 2 4]';
B = [1 2 4]'; % <-make these column vectors
X = accumarray(A, 1);
Y = accumarray(B, 1);
This will produce the output
X = [2 1 0 1]'
and
Y = [1 1 0 1]'
Where X(i) represents the number of incidents of the number i, in vector A, and Y(i) represents the number of incidents of number i in vector B.
Then you can just take X - Y.
One caveat: if the maximum values of A and B are different, the output from accummarray will have different lengths. If that is the case, you can just assign the output to be a subset of a vector of zeros that is the size of the larger vector.

I just want to improve on Prototoast's answer.
In order to avoid pitfalls involving non-positive numbers in A or B use hist:
A = [-10 0 1 1 2 4];
B = [1 2 4];
We need the minimum and maximum values in the union of A and B:
U = [A,B];
range_ = min(U):max(U);
So that we can use hist to give us same length vectors:
a = hist(A,range_)
b = hist(B,range_)
Now you need to subtract the histograms:
r = a-b
If you wish the set difference operator be symmetric then use:
r = abs(a-b)
The following will give you which items are in A \ B (\ here is your modified set difference):
C = range_(logical(r))
Hope this helps.

Related

how do I get all combinations of elements in a list?

When getting a list of X elements, how can I get all doubles, triples, ... ( Y ) combinations of these elements ?
Y being the size of the required combinations. Ex : if Y = 2, I need to get all of the possible pairs.
I must not give the same combinations twice ( ex : [a, b] and [b, a] are the same combination )

Take a copy of the list.
If the list is empty, there are no combinations.
To get all combinations of size one, look at each element in turn.
To get all combinations of size n+1, first remove the first element. Then get all combinations of size n of the rest of the list, plus that first element. Then get all combinations of size n+1 of the rest of the list, and don't add the first element.
And then you are done.
You can get fancy and merely pretend to copy/remove elements for optimization sake.

You can iterate t from 2 to Y, and create an array A with the size X fill with X-t 0s in the front and t 1s in the back, then with the code below:
do{
//1s in array A now correspond to a valid combination
}while(std::next_permutation(A,A+X));
The loop will stop when all combination with size t are iterated
next_permutation is in header algorithm, it will reorder the array to the next lexicographically greater permutation or return false if the array is already in the lexicographically greatest permutation. Its complexity is O(n), since you also need to iterate through the array once, so it wouldn't be a problem. Total complexity for the whole process will be bounded by O(2^n*n).
So here is an example pseudo code
D[X] = {1,2,3,4} Y = 3 //the input
For t = 2,3,..,Y
A[X] = {0,...,0,1,...,1} // X - t 0s and t 1s
Do
For j = 0,1,...,X-1
if A[j] == 1
output D[j]
end if
end for
output newline
While next_permutation(A,A+X)
end for
The output will looks like
3 4
2 4
2 3
1 4
1 3
1 2
2 3 4
1 3 4
1 2 4
1 2 3

Algorithm to get best combination

I have items with ID 1, 3, 4, 5, 6, 7. Now I have data like following.
There is an offerId for each row. Array of Ids consist of combination of the ID in an array. Discount is the value for that offerId
offerId : Array of Ids : Discount
o1 : [1] : 45
o2 : [1 3 4] : 100
o3 : [3 5] : 55
o4 : [5] : 40
o5 : [6] : 30
o6 : [6 7] : 20
Now I have to select all the offerIds which give me best combination of Ids i.e. maximum total discount.
For example in above case : possible results can be:
[o2, o4, o5] maximum discount is 170(100 + 40 + 30).
Note. the result offerId should be such that Ids don't repeat. Example for o2,o4,o6 ids are [1,3,4], [5], [6] all are distinct.
Other combination can be :
o1, o3, 06 for which ids are [1], [3,5], [6,7] However the total is 120(45+55+20) which is less then 170 as in previous case.
I need an algorithm/code which will help me to identify combination of offerIds which will give maximum discount , considering that each offer should contain distinct Ids.
NOTE I am writing my code in go language. But solutions/Logic in any language will be helpful.
NOTE : I hope I am able to explain my requirement properly. please comment if any extra information is required. Thanks.

Here is a dynamic programming solution which, for every possible subset of IDs, finds the combination of offers for which the discount is maximum possible.
This will be pseudocode.
Let our offers be structures with fields offerNumber, setOfItems and discount.
For the purposes of implementation, we first renumerate the possible items by integers from zero to number of different possible items (say k) minus one.
After that, we can represent setOfItems by a binary number of length k.
For example, if k = 6 and setOfItems = 1011102, this set includes items 5, 3, 2 and 1 and excludes items 4 and 0, since bits 5, 3, 2 and 1 are ones and bits 4 and 0 are zeroes.
Now let f[s] be the best discount we can get using exactly set s of items.
Here, s can be any integer between 0 and 2k - 1, representing one of the 2k possible subsets.
Furthermore, let p[s] be the list of offers which together allow us to get discount f[s] for the set of items s.
The algorithm goes as follows.
initialize f[0] to zero, p[0] to empty list
initialize f[>0] to minus infinity
initialize bestF to 0, bestP to empty list
for each s from 0 to 2^k - 1:
for each o in offers:
if s & o.setOfItems == o.setOfItems: // o.setOfItems is a subset of s
if f[s] < f[s - o.setOfItems] + o.discount: // minus is set subtraction
f[s] = f[s - o.setOfItems] + o.discount
p[s] = p[s - o.setOfItems] append o.offerNumber
if bestF < f[s]:
bestF = f[s]
bestP = p[s]
After that, bestF is the best possible discount, and bestP is the list of offers which get us that discount.
The complexity is O (|offers| * 2k) where k is the total number of items.
Here is another implementation which is asymptotically the same, but might be faster in practice when most subsets are unreachable.
It is "forward" instead of "backward" dynamic programming.
initialize f[0] to zero, p[0] to empty list
initialize f[>0] to -1
initialize bestF to 0, bestP to empty list
for each s from 0 to 2^k - 1:
if f[s] >= 0: // only for reachable s
if bestF < f[s]:
bestF = f[s]
bestP = p[s]
for each o in offers:
if s & o.setOfItems == 0: // s and o.setOfItems don't intersect
if f[s + o.setOfItems] < f[s] + o.discount: // plus is set addition
f[s + o.setOfItems] = f[s] + o.discount
p[s + o.setOfItems] = p[s] append o.offerNumber

Issue when generate random vectors with limits on matlab

I have a problem, I want to generate a table of 4 columns and 1 line, and with integers in the range 0 to 9, without repeating and are random each time it is run.
arrives to this, but I have a problem I always generates a 0 in the first element. And i dont know how to put a limit of 0-9
anyone who can help me?
Code of Function:
function [ n ] = generar( )
n = [-1 -1 -1 -1];
for i = 1:4
r=abs(i);
dig=floor((r-floor(r))*randn);
while find (n == dig)
r=r+1;
dig=dig+floor(r-randn);
end
n(i)=dig;
end
end
And the results:
generar()
ans =
0 3 9 6
generar()
ans =
0 2 4 8
I dont know if this post is a duplicate, but i need help with my specific problem.

So assuming you want matlab, because the code you supplied is matlab, you can simply do this:
randperm(10, 4) - 1
This will give you 4 unique random numbers from 0-9.

Another way of getting there is randsample(n, k) where n is an integer, then a random sample of size k will be drawn from the population 1:n (as a column vector). So for your case, you would get the result by:
randsample(10, 4)' - 1
It draws 4 random numbers from the population without replacement and all with same weights. This might be slower than randperm(10, 4) - 1 as its real strength comes with the ability to pass over population vectors for more sophisticated examples.
Alternatively one can call it with randsample(pop, k) where pop is the population-vector of which you want to draw a random sample of size k. So for your case, one would do:
randsample(0:9, 4)
The result will have the same singleton dimension as the population-vector, which in this case is a row vector.
Just to offer another solution and get you in touch with randsample().

To find the min and max after addition and subtraction from a range of numbers

I am having a Algorithm question, in which numbers are been given from 1 to N and a number of operations are to be performed and then min/max has to be found among them.
Two operations - Addition and subtraction
and operations are in the form a b c d , where a is the operation to be performed,b is the starting number and c is the ending number and d is the number to be added/subtracted
for example
suppose numbers are 1 to N
and
N =5
1 2 3 4 5
We perform operations as
1 2 4 5
2 1 3 4
1 4 5 6
By these operations we will have numbers from 1 to N as
1 7 8 9 5
-3 3 4 9 5
-3 3 4 15 11
So the maximum is 15 and min is -3
My Approach:
I have taken the lower limit and upper limit of the numbers in this case it is 1 and 5 only stored in an array and applied the operations, and then had found the minimum and maximum.
Could there be any better approach?

I will assume that all update (addition/subtraction) operations happen before finding max/min. I don't have a good solution for update and min/max operations mixing together.
You can use a plain array, where the value at index i of the array is the difference between the index i and index (i - 1) of the original array. This makes the sum from index 0 to index i of our array to be the value at index i of the original array.
Subtraction is addition with the negated number, so they can be treated similarly. When we need to add k to the original array from index i to index j, we will add k to index i of our array, and subtract k to index (j + 1) of our array. This takes O(1) time per update.
You can find the min/max of the original array by accumulating summing the values and record the max/min values. This takes O(n) time per operation. I assume this is done once for the whole array.
Pseudocode:
a[N] // Original array
d[N] // Difference array
// Initialization
d[0] = a[0]
for (i = 1 to N-1)
d[i] = a[i] - a[i - 1]
// Addition (subtraction is similar)
add(from_idx, to_idx, amount) {
d[from_idx] += amount
d[to_idx + 1] -= amount
}
// Find max/min for the WHOLE array after add/subtract
current = max = min = d[0];
for (i = 1 to N - 1) {
current += d[i]; // Sum from d[0] to d[i] is a[i]
max = MAX(max, current);
min = MIN(min, current);
}

Generally there is no "best way" to find the min/max in the performance point of view because it depends on how this application will be used.
-Finding the max and min in a list needs O(n) Time, so if you want to run many (many in the context of the input) operations, your approach to find the min/max after all the operations took place is fine.
-But if the list will hold many elements and you don’t want to run that many operations, you better check each result of the op if its a new max/min and update if necessary.

Logical Question

Consider a [4x8] matrix "A" and [1x8] matrix "B". I need to check if there exists a value "X" such that
[A]^T * [X] = [B]^T exists for any x >= 0 { X is a [4X1] matrix, T = transpose }
Now here is the trick/tedious part. The matrix A always has 1 as its diagonal. A11,A22,A33,A44 = 1 This matrix can be considered as two halves with first half being the first 4 columns and the second half being the second 4 columns like something below :
1 -1 -1 -1 1 0 0 1
A = -1 1 -1 0 0 1 0 0
-1 -1 1 0 1 0 0 0
-1 -1 -1 1 1 1 0 0
Each row in the first half can have either two or three -1's and if it has two -1's then that corresponding row in the second half should have one "1" or if any row has three -1's the second half of the matrix should have two 1's. The overall objective is to have the sum of each row to be 0.
Now B is a [1x8] matrix which can also be considered as two halves as follows:
B = -1 -1 0 0 0 0 1 1
Here there can be either one, two, three or four -1's in the first half and there should be equal number of 1's in the second half. It should be done in combinations For example, if there are two -1's in the first half, they can be placed in 4 choose 2 = 6 ways and for each of them there will be 6 ways to place the 1's in the second half which has a total of 6*6 = 36 ways. i.e. 36 different values for B's if there are two -1's in the first half. The placement of 1's in the matrix A should also be the same way. The way I could think of doing this is to consider a valarray or something of that sort and make the matrices A and B but I don't know what to do.
Now for every A, I've to test it with every combinations of B to see if there exists
[A]^T * [X] = [B]^T
I'm trying to prove a result that I got I need to know if such an X would exist or not. I'm very confused on implementing this. Any suggestions are welcome. This would come under linear programming concept in math. I want it either in C++ or in Matlab. Any other languages are also acceptable but I'm familiar with only these two. Thanks in advance.
Update:
Here is my answer for this problem :
clear;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%# Generating all possible values of vector B
%# permutations using dec2bin (start from 17 since it's the first solution)
vectorB = str2double(num2cell(dec2bin(17:255)));
%# changing the sign in the first half, then check that the total is zero
vectorB(:,1:4) = - vectorB(:,1:4);
vectorB = vectorB(sum(vectorB,2)==0,:);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%# generate all possible variation of first/second halves
z = -[0 1 1; 1 0 1; 1 1 0; 1 1 1]; n = -sum(z,2);
h1 = {
[ ones(4,1) z(:,1:3)] ;
[z(:,1:1) ones(4,1) z(:,2:3)] ;
[z(:,1:2) ones(4,1) z(:,3:3)] ;
[z(:,1:3) ones(4,1) ] ;
};
h2 = arrayfun(#(i) unique(perms([zeros(1,4-i) ones(1,i)]),'rows'), (1:2)', ...
'UniformOutput',false);
%'# generate all possible variations of complete rows
rows = cell(4,1);
for r=1:4
rows{r} = cell2mat( arrayfun( ...
#(i) [ repmat(h1{r}(i,:),size(h2{n(i)-1},1),1) h2{n(i)-1} ], ...
(1:size(h1{r},1))', 'UniformOutput',false) );
end
%'# generate all possible matrices (pick one row from each to form the matrix)
sz = cellfun(#(M)1:size(M,1), rows, 'UniformOutput',false);
[X1 X2 X3 X4] = ndgrid(sz{:});
matrices = cat(3, ...
rows{1}(X1(:),:), ...
rows{2}(X2(:),:), ...
rows{3}(X3(:),:), ...
rows{4}(X4(:),:) );
matrices = permute(matrices, [3 2 1]); %# 4-by-8-by-104976
A = matrices;
clear matrices X1 X2 X3 X4 rows h1 h2 sz z n r
options = optimset('LargeScale','off','Display','off');
for i = 1:size(A,3),
for j = 1:size(vectorB,1),
X = linprog([],[],[],A(:,:,i)',vectorB(j,:)');
if(size(X,1)>0) %# To check that it's not an empty matrix
if((size(find(X < 0),1)== 0)) %# to check the condition X>=0
if (A(:,:,i)'* X == (vectorB(j,:)'))
X
end
end
end
end
end
I got it with the help of stackoverflow folks. The only problem is the linprog function throws a lot of exceptions in every iteration along with the answers produced. The exception is:
(1)Exiting due to infeasibility: an all-zero row in the constraint matrix does not have a zero in corresponding right-hand-side entry.
(2) Exiting: One or more of the residuals, duality gap, or total relative error has stalled: the primal appears to be infeasible (and the dual unbounded).(The dual residual < TolFun=1.00e-008.
What does this mean. How can I overcome this?

It is not clear from your question if you are familiar with system linear equations and their solution, or it is what you are trying to "invent". See also here for Matlab-specific explanation.
If you are familiar with that, you should be more clear in your question about what makes your problem different.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to do a set difference, except without eliminating repeated elements - list

Related

how do I get all combinations of elements in a list?

Algorithm to get best combination

Issue when generate random vectors with limits on matlab

To find the min and max after addition and subtraction from a range of numbers

Logical Question

Categories

Resources