Before I put my code, that I built it based on Thomas Draper code, and thanks to Jarod42
I will explain it in example:
Let this data in txt file: where each integer numbers associated with probability
1 0.933 2 0.865 3 0.919 4 0.726
3 0.906 2 0.854 4 0.726
4 0.865 3 0.933 5 0.919
Let the use input threshold = 1.5
I want to apply (next_combination) on my data in loop from k=1 until there is no more combination
When k =1, the result will be:
First step: generate all set of size 1, where the frequency of item can be represented by the summation of its probability.
{1}= 0.933
{2}= 0.865 + 0.854= 1.719
{3}= 0.919 + 0.906 + 0.933 = 2.758
{4}= 0.726 + 0.726 + 0.865 = 2.317
{5}= 0.919
Second step: ears all set of size 1, that has frequency of < threshold
==> We erased set {1}, {5}. And save the deleted item in another set
Repeat the steps when k=2
First step: generate all set of size 2,
We check to see if the generated set is superset from set that in erased set
we know that {1}, {5} already erased,
So no need to generate any superset include {1}, {5}
The rest generated superset will be:
{2,3} = (0.865 * 0.919 ) + (0.906 * 0.854) = 1.56774
{2,4}= (0.865 * 0.726) + (0.854 * 0.726)= 1.247994
{3,4}= (0.919 * 0.726) + (0.906 * 0.726) + (0.865 *0.933)= 2.131995
Second step: ears all set of size 2, that has frequency of < threshold
==> We erased set {2,4} And save the deleted item in erased set, {1}, {5}, {2,4}
Repeat the steps when k=3
From the previous step we only have: {2,3}, and {3,4}
The new generated set will be
{2,3,4} = (0.865 * 0.919 * 0.726) + (0.906 * 0.854 * 0.726)=
1.138846434 < threshold
I have done this code before with vector of vector of integer, it gives me correct answer but worst time. (code)
here where i need help, I got a lot of errors can't deal with them because i don't know how to employ next_combination with struct (code)
Related
This is my try to count the contiguous subsequences of an array with product mod 4 is not equal to 2:
# include <iostream>
using namespace std;
int main() {
long long int n, i, j, s, t, count = 0;
cin>>n;
long long int arr[n];
count = 0;
for(i = 0; i<n; i++) {
cin>>arr[i];
}
for(i = 0; i<n; i++) {
s = 1;
for(j = i; j<n; j++) {
s = s*arr[j];
if(s%4!=2) {
count++;
}
}
}
cout<<count;
return 0;
}
However, I want to reduce the time taken by my code to execute. I am looking for a way to do it. Any help/hint would be appreciated.
Thank you.
What does this definition of contiguous subsequences mean?
Listing all the subsequences
Suppose we have the sequence:
A B C D E F
First of all, we should recognize that there is one substring for every unique start and end point. Let's use the notation C-F to mean all items from C through F: i.e.: C D E F.
We can list all subsequences in a triangular arrangement like this:
A B C D E F
A-B B-C C-D D-E E-F
A-C B-D C-E D-F
A-D B-E C-F
A-E B-F
A-F
The first row lists all the subsequences of length 1.
The second row lists all the subsequences of length 2.
The third row lists all the subsequences of length 3. Etc.
The last row is the full sequence.
Modular arithmetic
Computing the product MOD 4 of a set of numbers
To figure out the product of a bunch of numbers MOD 4, we just need to look at each element of the set MOD 4. Intuitively, this is because when you multiply a bunch of numbers, the last digit of the result is determined entirely by the last digit of each factor. In this case "the last digit base 4" is the number mod 4.
The identity we are using is:
(A * B) MOD N == ((A MOD N) * (B MOD N)) MOD N
The table of products
Now we also have to look at the matrix of possible multiplications that might happen. It's a fairly small table and the interesting entries are given here:
2 * 2 = 4 4 MOD 4 = 0
2 * 3 = 6 6 MOD 4 = 2
3 * 3 = 9 9 MOD 4 = 1
So the results of multiplying any 2 numbers MOD 4 are given by this table:
+--------+---+---+---+---+
| Factor | 0 | 1 | 2 | 3 |
+--------+---+---+---+---+
| 0 | 0 | / | / | / |
| 1 | 0 | 1 | / | / |
| 2 | 0 | 2 | 0 | / |
| 3 | 0 | 3 | 2 | 1 |
+--------+---+---+---+---+
The /'s are omitted because of the symmetry of multiplication (A * B = B * A)
An example sequence
Now for each subsequence, let's compute the product MOD 4 of its elements.
Consider the following list of numbers
242 497 681 685 410 795
The first thing we do is take all these numbers MOD 4 and list them as the first row of our list of all subsequences triangle.
2 0 1 1 2 3
The second row is just the product of the pairs above it.
2 0 1 1 2 3
0 0 1 2 3
In general, the Nth element of each row is the product, MOD 4, of:
the number just to its left in the row above left times and
the element in the first row that is diagonally to its right
For example C = A * B
* * * * B *
* * * / *
* A / *
* C *
* *
*
Again,
A is immediately up and left of C
B is diagonally right all the way to the top row from C
Now we can complete our triangle
2 0 1 1 2 3
0 0 1 2 3
0 0 2 3
0 0 2
0 0
0
This can be computed easily in O(n^2) time.
Optimization
These optimizations do not improve the time complexity of the algorithm in its worse case, but can cause an early exit in the computation, and should therefore be included if time is to be reduced and the input is unknown.
Contageous 0's
Furthermore, as a matter of optimization, notice how contagious the 0's are. Anything times 0 is 0, so you can skip computing products of cells below a 0. In your case those sequences will not equal 2 MOD 4 once the product of one of its subsequences is determined to be equal to 0 MOD 4.
* * * 0 * * // <-- this zero infects all cells below it
* * 0 0 *
* 0 0 0
0 0 0
0 0
0
Need a 2 to make a 2.
Look back at the table of factors and products. Notice that the only way to get a product that is equal to 2 MOD 4 is to have one of the factors be equal to 2 MOD 4. What that means is that there can only be a 2 below another 2. So we are only interested in following computing entries in the table that are below a 2. Other entries in rows below can never become a 2.
You don't have to store more than the whole rows.
You only need O(n) storage to implement this. Working line by line, you can compute the values in a row entirely from the values in the first row and values in the row above.
Reading the answers from the table
Now you can look at the rows of the triangle list as you generate them and read off which subsequences are to be included.
Entries with a 2 are to be excluded. All others are to be included.
2 0 1 1 3 2
0 0 1 3 2
0 0 3 2
0 0 2
0 0
0
The excluded subsequences for the example (which I will list only because there are fewer of them in my example) are:
A
F
E-F
D-F
C-F
Which remember, according to our convention refer to the elements:
A
F
E F
D E F
C D E F
Which are:
242
795
410 795
685 410 795
681 685 410 795
Hopefully it's obvious how to display the "included" sequences, rather than the "excluded" sequences, as I have shown above.
Displaying all the elements makes it take much longer.
Sadly, actually displaying all of the elements of such subsequences is still an O(N^3) operation in the worst case. (Imagine a sequence of all zeros.)
Summary
For me, I feel like an average developer could take the magic bullet observation made in the diagram below and write an implementation that has optimal time complexity.
C = A * B
* * * * B *
* * * / *
* A / *
* C *
* *
*
I have a question about prime numbers algorithm.
why in the following pseudo code i increases by 6 and not by 2 every iteration?
function is_prime(n)
if n ≤ 1
return false
else if n ≤ 3
return true
else if n mod 2 = 0 or n mod 3 = 0
return false
let i ← 5
while i * i ≤ n
if n mod i = 0 or n mod (i + 2) = 0
return false
i ← i + 6
return true
Thanks!
If it increased by 2 it would be testing almost everything twice, that wouldn't make any sense. So I assume you mean: how can it get away with not testing every odd number?
This is because every prime p greater than 3 is of the form 6n±1. Proof:
Consider the remainder r = p mod 6. Obviously r must be odd. Notice also that r cannot be 3, because then p would be divisible by 3, making it not a prime. This leaves only the possibilities 1 and 5, which correspond p being of the form 6n+1 or the form 6n-1 respectively.
The effect is that it avoid testing multiples of 3. Dividing by a multiple of 3 is redundant, because we already know that n is not a multiple of 3, so it cannot be the multiple of a multiple of 3 either.
The assignment in the loop body is i <- i + 6, not i <- i + 2. In the if statement the expression i + 2 just becomes a new value. There is no assignment operator in that expression.
The algorithm is based on the fact that prime numbers can be predicted using the formula 6k ± 1 and this does not apply on 2 and 3.
For instance
(6 * 1) - 1 = 5
(6 * 2) - 1 = 11
(6 * 3) - 1 = 17
The list goes on and on.
Is there efficient way to downscale number of elements in array by decimal factor?
I want to downsize elements from one array by certain factor.
Example:
If I have 10 elements and need to scale down by factor 2.
1 2 3 4 5 6 7 8 9 10
scaled to
1.5 3.5 5.5 7.5 9.5
Grouping 2 by 2 and use arithmetic mean.
My problem is what if I need to downsize array with 10 elements to 6 elements? In theory I should group 1.6 elements and find their arithmetic mean, but how to do that?
Before suggesting a solution, let's define "downsize" in a more formal way. I would suggest this definition:
Downsizing starts with an array a[N] and produces an array b[M] such that the following is true:
M <= N - otherwise it would be upsizing, not downsizing
SUM(b) = (M/N) * SUM(a) - The sum is reduced proportionally to the number of elements
Elements of a participate in computation of b in the order of their occurrence in a
Let's consider your example of downsizing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 to six elements. The total for your array is 55, so the total for the new array would be (6/10)*55 = 33. We can achieve this total in two steps:
Walk the array a totaling its elements until we've reached the integer part of N/M fraction (it must be an improper fraction by rule 1 above)
Let's say that a[i] was the last element of a that we could take as a whole in the current iteration. Take the fraction of a[i+1] equal to the fractional part of N/M
Continue to the next number starting with the remaining fraction of a[i+1]
Once you are done, your array b would contain M numbers totaling to SUM(a). Walk the array once more, and scale the result by N/M.
Here is how it works with your example:
b[0] = a[0] + (2/3)*a[1] = 2.33333
b[1] = (1/3)*a[1] + a[2] + (1/3)*a[3] = 5
b[2] = (2/3)*a[3] + a[4] = 7.66666
b[3] = a[5] + (2/3)*a[6] = 10.6666
b[4] = (1/3)*a[6] + a[7] + (1/3)*a[8] = 13.3333
b[5] = (2/3)*a[8] + a[9] = 16
--------
Total = 55
Scaling down by 6/10 produces the final result:
1.4 3 4.6 6.4 8 9.6 (Total = 33)
Here is a simple implementation in C++:
double need = ((double)a.size()) / b.size();
double have = 0;
size_t pos = 0;
for (size_t i = 0 ; i != a.size() ; i++) {
if (need >= have+1) {
b[pos] += a[i];
have++;
} else {
double frac = (need-have); // frac is less than 1 because of the "if" condition
b[pos++] += frac * a[i]; // frac of a[i] goes to current element of b
have = 1 - frac;
b[pos] += have * a[i]; // (1-frac) of a[i] goes to the next position of b
}
}
for (size_t i = 0 ; i != b.size() ; i++) {
b[i] /= need;
}
Demo.
You will need to resort to some form of interpolation, as the number of elements to average isn't integer.
You can consider computing the prefix sum of the array, i.e.
0 1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
yields by summation
0 1 2 3 4 5 6 7 8 9
1 3 6 10 15 21 28 36 45 55
Then perform linear interpolation to get the intermediate values that you are lacking, like at 0*, 10/6, 20/6, 30/5*, 40/6, 50/6, 60/6*. (Those with an asterisk are readily available).
0 1 10/6 2 3 20/6 4 5 6 40/6 7 8 50/6 9
1 3 15/3 6 10 35/3 15 21 28 100/3 36 45 145/3 55
Now you get fractional sums by subtracting values in pairs. The first average is
(15/3-1)/(10/6) = 12/5
I can't think of anything in the C++ library that will crank out something like this, all fully cooked and ready to go.
So you'll have to, pretty much, roll up your sleeves and go to work. At this point, the question of what's the "efficient" way of doing it boils down to its very basics. Which means:
1) Calculate how big the output array should be. Based on the description of the issue, you should be able to make that calculation even before looking at the values in the input array. You know the input array's size(), you can calculate the size() of the destination array.
2) So, you resize() the destination array up front. Now, you no longer need to worry about the time wasted in growing the size of the dynamic output array, incrementally, as you go through the input array, making your calculations.
3) So what's left is the actual work: iterating over the input array, and calculating the downsized values.
auto b=input_array.begin();
auto e=input_array.end();
auto p=output_array.begin();
Don't see many other options here, besides brute force iteration and calculations. Iterate from b to e, getting your samples, calculating each downsized value, and saving the resulting value into *p++.
I want to split this data,
ID x y
1 2.5 3.5
1 85.1 74.1
2 2.6 3.4
2 86.0 69.8
3 25.8 32.9
3 84.4 68.2
4 2.8 3.2
4 24.1 31.8
4 83.2 67.4
I was able, making match with their partner like,
ID x y ID x y
1 2.5 3.5 1 85.1 74.1
2 2.6 3.4 2 86.0 69.8
3 25.8 32.9
4 24.1 31.8
However, as you notice some of the new row in ID 4 were placed wrong, because it just got added in the next few rows. I want to split them properly without having to use complex logic which I am already using... Someone can give me an algorithm or idea?
it should looks like,
ID x y ID x y ID x y
1 2.5 3.5 1 85.1 74.1 3 25.8 32.9
2 2.6 3.4 2 86.0 69.8 4 24.1 31.8
4 2.8 3.2 3 84.4 68.2
4 83.2 67.4
It seems that your question is really about clustering, and that the ID column has nothing to do with the determining which points correspond to which.
A common algorithm to achieve that would be k-means clustering. However, your question implies that you don't know the number of clusters in advance. This complicates matters, and there have been already a lot of questions asked here on StackOverflow regarding this issue:
Kmeans without knowing the number of clusters?
compute clustersize automatically for kmeans
How do I determine k when using k-means clustering?
How to optimal K in K - Means Algorithm
K-Means Algorithm
Unfortunately, there is no "right" solution for this. Two clusters in one specific problem could be indeed considered as one cluster in another problem. This is why you'll have to decide that for yourself.
Nevertheless, if you're looking for something simple (and probably inaccurate), you can use Euclidean distance as a measure. Compute the distances between points (e.g. using pdist), and group points where the distance falls below a certain threshold.
Example
%// Sample input
A = [1, 2.5, 3.5;
1, 85.1, 74.1;
2, 2.6, 3.4;
2, 86.0, 69.8;
3, 25.8, 32.9;
3, 84.4, 68.2;
4, 2.8, 3.2;
4, 24.1, 31.8;
4, 83.2, 67.4];
%// Cluster points
pairs = nchoosek(1:size(A, 1), 2); %// Rows of pairs
d = sqrt(sum((A(pairs(:, 1), :) - A(pairs(:, 2), :)) .^ 2, 2)); %// d = pdist(A)
thr = d < 10; %// Distances below threshold
kk = 1;
idx = 1:size(A, 1);
C = cell(size(idx)); %// Preallocate memory
while any(idx)
x = unique(pairs(pairs(:, 1) == find(idx, 1) & thr, :));
C{kk} = A(x, :);
idx(x) = 0; %// Remove indices from list
kk = kk + 1;
end
C = C(~cellfun(#isempty, C)); %// Remove empty cells
The result is a cell array C, each cell representing a cluster:
C{1} =
1.0000 2.5000 3.5000
2.0000 2.6000 3.4000
4.0000 2.8000 3.2000
C{2} =
1.0000 85.1000 74.1000
2.0000 86.0000 69.8000
3.0000 84.4000 68.2000
4.0000 83.2000 67.4000
C{3} =
3.0000 25.8000 32.9000
4.0000 24.1000 31.8000
Note that this simple approach has the flaw of restricting the cluster radius to the threshold. However, you wanted a simple solution, so bear in mind that it gets complicated as you add more "clustering logic" to the algorithm.
How to effectively generate permutations of a number (or chars in word), if i need some char/digit on specified place?
e.g. Generate all numbers with digit 3 at second place from the beginning and digit 1 at second place from the end of the number. Each digit in number has to be unique and you can choose only from digits 1-5.
4 3 2 1 5
4 3 5 1 2
2 3 4 1 5
2 3 5 1 4
5 3 2 1 4
5 3 4 1 2
I know there's a next_permutation function, so i can prepare an array with numbers {4, 2, 5} and post this in cycle to this function, but how to handle the fixed positions?
Generate all permutations of 2 4 5 and insert 3 and 1 in your output routine. Just remember the positions were they have to be:
int perm[3] = {2, 4, 5};
const int N = sizeof(perm) / sizeof(int);
std::map<int,int> fixed; // note: zero-indexed
fixed[1] = 3;
fixed[3] = 1;
do {
for (int i=0, j=0; i<5; i++)
if (fixed.find(i) != fixed.end())
std::cout << " " << fixed[i];
else
std::cout << " " << perm[j++];
std::cout << std::endl;
} while (std::next_permutation(perm, perm + N));
outputs
2 3 4 1 5
2 3 5 1 4
4 3 2 1 5
4 3 5 1 2
5 3 2 1 4
5 3 4 1 2
I've read the other answers and I believe they are better than mine for your specific problem. However I'm answering in case someone needs a generalized solution to your problem.
I recently needed to generate all permutations of the 3 separate continuous ranges [first1, last1) + [first2, last2) + [first3, last3). This corresponds to your case with all three ranges being of length 1 and separated by only 1 element. In my case the only restriction is that distance(first3, last3) >= distance(first1, last1) + distance(first2, last2) (which I'm sure could be relaxed with more computational expense).
My application was to generate each unique permutation but not its reverse. The code is here:
http://howardhinnant.github.io/combinations.html
And the specific applicable function is combine_discontinuous3 (which creates combinations), and its use in reversible_permutation::operator() which creates the permutations.
This isn't a ready-made packaged solution to your problem. But it is a tool set that could be used to solve generalizations of your problem. Again, for your exact simple problem, I recommend the simpler solutions others have already offered.
Remember at which places you want your fixed numbers. Remove them from the array.
Generate permutations as usual. After every permutation, insert your fixed numbers to the spots where they should appear, and output.
If you have a set of digits {4,3,2,1,5} and you know that 3 and 1 will not be permutated, then you can take them out of the set and just generate a powerset for {4, 2, 5}. All you have to do after that is just insert 1 and 3 in their respective positions for each set in the power set.
I posted a similar question and in there you can see the code for a powerset.