How to calc percentage of coverage in an array of 1-100 using C++? - c++

This is for an assignment so I would appreciate no direct answers; rather, any logic help with my algorithms (or pointing out any logic flaws) would be incredibly helpful and appreciated!
I have a program that receives "n" number of elements from the user to put into a single-dimensional array.
The array uses random generated numbers.
IE: If the user inputs 88, a list of 88 random numbers (each between 1 to 100) is generated).
"n" has a max of 100.
I must write 2 functions.
Function #1:
Determine the percentage of numbers that appear in the array of "n" elements.
So any duplicates would decrease the percentage.
And any missing numbers would decrease the percentage.
Thus if n = 75, then you have a maximum possible %age of 0.75
(this max %age decreases if there are duplicates)
This function basically calls upon function #2.
FUNCTION HEADER(GIVEN) = "double coverage (int array[], int n)"
Function #2:
Using a linear search, search for the key (key being the current # in the list of 1 to 100, which should be from the loop in function #1), in the array.
Return the position if that key is found in the array
(IE: if this is the loops 40th run, it will be at the variable "39",
and will go through every instance of an element in the array
and if any element is equal to 39, all of those positions will be returned?
I believe that is what our prof is asking)
Return -1 if the key is not found.
Given notes = "Only function #1 calls function #2,
and does so to find out if a certain value (key) is found among the first n elements of the array."
FUNCTION HEADER(GIVEN) = "int search (int array[], int n, int key)"
What I really need help with is the logic for the algorithm.
I would appreciate any help with this as I would approach this problem completely differently than our professor wants us.
My first thoughts would be to loop through function #1 for all variable keys of 1 through 100.
And in that loop, go to the search function (function #2), in which a loop would go through every number in the array and add to a counter if a number was (1)a duplicate or (2) non-existent in the array. Then I would subtract that counter from 100. Thus if all numbers were included in the array except for the #40 and #41, and then #77 was a duplicate , the total percentage of coverage would be 100 - 3 = 97%.
Although as I type this I think that may in of itself be flawed? ^ Because with a max of 100 elements in the array, if the only number missing was 99, then you would subtract 1 for having that number missing, and then if there was a duplicate you would subtract another 1, and thus your percentage of coverage would be (100-2) = 98, when clearly it ought to be 99.
And this ^ is exactly why I would REALLY appreciate any logic help. :)
I know I am having some problems approaching this logically.
I think I can figure out the coding with a relative amount of ease; what I am struggling witht he most is the steps to take. So any pseudocode ideas would be amazing!
(I can post my entire program code so far if necessary for anyone, just ask, but it is rather long as of now as I have many other functions performing other tasks in the program)

I may be mistaken, but as I read it all you need to do is:
write a function that loops through the array of n elements to find a given number in it. It would return the index of first occurence, or a negative value in case the number cannot be found in the array.
write a loop to call the function for all numbers 1 to 100 and count the finds. Then divide the result by 100.

I'm not sure if I understand this whole thing right, but 1 function you can do it, if you don't care about speed, it's better to put array into a vector, loop through 1..100 and use boost find function http://www.boost.org/doc/libs/1_41_0/doc/html/boost/algorithm/find_nth.html. There you can compare current value with the second entry value in the vector, if it contains you decrease, not not decrease, if you want to find if the unique number is in array, use http://www.cplusplus.com/reference/algorithm/find/. I don't understand, how the percentage decreases, so it's on your own and I don't rly understand second function, but if its linear search use again find.
P.S. Vector description http://www.cplusplus.com/reference/vector/vector/begin/.

You want to know how many numbers in the range [1, 100] appear in your given array. You can search for each number in turn:
size_t count_unique(int array[], size_t n)
{
size_t result = 0;
for (int i = 1; i <= 100; ++i)
{
if (contains(array, n, i))
{
++result;
}
}
return result;
}
All you still need is an implementation of the containment check contains(array, n, i), and to transform the unique count into a percentage (by using division).

Related

efficiently mask-out exactly 30% of array with 1M entries

My question's header is similar to this link, however that one wasn't answered to my expectations.
I have an array of integers (1 000 000 entries), and need to mask exactly 30% of elements.
My approach is to loop over elements and roll a dice for each one. Doing it in a non-interrupted manner is good for cache coherency.
As soon as I notice that exactly 300 000 of elements were indeed masked, I need to stop. However, I might reach the end of an array and have only 200 000 elements masked, forcing me to loop a second time, maybe even a third, etc.
What's the most efficient way to ensure I won't have to loop a second time, and not being biased towards picking some elements?
Edit:
//I need to preserve the order of elements.
//For instance, I might have:
[12, 14, 1, 24, 5, 8]
//Masking away 30% might give me:
[0, 14, 1, 24, 0, 8]
The result of masking must be the original array, with some elements set to zero
Just do a fisher-yates shuffle but stop at only 300000 iterations. The last 300000 elements will be the randomly chosen ones.
std::size_t size = 1000000;
for(std::size_t i = 0; i < 300000; ++i)
{
std::size_t r = std::rand() % size;
std::swap(array[r], array[size-1]);
--size;
}
I'm using std::rand for brevity. Obviously you want to use something better.
The other way is this:
for(std::size_t i = 0; i < 300000;)
{
std::size_t r = rand() % 1000000;
if(array[r] != 0)
{
array[r] = 0;
++i;
}
}
Which has no bias and does not reorder elements, but is inferior to fisher yates, especially for high percentages.
When I see a massive list, my mind always goes first to divide-and-conquer.
I won't be writing out a fully-fleshed algorithm here, just a skeleton. You seem like you have enough of a clue to take decent idea and run with it. I think I only need to point you in the right direction. With that said...
We'd need an RNG that can return a suitably-distributed value for how many masked values could potentially be below a given cut point in the list. I'll use the halfway point of the list for said cut. Some statistician can probably set you up with the right RNG function. (Anyone?) I don't want to assume it's just uniformly random [0..mask_count), but it might be.
Given that, you might do something like this:
// the magic RNG your stats homework will provide
int random_split_sub_count_lo( int count, int sub_count, int split_point );
void mask_random_sublist( int *list, int list_count, int sub_count )
{
if (list_count > SOME_SMALL_THRESHOLD)
{
int list_count_lo = list_count / 2; // arbitrary
int list_count_hi = list_count - list_count_lo;
int sub_count_lo = random_split_sub_count_lo( list_count, mask_count, list_count_lo );
int sub_count_hi = list_count - sub_count_lo;
mask( list, list_count_lo, sub_count_lo );
mask( list + sub_count_lo, list_count_hi, sub_count_hi );
}
else
{
// insert here some simple/obvious/naive implementation that
// would be ludicrous to use on a massive list due to complexity,
// but which works great on very small lists. I'm assuming you
// can do this part yourself.
}
}
Assuming you can find someone more informed on statistical distributions than I to provide you with a lead on the randomizer you need to split the sublist count, this should give you O(n) performance, with 'n' being the number of masked entries. Also, since the recursion is set up to traverse the actual physical array in constantly-ascending-index order, cache usage should be as optimal as it's gonna get.
Caveat: There may be minor distribution issues due to the discrete nature of the list versus the 30% fraction as you recurse down and down to smaller list sizes. In practice, I suspect this may not matter much, but whatever person this solution is meant for may not be satisfied that the random distribution is truly uniform when viewed under the microscope. YMMV, I guess.
Here's one suggestion. One million bits is only 128K which is not an onerous amount.
So create a bit array with all items initialised to zero. Then randomly select 300,000 of them (accounting for duplicates, of course) and mark those bits as one.
Then you can run through the bit array and, any that are set to one (or zero, if your idea of masking means you want to process the other 700,000), do whatever action you wish to the corresponding entry in the original array.
If you want to ensure there's no possibility of duplicates when randomly selecting them, just trade off space for time by using a Fisher-Yates shuffle.
Construct an collection of all the indices and, for each of the 700,000 you want removed (or 300,000 if, as mentioned, masking means you want to process the other ones) you want selected:
pick one at random from the remaining set.
copy the final element over the one selected.
reduce the set size.
This will leave you with a random subset of indices that you can use to process the integers in the main array.
You want reservoir sampling. Sample code courtesy of Wikipedia:
(*
S has items to sample, R will contain the result
*)
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i = 1 to k
R[i] := S[i]
// replace elements with gradually decreasing probability
for i = k+1 to n
j := random(1, i) // important: inclusive range
if j <= k
R[j] := S[i]

Reaching from first index to last with minimum product without using Graphs?

Solving this problem on codechef:
After visiting a childhood friend, Chef wants to get back to his home.
Friend lives at the first street, and Chef himself lives at the N-th
(and the last) street. Their city is a bit special: you can move from
the X-th street to the Y-th street if and only if 1 <= Y - X <= K,
where K is the integer value that is given to you. Chef wants to get
to home in such a way that the product of all the visited streets'
special numbers is minimal (including the first and the N-th street).
Please, help him to find such a product. Input
The first line of input consists of two integer numbers - N and K -
the number of streets and the value of K respectively. The second line
consist of N numbers - A1, A2, ..., AN respectively, where Ai equals
to the special number of the i-th street. Output
Please output the value of the minimal possible product, modulo
1000000007. Constraints
1 ≤ N ≤ 10^5 1 ≤ Ai ≤ 10^5 1 ≤ K ≤ N Example
Input: 4 2 1 2 3 4.
Output: 8
It could be solved using graphs based on this tutorial
I tried to solve it without using graphs and just using recursion and DP.
My approach:
Take an array and calculate the min product to reach every index and store it in the respective index.
This could be calculated using top down approach and recursively sending index (eligible) until starting index is reached.
Out of all calculated values store the minimum one.
If it is already calculated return it else calculate.
CODE:
#include<iostream>
#include<cstdio>
#define LI long int
#define MAX 100009
#define MOD 1000000007
using namespace std;
LI dp[MAX]={0};
LI ar[MAX],k,orig;
void cal(LI n)
{
if(n==0)
return;
if(dp[n]!=0)
return;
LI minn=MAX;
for(LI i=n-1;i>=0;i--)
{
if(ar[n]-ar[i]<=k && ar[n]-ar[i]>=1)
{
cal(i);
minn=(min(dp[i]*ar[n],minn))%MOD;
}
}
dp[n]=minn%MOD;
return;
}
int main()
{
LI n,i;
scanf("%ld %ld",&n,&k);
orig=n;
for(i=0;i<n;i++)
scanf("%ld",&ar[i]);
dp[0]=ar[0];
cal(n-1);
if(dp[n-1]==MAX)
printf("0");
else printf("%ld",dp[n-1]);
return 0;
}
Its been 2 days and I have checked every corner cases and constraints but it still gives Wrong answer! Whats wrong with the solution?
Need Help.
Analysis
There are many problems. Here is what I found:
You restrict the product to a value inferior to 100009 without reason. The product can be way higher that that (this is indeed the reason why the problem only asked the value modulo 1000000007)
You restrict your moves from streets whose difference in special number is K whereas the problem statement says that you can move between any cities whose index difference is inferior to K
In you dynamic programming function you compute the product and store the modulo of the product. This can lead to a problem because the modulo of a big number can be lower than the modulo of a lower number. This may corrupt later computations.
The integral type you use, long int, is too short.
The complexity of your algorithm is too high.
From all these problems, the last one is the most serious. I fixed it by changing the whole aproach and using a better datastructure.
1st Problem
In your main() function:
if(dp[n-1]==MAX)
printf("0");
In your cal() function:
LI minn=MAX;
You should replace this line with:
LI minn = std::numeric_limits<LI>::max();
Do not forget to:
#include <limits>
2nd Problem
for(LI i=n-1;i>=0;i--)
{
if(ar[n]-ar[i]<=k && ar[n]-ar[i]>=1)
{
. . .
}
}
You should replace the for loop condition:
for(LI i=n-1;i>=n-k;i--)
And remove altogether the condition on the special numbers.
3rd Problem
You are looking for the path whose product of special numbers is the lowest. In your current setting, you compare path's product after having taken the modulo of the product. This is wrong, as the modulo of a higher number may become very low (for instance a path whose product is 1000000008 will have a modulo of 1 and you will choose this path, even if there is a path whose product is only 2).
This means you should compare the real products, without taking their modulo. As these products can become very high you should take their logarithm. This will allow you to compare the products with a simple double. Remember that:
log(a*b) = log(a) + log(b)
4th Problem
Use unsigned long long.
5th Problem
I fixed all these issues and submitted on codechef CHRL4. I got all but one test case accepted. The testcase not accepted was because of a timeout. This is due to the fact that your algorithm has got a complexity of O(k*n).
You can achieve O(n) complexity using a bottom-up dynamic programming approach, instead of top-down and using a data structure that will return the minimum log value of the k previous streets. You can lookup sliding window minimum algorithm to find how to do.
References
numeric_limits::max()
my own codechef CHRL4 solution: bottom-up dp + sliding window minimum

2 player team knowing maximum moves

Given a list of N players who are to play a 2 player game. Each of them are either well versed in making a particular move or they are not. Find out the maximum number of moves a 2-player team can know.
And also find out how many teams can know that maximum number of moves?
Example Let we have 4 players and 5 moves with ith player is versed in jth move if a[i][j] is 1 otherwise it is 0.
10101
11100
11010
00101
Here maximum number of moves a 2-player team can know is 5 and their are two teams that can know that maximum number of moves.
Explanation : (1, 3) and (3, 4) know all the 5 moves. So the maximal moves a 2-player team knows is 5, and only 2 teams can acheive this.
My approach : For each pair of players i check if any of the players is versed in ith move or not and for each player maintain the maximum pairs he can make with other players with his local maximum move combination.
vector<int> pairmemo;
for(int i=0;i<n;i++){
int mymax=INT_MIN;
int countpairs=0;
for(int j=i+1;j<n;j++){
int count=0;
for(int k=0;k<m;k++){
if(arr[i][k]==1 || arr[j][k]==1)
{
count++;
}
}
if(mymax<count){
mymax=count;
countpairs=0;
}
if(mymax==count){
countpairs++;
}
}
pairmemo.push_back(countpairs);
maxmemo.push_back(mymax);
}
Overall maximum of all N players is answer and count is corresponding sum of the pairs being calculated.
for(int i=0;i<n;i++){
if(maxi<maxmemo[i])
maxi=maxmemo[i];
}
int countmaxi=0;
for(int i=0;i<n;i++){
if(maxmemo[i]==maxi){
countmaxi+=pairmemo[i];
}
}
cout<<maxi<<"\n";
cout<<countmaxi<<"\n";
Time complexity : O((N^2)*M)
Code :
How can i improve it?
Constraints : N<= 3000 and M<=1000
If you represent each set of moves by a very large integer, the problem boils down to finding pair of players (I, J) which have maximum number of bits set in MovesI OR MovesJ.
So, you can use bit-packing and compress all the information on moves in Long integer array. It would take 16 unsigned long integers to store according to the constraints. So, for each pair of players you OR the corresponding arrays and count number of ones. This would take O(N^2 * 16) which would run pretty fast given the constraints.
Example:
Lets say given matrix is
11010
00011
and you used 4-bit integer for packing it.
It would look like:
1101-0000
0001-1000
that is,
13,0
1,8
After OR the moves array for 2 player team becomes 13,8, now count the bits which are one. You have to optimize the counting of bits also, for that read the accepted answer here, otherwise the factor M would appear in complexity. Just maintain one count variable and one maxNumberOfBitsSet variable as you process the pairs.
What Ill do is:
1. Do logical OR between all the possible pairs - O(N^2) and store it's SUM in a 2D array with the symmetric diagonal ignored. (thats we save half of the calc - see example)
2. find the max value in the 2D Array (can be done while doing task 1) -> O(1)
3. count how many cells in the 2D array equals to the maximum value in task 2 O(N^2)
sum: 2*O(N^2)+ O(1) => O(N^2)
Example (using the data in the question (with letters indexes):
A[10101] B[11100] C[11010] D[00101]
Task 1:
[A|B] = 11101 = SUM(4)
[A|C] = 11111 = SUM(5)
[A|D] = 10101 = SUM(3)
[B|C] = 11110 = SUM(4)
[B|D] = 11101 = SUM(4)
[C|D] = 11111 = SUM(5)
Task 2 (Done while is done 1):
Max = 5
Task 3:
Count = 2
By the way, O(N^2) is the minimum possible since you HAVE to check all the possible pairs.
Since you have to find all solutions, unless you find a way to find a count without actually finding the solutions themselves, you have to actually look at or eliminate all possible solutions. So the worst case will always be O(N^2*M), which I'll call O(n^3) as long as N and M are both big and similar size.
However, you can hope for much better performance on the average case by pruning.
Don't check every case. Find ways to eliminate combinations without checking them.
I would sum and store the total number of moves known to each player, and sort the array rows by that value. That should provide an easy check for exiting the loop early. Sorting at O(n log n) should be basically free in an O(n^3) algorithm.
Use Priyank's basic idea, except with bitsets, since you obviously can't use a fixed integer type with 3000 bits.
You may benefit from making a second array of bitsets for the columns, and use that as a mask for pruning players.

Find pair of elements in integer array such that abs(v[i]-v[j]) is minimized

Lets say we have int array with 5 elements: 1, 2, 3, 4, 5
What I need to do is to find minimum abs value of array's elements' subtraction:
We need to check like that
1-2 2-3 3-4 4-5
1-3 2-4 3-5
1-4 2-5
1-5
And find minimum abs value of these subtractions. We can find it with 2 fors. The question is, is there any algorithm for finding value with one and only for?
sort the list and subtract nearest two elements
The provably best performing solution is assymptotically linear O(n) up until constant factors.
This means that the time taken is proportional to the number of the elements in the array (which of course is the best we can do as we at least have to read every element of the array, which already takes O(n) time).
Here is one such O(n) solution (which also uses O(1) space if the list can be modified in-place):
int mindiff(const vector<int>& v)
{
IntRadixSort(v.begin(), v.end());
int best = MAX_INT;
for (int i = 0; i < v.size()-1; i++)
{
int diff = abs(v[i]-v[i+1]);
if (diff < best)
best = diff;
}
return best;
}
IntRadixSort is a linear time fixed-width integer sorting algorithm defined here:
http://en.wikipedia.org/wiki/Radix_sort
The concept is that you leverage the fixed-bitwidth nature of ints by paritioning them in a series of fixed passes on the bit positions. ie partition them on the hi bit (32nd), then on the next highest (31st), then on the next (30th), and so on - which only takes linear time.
The problem is equivalent to sorting. Any sorting algorithm could be used, and at the end, return the difference between the nearest elements. A final pass over the data could be used to find that difference, or it could be maintained during the sort. Before the data is sorted the min difference between adjacent elements will be an upper bound.
So to do it without two loops, use a sorting algorithm that does not have two loops. In a way it feels like semantics, but recursive sorting algorithms will do it with only one loop. If this issue is the n(n+1)/2 subtractions required by the simple two loop case, you can use an O(n log n) algorithm.
No, unless you know the list is sorted, you need two
Its simple Iterate in a for loop
keep 2 variable "minpos and maxpos " and " minneg" and "maxneg"
check for the sign of the value you encounter and store maximum positive in maxpos
and minimum +ve number in "minpos" do the same by checking in if case for number
less than zero. Now take the difference of maxpos-minpos in one variable and
maxneg and minneg in one variable and print the larger of the two . You will get
desired.
I believe you definitely know how to find max and min in one for loop
correction :- The above one is to find max difference in case of minimum you need to
take max and second max instead of max and min :)
This might be help you:
end=4;
subtractmin;
m=0;
for(i=1;i<end;i++){
if(abs(a[m]-a[i+m])<subtractmin)
subtractmin=abs(a[m]-a[i+m];}
if(m<4){
m=m+1
end=end-1;
i=m+2;
}}

USACO: Subsets (Inefficient)

I am trying to solve subsets from the USACO training gateway...
Problem Statement
For many sets of consecutive integers from 1 through N (1 <= N <= 39), one can partition the set into two sets whose sums are identical.
For example, if N=3, one can partition the set {1, 2, 3} in one way so that the sums of both subsets are identical:
{3} and {1,2}
This counts as a single partitioning (i.e., reversing the order counts as the same partitioning and thus does not increase the count of partitions).
If N=7, there are four ways to partition the set {1, 2, 3, ... 7} so that each partition has the same sum:
{1,6,7} and {2,3,4,5}
{2,5,7} and {1,3,4,6}
{3,4,7} and {1,2,5,6}
{1,2,4,7} and {3,5,6}
Given N, your program should print the number of ways a set containing the integers from 1 through N can be partitioned into two sets whose sums are identical. Print 0 if there are no such ways.
Your program must calculate the answer, not look it up from a table.
End
Before I was running on a O(N*2^N) by simply permuting through the set and finding the sums.
Finding out how horribly inefficient that was, I moved on to mapping the sum sequences...
http://en.wikipedia.org/wiki/Composition_(number_theory)
After many coding problems to scrape out repetitions, still too slow, so I am back to square one :(.
Now that I look more closely at the problem, it looks like I should try to find a way to not find the sums, but actually go directly to the number of sums via some kind of formula.
If anyone can give me pointers on how to solve this problem, I'm all ears. I program in java, C++ and python.
Actually, there is a better and simpler solution. You should use Dynamic Programming
instead. In your code, you would have an array of integers (whose size is the sum), where each value at index i represents the number of ways to possibly partition the numbers so that one of the partitions has a sum of i. Here is what your code could look like in C++:
int values[N];
int dp[sum+1]; //sum is the sum of the consecutive integers
int solve(){
if(sum%2==1)
return 0;
dp[0]=1;
for(int i=0; i<N; i++){
int val = values[i]; //values contains the consecutive integers
for(int j=sum-val; j>=0; j--){
dp[j+val]+=dp[j];
}
}
return dp[sum/2]/2;
}
This gives you an O(N^3) solution, which is by far fast enough for this problem.
I haven't tested this code, so there might be a syntax error or something, but you get the point. Let me know if you have any more questions.
This is the same thing as finding the coefficient x^0 term in the polynomial (x^1+1/x)(x^2+1/x^2)...(x^n+1/x^n), which should take about an upper bound of O(n^3).