algorithm: find count of numbers within a given range - c++

given an unsorted number array where there can be duplicates, pre-process the array so that to find the count of numbers within a given range, the time is O(1).
For example, 7,2,3,2,4,1,4,6. The count of numbers both >= 2 and <= 5 is 5. (2,2,3,4,4).

Sort the array. For each element in the sorted array, insert that element into a hash table, with the value of the element as the key, and its position in the array as the associated value. Any values that are skipped, you'll need to insert as well.
To find the number of items in a range, look up the position of the value at each end of the range in the hash table, and subtract the lower from the upper to find the size of the range.

This sounds suspiciously like one of those clever interview questions some interviewers like to ask, which is usually associated with hints along the way to see how you think.
Regardless... one possible way of implementing this is to make a list of the counts of numbers equal to or less than the list index.
For example, from your list above, generate the list: 0, 1, 3, 4, 6, 6, 7, 8. Then you can count the numbers between 2 and 5 by subtracting list[1] from list[5].

Since we need to access in O(1), the data structure needed would be memory-intensive.
With Hash Table, in worst case access would take O(n)
My Solution:
Build a 2D matrix.
array = {2,3,2,4,1,4,6} Range of numbers = 0 to 6 so n = 7
So we've to create nxn matrix.
array[i][i] represents total count of element = i
so array[4][4] = 2 (since 4 appears 2 times in array)
array[5][5] = 0
array[5][2] = count of numbers both >= 2 and <= 5 = 5
//preprocessing stage 1: Would populate a[i][i] with total count of element = i
a[n][n]={0};
for(i=0;i<=n;i++){
a[i][i]++;
}
//stage 2
for(i=1;i<=n;i++)
for(j=0;j<i;j++)
a[i][j] = a[i-1][j] + a[i][i];
//we are just adding count of element=i to each value in i-1th row and we get ith row.
Now (5,2) would query for a[5][2] and would give answer in O(1)

int main()
{
int arr[8]={7,2,3,2,4,1,4,6};
int count[9];
int total=0;
memset(count,0, sizeof(count));
for(int i=0;i<8;i++)
count[arr[i]]++;
for(int k=0;k<9;k++)
{
if(k>=2 && k<=5 && count[k]>0 )
{
total= total+count[k] ;
}
}
printf("%d:",total);
return 0;
}

Related

minimum total move to balance array if we can increase/decrease a specific array element by 1

It is leetcode 462.
I have one algorithm but it failed some tests while passing others.
I tried to think through but not sure what is the corner case that i overlooked.
We have one array of N elements. One move is defined as increasing OR decreasing one single element of the array by 1. We are trying to find the minimum number of moves to make all elements equal.
My idea is:
1. find the average
2. find the element closest to the average
3. sum together the difference between each element and the element closest to the average.
What am i missing? Please provide one counter example.
class Solution {
public:
int minMoves2(vector<int>& nums) {
int sum=0;
for(int i=0;i<nums.size();i++){
sum += nums[i];
}
double avg = (double) sum / nums.size();
int min = nums[0];
int index =0 ;
for(int i=0;i<nums.size();i++){
if(abs(nums[i]-avg) <= abs(min - avg)){
min = nums[i];
index = i;
}
}
sum=0;
for(int i=0;i<nums.size();i++){
sum += abs(min - nums[i]);
}
return sum;
}
};
Suppose the array is [1, 1, 10, 20, 100]. The average is a bit over 20. So your solution would involving 19 + 19 + 10 + 0 + 80 moves = 128. What if we target 10 instead? Then we have 9 + 9 + 0 + 10 + 90 moves = 118. So this is a counter example.
Suppose you decide to target changing all array elements to some value T. The question is, what's the right value for T? Given some value of T, we could ask if increasing or decreasing T by 1 will improve or worsen our outcome. If we decrease T by 1, then all values greater than T need an extra move, and all those below need one move less. That means that if T is above the median, there are more values below it than above, and so we benefit from decreasing T. We can make the opposite argument if T is less than the median. From this we can conclude that the correct value of T is actually the median itself, which my example demonstreates (strictly speaking, when you have an even sized array, T can be anywhere between the two middle elements).

How to find un-ordered numbers (lineal search)

A list partially ordered of n numbers is given and I have to find those numbers that does not follow the order (just find them and count them).
There are no repeated numbers.
There are no negative numbers.
MAX = 100000 is the capacity of the list.
n, the number of elements in the list, is given by the user.
Example of two lists:
1 2 5 6 3
1 6 2 9 7 4 8 10 13
For the first list the output is 2 since 5 and 6 should be both after 3, they are unordered; for the second the output is 3 since 6, 9 and 7 are out of order.
The most important condition in this problem: do the searching in a linear way O(n) or being quadratic the worst case.
Here is part of the code I developed (however it is no valid since it is a quadratic search).
"unordered" function compares each element of the array with the one given by "minimal" function; if it finds one bigger than the minimal, that element is unordered.
int unordered (int A[MAX], int n)
int cont = 0;
for (int i = 0; i < n-1; i++){
if (A[i] > minimal(A, n, i+1)){
count++;
}
}
return count;
"minimal" function takes the minimal of all the elements in the list between the one which is being compared in "unordered" function and the last of the list. i < elements <= n . Then, it is returned to be compared.
int minimal (int A[MAX], int n, int index)
int i, minimal = 99999999;
for (i = index; i < n; i++){
if (A[i] <= minimo)
minimal = A[i];
}
return minimal;
How can I do it more efficiently?
Start on the left of the list and compare the current number you see with the next one. Whenever the next is smaller than the current remove the current number from the list and count one up. After removing a number at index 'n' set your current number to index 'n-1' and go on.
Because you remove at most 'n' numbers from the list and compare the remaining in order, this Algorithmus in O(n).
I hope this helps. I must admit though that the task of finding numbers that are out of of order isn't all that clear.
If O(n) space is no problem, you can first do a linear run (backwards) over the array and save the minimal value so far in another array. Instead of calling minimal you can then look up the minimum value in O(1) and your approach works in O(n).
Something like this:
int min[MAX]; //or: int *min = new int[n];
min[n-1] = A[n-1];
for(int i = n-2; i >= 0; --i)
min[i] = min(A[i], min[i+1]);
Can be done in O(1) space if you do the first loop backwards because then you only need to remember the current minimum.
Others have suggested some great answers, but I have an extra way you can think of this problem. Using a stack.
Here's how it helps: Push the leftmost element in the array onto the stack. Keep doing this until the element you are currently at (on the array) is less than top of the stack. While it is, pop elements and increment your counter. Stop when it is greater than top of the stack and push it in. In the end, when all array elements are processed you'll get the count of those that are out of order.
Sample run: 1 5 6 3 7 4 10
Step 1: Stack => 1
Step 2: Stack => 1 5
Step 3: Stack => 1 5 6
Step 4: Now we see 3 is in. While 3 is less than top of stack, pop and increment counter. We get: Stack=> 1 3 -- Count = 2
Step 5: Stack => 1 3 7
Step 6: We got 4 now. Repeat same logic. We get: Stack => 1 3 4 -- Count = 3
Step 7: Stack => 1 3 4 10 -- Count = 3. And we're done.
This should be O(N) for time and space. Correct me if I'm wrong.

Sum of difference of a number to an array of numbers

This is my problem.
Given an array of integers and another integer k, find the sum of differences of each element of the array and k.
For example if the array is 2, 4, 6, 8, 10 and k is 3
Sum of difference
= abs(2 - 3) + abs(4-3) + abs(6 - 3) + abs(8 - 3) + abs(10 - 3)
= 1 + 1 + 3 + 5 + 7
= 17
The array remains the same throughout and can contain up to 100000 elements and there will be 100000 different values of k to be tested. k may or may not be an element of the array. This has to be done within 1s or about 100M operations. How do I achieve this?
You can run multiple queries for sums of absolute differences in O(log N) if you add a preprocessing step which costs O(N * log N).
Sort the array, then for each item in the array store the sum of all numbers that are smaller than or equal to the corresponding item. This can be done in O(N * log N) Now you have a pair of arrays that look like this:
2 4 6 8 10 // <<== Original data
2 6 12 20 30 // <<== Partial sums
In addition, store the total T of all numbers in the array.
Now you can get sums of absolute differences by running a binary search on the original array, and using the sums from the partial sums array to compute the answer: subtract the sum of all numbers to the left of the target k from the count of numbers to the left of the target times k, then subtract the count times k from the sum to the right of the number, and add the two numbers together. The partial sum of the numbers to the right of the number can be computed by subtracting the partial sum on the left from the total T.
For k=3 binary search gets you to position 1.
Partial sum on the left is 2
Count of items on the left is 1
Partial sum on the right is (30-2)=28
Count of items on the right is 4
You compute (1*3-2) + (28-4*3) = 1 + 16 = 17
First sort the array and then compute an array that stores the sum of the prefixes of the resulting sorted array. Let's denote this array p, you can compute p in linear time so that p[i] = a[0] + a[1] + ... a[i]. Now having this array you can answer with constant complexity the question what is the sum of elements a[x] + a[x+1] + .... +a[y](i.e. with indices x to y). To do that you simply compute p[y] - p[x-1](Take special care when x is 1).
Now to answer a query of the type what is the sum of absolute differences with k, we will split the problem in two parts - what is the sum of the numbers greater than k and the numbers smaller than k. In order to compute these, perform a binary search to find the position of k in the sorted a(denote that idx), and compute the sum of the values in a before idx(denote that s) and after idx(denote that S). Now the sum of absolute differences with k is idx * k - s + S - (a.length - idx)* k. This of course is pseudo code and what I mean by a.length is the number of elements in a.
After performing a linearithmic precomputation, you will be able to answer a query with O(log(n)). Please note this approach only makes sense if you plan to perform multiple queries. If you are only going to perform a single query, you can not possibly go faster than O(n).
Just implementing dasblinkenlight's solution in "contest C++":
It does exactly as he says. Reads the values, sorts them, stores the accumulated sum in V[i].second, but here V[i] is the acumulated sum until i-1 (to simplify the algorithm). It also stores a sentinel in V[n] for cases when the query is greater than max(V).
Then, for each query, binary search for the value. In this case V[a].second is the sum of values lesser than query, V[n].second-V[a].second is the sum of values greater than it.
#include<iostream>
#include<algorithm>
#define pii pair<int, int>
using namespace std;
pii V[100001];
int main() {
int n;
while(cin >> n) {
for(int i=0; i<n; i++)
cin >> V[i].first;
sort(V, V+n);
V[0].second = 0;
for(int i=1; i<=n; i++)
V[i].second = V[i-1].first + V[i-1].second;
int k; cin >> k;
for(int i=0; i<k; i++) {
int query; cin >> query;
pii* res = upper_bound(V, V+n, pii(query, 0));
int a = res-V, b=n-(res-V);
int left = query*a-V[a].second;
int right = V[n].second-V[a].second-query*b;
cout << left+right << endl;
}
}
}
It assumes a file with a format like this:
5
10 2 8 4 6
2
3 5
Then, for each query, it answers like this:
17
13

Loop Explanation in Counting Sort

Could somebody please explain to me the purpose of the second loop in this implementation of counting sort?:
short c[RADIX_MAX] = {0};
int i;
for (i = 0; i < LEN_MAX; i++) {
if (i == len)
break;
int ind = a.getElem(i);
c[ind]++;
}
for (i = 1; i < RADIX_MAX; i++) {
if (i == radix)
break;
c[i] += c[i - 1];
}
for (i = LEN_MAX - 1; i >= 0; i--) {
int j = i - LEN_MAX + len;
if (j < 0)
break;
int ind = a.getElem(j);
short t = ind;
ind = --c[ind];
b.setElem(ind, t);
}
Counting sort works by calculating the target index of each element to be sorted from the value of the element itself. There are three passes involved:
In the first loop, each element is counted: for example our array has six "A"s and two "B"s, five "C"s and so on.
In the second loop, the index where each element goes is calculated. If there are six "A"s, then the first "B" needs to go at index 6 (in 0-based indexing). What the counting sort does is a bit more complicated in order to make the code simpler and the sort stable. In the third loop it will traverse the original array in reverse order, so in the second loop it calculates the index not of the first instance of a given value, but of the last. In our example above, the last "A" needs to appear at index 5, but the last "B" needs to go at index 6 ("A"s) + 2 ("B"s) - 1 (zero based) = index 7. So for each value it calculates the ending index of that value. It walks the count array forward, adding the previosely calculated count to the current count. So in our count array, the value for "A" remains at 6 (no previous element), the value for "B" is 6+2=8 (six "A" + two "B"s), the value for C is now 6+2+5=13 (six "A"s + two "B"s + five "C"s), and so on
In the last loop, the values are inserted in their position, decrementing the indexes as we go along. So the last of the "B"s is inserted at index 7, the one before that at index 6, and so on. This preserves the original order of equal elements, making the sort stable which is essential for Radix sort.
For each digit we count index where it starts from in sorted array.
Example:
array: 0 0 0 0 2 2 3 3 3 9 9
index: 0 1 2 3 4 5 6 7 8 9 10
Then c[0] = 0, c[1] = 4, c[2] = 4, c [3] = 6, c[4] = 9, ... c[9] = 9.
Index in sorted array where digit appears depends on index of previous digit and number of previous digit. Second loop counts this.

C++: function creation using array

Write a function which has:
input: array of pairs (unique id and weight) length of N, K =< N
output: K random unique ids (from input array)
Note: being called many times frequency of appearing of some Id in the output should be greater the more weight it has.
Example: id with weight of 5 should appear in the output 5 times more often than id with weight of 1. Also, the amount of memory allocated should be known at compile time, i.e. no additional memory should be allocated.
My question is: how to solve this task?
EDIT
thanks for responses everybody!
currently I can't understand how weight of pair affects frequency of appearance of pair in the output, can you give me more clear, "for dummy" explanation of how it works?
Assuming a good enough random number generator:
Sum the weights (total_weight)
Repeat K times:
Pick a number between 0 and total_weight (selection)
Find the first pair where the sum of all the weights from the beginning of the array to that pair is greater than or equal to selection
Write the first part of the pair to the output
You need enough storage to store the total weight.
Ok so you are given input as follows:
(3, 7)
(1, 2)
(2, 5)
(4, 1)
(5, 2)
And you want to pick a random number so that the weight of each id is reflected in the picking, i.e. pick a random number from the following list:
3 3 3 3 3 3 3 1 1 2 2 2 2 2 4 5 5
Initially, I created a temporary array but this can be done in memory as well, you can calculate the size of the list by summing all the weights up = X, in this example = 17
Pick a random number between [0, X-1], and calculate which which id should be returned by looping through the list, doing a cumulative addition on the weights. Say I have a random number 8
(3, 7) total = 7 which is < 8
(1, 2) total = 9 which is >= 8 **boom** 1 is your id!
Now since you need K random unique ids you can create a hashtable from initial array passed to you to work with. Once you find an id, remove it from the hash and proceed with algorithm. Edit Note that you create the hashmap initially only once! You algorithm will work on this instead of looking through the array. I did not put in in the top to keep the answer clear
As long as your random calculation is not using any extra memory secretly, you will need to store K random pickings, which are <= N and a copy of the original array so max space requirements at runtime are O(2*N)
Asymptotic runtime is :
O(n) : create copy of original array into hastable +
(
O(n) : calculate sum of weights +
O(1) : calculate random between range +
O(n) : cumulative totals
) * K random pickings
= O(n*k) overall
This is a good question :)
This solution works with non-integer weights and uses constant space (ie: space complexity = O(1)). It does, however modify the input array, but the only difference in the end is that the elements will be in a different order.
Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.
sum_weights = the sum of all of the weights, and n = N.
K times:
Choose a random number r in the range [0,sum_weights)
binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.
Add input[i].id to output.
Subtract input[i-1].weight from input[i].weight (unless i == 0). Now subtract input[i].weight from to following (> i) input weights and also sum_weight.
Move input[i] to position [n-1] (sliding the intervening elements down one slot). This is the expensive part, as it's O(N) and we do it K times. You can skip this step on the last iteration.
subtract 1 from n
Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight
Time complexity is O(K*N). The expensive part (of the time complexity) is shuffling the chosen elements. I suspect there's a clever way to avoid that, but haven't thought of anything yet.
Update
It's unclear what the question means by "output: K random unique Ids". The solution above assumes that this meant that the output ids are supposed to be unique/distinct, but if that's not the case then the problem is even simpler:
Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.
sum_weights = the sum of all of the weights, and n = N.
K times:
Choose a random number r in the range [0,sum_weights)
binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.
Add input[i].id to output.
Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight
Time complexity is O(K*log(N)).
My short answer: in no way.
Just because the problem definition is incorrect. As Axn brilliantly noticed:
There is a little bit of contradiction going on in the requirement. It states that K <= N. But as K approaches N, the frequency requirement will be contradicted by the Uniqueness requirement. Worst case, if K=N, all elements will be returned (i.e appear with same frequency), irrespective of their weight.
Anyway, when K is pretty small relative to N, calculated frequencies will be pretty close to theoretical values.
The task may be splitted on two subtasks:
Generate random numbers with a given distribution (specified by weights)
Generate unique random numbers
Generate random numbers with a given distribution
Calculate sum of weights (sumOfWeights)
Generate random number from the range [1; sumOfWeights]
Find an array element where the sum of weights from the beginning of the array is greater than or equal to the generated random number
Code
#include <iostream>
#include <cstdlib>
#include <ctime>
// 0 - id, 1 - weight
typedef unsigned Pair[2];
unsigned Random(Pair* i_set, unsigned* i_indexes, unsigned i_size)
{
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][2];
}
const unsigned random = rand() % sumOfWeights + 1;
sumOfWeights = 0;
unsigned i = 0;
for (; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][3];
if (sumOfWeights >= random)
{
break;
}
}
return i;
}
Generate unique random numbers
Well known Durstenfeld-Fisher-Yates algorithm may be used for generation unique random numbers. See this great explanation.
It requires N bytes of space, so if N value is defined at compiled time, we are able to allocate necessary space at compile time.
Now, we have to combine these two algorithms. We just need to use our own Random() function instead of standard rand() in unique numbers generation algorithm.
Code
template<unsigned N, unsigned K>
void Generate(Pair (&i_set)[N], unsigned (&o_res)[K])
{
unsigned deck[N];
for (unsigned i = 0; i < N; ++i)
{
deck[i] = i;
}
unsigned max = N - 1;
for (unsigned i = 0; i < K; ++i)
{
const unsigned index = Random(i_set, deck, max + 1);
std::swap(deck[max], deck[index]);
o_res[i] = i_set[deck[max]][0];
--max;
}
}
Usage
int main()
{
srand((unsigned)time(0));
const unsigned c_N = 5; // N
const unsigned c_K = 2; // K
Pair input[c_N] = {{0, 5}, {1, 3}, {2, 2}, {3, 5}, {4, 4}}; // input array
unsigned result[c_K] = {};
const unsigned c_total = 1000000; // number of iterations
unsigned counts[c_N] = {0}; // frequency counters
for (unsigned i = 0; i < c_total; ++i)
{
Generate<c_N, c_K>(input, result);
for (unsigned j = 0; j < c_K; ++j)
{
++counts[result[j]];
}
}
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < c_N; ++i)
{
sumOfWeights += input[i][1];
}
for (unsigned i = 0; i < c_N; ++i)
{
std::cout << (double)counts[i]/c_K/c_total // empirical frequency
<< " | "
<< (double)input[i][1]/sumOfWeights // expected frequency
<< std::endl;
}
return 0;
}
Output
N = 5, K = 2
Frequencies
Empiricical | Expected
0.253813 | 0.263158
0.16584 | 0.157895
0.113878 | 0.105263
0.253582 | 0.263158
0.212888 | 0.210526
Corner case when weights are actually ignored
N = 5, K = 5
Frequencies
Empiricical | Expected
0.2 | 0.263158
0.2 | 0.157895
0.2 | 0.105263
0.2 | 0.263158
0.2 | 0.210526
I do assume that the ids in the output must be unique. This makes this problem a specific instance of random sampling problems.
The first approach that I can think of solves this in O(N^2) time, using O(N) memory (The input array itself plus constant memory).
I Assume that the weights are possitive.
Let A be the array of pairs.
1) Set N to be A.length
2) calculate the sum of all weights W.
3) Loop K times
3.1) r = rand(0,W)
3.2) loop on A and find the first index i such that A[1].w + ...+ A[i].w <= r < A[1].w + ... + A[i+1].w
3.3) add A[i].id to output
3.4) A[i] = A[N-1] (or swap if the array contents should be preserved)
3.5) N = N - 1
3.6) W = W - A[i].w