Greedily assigning scores to maximize the final result - c++

You are given a time line of T days and a list of N scores. You have to assign each score to a day(among 1 to T) such that the total assigned score maximizes.
Although there are restrictions. Each score can be assigned to only a limited number of days X and also can be assigned to days occuring on or after a particular number Y.
Input is in the given Format :
T
N
Score X Y (150 4 1 means Score 150 can be assigned to atmost 4 days on or after day 1)
For eg :
T = 10
N = 5
150 4 1
120 4 3
200 2 7
100 10 5
50 5 1
Note = 2 Scores can have same value . Each day can be assigned at most 1 score.
The optimum result for above example would be : 150 150 150 150 120 120 200 200 120 120.
What i tried :
I sorted the list according to the scores and started assigning the highest scores first.
In the above example I would start with 200 and assign it to 7 and 8 days.
Similarly I would assign the next highest score 150 to 1,2,3 and 4 days.
and so on ...
But this would take O(N * T) time. N for iterting over list of scores and T for checking and assigning scores over the time line(in the worst case).
The goal is to maximise and calculate the final score.
Is there a more elegant way to do this? Like without even assigning the scores and thus doing away with the T part of O(N * T).

I coded up a pretty straight forward implementation of your algorithm:
#include <vector>
#include <algorithm>
#include <array>
constexpr int T = 10;
struct Item {
int score;
int count;
int min;
};
std::array<Item, 5> input={{
{150, 4, 1},
{120, 4, 3},
{200, 2, 7},
{100, 10, 5},
{50, 5, 1}
}};
std::array<bool, T> days{};
int main() {
// preprocess input
std::sort(input.begin(), input.end(), [](auto l, auto r) {return l.score > r.score; });
int totalScore = 0;
int lastFreeDay = T - 1;
[&] {
for (auto spec : input) {
// scan forward to find open spots for the scores
for (int pos = spec.min-1; spec.count && pos < lastFreeDay; ++pos) {
if (!days[pos]) {
days[pos] = true;
totalScore += spec.score;
spec.count--;
}
}
// we weren't able to assign all scores of this entry,
// so every day after spec.min has already a score assigned to it.
// lets scan backward and see where the last free one is
if (spec.count > 0) {
lastFreeDay = spec.min;
while (days[lastFreeDay]) {
if (--lastFreeDay == -1) {
return;
}
}
}
}
}();
return totalScore;
}
I'm not sure what the exact algorithmic complexity is, but you can see two things:
At the beginning, there are very few collisions, so the inner loop doesn't actually depend on T, so it behaves more like O(N*k) (where k is the average number of times you can assign a particular score).
Even if N grows very large, not all scores can actually be processed, because the algorithm can terminate early and compares the latest free day against the earliest da a score can be assigned to.
Of course, you can create an worst case input, where you have T*(T+1)/2 passes of the inner loop (for N == T and k = 1 and min_i = 1) but my gut feeling is that on average it is much better than O(N*T) or at least has a very small constant (actually, the sort could be the dominant factor).
Long story short: I'm pretty confident that your algorithm is in fact applicable in practice and could probably be further improved by more intelligent data structures as suggested by Prune.

If you maintain a table of available intervals, I believe that you can keep this to O(N + T). Don't range through the entire length T each time; just check your list of open intervals and begin at the first available interval that contains the input line's Y value. There will be no more than N/2 intervals in this "open" list, and either hashing or a binary search can keep the complexity under control.

Related

Using modulus to solve coin change question

I'm looking for a different way to solve coin change problem using modulus. Most solutions refer to use of dynamic memory to solve this.
Example:
You are given coins of different denominations and a total amount of
money amount. Write a function to compute the fewest number of coins
that you need to make up that amount. If that amount of money cannot be
made up by any combination of the coins, return -1.
Input: coins = [1, 2, 5], amount = 11
Output: 3
Explanation: 11 = 5 + 5 + 1
The goal is to create a solution using modulus instead.
Here is what I've tried so far. I'm wondering if my variable should be initialized to something other than 0 or I'm updating in the wrong part of the code block.
class Solution {
public:
int coinChange(vector<int>& coins, int amount) {
int pieces = 0;
int remainder = 0;
for(int i = coins.size()-1; i = 0; i--) {
if (amount % coins[i] == 0)
{
pieces += amount/coins[i];
} else {
pieces += amount/coins[i];
remainder = amount%coins[i];
amount = remainder;
}
}
return pieces;
}
}
I'm expecting the output as above. Stuck and not sure what else to try to get this to work.
I understand what you're trying to do, but your code isn't actually going to accomplish what you think it will. Here's a breakdown of your code:
int coinChange(vector<int>& coins, int amount) {
// Minimum number of coins to sum to 'amount'
int pieces = 0;
int remainder = 0;
// Assuming 'coins' is a non-decreasing vector of ints,
// iterate over all coins, starting from the larger ones,
// ending with the smaller ones. This makes sense, as it
// will use more coins of higher value, implying less
// coins being used
for(int i = coins.size()-1; i = 0; i--) {
// If what's left of the original amount is
// a multiple of the current coin, 'coins[i]',
if (amount % coins[i] == 0)
{
// Increase the number of pieces by the number
// of current coins that would satisfy it
pieces += amount/coins[i];
// ERROR: Why are you not updating the remaining amount?
} else {
// What's left of the original amount is NOT
// a multiple of the current coin, so account
// for as much as you can, and leave the remainder
pieces += amount/coins[i];
remainder = amount%coins[i];
amount = remainder;
}
}
// ERROR: What if amount != 0? Should return -1
return pieces;
}
If you fixed the ERRORs I mentioned above, the function would work ASSUMING that all ints in coins behave as the following:
If a coin, s, is smaller than another coin, l, then l must be a multiple of s.
Every coin has to be >= 1.
Proof of 1:
If a coin, s, is smaller than another coin, l, but l is not a multiple of s, using l as one of the coins in your solution might be a bad idea. Let's consider an example, where coins = [4, 7], and amount = 8. You will iterate over coins in non-increasing order, starting with 7. 7 fits into 8, so you will say that pieces = 1, and amount = 1 remains. Now, 4 doesn't fit into amount, so you don't add it. Now the for-loop is over, amount != 0, so you fail the function. However, a working solution would have been two coins of 4, so returning pieces = 2.
Proof of 2:
If a coin, c is < 1, it can be 0 or less. If c is 0, you will divide by 0 and throw an error. Even more confusingly, if you changed your code you could add an infinite amount of coins valued 0.
If c is negative, you will divide by a negative, resulting in a negative amount, breaking your logic.

Multiple Constrain Knapsack

I'm trying to solve the following problem:
INPUT:
An array of items, each item has 3 different weights (integers), a value and the amount available of this type of item.
A maximum for each type of weight
OUTPUT:
An array that tells how many of each item to take in order to achieve the maximum value. The sum of each of the weights of every item must not exceed the maximum allowed and you may not take more of an item of what is available.
Example output: {3,0,2,1} means 3 of item1, 0 of item2, 2 of item3, and 1 of item4.
Example scenario:
In case I wasn't very clear with the explanation, imagine it's about putting food on a backpack. Each type of food has a weight, a volume, a number of calories and a value and there's a certain amount of each type of food available. The objective would be to maximize the value of the food in the backpack without exceeding a certain maximum amount of weight, volume and calories.
On this scenario the INPUT could be:
Array<Food>:
Burger (Weight 2, Volume 2, Calories 5, Value 5$, number of Burgers 3)
Pizza (Weight 3, Volume 7, Calories 6, Value 8$, number of Pizzas 2)
Hot Dog (Weight 1, Volume 1, Calories 3, Value 2$, number of Hot Dogs 6)
int MaxWeight = 10; int MaxVolume = 15; int MaxCalories = 10;
My Attempt
Since the data set is quite small (say 7 types of items and there's no more than 15 pieces of each item available), I thought of a brute force search:
Keep track of the best set found so far (Most value and doesn't
exceed any limits), call best set B
Have a recursive function R(s) which takes a set (array of how many of each item) as input, if the input is invalid, it returns. If the input is valid it first updates B (in case s better than B) and then calls R(s + p_i) for every product p_i
The idea is to first call R(s) with s = empty set (0 for every product) and every possible branch will be created while the branches that exceed the weights are ignored.
This obviously didn't work cause the amount of branches that have to be checked is huge even for only as few as 7 items
Any help is much appreciated!
You have to consider each type of weight in your DP method. I'll write the implementation in C++:
vector<Food> Array;
int memo[MAX_ITEM][MAX_WEIGHT1][MAX_WEIGHT2][MAX_WEIGHT3];
int f(int ind, int weight1, int weight2, int weight3){
if(weight1<0 || weight2<0 || weight3<0) return -INF;
if(ind == Array.size()) return 0;
int &ret= memo[ind][weight1][weight2][weight3];
if(ret>0) return ret;
int res = 0;
for(int i=0;i<=Array[ind].maxOfType;i++)
res = max(res, i * Array[ind].value + f(ind+1, weight1-i*Array[ind].weight1, weight2-i*Array[ind].weight2, weight3-i*Array[ind].weight3));
return ret = res;
}
The DP function is recursive and we use memoization to optimize it. It returns the maximum value we can get. you can call it by:
f(0,MaxWeight1, MaxWeight2, MaxWeight3);
After that we have to track and see which items leads to maximum value. The Next method will print what you want:
void printResult(int ind, int weight1, int weight2, int weight3){
if(ind == Array.size()) return;
int maxi = memo[ind][weight1][weight2][weight3];
for(int i=0;i<=Array[ind].maxOfType;i++){
int cur = i * Array[ind].value + f(ind+1, weight1-i*Array[ind].weight1, weight2-i*Array[ind].weight2, weight3-i*Array[ind].weight3);
if(cur == maxi){
cout<<i<<", ";
printResult(ind+1, weight1-i*Array[ind].weight1, weight2-i*Array[ind].weight2, weight3-i*Array[ind].weight3);
break;
}
}
}
All codes are tested and works well.

How to calculate the minimum cost to convert all n numbers in an array to m?

I have been given the following assignment:
Given N integers in the form of A(i) where 1≤i≤N, make each number
A(i) in the N numbers equal to M. To convert a number A(i) to M, it
will cost |M−Ai| units. Find out the minimum cost to convert all the N
numbers to M, so you should choose the best M to get the minimum cost.
Given:
1 <= N <= 10^5
1 <= A(i) <= 10^9
My approach was to calculate the sum of all numbers and find avg = sum / n and then subtract each number by avg to get the minimum cost.
But this fails in many test cases. How can I find the optimal solution for this?
You should take the median of the numbers (or either of the two numbers nearest the middle if the list has even length), not the mean.
An example where the mean fails to minimize is: [1, 2, 3, 4, 100]. The mean is 110 / 5 = 22, and the total cost is 21 + 20 + 19 + 18 + 78 = 156. Choosing the median (3) gives total cost: 2 + 1 + 0 + 1 + 97 = 101.
An example where the median lies between two items in the list is [1, 2, 3, 4, 5, 100]. Here the median is 3.5, and it's ok to either use M=3 or M=4. For M=3, the total cost is 2 + 1 + 0 + 1 + 2 + 97 = 103. For M=4, the total cost is 3 + 2 + 1 + 0 + 1 + 96 = 103.
A formal proof of correctness can be found on Mathematics SE, although you may convince yourself of the result by noting that if you nudge M a small amount delta in one direction (but not past one of the data points) -- and for example's sake let's say it's in the positive direction, the total cost increases by delta times the number of points to the left of M minus delta times the number of points to the right of M. So M is minimized when the number of points to its left and the right are equal in number, otherwise you could move it a small amount one way or the other to decrease the total cost.
#PaulHankin already provided a perfect answer. Anyway, when thinking about the problem, I didn't think of the median being the solution. But even if you don't know about the median, you can come up with a programming solution.
I made similar observations as #PaulHankin in the last paragraph of his last answer. This made me realize, that I have to eliminate outliers iteratively in order to find m. So I wrote a program that first sorts the input array (vector) A and then analyzes the minimum and maximum values.
The idea is to move the minimum values towards the second smallest values and the maximum values towards the second largest values. You always move either the minimum or maximum values, depending on whether you have less minimum values than maximum values or not. If all array items end up being the same value, then you found your m:
#include <vector>
#include <algorithm>
#include <iostream>
using namespace std;
int getMinCount(vector<int>& A);
int getMaxCount(vector<int>& A);
int main()
{
// Example as given by #PaulHankin
vector<int> A;
A.push_back(1);
A.push_back(2);
A.push_back(3);
A.push_back(4);
A.push_back(100);
sort(A.begin(), A.end());
int minCount = getMinCount(A);
int maxCount = getMaxCount(A);
while (minCount != A.size() && maxCount != A.size())
{
if(minCount <= maxCount)
{
for(int i = 0; i < minCount; i++)
A[i] = A[minCount];
// Recalculate the count of the minium value, because we changed the minimum.
minCount = getMinCount(A);
}
else
{
for(int i = 0; i < maxCount; i++)
A[A.size() - 1 - i] = A[A.size() - 1 - maxCount];
// Recalculate the count of the maximum value, because we changed the maximum.
maxCount = getMaxCount(A);
}
}
// Print out the one and only remaining value, which is m.
cout << A[0] << endl;
return 0;
}
int getMinCount(vector<int>& A)
{
// Count how often the minimum value exists.
int minCount = 1;
int pos = 1;
while (pos < A.size() && A[pos++] == A[0])
minCount++;
return minCount;
}
int getMaxCount(vector<int>& A)
{
// Count how often the maximum value exists.
int maxCount = 1;
int pos = A.size() - 2;
while (pos >= 0 && A[pos--] == A[A.size() - 1])
maxCount++;
return maxCount;
}
If you think about the algorithm, then you will come to the conclusion, that it actually calculates the median of the values in the array A. As example input I took the first example given by #PaulHankin. As expected, the code provides the correct result (3) for it.
I hope my approach helps you to understand how to tackle such kind of problems even if you don't know the correct solution. This is especially helpful when you are in an interview, for example.

Majority element - parts of an array

I have an array, filled with integers. My job is to find majority element quickly for any part of an array, and I need to do it... log n time, not linear, but beforehand I can take some time to prepare the array.
For example:
1 5 2 7 7 7 8 4 6
And queries:
[4, 7] returns 7
[4, 8] returns 7
[1, 2] returns 0 (no majority element), and so on...
I need to have an answer for each query, if possible, it needs to execute fast.
For preparation, I can use O(n log n) time
O(log n) queries and O(n log n) preprocessing/space could be achieved by finding and using majority intervals with following properties:
For each value from input array there may be one or several majority intervals (or there may be none if elements with these values are too sparse; we don't need majority intervals of length 1 because they may be useful only for query intervals of size 1 which are better handled as a special case).
If query interval lies completely inside one of these majority intervals, corresponding value may be the majority element of this query interval.
If there is no majority interval completely containing query interval, corresponding value cannot be the majority element of this query interval.
Each element of input array is covered by O(log n) majority intervals.
In other words, the only purpose of majority intervals is to provide O(log n) majority element candidates for any query interval.
This algorithm uses following data structures:
List of positions for each value from input array (map<Value, vector<Position>>). Alternatively unordered_map may be used here to improve performance (but we'll need to extract all keys and sort them so that structure #3 is filled in proper order).
List of majority intervals for each value (vector<Interval>).
Data structure for handling queries (vector<small_map<Value, Data>>). Where Data contains two indexes of appropriate vector from structure #1 pointing to next/previous positions of elements with given value. Update: Thanks to #justhalf, it is better to store in Data cumulative frequencies of elements with given value. small_map may be implemented as sorted vector of pairs - preprocessing will append elements already in sorted order and query will use small_map only for linear search.
Preprocessing:
Scan input array and push current position to appropriate vector in structure #1.
Perform steps 3 .. 4 for every vector in structure #1.
Transform list of positions into list of majority intervals. See details below.
For each index of input array covered by one of majority intervals, insert data to appropriate element of structure #3: value and positions of previous/next elements with this value (or cumulative frequency of this value).
Query:
If query interval length is 1, return corresponding element of source array.
For starting point of query interval get corresponding element of 3rd structure's vector. For each element of the map perform step 3. Scan all elements of the map corresponding to ending point of query interval in parallel with this map to allow O(1) complexity for step 3 (instead of O(log log n)).
If the map corresponding to ending point of query interval contains matching value, compute s3[stop][value].prev - s3[start][value].next + 1. If it is greater than half of the query interval, return value. If cumulative frequencies are used instead of next/previous indexes, compute s3[stop+1][value].freq - s3[start][value].freq instead.
If nothing found on step 3, return "Nothing".
Main part of the algorithm is getting majority intervals from list of positions:
Assign weight to each position in the list: number_of_matching_values_to_the_left - number_of_nonmatching_values_to_the_left.
Filter only weights in strictly decreasing order (greedily) to the "prefix" array: for (auto x: positions) if (x < prefix.back()) prefix.push_back(x);.
Filter only weights in strictly increasing order (greedily, backwards) to the "suffix" array: reverse(positions); for (auto x: positions) if (x > suffix.back()) suffix.push_back(x);.
Scan "prefix" and "suffix" arrays together and find intervals from every "prefix" element to corresponding place in "suffix" array and from every "suffix" element to corresponding place in "prefix" array. (If all "suffix" elements' weights are less than given "prefix" element or their position is not to the right of it, no interval generated; if there is no "suffix" element with exactly the weight of given "prefix" element, get nearest "suffix" element with larger weight and extend interval with this weight difference to the right).
Merge overlapping intervals.
Properties 1 .. 3 for majority intervals are guaranteed by this algorithm. As for property #4, the only way I could imagine to cover some element with maximum number of majority intervals is like this: 11111111222233455666677777777. Here element 4 is covered by 2 * log n intervals, so this property seems to be satisfied. See more formal proof of this property at the end of this post.
Example:
For input array "0 1 2 0 0 1 1 0" the following lists of positions would be generated:
value positions
0 0 3 4 7
1 1 5 6
2 2
Positions for value 0 will get the following properties:
weights: 0:1 3:0 4:1 7:0
prefix: 0:1 3:0 (strictly decreasing)
suffix: 4:1 7:0 (strictly increasing when scanning backwards)
intervals: 0->4 3->7 4->0 7->3
merged intervals: 0-7
Positions for value 1 will get the following properties:
weights: 1:0 5:-2 6:-1
prefix: 1:0 5:-2
suffix: 1:0 6:-1
intervals: 1->none 5->6+1 6->5-1 1->none
merged intervals: 4-7
Query data structure:
positions value next prev
0 0 0 x
1..2 0 1 0
3 0 1 1
4 0 2 2
4 1 1 x
5 0 3 2
...
Query [0,4]:
prev[4][0]-next[0][0]+1=2-0+1=3
query size=5
3>2.5, returned result 0
Query [2,5]:
prev[5][0]-next[2][0]+1=2-1+1=2
query size=4
2=2, returned result "none"
Note that there is no attempt to inspect element "1" because its majority interval does not include either of these intervals.
Proof of property #4:
Majority intervals are constructed in such a way that strictly more than 1/3 of all their elements have corresponding value. This ratio is nearest to 1/3 for sub-arrays like any*(m-1) value*m any*m, for example, 01234444456789.
To make this proof more obvious, we could represent each interval as a point in 2D: every possible starting point represented by horizontal axis and every possible ending point represented by vertical axis (see diagram below).
All valid intervals are located on or above diagonal. White rectangle represents all intervals covering some array element (represented as unit-size interval on its lower right corner).
Let's cover this white rectangle with squares of size 1, 2, 4, 8, 16, ... sharing the same lower right corner. This divides white area into O(log n) areas similar to yellow one (and single square of size 1 containing single interval of size 1 which is ignored by this algorithm).
Let's count how many majority intervals may be placed into yellow area. One interval (located at the nearest to diagonal corner) occupies 1/4 of elements belonging to interval at the farthest from diagonal corner (and this largest interval contains all elements belonging to any interval in yellow area). This means that smallest interval contains strictly more than 1/12 values available for whole yellow area. So if we try to place 12 intervals to yellow area, we have not enough elements for different values. So yellow area cannot contain more than 11 majority intervals. And white rectangle cannot contain more than 11 * log n majority intervals. Proof completed.
11 * log n is overestimation. As I said earlier, it's hard to imagine more than 2 * log n majority intervals covering some element. And even this value is much greater than average number of covering majority intervals.
C++11 implementation. See it either at ideone or here:
#include <iostream>
#include <vector>
#include <map>
#include <algorithm>
#include <functional>
#include <random>
constexpr int SrcSize = 1000000;
constexpr int NQueries = 100000;
using src_vec_t = std::vector<int>;
using index_vec_t = std::vector<int>;
using weight_vec_t = std::vector<int>;
using pair_vec_t = std::vector<std::pair<int, int>>;
using index_map_t = std::map<int, index_vec_t>;
using interval_t = std::pair<int, int>;
using interval_vec_t = std::vector<interval_t>;
using small_map_t = std::vector<std::pair<int, int>>;
using query_vec_t = std::vector<small_map_t>;
constexpr int None = -1;
constexpr int Junk = -2;
src_vec_t generate_e()
{ // good query length = 3
src_vec_t src;
std::random_device rd;
std::default_random_engine eng{rd()};
auto exp = std::bind(std::exponential_distribution<>{0.4}, eng);
for (int i = 0; i < SrcSize; ++i)
{
int x = exp();
src.push_back(x);
//std::cout << x << ' ';
}
return src;
}
src_vec_t generate_ep()
{ // good query length = 500
src_vec_t src;
std::random_device rd;
std::default_random_engine eng{rd()};
auto exp = std::bind(std::exponential_distribution<>{0.4}, eng);
auto poisson = std::bind(std::poisson_distribution<int>{100}, eng);
while (int(src.size()) < SrcSize)
{
int x = exp();
int n = poisson();
for (int i = 0; i < n; ++i)
{
src.push_back(x);
//std::cout << x << ' ';
}
}
return src;
}
src_vec_t generate()
{
//return generate_e();
return generate_ep();
}
int trivial(const src_vec_t& src, interval_t qi)
{
int count = 0;
int majorityElement = 0; // will be assigned before use for valid args
for (int i = qi.first; i <= qi.second; ++i)
{
if (count == 0)
majorityElement = src[i];
if (src[i] == majorityElement)
++count;
else
--count;
}
count = 0;
for (int i = qi.first; i <= qi.second; ++i)
{
if (src[i] == majorityElement)
count++;
}
if (2 * count > qi.second + 1 - qi.first)
return majorityElement;
else
return None;
}
index_map_t sort_ind(const src_vec_t& src)
{
int ind = 0;
index_map_t im;
for (auto x: src)
im[x].push_back(ind++);
return im;
}
weight_vec_t get_weights(const index_vec_t& indexes)
{
weight_vec_t weights;
for (int i = 0; i != int(indexes.size()); ++i)
weights.push_back(2 * i - indexes[i]);
return weights;
}
pair_vec_t get_prefix(const index_vec_t& indexes, const weight_vec_t& weights)
{
pair_vec_t prefix;
for (int i = 0; i != int(indexes.size()); ++i)
if (prefix.empty() || weights[i] < prefix.back().second)
prefix.emplace_back(indexes[i], weights[i]);
return prefix;
}
pair_vec_t get_suffix(const index_vec_t& indexes, const weight_vec_t& weights)
{
pair_vec_t suffix;
for (int i = indexes.size() - 1; i >= 0; --i)
if (suffix.empty() || weights[i] > suffix.back().second)
suffix.emplace_back(indexes[i], weights[i]);
std::reverse(suffix.begin(), suffix.end());
return suffix;
}
interval_vec_t get_intervals(const pair_vec_t& prefix, const pair_vec_t& suffix)
{
interval_vec_t intervals;
int prev_suffix_index = 0; // will be assigned before use for correct args
int prev_suffix_weight = 0; // same assumptions
for (int ind_pref = 0, ind_suff = 0; ind_pref != int(prefix.size());)
{
auto i_pref = prefix[ind_pref].first;
auto w_pref = prefix[ind_pref].second;
if (ind_suff != int(suffix.size()))
{
auto i_suff = suffix[ind_suff].first;
auto w_suff = suffix[ind_suff].second;
if (w_pref <= w_suff)
{
auto beg = std::max(0, i_pref + w_pref - w_suff);
if (i_pref < i_suff)
intervals.emplace_back(beg, i_suff + 1);
if (w_pref == w_suff)
++ind_pref;
++ind_suff;
prev_suffix_index = i_suff;
prev_suffix_weight = w_suff;
continue;
}
}
// ind_suff out of bounds or w_pref > w_suff:
auto end = prev_suffix_index + prev_suffix_weight - w_pref + 1;
// end may be out-of-bounds; that's OK if overflow is not possible
intervals.emplace_back(i_pref, end);
++ind_pref;
}
return intervals;
}
interval_vec_t merge(const interval_vec_t& from)
{
using endpoints_t = std::vector<std::pair<int, bool>>;
endpoints_t ep(2 * from.size());
std::transform(from.begin(), from.end(), ep.begin(),
[](interval_t x){ return std::make_pair(x.first, true); });
std::transform(from.begin(), from.end(), ep.begin() + from.size(),
[](interval_t x){ return std::make_pair(x.second, false); });
std::sort(ep.begin(), ep.end());
interval_vec_t to;
int start; // will be assigned before use for correct args
int overlaps = 0;
for (auto& x: ep)
{
if (x.second) // begin
{
if (overlaps++ == 0)
start = x.first;
}
else // end
{
if (--overlaps == 0)
to.emplace_back(start, x.first);
}
}
return to;
}
interval_vec_t get_intervals(const index_vec_t& indexes)
{
auto weights = get_weights(indexes);
auto prefix = get_prefix(indexes, weights);
auto suffix = get_suffix(indexes, weights);
auto intervals = get_intervals(prefix, suffix);
return merge(intervals);
}
void update_qv(
query_vec_t& qv,
int value,
const interval_vec_t& intervals,
const index_vec_t& iv)
{
int iv_ind = 0;
int qv_ind = 0;
int accum = 0;
for (auto& interval: intervals)
{
int i_begin = interval.first;
int i_end = std::min<int>(interval.second, qv.size() - 1);
while (iv[iv_ind] < i_begin)
{
++accum;
++iv_ind;
}
qv_ind = std::max(qv_ind, i_begin);
while (qv_ind <= i_end)
{
qv[qv_ind].emplace_back(value, accum);
if (iv[iv_ind] == qv_ind)
{
++accum;
++iv_ind;
}
++qv_ind;
}
}
}
void print_preprocess_stat(const index_map_t& im, const query_vec_t& qv)
{
double sum_coverage = 0.;
int max_coverage = 0;
for (auto& x: qv)
{
sum_coverage += x.size();
max_coverage = std::max<int>(max_coverage, x.size());
}
std::cout << " size = " << qv.size() - 1 << '\n';
std::cout << " values = " << im.size() << '\n';
std::cout << " max coverage = " << max_coverage << '\n';
std::cout << " avg coverage = " << sum_coverage / qv.size() << '\n';
}
query_vec_t preprocess(const src_vec_t& src)
{
query_vec_t qv(src.size() + 1);
auto im = sort_ind(src);
for (auto& val: im)
{
auto intervals = get_intervals(val.second);
update_qv(qv, val.first, intervals, val.second);
}
print_preprocess_stat(im, qv);
return qv;
}
int do_query(const src_vec_t& src, const query_vec_t& qv, interval_t qi)
{
if (qi.first == qi.second)
return src[qi.first];
auto b = qv[qi.first].begin();
auto e = qv[qi.second + 1].begin();
while (b != qv[qi.first].end() && e != qv[qi.second + 1].end())
{
if (b->first < e->first)
{
++b;
}
else if (e->first < b->first)
{
++e;
}
else // if (e->first == b->first)
{
// hope this doesn't overflow
if (2 * (e->second - b->second) > qi.second + 1 - qi.first)
return b->first;
++b;
++e;
}
}
return None;
}
int main()
{
std::random_device rd;
std::default_random_engine eng{rd()};
auto poisson = std::bind(std::poisson_distribution<int>{500}, eng);
int majority = 0;
int nonzero = 0;
int failed = 0;
auto src = generate();
auto qv = preprocess(src);
for (int i = 0; i < NQueries; ++i)
{
int size = poisson();
auto ud = std::uniform_int_distribution<int>(0, src.size() - size - 1);
int start = ud(eng);
int stop = start + size;
auto res1 = do_query(src, qv, {start, stop});
auto res2 = trivial(src, {start, stop});
//std::cout << size << ": " << res1 << ' ' << res2 << '\n';
if (res2 != res1)
++failed;
if (res2 != None)
{
++majority;
if (res2 != 0)
++nonzero;
}
}
std::cout << "majority elements = " << 100. * majority / NQueries << "%\n";
std::cout << " nonzero elements = " << 100. * nonzero / NQueries << "%\n";
std::cout << " queries = " << NQueries << '\n';
std::cout << " failed = " << failed << '\n';
return 0;
}
Related work:
As pointed in other answer to this question, there is other work where this problem is already solved: "Range majority in constant time and linear space" by S. Durocher, M. He, I Munro, P.K. Nicholson, M. Skala.
Algorithm presented in this paper has better asymptotic complexities for query time: O(1) instead of O(log n) and for space: O(n) instead of O(n log n).
Better space complexity allows this algorithm to process larger data sets (comparing to the algorithm proposed in this answer). Less memory needed for preprocessed data and more regular data access pattern, most likely, allow this algorithm to preprocess data more quickly. But it is not so easy with query time...
Let's suppose we have input data most favorable to algorithm from the paper: n=1000000000 (it's hard to imagine a system with more than 10..30 gigabytes of memory, in year 2013).
Algorithm proposed in this answer needs to process up to 120 (or 2 query boundaries * 2 * log n) elements for each query. But it performs very simple operations, similar to linear search. And it sequentially accesses two contiguous memory areas, so it is cache-friendly.
Algorithm from the paper needs to perform up to 20 operations (or 2 query boundaries * 5 candidates * 2 wavelet tree levels) for each query. This is 6 times less. But each operation is more complex. Each query for succinct representation of bit counters itself contains a linear search (which means 20 linear searches instead of one). Worst of all, each such operation should access several independent memory areas (unless query size and therefore quadruple size is very small), so query is cache-unfriendly. Which means each query (while is a constant-time operation) is pretty slow, probably slower than in algorithm proposed here. If we decrease input array size, increased are the chances that proposed here algorithm is quicker.
Practical disadvantage of algorithm in the paper is wavelet tree and succinct bit counter implementation. Implementing them from scratch may be pretty time consuming. Using a pre-existing implementation is not always convenient.
the trick
When looking for a majority element, you may discard intervals that do not have a majority element. See Find the majority element in array. This allows you to solve this quite simply.
preparation
At preparation time, recursively keep dividing the array into two halves and store these array intervals in a binary tree. For each node, count the occurrence of each element in the array interval. You need a data structure that offers O(1) inserts and reads. I suggest using an unsorted_multiset, which on average behaves as needed (but worst case inserts are linear). Also check if the interval has a majority element and store it if it does.
runtime
At runtime, when asked to compute the majority element for a range, dive into the tree to compute the set of intervals that covers the given range exactly. Use the trick to combine these intervals.
If we have array interval 7 5 5 7 7 7, with majority element 7, we can split off and discard 5 5 7 7 since it has no majority element. Effectively the fives have gobbled up two of the sevens. What's left is an array 7 7, or 2x7. Call this number 2 the majority count of the majority element 7:
The majority count of a majority element of an array interval is the
occurrence count of the majority element minus the combined occurrence
of all other elements.
Use the following rules to combine intervals to find the potential majority element:
Discard the intervals that have no majority element
Combining two arrays with the same majority element is easy, just add up the element's majority counts. 2x7 and 3x7 become 5x7
When combining two arrays with different majority elements, the higher majority count wins. Subtract the lower majority count from the higher to find the resulting majority count. 3x7 and 2x3 become 1x7.
If their majority elements are different but have have equal majority counts, disregard both arrays. 3x7 and 3x5 cancel each other out.
When all intervals have been either discarded or combined, you are either left with nothing, in which case there is no majority element. Or you have one combined interval containing a potential majority element. Lookup and add this element's occurrence counts in all array intervals (also the previously discarded ones) to check if it really is the majority element.
example
For the array 1,1,1,2,2,3,3,2,2,2,3,2,2, you get the tree (majority count x majority element listed in brackets)
1,1,1,2,2,3,3,2,2,2,3,2,2
(1x2)
/ \
1,1,1,2,2,3,3 2,2,2,3,2,2
(4x2)
/ \ / \
1,1,1,2 2,3,3 2,2,2 3,2,2
(2x1) (1x3) (3x2) (1x2)
/ \ / \ / \ / \
1,1 1,2 2,3 3 2,2 2 3,2 2
(1x1) (1x3) (2x2) (1x2) (1x2)
/ \ / \ / \ / \ / \
1 1 1 2 2 3 2 2 3 2
(1x1) (1x1)(1x1)(1x2)(1x2)(1x3) (1x2)(1x2) (1x3) (1x2)
Range [5,10] (1-indexed) is covered by the set of intervals 2,3,3 (1x3), 2,2,2 (3x2). They have different majority elements. Subtract their majority counts, you're left with 2x2. So 2 is the potential majority element. Lookup and sum the actual occurrence counts of 2 in the arrays: 1+3 = 4 out of 6. 2 is the majority element.
Range [1,10] is covered by the set of intervals 1,1,1,2,2,3,3 (no majority element) and 2,2,2 (3x2). Disregard the first interval since it has no majority element, so 2 is the potential majority element. Sum the occurrence counts of 2 in all intervals: 2+3 = 5 out of 10. There is no majority element.
Actually, it can be done in constant time and linear space(!)
See https://cs.stackexchange.com/questions/16671/range-majority-queries-most-freqent-element-in-range and S. Durocher, M. He, I Munro, P.K. Nicholson, M. Skala, Range majority in constant time and linear space, Information and Computation 222 (2013) 169–179, Elsevier.
Their preparation time is O(n log n), the space needed is O(n) and queries are O(1). It is a theoretical paper and I don't claim to understand all of it but it seems far from impossible to implement. They're using wavelet trees.
For an implementation of wavelet trees, see https://github.com/fclaude/libcds
If you have unlimited memory you can and limited data range (like short int) do it even in O(N) time.
Go through array and count number of 1s, 2s, 3s, eta (number of entries for each value you have in array). You will need additional array X with sizeof(YouType) elements for this.
Go through array X and find maximum.
In total O(1) + O(N) operations.
Also you can limit yourself with O(N) memory, if you use map instead of array X.
But then you will need to find element on each iteration at stage 1. Therefore you will need O(N*log(N)) time in total.
You can use MAX Heap, with frequency of number as a deciding factor for Keeping Max Heap property,
I meant, e.g. for following input array
1 5 2 7 7 7 8 4 6 5
Heap would have all distinct elements with their frequency associated with them
Element = 1 Frequency = 1,
Element = 5 Frequency = 2,
Element = 2 Frequency = 1,
Element = 7 Frequency = 3,
Element = 8 Frequency = 1,
Element = 4 Frequency = 1,
Element = 6 Frequency = 1
As its MAX heap, Element 7 with frequency 3 would be at the root level,
Just check whether input range contains this element, if yes then this is the answer if no, then go to left subtree or right subtree as per input range and perform same checks.
O(N) would be required only once while creating a heap, but once its created, searching will be efficient.
Edit: Sorry, I was solving a different problem.
Sort the array and build an ordered list of pairs (value, number_of_occurrences) - it's O(N log N). Starting with
1 5 2 7 7 7 8 4 6
it will be
(1,1) (2,1) (4,1) (5,1) (6,1) (7,3) (8,1)
On top of this array, build a binary tree with pairs (best_value_or_none, max_occurrences). It will look like:
(1,1) (2,1) (4,1) (5,1) (6,1) (7,3) (8,1)
\ / \ / \ / |
(0,1) (0,1) (7,3) (8,1)
\ / \ /
(0,1) (7,3)
\ /
(7,3)
This structure definitely has a fancy name, but I don't remember it :)
From here, it's O(log N) to fetch the mode of any interval. Any interval can be split into O(log N) precomputed intervals; for example:
[4, 7] = [4, 5] + [6, 7]
f([4,5]) = (0,1)
f([6,7]) = (7,3)
and the result is (7,3).

Algorithm to determine coin combinations

I was recently faced with a prompt for a programming algorithm that I had no idea what to do for. I've never really written an algorithm before, so I'm kind of a newb at this.
The problem said to write a program to determine all of the possible coin combinations for a cashier to give back as change based on coin values and number of coins. For example, there could be a currency with 4 coins: a 2 cent, 6 cent, 10 cent and 15 cent coins. How many combinations of this that equal 50 cents are there?
The language I'm using is C++, although that doesn't really matter too much.
edit: This is a more specific programming question, but how would I analyze a string in C++ to get the coin values? They were given in a text document like
4 2 6 10 15 50
(where the numbers in this case correspond to the example I gave)
This problem is well known as coin change problem. Please check this and this for details. Also if you Google "coin change" or "dynamic programming coin change" then you will get many other useful resources.
Here's a recursive solution in Java:
// Usage: int[] denoms = new int[] { 1, 2, 5, 10, 20, 50, 100, 200 };
// System.out.println(ways(denoms, denoms.length, 200));
public static int ways(int denoms[], int index, int capacity) {
if (capacity == 0) return 1;
if (capacity < 0 || index <= 0 ) return 0;
int withoutItem = ways(denoms, index - 1, capacity);
int withItem = ways(denoms, index, capacity - denoms[index - 1]);
return withoutItem + withItem;
}
This seems somewhat like a Partition, except that you don't use all integers in 1:50. It also seems similar to bin packing problem with slight differences:
Wikipedia: Partition (Number Theory)
Wikipedia: Bin packing problem
Wolfram Mathworld: Partiton
Actually, after thinking about it, it's an ILP, and thus NP-hard.
I'd suggest some dynamic programming appyroach. Basically, you'd define a value "remainder" and set it to whatever your goal was (say, 50). Then, at every step, you'd do the following:
Figure out what the largest coin that can fit within the remainder
Consider what would happen if you (A) included that coin or (B) did not include that coin.
For each scenario, recurse.
So if remainder was 50 and the largest coins were worth 25 and 10, you'd branch into two scenarios:
1. Remainder = 25, Coinset = 1x25
2. Remainder = 50, Coinset = 0x25
The next step (for each branch) might look like:
1-1. Remainder = 0, Coinset = 2x25 <-- Note: Remainder=0 => Logged
1-2. Remainder = 25, Coinset = 1x25
2-1. Remainder = 40, Coinset = 0x25, 1x10
2-2. Remainder = 50, Coinset = 0x25, 0x10
Each branch would split into two branches unless:
the remainder was 0 (in which case you would log it)
the remainder was less than the smallest coin (in which case you would discard it)
there were no more coins left (in which case you would discard it since remainder != 0)
If you have 15, 10, 6 and 2 cents coins and you need to find how many distinct ways are there to arrive to 50 you can
count how many distinct ways you have to reach 50 using only 10, 6 and 2
count how many distinct ways you have to reach 50-15 using only 10, 6 and 2
count how many distinct ways you have to reach 50-15*2 using only 10, 6 and 2
count how many distinct ways you have to reach 50-15*3 using only 10, 6 and 2
Sum up all these results that are of course distinct (in the first I used no 15c coins, in the second I used one, in the third two and in the fourth three).
So you basically can split the problem in smaller problems (possibly smaller amount and fewer coins). When you have just one coin type the answer is of course trivial (either you cannot reach the prescribed amount exactly or you can in the only possible way).
Moreover you can also avoid repeating the same computation by using memoization, for example the number of ways of reach 20 using only [6, 2] doesn't depend if the already paid 30 have been reached using 15+15 or 10+10+10, so the result of the smaller problem (20, [6, 2]) can
be stored and reused.
In Python the implementation of this idea is the following
cache = {}
def howmany(amount, coins):
prob = tuple([amount] + coins) # Problem signature
if prob in cache:
return cache[prob] # We computed this before
if amount == 0:
return 1 # It's always possible to give an exact change of 0 cents
if len(coins) == 1:
if amount % coins[0] == 0:
return 1 # We can match prescribed amount with this coin
else:
return 0 # It's impossible
total = 0
n = 0
while n * coins[0] <= amount:
total += howmany(amount - n * coins[0], coins[1:])
n += 1
cache[prob] = total # Store in cache to avoid repeating this computation
return total
print howmany(50, [15, 10, 6, 2])
As for the second part of your question, suppose you have that string in the file coins.txt:
#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>
int main() {
std::ifstream coins_file("coins.txt");
std::vector<int> coins;
std::copy(std::istream_iterator<int>(coins_file),
std::istream_iterator<int>(),
std::back_inserter(coins));
}
Now the vector coins will contain the possible coin values.
For such a small number of coins you can write a simple brute force solution.
Something like this:
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
vector<int> v;
int solve(int total, int * coins, int lastI)
{
if (total == 50)
{
for (int i = 0; i < v.size(); i++)
{
cout << v.at(i) << ' ';
}
cout << "\n";
return 1;
}
if (total > 50) return 0;
int sum = 0;
for (int i = lastI; i < 6; i++)
{
v.push_back(coins[i]);
sum += solve(total + coins[i], coins, i);
v.pop_back();
}
return sum;
}
int main()
{
int coins[6] = {2, 4, 6, 10, 15, 50};
cout << solve(0, coins, 0) << endl;
}
A very dirty brute force solution that prints all possible combinations.
This is a very famous problem, so try reading about better solutions others have provided.
One rather dumb approach is the following. You build a mapping "coin with value X is used Y times" and then enumerate all possible combinations and only select those which total the desired sum. Obviously for each value X you have to check Y ranging from 0 up to the desired sum. This will be rather slow, but will solve your task.
It's very similar to the knapsack problem
You basically have to solve the following equation: 50 = a*4 + b*6 + c*10 + d*15, where the unknowns are a,b,c,d. You can compute for instance d = (50 - a*4 - b*6 - c*10)/15 and so on for each variable. Then, you start giving d all the possible values (you should start with the one that has the least possible values, here d): 0,1,2,3,4 and than start giving c all the possible values depending on the current value of d and so on.
Sort the List backwards: [15 10 6 4 2]
Now a solution for 50 ct can contain 15 ct or not.
So the number of solutions is the number of solutions for 50 ct using [10 6 4 2] (no longer considering 15 ct coins) plus the number of solutions for 35 ct (=50ct - 15ct) using [15 10 6 4 2]. Repeat the process for both sub-problems.
An algorithm is a procedure for solving a problem, it doesn't have to be in any particular language.
First work out the inputs:
typedef int CoinValue;
set<CoinValue> coinTypes;
int value;
and the outputs:
set< map<CoinValue, int> > results;
Solve for the simplest case you can think of first:
coinTypes = { 1 }; // only one type of coin worth 1 cent
value = 51;
the result should be:
results = { [1 : 51] }; // only one solution, 51 - 1 cent coins
How would you solve the above?
How about this:
coinTypes = { 2 };
value = 51;
results = { }; // there is no solution
what about this?
coinTypes = { 1, 2 };
value = { 4 };
results = { [2: 2], [2: 1, 1: 2], [1: 4] }; // the order I put the solutions in is a hint to how to do the algorithm.
Recursive solution based on algorithmist.com resource in Scala:
def countChange(money: Int, coins: List[Int]): Int = {
if (money < 0 || coins.isEmpty) 0
else if (money == 0) 1
else countChange(money, coins.tail) + countChange(money - coins.head, coins)
}
Another Python version:
def change(coins, money):
return (
change(coins[:-1], money) +
change(coins, money - coins[-1])
if money > 0 and coins
else money == 0
)