How to code the optimal page replacement algorithm?

How to code the optimal page replacement algorithm? - c++

I am sharing my logic. I need to know if its fine.
I created an array which stores the total number of occurrences for each page.
For ex - If sequence of page requirements is { 1,2,3,1,2}. Lets call it "seq" array.
Then array = { 2,2,1 } . Lets call it "count" array
Now, I iterate through seq and allocate it a frame till I don't exhaust all the frames or if the frame is not already in memory. Then I push it the frame no. and its remaining no. of occurrences to a min priority queue.
for (int i = 1; i <= M; ++i)
{
if (frameAssigned[arr[i]] != 0) //frame already assigned
{
count[arr[i]]--;
PQ.push(ii(count[arr[i]], arr[i]));
continue;
}
if (freeFrames >= 1)
{
frameAssigned[arr[i]] = presentFrame++; //presentFrame=0 initially
freeFrames--;
noOfReplacements++;
count[seq[i]]--;
PQ.push(ii(count[seq[i]], seq[i]));
continue;
}
//Now, if all free frames are exhausted, I do the following. Replace the page which is
//occurring the minimum number of times.
ii temp = PQ.top(); // ii = pair<int,int>
PQ.pop();
int frameNumber = temp.second;
count[seq[i]]--;
if (seq[arr[i]] >= 0) PQ.push(ii(count[seq[i]], seq[i]));
frameAssigned[arr[i]] = frameAssigned[custNumber];
frameAssigned[custNumber] = 0;
noOfReplacements++;
However, this algorithm seems to be incorrect. I don't understand why. I found the correct algorithm here, but I don't understand why mine doesn't work.

Let us look at the following page occurrence:
1,2,3,2,3,2,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1
Let us assume that the 2 pages can be held in memory. According to your algorithm, when 3 will arrive for the first time, 2 will be replaced because number of occurrences of 1 is quite high , which is not optimal.
In the optimal page replacement algorithm, the criteria for page replacement is based on the time after which the page will be referenced again.
I recommend you to go through the editorial of this problem http://www.codechef.com/AUG14/problems/CLETAB once it is out.

Related

Minimum number of elements required to make two bags of at least k weight?

Suppose you are given a number k and an array of objects having some weight. Now your task is to find the minimum number of objects that you can put in two bags such that each bag weigh at least k.
You can only take the objects as whole no breaking is allowed. Also, if an object is put in one bag it cannot be put into the other bag.
This problem seems simple to me. I have done similar problems when you need to fill just one bag. The idea I use is that you visit each object ask yourself what if I put it in the bag and what if I don't? You do this recursively until your desired weight is reached or you have no more objects. Take minimum when calling your recursive function.
However, I am not able to understand how to keep track of all the objects used up in bag 1 so that I don't include in bag 2.
Few Test cases
Desired weight (k) = 4
Number of objects (N) = 1
[10]
Output: -1 (Not possible)
Desired weight (k) = 2
Number of objects (N) = 3
[2,2,2]
Output: 2

I will focus on what you point out as your actual core problem, how to keep track of objects you used in one bag, the other bag or not at all.
Make a list (array, vector, ... whatever container you prefer) and note for each of the objects where you used it - or not.
index
value
meaning
0
0
not used
1
0
not used
2
0
not used
3
1
used in one bag
4
2
used in other bag
From your question it is not clear to me whether all objects have the same weight or different weights given in the input. If the weights are different, then you most likely already have a container for keeping track of the weight of each object. Modifying that container or using a second, very similar one will help you to also store the "used where" information.
I am intentionally not going into detail, because of
How do I ask and answer homework questions?

I don't know if this answers your question or not, but still...
You can do one thing: Initially make two empty arrays, say Bag_1 and Bag_2. As you recurse through all elements one by one, pop that element out of the array and append it to Bag_1 or Bag_2 whichever gives you the optimal solution. If the process is to be done multiple times, then creating a copy of the original array might help, if the length of the array is reasonable.

Here is the pseudo code for the program without dynamic programing.
sort(a, a+n); // Sort the array of objects having weight
int sum = a[n-1], count = -1; //Initialise sum and count
unordered_set<int>log; // Create an unordered set to store logs (Unordered set will not add repetitive values in the log thus decreasing time complexity)
log.insert(a[n-1]); // insert last element int log initially
for(int i = n-2; i >=0; i--) {
sum += a[i]; //increment the sum
unordered_set<int>temp; //Create a temporary log that will be mapped to main log at the end.
temp.insert(a[i]); //insert the sum to temp log
for (auto it = log.begin(); it != log.end(); ++it) { //loop over all logs seen till now
temp.insert(*it + a[i]); // Add current sum to each of them and insert it to temp log thus creating all possible combinations.
if((a[i] + *it >= k) && (sum - a[i] - *it >= k)) { //Condition to check if bags have been filled with at least k weight.
count = n-i; // update the current count. This will be the ans.
break;
}
if(a[i] >= k && sum - a[i] >= k) {
count = n-i;
break;
}
}
if(count != -1) { //Condition to check if it's not possible to make such a combination.
break;
}
log.insert(temp.begin(), temp.end()); // add all temp to main log.
}
cout << count << endl; //print ans.

Why does the longest prefix which is also suffix calculation part in the KMP have a time complexity of O(n) and not O(n^2)?

I was going through the code of KMP when I noticed the Longest Prefix which is also suffix calculation part of KMP. Here is how it goes,
void computeLPSArray(char* pat, int M, int* lps)
{
int len = 0;
lps[0] = 0;
int i = 1;
while (i < M) {
if (pat[i] == pat[len]) {
len++;
lps[i] = len;
i++;
}
else
{
if (len != 0) {
len = lps[len - 1]; //<----I am referring to this part
}
else
{
lps[i] = 0;
i++;
}
}
}
}
Now the part where I got confused was the one which I have shown in comments in the above code. Now we do know that when a code contains a loop like the following
int a[m];
memset(a, 0, sizeof(a));
for(int i = 0; i<m; i++){
for(int j = i; j>=0; j--){
a[j] = a[j]*2;//This inner loop is causing the same cells in the 1
//dimensional array to be visited more than once.
}
}
The complexity comes out to be O(m*m).
Similarly if we write the above LPS computation in the following format
while(i<M){
if{....}
else{
if(len != 0){
//doesn't this part cause the code to again go back a few elements
//in the LPS array the same way as the inner loop in my above
//written nested for loop does? Shouldn't that mean the same cell
//in the array is getting visited more than once and hence the
//complexity should increase to O(M^2)?
}
}
}
It might be that the way I think complexities are calculated is wrong. So please clarify.

If expressions do not take time that grows with len.
Len is an integer. Reading it takes O(1) time.
Array indexing is O(1).
Visiting something more than once does not mean you are higher O notation wise. Only if the visit count grows faster than kn for some k.

If you carefully analyze the algorithm of creating prefix table, you may notice that the total number of rollbacked positions could be m at most, so the upper bound for total number of iterations is 2*m which yields O(m)
Value of len grows alongside the main iterator i and whenever there is a mismatch, len drops back to zero value but this "drop" cannot exceed the interval passed by the main iterator i since the start of match.
For example, let's say, the main iterator i started matching with len at position 5 and mismatched at position 20.
So,
LPS[5]=1
LPS[6]=2
...
LPS[19]=15
At the moment of mismatch, len has a value of 15. Hence it may rollback at most 15 positions down to zero, which is equivalent to the interval passed by i while matching. In other words, on every mismatch, len travels back no more than i has traveled forward since the start of match

Finding the sequence so that the event is finished at the earliest

This is a problem from informatica olympiad that I am trying to solve since sometime. This is important for me since this contains an underlying fundamental problem that I see in a lot of problems.
Given N citizens for an event such that they have to program on a single computer, eat chocolates and then eat doughnuts. time , ith citizen takes for each task is given as input. Each citizen has to finish the tasks in order, i.e., first program then eat chocolate and then eat doughnuts. Any number of people could eat chocolates or doughnuts at a time but since computer is one only 1 person can program each time. Once, he is done he would move to chocolates and next person shall program. The task is to find the order in which citizens be sent out to program such that event ends in minimum time and this time is the output.
I worked this problem using the approach:
If I start with ith citizen then for remaining n-1 citizens if I find the time (tn-1) then tn = max((ni[0]+ni[1]+ni[2]), ni[0] + tn-1). Eg.:
18 7 6
23 10 27
20 9 14
then 18+7+6, 18+23+10+27, 18+23+20+9+14, max would be 84 but if you start with 23 then time would be 74 which is less.
I implemented this approach whose code I am presenting here. However, the complexity is O(n!) for my approach. I can see underlying repeated subproblems,so I could use DP approach. But the problem is I need to store the time value for each list i to j such that it could begin with any k from i to j and so on. This storage process would again be complex and require n! storage. How, to solve this problem and similar such problems?
Here is my program on my approach:
#include <iostream>
#include <vector>
#include <climits>
int min_time_sequence(std::vector<std::vector<int> > Info, int N)
{
if (N == 0) return 0;
if (N == 1)
{
int val = Info[0][0] + Info[0][1] + Info[0][2];
return val;
}
std::vector<std::vector<int> > tmp = Info;
int mn = INT_MAX;
for (int i = 0; i < N; ++i)
{
//prepare new list
tmp.erase(tmp.begin()+i);
int mn = min_time_sequence(tmp, N-1);
int v1 = Info[i][0] + mn;
int v2 = Info[i][0] + Info[i][1] + Info[i][2];
int larger = v1 > v2 ? v1 : v2;
if (mn > larger) mn = larger;
}
return mn;
}
int main()
{
int N;
std::cin>>N;
std::vector<std::vector<int> > Info;
//input
for (int i = 0; i < N; ++i)
{
std::cin>>Info[i][0];
std::cin>>Info[i][1];
std::cin>>Info[i][2];
}
int mx = 0;
if (N > 0)
mx = min_time_sequence(Info, N);
std::cout<<mx<<std::endl;
return 0;
}

Since you asked for general techniques, you might want to look at greedy algorithms, that is, algorithms that repeatedly optimize the next selection. In this case, that might be for the remaining person who will take the longest total time (the sum of the three times) to program next, so he or she will finish eating sooner, and no one who starts later will take more time.
If such an algorithm were optimal, the program could simply sort the list by the sum of times, in decreasing order, which takes O(N log N) time.
You would, however, be expected to prove that your solution is valid. One way to do that is known as “Greedy Stays Ahead.” That is an inductive proof where you show that the solution your greedy algorithm produces is at least as optimal (by some measure equivalent to optimality at the final step) at its first step, then that it is also as good at its second step, the step after that, and so on. Hint: you might try measuring what is the worst-case scenario for how much time the event could need after each person starts programming. At the final step, when the last person gets to start programming, this is equivalent to optimality.
Another method to prove an algorithm is optimal is “Proof by Exchange.” This is a form of proof by contradiction in which you hypothesize that some different solution is optimal, then you show that exchanging a part of that solution with a part of your solution could improve the supposedly-optimal solution. That contradicts the premise that it was ever optimal—which proves that no other solution is better than this. So: assume the optimal order is different, meaning the last person who finishes started after someone else who took less time. What happens if you switch the positions of those two people?
Greedy solutions are not always best, so in cases where they are not, you would want to look at other techniques, such as symmetry-breaking and pruning the search tree early.

find duplicate number in an array

I am debugging below problem and post the solution I am debugging and working on, the solution or similar is posted on a couple of forums, but I think the solution has a bug when num[0] = 0 or in general num[x] = x? Am I correct? Please feel free to correct me if I am wrong.
Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.
Note:
You must not modify the array (assume the array is read only).
You must use only constant, O(1) extra space.
Your runtime complexity should be less than O(n2).
There is only one duplicate number in the array, but it could be repeated more than once.
int findDuplicate3(vector<int>& nums)
{
if (nums.size() > 1)
{
int slow = nums[0];
int fast = nums[nums[0]];
while (slow != fast)
{
slow = nums[slow];
fast = nums[nums[fast]];
}
fast = 0;
while (fast != slow)
{
fast = nums[fast];
slow = nums[slow];
}
return slow;
}
return -1;
}

Below is my code which uses Floyd's cycle-finding algorithm:
#include <iostream>
#include <vector>
using namespace std;
int findDup(vector<int>&arr){
int len = arr.size();
if(len>1){
int slow = arr[0];
int fast = arr[arr[0]];
while(slow!=fast){
slow = arr[slow];
fast = arr[arr[fast]];
}
fast = 0;
while(slow!=fast){
slow = arr[slow];
fast = arr[fast];
}
return slow;
}
return -1;
}
int main() {
vector<int>v = {1,2,2,3,4};
cout<<findDup(v)<<endl;
return 0;
}
Comment This works because zeroes aren't allowed, so the first element of the array isn't part of a cycle, and so the first element of the first cycle we find is referred to both outside and inside the cycle. If zeroes were allowed, this would fail if arr[0] were on a cycle. E.g., [0,1,1].

The sum of integers from 1 to N = (N * (N + 1)) / 2. You can use this to find the duplicate -- sum the integers in the array, then subtract the above formula from the sum. That's the duplicate.
Update: The above solution is based on the (possibly invalid) assumption that the input array consists of the values from 1 to N plus a single duplicate.

Start with two pointers to the first element: fast and slow.
Define a 'move' as incrementing fast by 2 step(positions) and slow by 1.
After each move, check if slow & fast point to the same node.
If there is a loop, at some point they will. This is because after they are both in the loop, fast is moving twice as quickly as slow and will eventually 'run into' it.
Say they meet after k moves. This is NOT NECESSARILY the repeated element, since it might not be the first element of the loop reached from outside the loop.
Call this element X.
Notice that fast has stepped 2k times, and slow has stepped k times.
Move fast back to zero.
Repeatedly advance fast and slow by ONE STEP EACH, comparing after each step.
Notice that after another k steps, slow will have moved a total of 2k steps and fast a total of k steps from the start, so they will again both be pointing to X.
Notice that if the prior step is on the loop for both of them, they were both pointing to X-1. If the prior step was only on the loop for slow, then they were pointing to different elements.
Ditto for X-2, X-3, ...
So in going forward, the first time they are pointing to the same element is the first element of the cycle reached from outside the cycle, which is the repeated element you're looking for.

Since you cannot use any additional space, using another hash table would be ruled out.
Now, coming to the approach of hashing on existing array, it can be acheived if we are allowed to modify the array in place.
Algo:
1) Start with the first element.
2) Hash the first element and apply a transformation to the value of hash.Let's say this transformation is making the value -ve.
3)Proceed to next element.Hash the element and before applying the transformation, check if a transformation has already been applied.
4) If yes, then element is a duplicate.
Code:
for(i = 0; i < size; i++)
{
if(arr[abs(arr[i])] > 0)
arr[abs(arr[i])] = -arr[abs(arr[i])];
else
cout<< abs(arr[i]) <<endl;
}
This transformation is required since if we are to use hashing approach,then, there has to be a collision for hashing the same key.
I cant think of a way in which hashing can be used without any additional space and not modifying the array.

Insertion Sort Optimization

I'm trying to practice making some different sort functions and the insertion function that I came up with is giving me some trouble. I can sort lists that are less than 30K fairly quickly. But I have a list of 100K integers and it literally takes 15 minutes for the function to complete the sort. Everything is sorted correctly, but I don't believe it should take that long.
Am I missing something with my code that is making it take so long? Many thanks in advance.
void Sort::insertion_Sort(vector <int> v)
{
int vecSize = v.size();
//for loop to advance through the vector
for (int i=0; i < vecSize; i++)
{
//delcare some variables
int cursor = i;
int inputCursor = i-1;
int temp = v[cursor];
//check to see if we are considering only a single element
if (cursor > 0)
{
//if there is more than 1 element, then we test the following.
//1. is the cursor element less than the inputCursor(which
//is the previous element)
//2. is the input cursor greater than -1
while (inputCursor > -1 && v[cursor] < v[inputCursor] )
{
//if so, we swap the variables
//then move the cursors back to check
//the previous elment and see if we need to swap again.
temp = v[cursor];
v[cursor] = v[inputCursor];
v[inputCursor] = temp;
inputCursor--;
cursor--;
}
}
}
}

Insertion sort is an O(n^2) algorithm. It's slow for large inputs. It's going to take roughly 11 times longer to process a list of 100k items than a list of 30k items. For inputs larger than 20 or so, you should use something like quicksort, which is O(n*log(n)).

The O(n^2) vs O(n*log(n)) problem, as pointed out by the other answer, is the center of this problem. I would suggest a binary search algorithm, as it is more similar to the insert algorithm, and is simplier to implement. It would look for the point of insertion dividing the already inserted vector in half, and trying to see if the integer to be inserted is greater or not of the integer in the middle. Then, it will try again to split one of the half (the one on the choosen side) and so on, recursively.
I think this is the best approach without starting from scratch.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js