MapReduce pseudocode explanation

MapReduce pseudocode explanation - mapreduce

I saw this pseudocode will studying how to design a map-reduce algorithm. The pseudocode is meant to compute the mean of values associated with the same key. I need help in understanding how the algorithm works:
class MAPPER
method INITIALIZE
S <= new AssociativeArray
C <= new AssociativeArray
method MAP(string t; integer r)
S{t} <= S{t} + r
C{t} <= C{t} + 1
method Close
for all term t S do
EMIT (term t; pair (S{t}, C{t}))
class REDUCER
method REDUCE(string t; pairs [(s1, c1), (s2, c2)…])
sum <= 0
cnt <= 0
for all pair (s, c) 2 pairs [(s1, c1), (s2, c2) …] do
sum <= sum + s
cnt <= cnt + c
ravg <= sum=cnt
EMIT(string t, integer ravg)

Related

How to find N points on an infinite axis so that sum of distances from M points to its nearest N is smallest?

Consider there are N houses on a single road. I have M lightpoles. Given that M < N. Distance between all adjacent houses are different. Lightpole can be placed at the house only. And I have to place all lightpoles at house so that sum of distances from each house to its nearest lightpole is smallest. How can I code this problem?
After a little research I came to know that I have to use dynamic programming for this problem. But I don't know how to approach it to this problem.

Here's a naive dynamic program with search space O(n^2 * m). Perhaps others know of another speedup? The recurrence should be clear from the function f in the code.
JavaScript code:
// We can calculate these in O(1)
// by using our prefixes (ps) and
// the formula for a subarray, (j, i),
// reaching for a pole at i:
//
// ps[i] - ps[j-1] - (A[i] - A[j-1]) * j
//
// Examples:
// A: [1,2,5,10]
// ps: [0,1,7,22]
// (2, 3) =>
// 22 - 1 - (10 - 2) * 2
// = 5
// = 10-5
// (1, 3) =>
// 22 - 0 - (10 - 1) * 1
// = 13
// = 10-5 + 10-2
function sumParts(A, j, i, isAssigned){
let result = 0
for (let k=j; k<=i; k++){
if (isAssigned)
result += Math.min(A[k] - A[j], A[i] - A[k])
else
result += A[k] - A[j]
}
return result
}
function f(A, ps, i, m, isAssigned){
if (m == 1 && isAssigned)
return ps[i]
const start = m - (isAssigned ? 2 : 1)
const _m = m - (isAssigned ? 1 : 0)
let result = Infinity
for (let j=start; j<i; j++)
result = Math.min(
result,
sumParts(A, j, i, isAssigned)
+ f(A, ps, j, _m, true)
)
return result
}
var A = [1, 2, 5, 10]
var m = 2
var ps = [0]
for (let i=1; i<A.length; i++)
ps[i] = ps[i-1] + (A[i] - A[i-1]) * i
var result = Math.min(
f(A, ps, A.length - 1, m, true),
f(A, ps, A.length - 1, m, false))
console.log(`A: ${ JSON.stringify(A) }`)
console.log(`ps: ${ JSON.stringify(ps) }`)
console.log(`m: ${ m }`)
console.log(`Result: ${ result }`)

I got you covered bud. I will write to explain the dynamic programming algorithm first and if you are not able to code it, let me know.
A-> array containing points so that A[i]-A[i-1] will be the distance between A[i] and A[i-1]. A[0] is the first point. When you are doing memoization top-down, you will have to handle cases when you would want to place a light pole at the current house or you would want to place it at a lower index. If you place it now, you recurse with one less light pole available and calculate the sum of distances with previous houses. You handle the base case when you are not left with any ligh pole or you are done with all the houses.

Efficent Algorithm to Answer Subarray Queries fast

The other day I encountered a problem related with queries, but I can't solve it.
Given an array with N integers and a positive integer M, you must answer Q queries. Each query is characterized as ( i , j ), where i and j are each indices of the array. In each query you must answer how many pairs ( r , s ) exist such that
i <= r <= s <= j
the sum of the array elements with indices in [ r , s ] is divisible by M.
Limits:
N <= 50,000
Q <= 50,000
M <= 100
I have a dynamic programming solution that preprocesses every query ( r , s ) in O( N^2 ), but that is not fast enough. Is there a more efficient solution? I have some ideas with Mo's algorithm, or with segment trees, but I can't get it.

Calculate the prefix sums of the original array (assuming it's 1-based) for every i = 1..N.
The equivalence of Sum[r] and Sum[s] for any two indices r and s where r < s means that the sum of the array elements with indices in [r+1, s] is divisible by M (and we need to calculate the number of such equivalences within interval). The time complexity of this step is O(N).
Precalculate the array Count for every i = 1..N, j = 0..M-1:
Count[i][j] stores the number of times that Sum[len] (where len <= i) was equal to j. Time complexity of this step is O(N*M).
For every query (i, j) the answer will be equal to:
For every possible value of the remainder k we find D(k) - the number of times that Sum[len] is equal to k within interval [i, j]. Then we add to the result the number of all possible pairs of D(k) interval boundaries that is D(k)*(D(k)-1)/2. Time complexity: O(M) for every query.
Complexity: O(N) + O(N*M) + O(Q*M) = O((Q+N)*M), that would be ok for given constraints.

First note that for any subarray (r, s) that sums to a multiple of M:
sum(r, s) == sum(i, s) - sum(i, r - 1)
== (qa * M + ra) - (qb * M + rb)
where ra and rb are both less than M and greater than or equal to 0 (i.e. the respective remainders after dividing by M).
Now sum(r, s) is divisible by M so it's remainder is 0 after dividing by M. Therefore:
ra == rb
If we calculate all the remainders after dividing the sums the subarrays (i, i), (i, i + 1), ... ,(i, j) by M as r1, r2, ... , rj then store the count of all these in an array R of size M so that R[k] is the number of remainders equal to k, then:
R[0] == the number of subarrays starting at i that are divisible by M
and for every k >= 0 and k < M such that R[k] > 1 we can count R[k] choose 2:
(R[k] * (R[k] - 1)) / 2
subarrays not starting at i that are divisible by M.
Creating and summing all these values gives us the answer in O( N + M ) for each (r, s) query.

Range Update - Range Query using Fenwick Tree

http://ayazdzulfikar.blogspot.in/2014/12/penggunaan-fenwick-tree-bit.html?showComment=1434865697025#c5391178275473818224
For example being told that the value of the function or f (i) of the index-i is an i ^ k, for k> = 0 and always stay on this matter. Given query like the following:
Add value array [i], for all a <= i <= b as v Determine the total
array [i] f (i), for each a <= i <= b (remember the previous function
values clarification)
To work on this matter, can be formed into Query (x) = m * g (x) - c,
where g (x) is f (1) + f (2) + ... + f (x).
To accomplish this, we
need to know the values of m and c. For that, we need 2 separate
BIT. Observations below for each update in the form of ab v. To
calculate the value of m, virtually identical to the Range Update -
Point Query. We can get the following observations for each value of
i, which may be:
i <a, m = 0
a <= i <= b, m = v
b <i, m = 0
By using the following observation, it is clear that the Range Update - Point Query can be used on any of the BIT. To calculate the value of c, we need to observe the possibility for each value of i, which may be:
i <a, then c = 0
a <= i <= b, then c = v * g (a - 1)
b <i, c = v * (g (b) - g (a - 1))
Again, we need Range Update - Point Query, but in a different BIT.
Oiya, for a little help, I wrote the value of g (x) for k <= 3 yes: p:
k = 0 -> x
k = 1 -> x * (x + 1) / 2
k = 2 -> x * (x + 1) * (2x + 1) / 6
k = 3 -> (x * (x + 1) / 2) ^ 2
Now, example problem SPOJ - Horrible Queries . This problem is
similar issues that have described, with k = 0. Note also that
sometimes there is a matter that is quite extreme, where the function
is not for one type of k, but it could be some that polynomial shape!
Eg LA - Alien Abduction Again . To work on this problem, the solution
is, for each rank we make its BIT counter m respectively. BIT combined
to clear the counters c it was fine.
How can we used this concept if:
Given an array of integers A1,A2,…AN.
Given x,y: Add 1×2 to Ax, add 2×3 to Ax+1, add 3×4 to Ax+2, add 4×5 to
Ax+3, and so on until Ay.
Then return Sum of the range [Ax,Ay].

Efficient C/C++ algorithm on 2-dimensional max-sum window

I have a c[N][M] matrix where I apply a max-sum operation over a (K+1)² window. I am trying to reduce the complexity of the naive algorithm.
In particular, here's my code snippet in C++:
<!-- language: cpp -->
int N,M,K;
std::cin >> N >> M >> K;
std::pair< unsigned , unsigned > opt[N][M];
unsigned c[N][M];
// Read values for c[i][j]
// Initialize all opt[i][j] at (0,0).
for ( int i = 0; i < N; i ++ ) {
for ( int j = 0; j < M ; j ++ ) {
unsigned max = 0;
int posX = i, posY = j;
for ( int ii = i; (ii >= i - K) && (ii >= 0); ii -- ) {
for ( int jj = j; (jj >= j - K) && (jj >= 0); jj -- ) {
// Ignore the (i,j) position
if (( ii == i ) && ( jj == j )) {
continue;
}
if ( opt[ii][jj].second > max ) {
max = opt[ii][jj].second;
posX = ii;
posY = jj;
}
}
}
opt[i][j].first = opt[posX][posY].second;
opt[i][j].second = c[i][j] + opt[posX][posY].first;
}
}
The goal of the algorithm is to compute opt[N-1][M-1].
Example: for N = 4, M = 4, K = 2 and:
c[N][M] = 4 1 1 2
6 1 1 1
1 2 5 8
1 1 8 0
... the result should be opt[N-1][M-1] = {14, 11}.
The running complexity of this snippet is however O(N M K²). My goal is to reduce the running time complexity. I have already seen posts like this, but it appears that my "filter" is not separable, probably because of the sum operation.
More information (optional): this is essentially an algorithm which develops the optimal strategy in a "game" where:
Two players lead a single team in a N × M dungeon.
Each position of the dungeon has c[i][j] gold coins.
Starting position: (N-1,M-1) where c[N-1][M-1] = 0.
The active player chooses the next position to move the team to, from position (x,y).
The next position can be any of (x-i, y-j), i <= K, j <= K, i+j > 0. In other words, they can move only left and/or up, up to a step K per direction.
The player who just moved the team gets the coins in the new position.
The active player alternates each turn.
The game ends when the team reaches (0,0).
Optimal strategy for both players: maximize their own sum of gold coins, if they know that the opponent is following the same strategy.
Thus, opt[i][j].first represents the coins of the player who will now move from (i,j) to another position. opt[i][j].second represents the coins of the opponent.

Here is a O(N * M) solution.
Let's fix the lower row(r). If the maximum for all rows between r - K and r is known for every column, this problem can be reduced to a well-known sliding window maximum problem. So it is possible to compute the answer for a fixed row in O(M) time.
Let's iterate over all rows in increasing order. For each column the maximum for all rows between r - K and r is the sliding window maximum problem, too. Processing each column takes O(N) time for all rows.
The total time complexity is O(N * M).
However, there is one issue with this solution: it does not exclude the (i, j) element. It is possible to fix it by running the algorithm described above twice(with K * (K + 1) and (K + 1) * K windows) and then merging the results(a (K + 1) * (K + 1) square without a corner is a union of two rectangles with K * (K + 1) and (K + 1) * K size).

Calculate n where a^n mod m = 1?

What is fastest way to calculate the first n satisfying the equation
a^n mod m = 1
Here a,n,m can be prime or composite
mod : is the modulus operator

What is wrong with the direct way:
int mod_order(int m, int a) {
for(int n = 1, an = a; n != m; n++, an = an * a % m) if(an % m == 1) return n;
return -1;
}

If gcd(a,m)>1, then there is no such n. (Obvious)
Otherwise, if m is prime, n=m-1. (Proof)
Otherwise (and as more general case), n=ф(m), where ф is Euler's totient function. (Proof)
As you can see, computing ф(m) is essentially the same as factorization of m. This can be done in sqrt(m) time or faster, depending on how convoluted is the algorithm you use. Simple one:
int phi(m){
if(m==1) return 1;
for(int d=2; d*d<m; ++d){
if(m%d != 0) continue;
int deg = 1; long acc=1;
for(; m%(acc*d)==0; ++deg) acc*=d;
acc /= d;
return phi(m/acc)*acc*(d-1)/d;
}
return m-1;
}
Upd: My bad. a^(ф(m)) = 1 (mod m), but there can be lesser value of n (for a=1, n=1, no difference what m is; for a=14, m=15, n=2). n is divisor of ф(m), but efficiently computing least possible n seems to be tricky. Task can be divided, by using this theorem (minimal n is least common multiple for all degrees for respective remainders). But when m is prime or has big enough prime divisor, and there is only one a (as opposed to computing n for many different a with the same m), we're kind of out of options. You may want to look at 1, 2.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

MapReduce pseudocode explanation - mapreduce

Related

How to find N points on an infinite axis so that sum of distances from M points to its nearest N is smallest?

Efficent Algorithm to Answer Subarray Queries fast

Range Update - Range Query using Fenwick Tree

Efficient C/C++ algorithm on 2-dimensional max-sum window

Calculate n where a^n mod m = 1?

Categories

Resources