Why do CouchDB reduce functions receive 'keys' as an argument - mapreduce

With a CouchDB reduce function:
function(keys, values, rereduce) {
// ...
}
That gets called like this:
reduce( [[key1,id1], [key2,id2], [key3,id3]], [value1,value2,value3], false )
Question 1
What is the reason for passing keys to the reduce function? I have only written relatively simple CouchDB views with reduce functions and would like to know what the use case is for receiving a list of [key1, docid], [key2, docid], etc is.
Also. is there ever a time when key1 != key2 != keyX when a reduce function executes?
Question 2
CouchDB's implementation of MapReduce allows for rereduce=true, in which case the reduce function is called like this:
reduce(null, [intermediate1,intermediate2,intermediate3], true)
Where the keys argument is null (unlike when rereduce=false). Why would there not be a use case for a keys argument in this case if there was a use for when rereduce=false?

What is the use case of keys argument when rereduce = true?
There isn't one. That's why the keys argument is null in this case.
From the documentation (emphasis added):
Reduce and Rereduce Functions
redfun(keys, values[, rereduce])
Arguments:
keys – Array of pairs of key-docid for related map function results. Always null if rereduce is running (has true value).
values – Array of map function result values.
rereduce – Boolean flag to indicate a rereduce run.
Perhaps what you're meaning to ask is: Why is the same function used for both reduce and rereduce? I expect there's some history involved, but I can also imagine that it's because it's quite common that the same logic can be used for both functions, and by not having separate function definitions duplication can be reduced. Suppose a simple sum reduce function:
function(keys, values) {
return sum(values);
}
Here both keys and rereduce can be ignored entirely. Many other (re)reduce functions follow the same pattern. If two functions had to be used, then this identical function would have to be specified twice.
In response to the additional question in comments:
what use cases exist for the keys argument when rereduce=false?
Remember, keys and values can be anything, based on the map function. A common pattern is to emit([foo,bar,baz],null). That is to say, the value may be null, if all the data you care about is already present in the key. In such a case, any reduce function more complex than a simple sum would require use of the keys.
Further, for grouping operations, using the keys makes sense. Consider a map function with emit(doc.countryCode, ... ), a possible (incomplete) reduce function:
function(keys, values, rereduce) {
const sums = {};
if (!rereduce) {
keys.forEach((key) => ++sums[key]);
}
return sums;
}
Then given documents:
{"countryCode": "us", ...}
{"countryCode": "us", ...}
{"countryCode": "br", ...}
You'd get emitted values (from the map function) of:
["us", ...]
["br", ...]
You'd a reduced result of:
{"us": 2, "br": 1}

Related

Drools iterate an object list and for all objects in the list, sum up the value of a field of the object

I am trying to build a drools rule where the fact supplied has a user supplied int value. Fact also has a list of objects that have an expected int value. I need to check if the user supplied value is less than sum of the individual objects expected value.
Here is my sample fact class -
public class Order {
private int orderId;
private List<OrderItem> orderItems;
private int usersIntegrationFee;
}
The OrderItem class contained in the fact class is -
public class OrderItem {
private int orderItemId;
private Product prod;
}
And the Product class is the one that has productIntegrationFee field-
public class Product {
private String mktProdId;
private String prodName;
private int productIntegrationFee;
}
I wish to check if Order.usersIntegrationFee is less than sum of all Order.OrderItem.Product.productIntegrationFee.
Is there a way to do this in drools? Any help would be much appreciated.
What you need to use is a drools built-in utility called accumulate. This utility allows you to iterate over a collection of objects on the right hand side while "accumulating" some sort of variable value. Some common use cases involve sums, means, counts, and so on. Basically the construct loops over each item in a collection and performs some defined action on each item.
While you can define your own custom accumulate function, Drools supports several built-in accumulate functions -- one of which is sum. This built-in function is what you need for your particular use case.
The general format of accumulate is this: accumulate( <source pattern>; <functions> [;<constraints>] ). Your rule would look something like this (psuedo-code since I don't have Drools on this computer; syntax should be close or exact but typos may exist.)
rule "Integration fee is less than sum of product fees"
when
// extract the variables from the Order
Order( $userFee: usersIntegrationFee,
$items: orderItems)
// use the accumulate function to sum up the product fees
$productFees: Integer( this > $userFee ) from
accumulate( OrderItem( $product: prod ) from $items,
Product( $fee: productIntegrationFee ) from $product;
sum($fee) )
then
// do something
end
Some things to note. In a single accumulate, you can do multiple things -- sum, average, and so on. But if you're only invoking one accumulate function like here, you can use the syntax I've shown (Integer( conditions ) from accumulate( ... ).) If you had multiple functions, you'd have to assign the outputs directly (eg. $sum: sum($fee), etc.)
Finally there's a third optional parameter to accumulate which I've omitted since your use case is quite simple and doesn't need it. The third parameter applies filtering (called 'constraints') so that the accumulate functions skip over items that don't meet this criteria. For example, you could add a constraint to ignore productIntegrationFee values that are negative like this: $fee > 0.
A final note about the syntax I chose in this rule. Since the use case was "trigger the rule if the usersIntegrationFee is less than the sum," I put the comparison directly in the Integer( ... ) from accumulate. You could, of course, do a comparison separately, for example like Integer( $productFees > this ) from $userFee or whatever other format of comparison you like. This way just seemed simplest.
The Drools documentation has more information about this utility. I've linked to the section that discusses elements in the 'when' clause; scroll down a bit to see the documentation for accumulate directly.

is there a hashing function that satisfies the following

is there a hashing algorithm that satisfies the following?
let "hash_funct" be a hashing function that takes two args, and returns a hash value. so all the following will be true
Hash1 = hash_funct(arg1, arg2) <=> hash_funct(Hash1, arg1) = hash_funct(Hash1, arg2) = Hash1;
Can anyone point me to this Algorithm? or if it doesn't exist, can anyone collaborate with me to invent it?
more explanation:
imagine a set S={A,B,C,D}, and the Hashing function above.
if we can make: Hash1 = hash_funct(A,B,C,D), then we can check if an element X is in the set by checking the hash result of hash_funct(Hash1,X) == Hash1 ? belogns to the set : doesn't belong
with this property we make checking the exisitance of an element in a set O(1) instead of O(NlogN)
I suppose Highest common factor(Hcf) will fit right here. Let a and b be two numbers with x as their highest common factor.
hcf(a,b) = x.
This means a = x*m and b = x*n. This clearly means that:
hcf(x,x*m) = hcf(x,x*n) = hcf(x*n,x*m) = x
What you are looking for is the Accumulators. Currently, they are very popular with digital coins #youtube
From Wikipedia;
A cryptographic accumulator is a one-way membership function. It answers a query as to whether a potential candidate is a member of a set without revealing the individual members of the set.
For example this paper;
We show how to use the RSA one-way accumulator to realize an efficient and dynamic authenticated dictionary, where untrusted directories provide cryptographically verifiable answers
to membership queries on a set maintained by a trusted source
With a Straightforward Accumulator-Based Scheme;
Query: When asking for a proof of membership.
Verification: check the validity of the answer.
Updates: Insertion and deletions
are available.

Sort List with two parameters but in different lines of code

I have a method which sorts a List of Clients.
There are two parameters I want to sort by.
I am well aware of the
List.OrderBy(x => x. [...]).ThenBy...
Method but my situation is a bit different:
switch(mySortType)
{
case SortType.mostVisits:
clients = clients.OrderBy(x => x. and so on...)
break;
More cases with different sortTypes
}
And after that I would like to sort after a Boolean parameter, but only if the settings tell the method to do so via another bool.
Ist there a way to say something like
clients = clients.OrderBy(Current).ThenBy(x => x.param).ToList()?
I could of course have an if() in every case
case SortType.MostVisits:
if(settingsTellsMeToDoSo)
clients = clients.OrderBy(x=>x.numberOfVisits).ThenBy(x => x.param).ToList();
else clients = clients.OrderBy(x => x.numberOfVisits).ToList();
and then use the simple ThenBy Method but I thaught that there has to be an easier way to do it.
I hope you got my question right. I am not sure whether or not I was able to explain well enough...
regards,
Eric
Since the problem is to decide which lambda to put into OrderBy (as opposed to the rest of the lambdas, which go into ThenBy) you can use a little trick, and put all your lambdas (or no lambdas if no sorting is to be performed) into ThenBy. Since ThenBy is an extension method on IOrderedEnumerable<T>, you need a way to make a regular enumerable into an ordered enumerable without disturbing the original order.
You can make an IOrderedEnumerable<T> from your IEnumerable<T> (or IOrderedQueryable<T> from IQueryable<T>, depending on your situation) by applying a dummy sort on a constant, like this:
IOrderedEnumerable<Client> orderdClients = list.OrderBy(c => 1);
Now you can apply ThenBy repeatedly as needed. This approach is similar in nature to making a dummy search condition when dealing with a chain of AND operations which may contain zero elements.
I am not sure what you are trying to simplify, you can't remove the decision making from your code, but you could order the sort statements better:
IOrderedEnumerable<Client> orderedClients;
case SortType.MostVisits:
orderedClients = clients.OrderBy(x => x.numberOfVisits);
if(settingsTellsMeToDoSo)
orderedClients = orderedClients.ThenBy(x => x.param);
break;
then after the switch is ended:
clients = orderedClients.ToList();

Lazy evaluation of expression in Elixir

I'm trying to figure out if there is a macro similar to delay in clojure to get a lazy expression/ variable that can be evaluated later.
The use case is a default value for Map.get/3, since the default value comes from a database call, I'd prefer it to be called only when it's needed.
Elixir's macro could be used for writing simple wrapper function for conditional evaluation. I've put one gist in the following, though it may be better/smarter way.
https://gist.github.com/parroty/98a68f2e8a735434bd60
"Generic" laziness is a bit of a tough nut to crack because it's a fairly broad question. Streams allow laziness for enumerables but I'm not sure what laziness for an expression would mean. For example what would a lazy form of x = 1 + 2 be? When would it be evaluated?
The thought that comes to mind for a lazy form of an expression is a procedure expression:
def x, do: 1 + 2
Because the value of x wouldn't be calculated until the expression is actually invoked (as far as I know). I'm sure others will correct me if I'm wrong on that point. But I don't think that's what you want.
Maybe you want to rephrase your question--leaving out streams and lazy evaluation of enumerated values.
One way to do this would be using processes. For example the map could be wrapped in a process like a GenServer or an Agent where the default value will be evaluated lazy.
The default value can be a function which makes the expensive call. If Map.get/3 isn't being used to return functions you can check if the value is a function and invoke it if it is returned. Like so:
def default_value()
expensive_db_call()
end
def get_something(dict, key) do
case Map.get(dict, key, default_value) do
value when is_fun(value) ->
value.() # invoke the default function and return the result of the call
value ->
value # key must have existed, return value
end
end
Of course if the map contains functions this type of solution probably won't work.
Also check Elixir's Stream module. While I don't know that it would help solve your particular problem it does allow for lazy evaluation. From the documentation:
Streams are composable, lazy enumerables. Any enumerable that generates items one by one during enumeration is called a stream. For example, Elixir’s Range is a stream:
More information is available in the Stream documentation.
Map.get_lazy and Keyword.get_lazy hold off on generating the default until needed, links the documentation below
https://hexdocs.pm/elixir/Map.html#get_lazy/3
https://hexdocs.pm/elixir/Keyword.html#get_lazy/3
You can wrap it in an anonymous function, then it will be evaluated when the function is called:
iex()> lazy = fn -> :os.list_env_vars() end
#Function<45.79398840/0 in :erl_eval.expr/5>
iex()> lazy.()

Compare two bson_t in C / C++

I need to compare two bson_t. I found that two bson_t s may have different sequence of key-value pairs. for example {"key1": "val1", "key2" : "val2"} and {"key2": "val2", "key1" : "val1"}. But they are the same in my project. bson_compare() and bson_equal() will return false in this case. How to solve this problem in C/C++?
By the way, how to sort these key-value pairs in C or C++?
Thanks
bson_compare and bson_equal check if two content buffers are equal (not only values # two buffers (or memory locations)). It uses memcmp internally to compare two objects. Hence, x==y does not imply that memcmp(x,y)==0.
Two methods:
(1) It is easy to do this in Python. Write a python function. And call this python function from C++ program.
(2) Using bson_iter_t to iterate each key-value pair in bson_t and do comparison recursively.
The second method seems more complex. But I decided to use it. Now, I already finished part of the method.