is there a hashing algorithm that satisfies the following?
let "hash_funct" be a hashing function that takes two args, and returns a hash value. so all the following will be true
Hash1 = hash_funct(arg1, arg2) <=> hash_funct(Hash1, arg1) = hash_funct(Hash1, arg2) = Hash1;
Can anyone point me to this Algorithm? or if it doesn't exist, can anyone collaborate with me to invent it?
more explanation:
imagine a set S={A,B,C,D}, and the Hashing function above.
if we can make: Hash1 = hash_funct(A,B,C,D), then we can check if an element X is in the set by checking the hash result of hash_funct(Hash1,X) == Hash1 ? belogns to the set : doesn't belong
with this property we make checking the exisitance of an element in a set O(1) instead of O(NlogN)
I suppose Highest common factor(Hcf) will fit right here. Let a and b be two numbers with x as their highest common factor.
hcf(a,b) = x.
This means a = x*m and b = x*n. This clearly means that:
hcf(x,x*m) = hcf(x,x*n) = hcf(x*n,x*m) = x
What you are looking for is the Accumulators. Currently, they are very popular with digital coins #youtube
From Wikipedia;
A cryptographic accumulator is a one-way membership function. It answers a query as to whether a potential candidate is a member of a set without revealing the individual members of the set.
For example this paper;
We show how to use the RSA one-way accumulator to realize an efficient and dynamic authenticated dictionary, where untrusted directories provide cryptographically verifiable answers
to membership queries on a set maintained by a trusted source
With a Straightforward Accumulator-Based Scheme;
Query: When asking for a proof of membership.
Verification: check the validity of the answer.
Updates: Insertion and deletions
are available.
Related
A nice feature in Chapel is that it distinguishes between the domain of an array and its distribution. What is the best way to check that two arrays have the same domain and distribution (which one often wants)?
The best I can see is to check D1==D2 and D1.dist==D2.dist, if D1 and D2 are both domains.
In particular, consider the following code :
const Dom = {1..5, 1..5};
const BDom = newBlockDom(Dom);
var x : [Dom] int;
var y : [BDom] int;
test(x,y);
proc test(a : [?Dom1] int, b : [Dom1] int) {
}
This compiles and runs just fine, which makes sense if the query syntax in the function declaration just tests for domain equality, but not for distribution equality (even though Dom1 also knows about how a is distributed). Is the only way to check for distribution equality in this case is to do a.domain.dist==b.domain.dist?
To check whether two domains describe the same distributed index set in Chapel, you're correct that you'd use D1 == D2 and D1.dist == D2.dist. Domain equality in Chapel checks whether two domains describe the same index set, so is independent of the domain maps / distributions. Similarly, an equality check between two domain maps / distributions checks whether they distribute indices identically.
Note that in Chapel, both domains and distributions have a notion of identity, so if you created two distributed domains as follows:
var BDom1 = newBlockDom(Dom),
BDom2 = newBlockDom(Dom);
they would pass the above equality checks, yet be distinct domain values. In some cases, it might be reasonable to wonder whether two domain expressions refer to the identical domain instance, but I believe there is no official user-facing way to do this in Chapel today. If this is of interest, it would be worth filing a feature request against on our GitHub issues page.
With respect to your code example:
const Dom = {1..5, 1..5};
const BDom = newBlockDom(Dom);
var x : [Dom] int;
var y : [BDom] int;
test(x,y);
proc test(a : [?Dom1] int, b : [Dom1] int) {
}
there is a subtlety going on here that requires some explanation. First, note that if you reverse the arguments to your test() routine, it will not compile, acting perhaps more similar to what you were expecting (TIO):
test(y,x);
The reason for this is that domains which don't have an explicit domain map are treated specially in formal array arguments. Specifically, in defining Chapel, we didn't want to have a formal argument that was declared like X here:
proc foo(X: [1..n] real) { ... }
require that the actual array argument be non-distributed / have the default domain map. In other words, we wanted the user to be able to pass in a Block- or Cyclic-distributed array indexed from 1..n so that the formal was constraining the array's index set but not its distribution. Conversely, if a formal argument's domain is defined in terms of an explicit domain map, like:
proc bar(X: [BDom] int) { ... }
(using your Block-distributed definition of BDom above), it requires the actual array argument to match that domain.
An effect of this is that in your example, since Dom1 was matched to a domain with a default domain map, b is similarly loosely constrained to have the same index set yet with any distribution. Whereas when the first actual argument is block-distributed (as in my call), Dom1 encodes that distribution and applies the constraint to b.
If your reaction to this is that it feels confusing / asymmetric, I'm inclined to agree. I believe we have discussed treating declared/named domains differently from anonymous ones in this regard (since it was the anonymity of the domain in X: [1..n] that we were focused on when adopting this rule, and its application to queried domains like Dom1 in cases like this is something of a side effect of the current implementation). Again, a GitHub issue would be completely fair game for questioning / challenging this behavior.
I have a function which will fail if there has being any change on the term/list it is using since the generation of this term/list. I would like to avoid to check that each parameter still the same. So I had thought about each time I generate the term/list to perform a CRC or something similar. Before making use of it I would generate again the CRC so I can be 99,9999% sure the term/list still the same.
Going to a specfic answer, I am programming in Erlang, I am thinking on using a function of the following type:
-spec(list_crc32(List :: [term()]) -> CRC32 :: integer()).
I use term, because it is a list of terms, (erlang has already a default fast CRC libraries but for binary values). I have consider to use "erlang:crc32(term_to_binary(Term))", but not sure if there could be a better approach.
What do you think?
Regards, Borja.
Without more context it is a little bit difficult to understand why you would have this problem, particularly since Erlang terms are immutable -- once assigned no other operation can change the value of a variable, not even in the same function.
So if your question is "How do I quickly assert that true = A == A?" then consider this code:
A = generate_list()
% other things in this function happen
A = A.
The above snippet will always assert that A is still A, because it is not possible to change A like you might do in, say, Python.
If your question is "How do I assert that the value of a new list generated exactly the same value as a different known list?" then using either matching or an actual assertion is the fastest way:
start() ->
A = generate_list(),
assert_loop(A).
assert_loop(A) ->
ok = do_stuff(),
A = generate_list(),
assert_loop(A).
The assert_loop/1 function above is forcing an assertion that the output of generate_list/0 is still exactly A. There is no telling what other things in the system might be happening which may have affected the result of that function, but the line A = generate_list() will crash if the list returned is not exactly the same value as A.
In fact, there is no way to change the A in this example, no matter how many times we execute assert_loop/1 above.
Now consider a different style:
compare_loop(A) ->
ok = do_stuff(),
case A =:= generate_list() of
true -> compare_loop(A);
false -> terminate_gracefully()
end.
Here we have given ourselves the option to do something other than crash, but the effect is ultimately the same, as the =:= is not merely a test of equality, it is a match test meaning that the two do not evaluate to the same values, but that they actually match.
Consider:
1> 1 == 1.0.
true
2> 1 =:= 1.0.
false
The fastest way to compare two terms will depend partly on the sizes of the lists involved but especially on whether or not you expect the assertion to pass or fail more often.
If the check is expected to fail more often then the fastest check is to use an assertion with =, an equivalence test with == or a match test with =:= instead of using erlang:phash2/1. Why? Because these tests can return false as soon as a non-matching element is encountered -- and if this non-match occurs near the beginning of the list then a full traverse of both lists is avoided entirely.
If the check is expected to pass more often then something like erlang:phash2/1 will be faster, but only if the lists are long, because only one list will be fully traversed each iteration (the hash of the original list is already stored). It is possible, though, on a short list that a simple comparison will still be faster than computing a hash, storing it, computing another hash, and then comparing the hashes (obviously). So, as always, benchmark.
A phash2 version could look like:
start() ->
A = generate_list(),
Hash = erlang:phash2(A),
assert_loop(Hash).
assert_loop(Hash) ->
ok = do_stuff(),
Hash = erlang:phash2(generate_list()),
loop(Hash).
Again, this is an assertive loop that will crash instead of exit cleanly, so it would need to be adapted to your needs.
The basic mystery still remains, though: in a language with immutable variables why is it that you don't know whether something will have changed? This is almost certainly a symptom of an underlying architectural problem elsewhere in the program -- either that or simply a misunderstanding of immutability in Erlang.
With a CouchDB reduce function:
function(keys, values, rereduce) {
// ...
}
That gets called like this:
reduce( [[key1,id1], [key2,id2], [key3,id3]], [value1,value2,value3], false )
Question 1
What is the reason for passing keys to the reduce function? I have only written relatively simple CouchDB views with reduce functions and would like to know what the use case is for receiving a list of [key1, docid], [key2, docid], etc is.
Also. is there ever a time when key1 != key2 != keyX when a reduce function executes?
Question 2
CouchDB's implementation of MapReduce allows for rereduce=true, in which case the reduce function is called like this:
reduce(null, [intermediate1,intermediate2,intermediate3], true)
Where the keys argument is null (unlike when rereduce=false). Why would there not be a use case for a keys argument in this case if there was a use for when rereduce=false?
What is the use case of keys argument when rereduce = true?
There isn't one. That's why the keys argument is null in this case.
From the documentation (emphasis added):
Reduce and Rereduce Functions
redfun(keys, values[, rereduce])
Arguments:
keys – Array of pairs of key-docid for related map function results. Always null if rereduce is running (has true value).
values – Array of map function result values.
rereduce – Boolean flag to indicate a rereduce run.
Perhaps what you're meaning to ask is: Why is the same function used for both reduce and rereduce? I expect there's some history involved, but I can also imagine that it's because it's quite common that the same logic can be used for both functions, and by not having separate function definitions duplication can be reduced. Suppose a simple sum reduce function:
function(keys, values) {
return sum(values);
}
Here both keys and rereduce can be ignored entirely. Many other (re)reduce functions follow the same pattern. If two functions had to be used, then this identical function would have to be specified twice.
In response to the additional question in comments:
what use cases exist for the keys argument when rereduce=false?
Remember, keys and values can be anything, based on the map function. A common pattern is to emit([foo,bar,baz],null). That is to say, the value may be null, if all the data you care about is already present in the key. In such a case, any reduce function more complex than a simple sum would require use of the keys.
Further, for grouping operations, using the keys makes sense. Consider a map function with emit(doc.countryCode, ... ), a possible (incomplete) reduce function:
function(keys, values, rereduce) {
const sums = {};
if (!rereduce) {
keys.forEach((key) => ++sums[key]);
}
return sums;
}
Then given documents:
{"countryCode": "us", ...}
{"countryCode": "us", ...}
{"countryCode": "br", ...}
You'd get emitted values (from the map function) of:
["us", ...]
["br", ...]
You'd a reduced result of:
{"us": 2, "br": 1}
I'd like to know what is the property testing aiming for, what is it's sweet point, where it should be used. Let' have an example function that I want to test:
f :: [Integer] -> [Integer]
This function, f, takes a list of numbers and will square the odd numbers and filter out the even numbers. I can state some properties about the function, like
Given a list of even numbers, return empty list.
Given a list of odd numbers, the result list will have the same size as input.
Given that I have a list of even numbers and a list of odd numbers, when I join them, shuffle and pass to the function, the length of the result will be the length of the list of odd numbers.
Given I provide a list of positive odd numbers, then each element in the result list at the same index will be greater than in the original list
Given I provide a list of odd numbers and even numbers, join and shuffle them, then I will get a list, where each number is odd
etc.
None of the properties test, that the function works for the simplest case, e.g. I can make a simple case, that will pass these properties if I implement the f incorrectly:
f = fmap (+2) . filter odd
So, If I want to cover some simple case, It looks like I either need to repeat a fundamental part of the algorithm in the property specification, or I need to use value based testing. The first option, that I have, to repeat the algorithm may be useful, If I plan to improve the algorithm if I plan to change it's implementation, for speed for example. In this way, I have a referential implementation, that I can use to test again.
If I want to check, that the algorithm doesn't fail for some trivial cases and I don't want to repeat the algorithm in the specification, it looks like I need some unit testing. I would write for example these checks:
f ([2,5]) == [25]
f (-8,-3,11,1) == [9,121,1]
Now I have a lot more confidence it the algorithm.
My question is, is the property based testing meant to replace the unit testing, or is it complementary? Is there some general idea, how to write the properties, so they are useful or it just totally depends on the understanding of the logic of the function? I mean, can one tell that writing the properties in some way is especially beneficial?
Also, should one strive to make the properties test every part of the algorithm? I could put the squaring out of the algorithm, and then test it elsewhere, let the properties test just the filtering part, which it looks like, that it covers it well.
f :: (Integer -> Integer) -> [Integer] -> [Integer]
f g = fmap g . filter odd
And then I can pass just Prelude.id and test the g elsewhere using unit testing.
How about the following properties:
For all odd numbers in the source list, its square is element of the result list.
For all numbers in the result list, there is a number in the source list whose square it is.
By the way, odd is easier to read than \x -> x % 2 == 1
Reference algorithm
It's very common to have a (possibly inefficient) reference implementation and test against that. In fact, that's one of the most common quickcheck strategies when implementing numeric algorithms. But not every part of the algorithm needs one. Sometimes there are some properties that characterize the algorithm completely.
Ingo's comment is spot on in that regard: These properties determine the results of your algorithm (up to order and duplicates). To recover order and duplicates you can modify the properties to include "in the resulting list truncated after the position of the source element" and vice versa in the other property.
Granularity of tests
Of course, given Haskell's composability it's nice to test each reasonable small part of an algorithm by itself. I trust e.g. \x -> x*x and filter odd as reference without looking twice.
Whether there should be properties for each part is not as clear as you might inline that part of the algorithm later and thus make the properties moot. Due to Haskell's laziness that's not a common thing to do, but it happens.
I noticed that Z3 can do allsmt from some paper. In my project, I have to search for deterministic variables in a SMT formula. By deterministic I mean the variable can only take one int value to make the formula satisfiable. Is there a c++/c API function which can do this task?
The best I can do so far is to call the solver.check() function many times for the negation of each variable I am interested in. Is there a faster way to do this by using the API?
Basically, I want to do allsmt and predicate abstraction/projection.
There is no specialized API for checking if all models of a given variable have to agree on the same value. You can implement more or less efficient algorithms on top of Z3 to solve this question.
Here is a possible algorithm:
Get a model M from Z3.
For the variables you are interested in assert: Not (And([(M.eval(x) == x) for x in Vars]))
Recheck satisfiability. If the new state is unsatisfiable, then the remaining variales in Vars must have the same value. Otherwise, remove variables from Vars that evaluate to a new value different from the old M.eval(x), and repeat (2) until Vars is either empty or the context is unsatisfiable.