Compare elements of 2 maps - Am I doing this right? - concurrency

just started working with Go over the weekend and I'm unsure whether I used Go's peculiarities right, or whether I haven't done this "Go-like" at all.
The code is supposed to iterate over elements of a map called non_placed_alleles and compare each of these with all elements in placed_alleles, which are stored in a map as well. I'm trying to use one go-routine for each of the elements in non_placed_alleles as the comparison is quite costly and takes forever.
Here's a bit from the main-function:
runtime.GOMAXPROCS(8) // For 8 consecutive routines at once? got 10 CPUs
c := make(chan string)
for name, alleles := range non_placed_alleles {
go get_best_places(name, alleles, &placed_alleles, c)
// pointer to placed_alleles as we only read, not write - should be safe?
}
for channel_item := range c {
fmt.Println("This came back ", channel_item)
}
// This also crashes with "all goroutines are sleeping",
// but all results are printed
And here's the called function:
func get_best_places(name string, alleles []string, placed_alleles *map[string] []string, c chan string) {
var best_partner string
// Iterate over all elements of placed_alleles, find best "partner"
for other_key, other_value := range *placed_alleles {
best_partner := compare_magic() // omitted because boring
}
c <- best_partner
}
Is there any way to make this "better"? Faster? Have I used the pointer-magic and goroutines correctly?

Some observations:
You probably want to use a bufered channel
c := make(chan string, someNumber) // someNumber >= goroutines cca
The deadlock stems from ranging a channel which no one closes (that's how the range statement terminates). Machinery to coordinate stuff in Go varies per task. Some would perhaps use in this case a sync.WaitGroup. Please check thoroughly the example code located bellow the preceding link for hints about proper use of WaitGroup. An alternative is some/any other kind/way of counting the started vs finished workers (which WaitGroup encapsulates, but I tend to see it as an overkill).
No need to pass pointers to map in this case. Maps in Go have full reference semantics (it's just a pointer under the hood anyway).

Related

Does Golang have something like C++'s decltype?

C++ has decltype(expr). You can declare an object of type of some other expression. For example:
decltype('c') a[4] will declare an array of 4 chars. This is a toy example, but this feature can be useful. Here is some Go code for a UDP server:
conn, err := net.ListenUDP("udp", udp_addr)
...
defer conn.Close()
...
_, err = conn.WriteToUDP(data, addr)
The important thing is that I know what I can do with the (type of the) result a function (in this case, with a connection, the result of ListenUDP), but I don't know what this type is. Here, because of Go's type inference, I don't need to know. But if I want to create 5 connections, then I'd like an array of 5 "results of ListenUDP". I am not able to do that. The closest I've got is:
ret_type := reflect.TypeOf(net.DialUDP)
first_param_type := reflect.TypeOf(ret_type.Out(0))
my_arr := reflect.ArrayOf(4, first_param_type)
my_arr[0] = nil
But the last line doesn't work. Is there a way to do it in Go?
Go does not have a compile-time equivalent to C++'s decltype.
But Go is a statically typed language: even though there's type inference in case of short variable declaration, the types are known at compile time. The result type(s) of net.ListenUDP() are not visible in the source code, but you can look it up just as easily, e.g. it takes 2 seconds to hover over it with your mouse and your IDE will display the signature. Or check online. Or run go doc net.ListenUDP in a terminal.
Signature of net.ListenUDP() is:
func ListenUDP(network string, laddr *UDPAddr) (*UDPConn, error)
So the array type to hold 5 of the returned connections is [5]*net.UDPConn. Also note that it's better and easier to use slices instead of arrays in Go.
So instead I suggest to use a slice type []*net.UDPConn. If you need a slice to hold 5 connections, you can make it using the builtin make() like this: make([]*net.UDPConn, 5).
If you really need to do this dynamically, at runtime, yes, reflection can do this. This is how it could look like:
funcType := reflect.TypeOf(net.ListenUDP)
resultType := funcType.Out(0)
arrType := reflect.ArrayOf(4, resultType)
arrValue := reflect.New(arrType).Elem()
conn := &net.UDPConn{}
arrValue.Index(0).Set(reflect.ValueOf(conn))
fmt.Println(arrValue)
This will output (try it on the Go Playground):
[0xc00000e058 <nil> <nil> <nil>]
See related: Declare mutiple variables on the same line with types in Go

Duplicate values in Julia with Function

I need writing a function which takes as input
a = [12,39,48,36]
and produces as output
b=[4,4,4,13,13,13,16,16,16,12,12,12]
where the idea is to repeat one element three times or two times (this should be variable) and divided by 2 or 3.
I tried doing this:
c=[12,39,48,36]
a=size(c)
for i in a
repeat(c[i]/3,3)
end
You need to vectorize the division operator with a dot ..
Additionally I understand that you want results to be Int - you can vectorizing casting to Int too:
repeat(Int.(a./3), inner=3)
Przemyslaw's answer, repeat(Int.(a./3), inner=3), is excellent and is how you should write your code for conciseness and clarity. Let me in this answer analyze your attempted solution and offer a revised solution which preserves your intent. (I find that this is often useful for educational purposes).
Your code is:
c = [12,39,48,36]
a = size(c)
for i in a
repeat(c[i]/3, 3)
end
The immediate fix is:
c = [12,39,48,36]
output = Int[]
for x in c
append!(output, fill(x/3, 3))
end
Here are the changes I made:
You need an array to actually store the output. The repeat function, which you use in your loop, would produce a result, but this result would be thrown away! Instead, we define an initially empty output = Int[] and then append! each repeated block.
Your for loop specification is iterating over a size tuple (4,), which generates just a single number 4. (Probably, you misunderstand the purpose of the size function: it is primarily useful for multidimensional arrays.) To fix it, you could do a = 1:length(c) instead of a = size(c). But you don't actually need the index i, you only require the elements x of c directly, so we can simplify the loop to just for x in c.
Finally, repeat is designed for arrays. It does not work for a single scalar (this is probably the error you are seeing); you can use the more appropriate fill(scalar, n) to get [scalar, ..., scalar].

How to detect list changes without comparing the complete list

I have a function which will fail if there has being any change on the term/list it is using since the generation of this term/list. I would like to avoid to check that each parameter still the same. So I had thought about each time I generate the term/list to perform a CRC or something similar. Before making use of it I would generate again the CRC so I can be 99,9999% sure the term/list still the same.
Going to a specfic answer, I am programming in Erlang, I am thinking on using a function of the following type:
-spec(list_crc32(List :: [term()]) -> CRC32 :: integer()).
I use term, because it is a list of terms, (erlang has already a default fast CRC libraries but for binary values). I have consider to use "erlang:crc32(term_to_binary(Term))", but not sure if there could be a better approach.
What do you think?
Regards, Borja.
Without more context it is a little bit difficult to understand why you would have this problem, particularly since Erlang terms are immutable -- once assigned no other operation can change the value of a variable, not even in the same function.
So if your question is "How do I quickly assert that true = A == A?" then consider this code:
A = generate_list()
% other things in this function happen
A = A.
The above snippet will always assert that A is still A, because it is not possible to change A like you might do in, say, Python.
If your question is "How do I assert that the value of a new list generated exactly the same value as a different known list?" then using either matching or an actual assertion is the fastest way:
start() ->
A = generate_list(),
assert_loop(A).
assert_loop(A) ->
ok = do_stuff(),
A = generate_list(),
assert_loop(A).
The assert_loop/1 function above is forcing an assertion that the output of generate_list/0 is still exactly A. There is no telling what other things in the system might be happening which may have affected the result of that function, but the line A = generate_list() will crash if the list returned is not exactly the same value as A.
In fact, there is no way to change the A in this example, no matter how many times we execute assert_loop/1 above.
Now consider a different style:
compare_loop(A) ->
ok = do_stuff(),
case A =:= generate_list() of
true -> compare_loop(A);
false -> terminate_gracefully()
end.
Here we have given ourselves the option to do something other than crash, but the effect is ultimately the same, as the =:= is not merely a test of equality, it is a match test meaning that the two do not evaluate to the same values, but that they actually match.
Consider:
1> 1 == 1.0.
true
2> 1 =:= 1.0.
false
The fastest way to compare two terms will depend partly on the sizes of the lists involved but especially on whether or not you expect the assertion to pass or fail more often.
If the check is expected to fail more often then the fastest check is to use an assertion with =, an equivalence test with == or a match test with =:= instead of using erlang:phash2/1. Why? Because these tests can return false as soon as a non-matching element is encountered -- and if this non-match occurs near the beginning of the list then a full traverse of both lists is avoided entirely.
If the check is expected to pass more often then something like erlang:phash2/1 will be faster, but only if the lists are long, because only one list will be fully traversed each iteration (the hash of the original list is already stored). It is possible, though, on a short list that a simple comparison will still be faster than computing a hash, storing it, computing another hash, and then comparing the hashes (obviously). So, as always, benchmark.
A phash2 version could look like:
start() ->
A = generate_list(),
Hash = erlang:phash2(A),
assert_loop(Hash).
assert_loop(Hash) ->
ok = do_stuff(),
Hash = erlang:phash2(generate_list()),
loop(Hash).
Again, this is an assertive loop that will crash instead of exit cleanly, so it would need to be adapted to your needs.
The basic mystery still remains, though: in a language with immutable variables why is it that you don't know whether something will have changed? This is almost certainly a symptom of an underlying architectural problem elsewhere in the program -- either that or simply a misunderstanding of immutability in Erlang.

unit testing accuracy of function composition

I'm writing tests for an object that takes in an input, composes some functions together, runs the input through the composed function, and returns the result.
Here's a greatly-simplified set of objects and functions that mirrors my design:
type Result =
| Success of string
let internal add5 x = x + 5
let internal mapResult number =
Success (number.ToString())
type public InteropGuy internal (add, map) =
member this.Add5AndMap number =
number |> (add >> map)
type InteropGuyFactory() =
member this.CreateInteropGuy () =
new InteropGuy(add5, mapResult)
The class is designed to be used for C# interop which explains the structure, but this problem still can apply to any function under test that composes function parameters.
I'm having trouble finding an elegant way to keep the implementation details of the internal functions from creeping in to the test conditions when testing the composing function, or in other words, isolating one link in the chain instead of inspecting the output once the input is piped entirely through. If I simply inspect the output then tests for each function are going to be dependent on downstream functions working properly, and if the one at the end of the chain stops working, all of the tests will fail. The best I've been able to do is stub out a function to return a certain value, then stub out its downstream function, storing the input of the downstream function and then asserting the stored value is equal to the output of the stubbed function:
[<TestClass>]
type InteropGuyTests() =
[<TestMethod>]
member this.``Add5AndMap passes add5 result into map function``() =
let add5 _ = 13
let tempResult = ref 0
let mapResult result =
tempResult := result
Success "unused result"
let guy = new InteropGuy(add5, mapResult)
guy.Add5AndMap 8 |> ignore
Assert.AreEqual(13, !tempResult)
Is there a better way to do this or is this generally how to test composition in isolation? Design comments also appreciated.
The first question we should ask when encountering something like this is: why do we want to test this piece of code?
When the potential System Under Test (SUT) is literally a single statement, then which value does the test add?
AFAICT, there's only two ways to test a one-liner.
Triangulation
Duplication of implementation
Both are possible, but comes with drawbacks, so I think it's worth asking if such a method/function should be tested at all.
Still, assuming that you want to test the function, e.g. to prevent regressions, you can use either of these options.
Triangulation
With triangulation, you simply throw enough example values at the SUT to demonstrate that it works as the black box it's supposed to be:
open Xunit
open Swensen.Unquote
[<Theory>]
[<InlineData(0, "5")>]
[<InlineData(1, "6")>]
[<InlineData(42, "47")>]
[<InlineData(1337, "1342")>]
let ``Add5AndMap returns expected result`` (number : int, expected : string) =
let actual = InteropGuyFactory().CreateInteropGuy().Add5AndMap number
Success expected =! actual
The advantage of this example is that it treats the SUT as a black box, but the disadvantage is that it doesn't demonstrate that the SUT is a result of any particular composition.
Duplication of implementation
You can use Property-Based Testing to demonstrate (or, at least make very likely) that the SUT is composed of the desired functions, but it requires duplicating the implementation.
Since the functions are assumed to be referentially transparent, you can simply throw enough example values at both the composition and the SUT, and verify that they return the same value:
open FsCheck.Xunit
open Swensen.Unquote
[<Property>]
let ``Add5AndMap returns composed result`` (number : int) =
let actual = InteropGuyFactory().CreateInteropGuy().Add5AndMap number
let expected = number |> add5 |> mapResult
expected =! actual
Is it ever interesting to duplicate the implementation in the test?
Often, it's not, but if the purpose of the test is to prevent regressions, it may be worthwhile as a sort of double-entry bookkeeping.

Algorithm for Deductions

Here is the problem.
If we have two statements
p=>q and q=>r, it also implies that p=>r.
Given a set of statements I need to find whether a given statement is true or false or cannot be concluded from the given statements.
Example:
Given statements p=>q, p=>r, q=>s
if the input is p=>s I should get the output true
if the input is p=>t I should get the output Cannot be concluded
if the input is p=> ~p I should get the output false
Here my question is what is the best data structure to implement this and what is the algorithm to use.
Thanks.
So, I'm still not _entirely clear on what you're trying to do. At the risk of being down-voted, I'm going to kick this out there and see what people think.
I might start by building a graph. Each entity (p, q, etc.) has its own node. "Implies" mean you draw a line between two nodes. Any input, then, is just a matter of seeing if you can find a way to traverse the graph--so in your example, a => b, b => c, the graph has three nodes, a connected to b, b connected to c. The fact that a path exists between a and c means that a implies c.
I haven't vetted this idea any further, but it seems like an interesting prospect. In particular because graph theory is cool, and lots of people are interested in it (i.e., Facebook execs). AND there are good modules in Python for analyzing graphs. (I assume the same is also true for C++. And you can always spec it out by hand using Gephi: https://gephi.org/)
A lot of people have studied this problem many years. What you need is a SAT Solver. Lookup Chaff or zChaff or any other commonly used SAT Solver. You want to take your clauses like (p->q && q->r) -> (p -> r) and negate them and determine if that is satisfiable. If the negation is not satisfiable, then you have a theorem, something that is always true. If the original clauses are satisfiable and the negation of the clauses are satisfiable, the you should return "cannot be concluded". And if the original clauses are not satisfiable then you have something that is false.
This is actually a well studied problem. There are good algorithms, but there is a hard limit on how many propositional variables you can handle. SAT is at the heart of NP hard problems. A class of problems for which efficient algorithms are not known.
I think that given the simplicity of your problem, you could get away with using a simple map. The main advantage over a vector being in the much faster look-up.
// For "p": { name: "p", positive: "true" }
// For "~q": { name: "q", positive: "false" }
struct Predicate {
std::string _name;
bool _positive;
};
using PredicateSetType = std::unordered_set<Predicate>;
using PredicateMapType = std::unordered_map<Predicate, PredicateSetType>;
You use the map in the following manner: when given p => q, you insert { "q", true } in the set of predicates associated to { "p", true }.
Note that this actually encodes a directed graph, so the typical methods of exploring a graph apply when it comes to proving a statement.