Mocking consecutive function calls in Elixir with Mock or Mox - unit-testing

I'm trying to mock function multiple calls, so it returns every time specific different value. I am not that familiar with Elixir and functional concepts.
defmodule Roller do
def roll do
1..10
|>Enum.random()
end
end
Roller returns a random number every call.
defmodule VolunteerFinder do
import Roller
def find(list) do
find(list, []);
end
defp find([] = _list, result) do
result
end
defp find([head | tail] = _list, result) do
result = [%{name: head, score: Roller.roll()} | result]
find(tail, result)
end
end
So assuming the list contains more than one element, the roller is called 2 times. In my test, I need to control it somehow.
I tried it with Mock. I would like to do something like this in the simplest possible way. It would be great not to have to save some state anywhere or run separate processes for every call. I know Elixir thinking might be a little different than the objective paradigm mindset I have. What's the most correct Elixir way of testing the VolunteerFinder module?
defmodule VolunteerFinderTest do
use ExUnit.Case
import Mock
import Roller
test_with_mock(
"Find volunteer for list with one element",
Roller,
[roll: fn() -> 5 end]
) do
assert VolunteerFinder.find(["John"]) == [%{name: "John", score: 5}]
end
test_with_mock(
"Find volunteer for list with two elements",
Roller,
[roll: fn
() -> 2
() -> 5
end]
) do
assert VolunteerFinder.find(["John", "Andrew"])
== [%{name: "Andrew", score: 5}, %{name: "John", score: 2}]
end
end

I found a working solution, but I am not sure if I'm satisfied with it:
test_with_mock(
"Find volunteer for list with two elements",
Roller,
[roll: fn () -> mock_roll end]
) do
send self(), {:mock_return, 1}
send self(), {:mock_return, 5}
assert VolunteerFinder.find(["Sergey", "Borys"])
== [%{name: "Borys", score: 5}, %{name: "Sergey", score: 1}]
end
def mock_roll do
receive do
{:mock_return, value} ->
value
after
0 ->
raise "No mocked returns received"
end
end
Is there any more elegant way of solving the problem?

Related

Find maximum w.r.t. substring within each group of formatted strings

I am struggling to find solution for a scenario. I have few files in a directory. lets say
vbBaselIIIData_201802_3_d.data.20180405.txt.gz
vbBaselIIIData_201802_4_d.data.20180405.txt.gz
vbBaselIIIData_201803_4_d.data.20180405.txt.gz
vbBaselIIIData_201803_5_d.data.20180405.txt.gz
Here suppose the single digit number after the second underscore is called runnumber. I have to pick only files with latest runnumber. so in this case I need to pick only two out of the four files and put it in a mutable scala list. The ListBuffer should contain :
vbBaselIIIData_201802_4_d.data.20180405.txt.gz
vbBaselIIIData_201803_5_d.data.20180405.txt.gz
Can anybody suggest me how to implement this. I am using Scala, but only algorithm is also appreciated. What could be the right sets of datastructure we can use? What are the functions we need to implement? Any suggestions.
Here is a hopefully somewhat inspiring proposal that demonstrates a whole bunch of different language features and useful methods on collections:
val list = List(
"vbBaselIIIData_201802_3_d.data.20180405.txt.gz",
"vbBaselIIIData_201802_4_d.data.20180405.txt.gz",
"vbBaselIIIData_201803_4_d.data.20180405.txt.gz",
"vbBaselIIIData_201803_5_d.data.20180405.txt.gz"
)
val P = """[^_]+_(\d+)_(\d+)_.*""".r
val latest = list
.map { str => {val P(id, run) = str; (str, id, run.toInt) }}
.groupBy(_._2) // group by id
.mapValues(_.maxBy(_._3)._1) // find the last run for each id
.values // throw away the id
.toList
.sorted // restore ordering, mostly for cosmetic purposes
latest foreach println
Brief explanation of the not-entirely-trivial parts that you might have missed when reading an introduction to Scala:
"regex pattern".r converts a string into a compiled regex pattern
A block { stmt1 ; stmt2 ; stmt3 ; ... ; stmtN; result } evaluates to the last expression result
Extractor syntax can be used for compiled regex patterns
val P(id, run) = str matches the second and third _-separated values
_.maxBy(_._3)._1 finds the triple with highest run number, then extracts the first component str again
Output:
vbBaselIIIData_201802_4_d.data.20180405.txt.gz
vbBaselIIIData_201803_5_d.data.20180405.txt.gz
It's not clear what performance needs you have, even though you're mentioning an 'algorithm'.
Provided you don't have more specific needs, something like this is easy to do with Scala's Collection API. Even if you were dealing with huge directories, you could probably achieve some good performance characteristics by moving to Streams (at least in memory usage).
So assuming you have a function like def getFilesFromDir(path: String): List[String] where the List[String] is a list of filenames, you need to do the following:
Group files by date (List[String] => Map[String, List[String]]
Extract the Runnumbers, preserving the original filename (List[String] => List[(String, Int)])
Select the max Runnumber (List[(String, Int)] => (String, Int))
Map to just the filename ((String, Int) => String)
Select just the values of the resulting Map (Map[Date, String] => String)
(Note: if you want to go the pure functional route, you'll want a function something like def getFilesFromDir(path: String): IO[List[String]])
With Scala's Collections API you can achieve the above with something like this:
def extractDate(fileName: String): String = ???
def extractRunnumber(fileName: String): String = ???
def getLatestRunnumbersFromDir(path: String): List[String] =
getFilesFromDir(path)
.groupBy(extractDate) // List[String] => Map[String, List[String]]
.mapValues(selectMaxRunnumber) // Map[String, List[String]] => Map[String, String]
.values // Map[String, String] => List[String]
def selectMaxRunnumber(fileNames: List[String]): String =
fileNames.map(f => f -> extractRunnumber(f))
.maxBy(p => p._2)
._1
I've left the extractDate and extractRunnumber implementations blank. These can be done using simple regular expressions — let me know if you're having trouble with that.
If you have the file-names as a list, like:
val list = List("vbBaselIIIData_201802_3_d.data.20180405.txt.gz"
, "vbBaselIIIData_201802_4_d.data.20180405.txt.gz"
, "vbBaselIIIData_201803_4_d.data.20180405.txt.gz"
, "vbBaselIIIData_201803_5_d.data.20180405.txt.gz")
Then you can do:
list.map{f =>
val s = f.split("_").toList
(s(1), f)
}.groupBy(_._1)
.map(_._2.max)
.values
This returns:
MapLike.DefaultValuesIterable(vbBaselIIIData_201803_5_d.data.20180405.txt.gz, vbBaselIIIData_201802_4_d.data.20180405.txt.gz)
as you wanted.

Outputting a list of lists in Haskell?

I am a complete beginner to Haskell but I'm being asked to create a sudoku solver. I've been making some steady progress with it but one of the things it is asking me to do is print a valid representation of a sudoku puzzle s. The Puzzle data type is defined as a list of lists, so [[Maybe Int]] and this is composed of Block values ([Maybe Int], representing a row).
Function signature is this:
printPuzzle :: Puzzle -> IO ()
How do I output this? I know this may be a simple question and I'm missing the point but I'm still not at the stage where I've got my ahead around the syntax yet. Any help would be much appreciated!
Simple pretty-printing of this can be done really succinctly with something like the following:
import Data.Char (intToDigit)
showRow :: [Maybe Int] -> String
showRow = map (maybe ' ' intToDigit)
showPuzzle :: [[Maybe Int]] -> [String]
showPuzzle = map showRow
printPuzzle :: [[Maybe Int]] -> IO ()
printPuzzle = mapM_ putStrLn . showPuzzle
showRow takes a single row from your grid and prints it - using the maybe function from Data.Maybe, we can write this as a quick map from each Maybe Int value to either a default "blank space" value or the character representing the number (using intToDigit).
showPuzzle simply maps showRow over the outer list.
printPuzzle just uses the previous pure definitions to give the impure action which prints a grid, by putStrLn'ing the pretty-print of each row.
A quick demo:
> printPuzzle [[Just 1, Nothing, Just 3],
[Nothing, Just 3, Just 6],
[Just 2, Just 4, Just 5]]
1 3
36
245
Though you can easily modify the above code to print something more explicit, like:
1X3
X36
245

How to write recursive procedures in python 2.7 correctly?

I have the following code in python 2.7:
net = {'Freda': [['Olive', 'John', 'Debra'], ['Starfleet Commander', ' Ninja Hamsters', ' Seahorse Adventures']], 'Ollie': [['Mercedes', 'Freda', 'Bryant'], ['Call of Arms', ' Dwarves and Swords', ' The Movie: The Game']], 'Debra': [['Walter', 'Levi', 'Jennie', 'Robin'], ['Seven Schemers', ' Pirates in Java Island', ' Dwarves and Swords']]}
def get_secondary_connections(network, person):
if person in network:
for person in network:
connections = network[person][0]
result = connections
for connection in connections:
result = result + get_secondary_connections(network, connection)
return result
return None
print get_secondary_connections(net, 'Fred')
When I execute it gives the following error:
result = result + get_secondary_connections(network, connection)
RuntimeError: maximum recursion depth exceeded
Please tell me where I went wrong.
First, pay attention to the semantics: use lists [x, y, z] for a collection which you're intending to loop through; use tuples (x, y, z) for a fixed-length collection which you're intending to index into, which is not big enough to become its own class. So you should have
net = {
'Freda': (['Olive', 'John', 'Debra'],
['Starfleet Commander', ' Ninja Hamsters', ' Seahorse Adventures']),
'Ollie': (['Mercedes', 'Freda', 'Bryant'],
['Call of Arms', ' Dwarves and Swords', ' The Movie: The Game']),
'Debra': (['Walter', 'Levi', 'Jennie', 'Robin'],
['Seven Schemers', ' Pirates in Java Island', ' Dwarves and Swords'])
}
Second, when doing a recursion problem, step through what you want to happen with some examples first, as if you were the machine. Your first task when processing Freda is to load her connections Olive, John, Debra. What do you want to do with each of these? Well you're going to try to load Olive's connections, fail, try to load John's connections, fail, then try to load Debra's connections, and then you'll have Walter, Levi, Jennie, Robin. What do you want to do with this list? Return it? Concatenate with anyone else's friends? There's nothing "recursive" about the secondary connections, at least not as one would normally think. It would in other words seem to match the name of the function to define:
def get_secondary_connections(network, person):
primary_connections, movies = network.get(person, ([], []))
out = []
for friend in primary_connections:
secondary_connections, movies = network.get(person, ([], []))
out.extend(secondary_connections)
return out
No recursion needed. When do we need recursion? Well, if we wanted to find everyone who this person is connected to by a friend or a friend-of-a-friend or a friend-of-a-friend-of-a-friend, then we need to explore the whole graph and recursion might be helpful. Of course we might also want this thing to not contain duplicates, so we might want to use a set rather than a list.
A first stab might be:
def get_all_connections(network, person):
primary_connections, movies = network.get(person, ([], []))
out = set(primary_connections) # copy this list into an output set
for friend in primary_connections:
out = out.union(get_all_connections(network, person))
return out
And now you will discover that indeed, in the wrong sort of network, this thing will easily exceed a maximum recursion depth. A simple network that does this:
net = {
'Alice': (['Bob'], []),
'Bob': (['Alice'], [])
}
Why does this happen? Because to find all of Alice's friends you need to find all of Bob's friends, but to find all of Bob's friends you need to find all of Alice's friends. So how do we get past this?
We can ask for all of Bob's friends excluding those that come through Alice. This will need to be a full list otherwise we will just trip up on other cyclic cases, like:
net = {
'Alice': (['Bob'], []),
'Bob': (['Carol', 'Dylan'], []),
'Carol': (['Alice'], []),
'Dylan': (['Bob'], [])
}
note that when we ask for Bob's friends we will recurse on both Carol and Dylan, we need to tell both of them to exclude both Alice and Bob, as already being handled. So that leads to
def get_all_connections(network, person, excluding=()):
if person in excluding:
return set() # we exclude by immediately returning the empty set.
excluding_us_too = excluding + (person,)
primary_connections, movies = network.get(person, ([], []))
out = set(primary_connections)
for friend in primary_connections:
out = out.union(get_all_connections(network, person, excluding_us_too))
return out
This cycle-detection strategy is known by a few different names, but usually the tuple here is called a "path" since when we process any of Dylan's friends it says ('Dylan', 'Bob', 'Alice'), meaning that we got to Dylan's friends by visiting first Alice and then Bob and then Dylan.
Notice that 'Carol' is nowhere in Dylan's path; this can sometimes be a good thing and sometimes not: if Dylan connects to Carol and get_all_connections applied to Carol produces a million results even when excluding Alice and Bob and Dylan, then we can expect both of these to produce the same million results for Dylan, and then when we get to Bob we have to union these two million results into a set of only one million -- that's a lot of unnecessary work!
So another stab would then be to keep a queue of people-to-handle and people-we've-handled. At this point you would not want so much to use recursion, you'd rather use normal looping constructs.
def get_all_connections(network, person):
visited, to_visit, out = set(), set([person]), set()
while len(to_visit) > 0:
visiting = to_visit.pop()
visited.add(visiting)
connections, movies = network.get(visiting, ([], []))
for friend in connections:
out.add(friend)
if friend not in visited:
to_visit.add(friend)
return out
A loop like this can be converted into an efficient recursion, but unfortunately Python does not make this style of recursion efficient, making the exercise academic rather than practical.

Few questions about sml zip function

I am new to sml, now I am trying to define a zip function which takes two lists as a tuple.
here are the code.
I got it working, but I have a few questions
exception Mismatch;
fun zip ([],[]) = []
| zip ((x::xs),(y::ys)) = (x, y)::zip (xs, ys)
| zip (_, _) = raise Mismatch;
Can I define the exception inside zip function, like let in end, I tried, but always get error.
Another question is for the second pattern matching, I wrote
zip ([x::xs],[y::ys]) = (x, y)::zip (xs, ys)
also gave me error.
Zip take a tuple, but each of the element is list, why can't I use [x::xs] just like other list?
last question, in pattern matching, does the order matter? I think it is, I change the order and got error, just want to make sure
Thanks
You should never define an exception inside of a let ... in ... end*. It makes it impossible to catch it by name outside of the let-expression.
*: It's okay if you don't plan on letting it escape from the let-expression, but you do plan on doing that here.
As for your other question:
When you write [...], the SML compiler understands it as "The list containing ...."
E.g, [1] is the list containing 1, [4, 6, 2] is the list containing 4, 6 and 2, and so on.
When you write x :: xs, the SML compiler understands it as "The list starting with x, followed by the list xs."
E.g. 1 :: [] is the list starting with a 1, followed by the empty list, and 4 :: [6, 2] is the list starting with a 4, followed by 6 and 2, and so on.
Now, when you write [x :: xs], you're doing a combination of the two, and SML understands it as: "The list containing the list starting with x, followed by xs."
So, by writing [...] rather than (...), you're asking for a list within another list. This is not what you want.
For your final question: Yes, order matters. Patterns are checked in top-down order. Therefore,
fun foo _ = 4
| foo 4 = 5
will always return 4, whereas
fun foo 4 = 5
| foo _ = 4
will return 5 when given a 4.

Spark FlatMap function for huge lists

I have a very basic question. Spark's flatMap function allows you the emit 0,1 or more outputs per input. So the (lambda) function you feed to flatMap should return a list.
My question is: what happens if this list is too large for your memory to handle!?
I haven't currently implemented this, the question should be resolved before I rewrite my MapReduce software which could easily deal with this by putting context.write() anywhere in my algorithm I wanted to. (the output of a single mapper could easily lots of gigabytes.
In case you're interested: a mappers does some sort of a word count, but in fact in generates all possible substrings, together with a wide range of regex expressions matching with the text. (bioinformatics use case)
So the (lambda) function you feed to flatMap should return a list.
No, it doesn't have to return list. In practice you can easily use a lazy sequence. It is probably easier to spot when take a look at the Scala RDD.flatMap signature:
flatMap[U](f: (T) ⇒ TraversableOnce[U])
Since subclasses of TraversableOnce include SeqView or Stream you can use a lazy sequence instead of a List. For example:
val rdd = sc.parallelize("foo" :: "bar" :: Nil)
rdd.flatMap {x => (1 to 1000000000).view.map {
_ => (x, scala.util.Random.nextLong)
}}
Since you've mentioned lambda function I assume you're using PySpark. The simplest thing you can do is to return a generator instead of list:
import numpy as np
rdd = sc.parallelize(["foo", "bar"])
rdd.flatMap(lambda x: ((x, np.random.randint(1000)) for _ in xrange(100000000)))
Since RDDs are lazily evaluated it is even possible to return an infinite sequence from the flatMap. Using a little bit of toolz power:
from toolz.itertoolz import iterate
def inc(x):
return x + 1
rdd.flatMap(lambda x: ((i, x) for i in iterate(inc, 0))).take(1)