Erlang : nested cases - if-statement

I'm very new to Erlang. I tried to find out if a list index is out of bounds (before trying it) so i wanted to do an if clause with something like
if lists:flatlength(A) < DestinationIndex ....
I discovered that those function results cannot be used in if guards so i used case instead. This results in a nested case statement
case Destination < 1 of
true -> {ok,NumberOfJumps+1};
false ->
case lists:flatlength(A) < Destination of
true ->
doSomething;
false ->
case lists:member(Destination,VisitedIndices) of
true -> doSomething;
false ->
doSomethingElse
end
end
end.
I found this bad in terms of readability and code style. Is this how you do things like that in erlang or is there a more elegant way to do this?
Thanks in advance

Before you take the following as some magical gospel, please note that the way this function is entered is almost certainly unidiomatic. You should seek to limit cases way before you get to this point -- the need for nested cases is itself usually a code smell. Sometimes it is genuinely unavoidable, but I strongly suspect that some aspects of this can be simplified way earlier in the code (especially with some more thought given to the data structures that are being passed around, and what they mean).
Without seeing where the variable A is coming from, I'm making one up as a parameter here. Also, without seeing how this function is entered I'm making up a function head, because without the rest of the function to go by its pretty hard to say anything for sure.
With all that said, let's refactor this a bit:
First up, we want to get rid of the one thing we know can go into a guard, and that is your first case that checks whether Destination < 1. Instead of using a case, let's consider that we really want to call two different clauses of a common function:
foo(Destination, NumberOfJumps, _, _) when Destination < 1 ->
{ok, NumerOfJumps + 1};
foo(Destination, _, VisitedIndices, A) ->
case lists:flatlength(A) < Destination of
true -> doSomething;
false ->
case lists:member(Destination,VisitedIndices) of
true -> doSomething;
false -> doSomethingElse
end
end.
Not too weird. But those nested cases that remain... something is annoying about them. This is where I suspect something can be done elsewhere to alleviate the choice of paths being taken here much earlier in the code. But let's pretend that you have no control over that stuff. In this situation assignment of booleans and an if can be a readability enhancer:
foo(Destination, NumberOfJumps, _, _) when Destination < 1 ->
{ok, NumberOfJumps + 1};
foo(Destination, _, VisitedIndices, A) ->
ALength = lists:flatlength(A) < Destination,
AMember = lists:member(Destionation, VisitedIncides),
NextOp =
if
ALength -> fun doSomething/0;
AMember -> fun doSomething/0;
not AMember -> fun doSomethingElse/0
end,
NextOp().
Here I have just cut to the chase and made sure we only execute each potentially expensive operation once by assigning the result to a variable -- but this makes me very uncomfortable because I shouldn't be in this situation to begin with.
In any case, something like this should test the same as the previous code, and in the interim may be more readable. But you should be looking for other places to simplify. In particular, this VisitedIndices business feels fishy (why don't we already know if Destination is a member?), the variable A needing to be flattened after we've arrived in this function is odd (why is it not already flattened? why is there so much of it?), and NumberOfJumps feels something like an accumulator, but its presence is mysterious.
What makes me feel weird about these variables, you might ask? The only one that is consistently used is Destination -- the others are only used either in one clause of foo/4 or the other, but not both. That makes me think this should be different paths of execution somewhere further up the chain of execution, instead of all winding up down here in a super-decision-o-matic type function.
EDIT
With a fuller description of the problem in hand (reference the discussion in comments below), consider how this works out:
-module(jump_calc).
-export([start/1]).
start(A) ->
Value = jump_calc(A, length(A), 1, 0, []),
io:format("Jumps: ~p~n", [Value]).
jump_calc(_, Length, Index, Count, _) when Index < 1; Index > Length ->
Count;
jump_calc(Path, Length, Index, Count, Visited) ->
NewIndex = Index + lists:nth(Index, Path),
NewVisited = [Index | Visited],
NewCount = Count + 1,
case lists:member(NewIndex, NewVisited) of
true -> NewCount;
false -> jump_calc(Path, Length, NewIndex, NewCount, NewVisited)
end.
Always try to front-load as much processing as possible instead of performing the same calculation over and over. Consider how readily we can barrier each iteration behind guards, and how much conditional stuff we don't even have to write because of this. Function matching is a powerful tool -- once you get the hang of it you will really start to enjoy Erlang.

Related

Is explicit caching required for List members of a type in F#

My question is probably digging a bit into the question on how smart the F# compiler really is.
I have a type module that scans a configuration file and should then provide a range of IP addresses between a start and an end address.
type IpRange (config: string) =
// Parse the config
member __.StartIp = new MyIp(startIp)
member __.EndIp = new MyIp(endIp)
Now I wanted to add the actual range giving me all IPs so I added
member __.Range =
let result = new List<MyIp>()
let mutable ipRunner = __.StartIp
while ipRunner <= __.EndIp do
result.Add(new MyIp(ipRunner))
ipRunner <- (ipRunner + 1)
result
which works but is not really idiomatic F#.
I then dug into the issue and came up with the following two alternatives
let rec GetIpRangeRec (startIp: MyIp) (endIp: MyIp) (ipList: MyIp list) =
if startIp <= endIp then
GetIpRangeRec (startIp + 1) endIp (ipList#[startIp])
else
ipList
and
let GetIpRangeUnfold (startIp: MyIp) (endIp: MyIp) =
startIp |> Seq.unfold(fun currentIp ->
if (currentIp <= endIp) then
Some(currentIp, currentIp + 1)
else
None)
As far as I have understood from reading up on lists and sequences, none is cached. So all three solutions would re-evalute the code to create a list whenever I try to access an item or enumerate the list.
I could solve this problem by using Seq.cache (and a previous cast to sequence where required) resulting in something like
member __.Range =
GetIpRangeRec startIp endIp []
|> List.toSeq
|> Seq.cache
but is this really necessary?
I have the feeling that I missed something and the F# compiler actually does cache the result without explicitely telling it to.
Lists are (normally at least, I suppose there might be some weird edge case I don't know about) stored directly as their values. Thus, your recursive function would specifically produce a list of MyIps - these would only be re-evaluated if you have done some weird thing where a MyIp is re-evaluated each time it is accessed. As in, when the function returns you'll have a fully evaluated list of MyIps.
There is one slight issue, however, in that your function as implemented isn't particularly efficient. Instead, I would recommend doing it in this slightly alternative way:
let rec GetIpRangeRec (startIp: MyIp) (endIp: MyIp) (ipList: MyIp list) =
if startIp <= endIp then
GetIpRangeRec (startIp + 1) endIp (startIp :: ipList)
else
List.rev ipList
Basically, the issue is that every time you use the # operator to append to the end of a list, the runtime has to walk to the end of the list to do the append. This means that you'll end up iterating over the list a whole bunch of times. Instead, better simply to prepend (i.e. append, but to the front), and then reverse the list just before you return it. This means that you only have to walk the list once, as prepending is always a constant-time operation (you just create a new list entry with a pointer to the previous front of the list).
Actually, since you probably don't want to use a pre-supplied list outside of your function, I would recommend doing it this way instead:
let GetIpRange startIp endIp =
let rec GetIpRangeRec (start: MyIp) (end: MyIp) (ipList: MyIp list) =
if start <= end then
GetIpRangeRec (start + 1) end (start :: ipList)
else
List.rev ipList
GetIpRangeRec startIp endIp List.empty
(note that I haven't tested this, so it may not work totally perfectly). If you do want to be able to pre-supply a starting list, then you can just stick with the first one.
Also, bear in mind that while lists are usually fine for sequential access, they're terrible for random accesses. If you need to be doing random lookups into the list, then I would recommend using a call to List.toArray once you get the complete list back. Probably no need to bother if you'll just be iterating over it sequentially though.
I'll make one more point though: From a total functional programming 'purist's' perspective your first implementation may not be totally 'functional', but the only mutability involved is all hidden away inside the function. That is, you're not mutating anything that is passed in to the function. This is perfectly fine from a functional purity perspective and might be good for performance. Remember that F# is functional-first, not zealously fuctional-only ;)
EDIT: Just thought of one more thing I would like to add: I don't know exactly how your MyIp types are constructed, but if you can build them out of numbers, it might be worth looking at using a sequence comprehension like seq {1 .. 100} and then piping that to a map to create the MyIps, e.g. seq {1 .. 100} |> Seq.map makeIp |> Seq.toList. This would be the easiest way, but would only work if you can simply specify a simple number range.
Seq is lazy in F#, ie there are benefits to caching the results occassionally. F# List is not lazy, it's an immutable single linked list that won't get any benefits from caching.

How to detect list changes without comparing the complete list

I have a function which will fail if there has being any change on the term/list it is using since the generation of this term/list. I would like to avoid to check that each parameter still the same. So I had thought about each time I generate the term/list to perform a CRC or something similar. Before making use of it I would generate again the CRC so I can be 99,9999% sure the term/list still the same.
Going to a specfic answer, I am programming in Erlang, I am thinking on using a function of the following type:
-spec(list_crc32(List :: [term()]) -> CRC32 :: integer()).
I use term, because it is a list of terms, (erlang has already a default fast CRC libraries but for binary values). I have consider to use "erlang:crc32(term_to_binary(Term))", but not sure if there could be a better approach.
What do you think?
Regards, Borja.
Without more context it is a little bit difficult to understand why you would have this problem, particularly since Erlang terms are immutable -- once assigned no other operation can change the value of a variable, not even in the same function.
So if your question is "How do I quickly assert that true = A == A?" then consider this code:
A = generate_list()
% other things in this function happen
A = A.
The above snippet will always assert that A is still A, because it is not possible to change A like you might do in, say, Python.
If your question is "How do I assert that the value of a new list generated exactly the same value as a different known list?" then using either matching or an actual assertion is the fastest way:
start() ->
A = generate_list(),
assert_loop(A).
assert_loop(A) ->
ok = do_stuff(),
A = generate_list(),
assert_loop(A).
The assert_loop/1 function above is forcing an assertion that the output of generate_list/0 is still exactly A. There is no telling what other things in the system might be happening which may have affected the result of that function, but the line A = generate_list() will crash if the list returned is not exactly the same value as A.
In fact, there is no way to change the A in this example, no matter how many times we execute assert_loop/1 above.
Now consider a different style:
compare_loop(A) ->
ok = do_stuff(),
case A =:= generate_list() of
true -> compare_loop(A);
false -> terminate_gracefully()
end.
Here we have given ourselves the option to do something other than crash, but the effect is ultimately the same, as the =:= is not merely a test of equality, it is a match test meaning that the two do not evaluate to the same values, but that they actually match.
Consider:
1> 1 == 1.0.
true
2> 1 =:= 1.0.
false
The fastest way to compare two terms will depend partly on the sizes of the lists involved but especially on whether or not you expect the assertion to pass or fail more often.
If the check is expected to fail more often then the fastest check is to use an assertion with =, an equivalence test with == or a match test with =:= instead of using erlang:phash2/1. Why? Because these tests can return false as soon as a non-matching element is encountered -- and if this non-match occurs near the beginning of the list then a full traverse of both lists is avoided entirely.
If the check is expected to pass more often then something like erlang:phash2/1 will be faster, but only if the lists are long, because only one list will be fully traversed each iteration (the hash of the original list is already stored). It is possible, though, on a short list that a simple comparison will still be faster than computing a hash, storing it, computing another hash, and then comparing the hashes (obviously). So, as always, benchmark.
A phash2 version could look like:
start() ->
A = generate_list(),
Hash = erlang:phash2(A),
assert_loop(Hash).
assert_loop(Hash) ->
ok = do_stuff(),
Hash = erlang:phash2(generate_list()),
loop(Hash).
Again, this is an assertive loop that will crash instead of exit cleanly, so it would need to be adapted to your needs.
The basic mystery still remains, though: in a language with immutable variables why is it that you don't know whether something will have changed? This is almost certainly a symptom of an underlying architectural problem elsewhere in the program -- either that or simply a misunderstanding of immutability in Erlang.

A list check before and after concatenation with different results

I have an assignment and the code is really simple to understand but I cant find a possible solution. Thats the code:
lucky:: [Integer] -> Bool
lucky (xs) = all (/=13) xs
catenate as [] = as
catenate as (b:bs) = b : (catenate as bs)
test_luck1 as bs = lucky as && lucky bs
test_luck2 as bs = lucky (catenate as bs)
So the question is: For which input(the same for both functions) are the boolean values of both functions different, for example the first one true and the second one false or vice versa. So the first function tests both lists individually and the second tests the concatenation of the lists. I have been thinking all day yesterday and have absolutely no idea. Could you guys help me with finding the trick that should be used to solve the question?
For infinite "lucky" bs and "unlucky" as, test_luck1 will terminate, while test_luck2 will not.
The functions test the values in different order, due to the (somewhat weird) implementation of catenate, which prepends bs to as. Thus, test_luck1 tests as first, then bs, whereas test_luck2 tests bs first, and then as.
P.S. This can be seen as a boundary case, as per #Mark Seemann's remark -- sorry for the spoiler ;)
Looking at this some more, I think I was too quick on the trigger with my comments. Apart from shinobi's answer, I don't see any way in which the two functions would return different results.
Not that this proves anything, but I wrote a QuickCheck property to verify the hypothesis that test_luck1 will always return the same result as test_luck2:
prop :: [Integer] -> [Integer] -> Bool
prop as bs =
test_luck1 as bs == test_luck2 as bs
I've been running this property with 1,000,000 tests, and they all pass, so I don't think that there's any 'normal' values of as and bs that will cause the output of test_luck1 to be different from the output of test_luck2.

Programming without if-statements? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I remember some time (years, probably) ago I read on Stackoverflow about the charms of programming with as few if-tests as possible. This question is somewhat relevant but I think the stress was on using many small functions that returned values determined by tests depending on the parameter they receive. A very simple example would be using this:
int i = 5;
bool iIsSmall = isSmall(i);
with isSmall() looking like this:
private bool isSmall(int number)
{
return (i < 10);
}
instead of just doing this:
int i = 5;
bool isSmall;
if (i < 10) {
isSmall = true;
} else {
isSmall = false;
}
(Logically this code is just sample code. It is not part of a program I am making.)
The reason for doing this, I believe, was because it looks nicer and makes a programmer less prone to logical errors. If this coding convention is applied correctly, you would see virtually no if-tests anywhere, except in functions whose only purpose is to do that test.
Now, my question is: is there any documentation about this convention? Is there anyplace where you can see wild arguments between supporters and opposers of this style? I tried searching for the Stackoverflow post that introduced me to this, but I can't find it anymore.
Lastly, I hope this question doesn't get shot down because I am not asking for a solution to a problem. I am simply hoping to hear more about this coding style and maybe increase the quality of all coding I will do in the future.
This whole "if" vs "no if" thing makes me think of the Expression Problem1. Basically, it's an observation that programming with if statements or without if statements is a matter of encapsulation and extensibility and that sometimes it's better to use if statements2 and sometimes it's better to use dynamic dispatching with methods / function pointers.
When we want to model something, there are two axes to worry about:
The different cases (or types) of the inputs we need to deal with.
The different operations we want to perform over these inputs.
One way to implement this sort of thing is with if statements / pattern matching / the visitor pattern:
data List = Nil | Cons Int List
length xs = case xs of
Nil -> 0
Cons a as -> 1 + length x
concat xs ys = case ii of
Nil -> jj
Cons a as -> Cons a (concat as ys)
The other way is to use object orientation:
data List = {
length :: Int
concat :: (List -> List)
}
nil = List {
length = 0,
concat = (\ys -> ys)
}
cons x xs = List {
length = 1 + length xs,
concat = (\ys -> cons x (concat xs ys))
}
It's not hard to see that the first version using if statements makes it easy to add new operations on our data type: just create a new function and do a case analysis inside it. On the other hand, this makes it hard to add new cases to our data type since that would mean going back through the program and modifying all the branching statements.
The second version is kind of the opposite. It's very easy to add new cases to the datatype: just create a new "class" and tell what to do for each of the methods we need to implement. However, it's now hard to add new operations to the interface since this means adding a new method for all the old classes that implemented the interface.
There are many different approaches that languages use to try to solve the Expression Problem and make it easy to add both new cases and new operations to a model. However, there are pros and cons to these solutions3 so in general I think it's a good rule of thumb to choose between OO and if statements depending on what axis you want to make it easier to extend stuff.
Anyway, going back to your question there are couple of things I would like to point out:
The first one is that I think the OO "mantra" of getting rid of all if statements and replacing them with method dispatching has more to do with how most OO languages don't have typesafe Algebraic Data Types than it has to do with "if statemsnts" being bad for encapsulation. Since the only way to be type safe is to use method calls you are encouraged to convert programs using if statements into programs using the Visitor Pattern4 or worse: convert programs that should be using the visitor pattern into programs using simple method dispatch, therefore making extensibility easy in the wrong direction.
The second thing is that I'm not a big fan of breaking things into functions just because you can. In particular, I find that style where all the functions have just 5 lines and call tons of other functions is pretty hard to read.
Finally, I think your example doesn't really get rid of if statements. Essentially, what you are doing is having a function from Integers to a new datatype (with two cases, one for Big and one for Small) and then you still need to use if statements when working with the datatype:
data Size = Big | Small
toSize :: Int -> Size
toSize n = if n < 10 then Small else Big
someOp :: Size -> String
someOp Small = "Wow, its small"
someOp Big = "Wow, its big"
Going back to the expression problem point of view, the advantage of defining our toSize / isSmall function is that we put the logic of choosing what case our number fits in a single place and that our functions can only operate on the case after that. However, this does not mean that we have removed if statements from our code! If we have the toSize being a factory function and we have Big and Small be classes sharing an interface then yes, we will have removed if statements from our code. However, if our isSmall just returns a boolean or enum then there will be just as many if statements as there were before. (and you should choose what implementation to use depending if you want to make it easier to add new methods or new cases - say Medium - in the future)
1 - The name of the problem comes from the problem where you have an "expression" datatype (numbers, variables, addition/multiplication of subexpressions, etc) and want to implement things like evaluation functions and other things.
2 - Or pattern matching over Algebraic Data Types, if you want to be more type safe...
3 - For example, you might have to define all multimethods on the "top level" where the "dispatcher" can see them. This is a limitation compared to the general case since you can use if statements (and lambdas) nested deeply inside other code.
4 - Essentially a "church encoding" of an algebraic data type
I've never heard of such a convection. I don't see how it works, anyway. Surely the only point of having a iIsSmall is to later branch on it (possibly in combination with other values)?
What I have heard of is an argument to avoid having variables like iIsSmall at all. iIsSmall is just storing the result of a test you made, so that you can later use that result to make some decision. So why not just test the value of i at the point where you need to make the decision? i.e., instead of:
int i = 5;
bool iIsSmall = isSmall(i);
...
<code>
...
if (iIsSmall) {
<do something because i is small>
} else {
<do something different because i is not small>
}
just write:
int i = 5
...
<code>
...
if (isSmall(i)) {
<do something because i is small>
} else {
<do something different because i is not small>
}
That way you can tell at the branch point what you're actually branching on because it's right there. That's not hard in this example anyway, but if the test was complicated you're probably not going to be able to encode the whole thing in the variable name.
It's also safer. There's no danger that the name iIsSmall is misleading because you changed the code so that it was testing something else, or because i was actually altered after you called isSmall so that it is not necessarily small anymore, or because someone just picked a dumb variable name, etc, etc.
Obviously this doesn't always work. If the isSmall test is expensive and you need to branch on its result many times, you don't want to execute it many times. You also might not want to duplicate the code of that call many times, unless it's trivial. Or you might want to return the flag to be used by a caller who doesn't know about i (though then you could just return isSmall(i), rather than store it in a variable and then return the variable).
Btw, the separate function saves nothing in your example. You can include (i < 10) in an assignment to a bool variable just as easily as in a return statement in a bool function. i.e. you could just as easily write bool isSmall = i < 10; - it's this that avoids the if statement, not the separate function. Code of the form if (test) { x = true; } else { x = false; } or if (test) { return true; } else { return false; } is always silly; just use x = test or return test.
Is it really a convention? Should one just kill minimal if-constructs just because there could be frustration over it?
OK, if statements tend to grow out of control, especially if many special cases are added over time. Branch after branch is added and at the end no one is able to comprehend what everything does without spending hours of time and some cups of coffee into this grown instance of spaghetti-code.
But is it really a good idea to put everything in seperate functions? Code should be reusable. Code should be readable. But a function call just creates the need to look it up further up in the source file. If all ifs are put away in this way, you just skip around in the source file all the time. Does this support readability?
Or consider an if-statement which is not reused anywhere. Should it really go into a separate function, just for the sake of convention? there is some overhead involved here, too. Performance issues could be relevant in this context, too.
What I am trying to say: following coding conventions is good. Style is important. But there are exceptions. Just try to write good code that fits into your project and keep the future in mind. In the end, coding conventions are just guidelines which try to help us to produce good code without enforcing anything on us.

Does ghc transform a list only used once into a generator for efficiency reasons?

If so, is this a part of the standard or a ghc specific optimisation we can depend on? Or just an optimisation which we can't necessarily depend on.
P.S.:
When I tried a test sample, it seemed to indicate that it was taking place/
Prelude> let isOdd x = x `mod` 2 == 1
Prelude> let isEven x = x `mod` 2 == 0
Prelude> ((filter isOdd).(filter isEven)) [1..]
Chews up CPU but doesn't consume much memory.
Depends on what you mean by generator. The list is lazily generated, and since nothing else references it, the consumed parts are garbage collected almost immediately. Since the result of the above computation doesn't grow, the entire computation runs in constant space. That is not mandated by the standard, but as it is harder to implement nonstrict semantics with different space behaviour for that example (and lots of vaguely similar), in practice you can rely on it.
But normally, the list is still generated as a list, so there's a lot of garbage produced. Under favourable circumstances, ghc eliminates the list [1 .. ] and produces a non-allocating loop:
result :: [Int]
result = filter odd . filter even $ [1 .. ]
(using the Prelude functions out of laziness), compiled with -O2 generates the core
List.result_go =
\ (x_ayH :: GHC.Prim.Int#) ->
case GHC.Prim.remInt# x_ayH 2 of _ {
__DEFAULT ->
case x_ayH of wild_Xa {
__DEFAULT -> List.result_go (GHC.Prim.+# wild_Xa 1);
9223372036854775807 -> GHC.Types.[] # GHC.Types.Int
};
0 ->
case x_ayH of wild_Xa {
__DEFAULT -> List.result_go (GHC.Prim.+# wild_Xa 1);
9223372036854775807 -> GHC.Types.[] # GHC.Types.Int
}
}
A plain loop, running from 1 to maxBound :: Int, producing nothing on the way and [] at the end.
It's almost smart enough to plain return []. Note that there's only one division by 2, GHC knows that if an Int is even, it can't be odd, so that check has been eliminated, and in no branch a non-empty list is created (i.e., the unreachable branches have been eliminated by the compiler).
Strictly speaking, Haskell does not specify any particular evaluation model, so implementations are free to implement the language's semantics how they want. However, in any sane implementation, including GHC, you can rely on this running in constant space.
In GHC, computations like these result in a singly-linked list ending in a thunk representing the remainder of the list which has not yet been evaluated. As you evaluate this list, more of the list will be generated on demand, but since the beginning of the list is not referred to anywhere else, the earlier parts are immediately eligible for garbage collection, so you get constant space behavior.
With optimizations enabled, GHC is very likely to perform deforestation here, optimizing away the need for having a list at all, and the result will be a simple loop with no allocation performed.