What's the real purpose of `ignore` function in OCaml? - ocaml

There is an ignore function in OCaml.
val ignore : 'a -> unit
Discard the value of its argument and return (). For instance,
ignore(f x) discards the result of the side-effecting function f. It
is equivalent to f x; (), except that the latter may generate a
compiler warning; writing ignore(f x) instead avoids the warning.
I know what this function will do, but don't get the point of using it.
Anyone can explain or give an example for when we have to use it?

You basically answered your own question. You don't ever have to use it. The point is precisely to avoid the warning. If you write f x; (), the compiler assumes you probably did something wrong. Probably you thought f x returns unit because you rarely want to ignore non-unit values.
However, sometimes that's not true, and you really want to ignore even non-unit values. Writing ignore (f x) documents the fact that you know f x returns something, but you are deliberately ignoring it.
Note that in real code f x might be something more complex, so the chances of you being wrong about the return type of f x are reasonably high. One example is partial application. Consider f : int -> int -> unit. You might accidentally write f 1, forgetting the second argument, and the warning will help you. Another example is if you do open Async, then many functions from the Standard Library change from returning unit to returning unit Deferred.t. Especially when first starting to use Async, it is quite likely that you'll accidentally think the semicolon operator is appropriate in places that you really need to use monadic bind.

As a complement to Ashish Agarwal's answer (because judging from your comment you don't seem very convinced) :
Imagine that I have a function that has side effects, and returns a value indicating something about the computation. Then, if I'm interested in how the computation went, I will need its return value. However, if I don't care about this and simply want the side effects to take place, I would use ignore.
Dumb example : let's say you have a function which sorts an array and returns Was_already_sorted or Was_not_sorted depending on the initial state of the array. Then if for some reason I'm interested in knowing how often my array was sorted, I might need the return value of this function. If not, I will ignore it.
I agree that this is a dumb example. And probably that in many cases there would be better ways to deal with the problem than using ignore (I've just noticed that I never use ignore). If you're really passionate about this, you could try to find examples of use of this function in real-life code (maybe in the source-code of software such as Unison?).
Also, note that you can use let _ = f x to the same end.

Related

Haskell - Why is Alternative implemented for List

I have read some of this post Meaning of Alternative (it's long)
What lead me to that post was learning about Alternative in general. The post gives a good answer to why it is implemented the way it is for List.
My question is:
Why is Alternative implemented for List at all?
Is there perhaps an algorithm that uses Alternative and a List might be passed to it so define it to hold generality?
I thought because Alternative by default defines some and many, that may be part of it but What are some and many useful for contains the comment:
To clarify, the definitions of some and many for the most basic types such as [] and Maybe just loop. So although the definition of some and many for them is valid, it has no meaning.
In the "What are some and many useful for" link above, Will gives an answer to the OP that may contain the answer to my question, but at this point in my Haskelling, the forest is a bit thick to see the trees.
Thanks
There's something of a convention in the Haskell library ecology that if a thing can be an instance of a class, then it should be an instance of the class. I suspect the honest answer to "why is [] an Alternative?" is "because it can be".
...okay, but why does that convention exist? The short answer there is that instances are sort of the one part of Haskell that succumbs only to whole-program analysis. They are global, and if there are two parts of the program that both try to make a particular class/type pairing, that conflict prevents the program from working right. To deal with that, there's a rule of thumb that any instance you write should live in the same module either as the class it's associated with or as the type it's associated with.
Since instances are expected to live in specific modules, it's polite to define those instances whenever you can -- since it's not really reasonable for another library to try to fix up the fact that you haven't provided the instance.
Alternative is useful when viewing [] as the nondeterminism-monad. In that case, <|> represents a choice between two programs and empty represents "no valid choice". This is the same interpretation as for e.g. parsers.
some and many does indeed not make sense for lists, since they try iterating through all possible lists of elements from the given options greedily, starting from the infinite list of just the first option. The list monad isn't lazy enough to do even that, since it might always need to abort if it was given an empty list. There is however one case when both terminates: When given an empty list.
Prelude Control.Applicative> many []
[[]]
Prelude Control.Applicative> some []
[]
If some and many were defined as lazy (in the regex sense), meaning they prefer short lists, you would get out results, but not very useful, since it starts by generating all the infinite number of lists with just the first option:
Prelude Control.Applicative> some' v = liftA2 (:) v (many' v); many' v = pure [] <|> some' v
Prelude Control.Applicative> take 100 . show $ (some' [1,2])
"[[1],[1,1],[1,1,1],[1,1,1,1],[1,1,1,1,1],[1,1,1,1,1,1],[1,1,1,1,1,1,1],[1,1,1,1,1,1,1,1],[1,1,1,1,1,"
Edit: I believe the some and many functions corresponds to a star-semiring while <|> and empty corresponds to plus and zero in a semiring. So mathematically (I think), it would make sense to split those operations out into a separate typeclass, but it would also be kind of silly, since they can be implemented in terms of the other operators in Alternative.
Consider a function like this:
fallback :: Alternative f => a -> (a -> f b) -> (a -> f e) -> f (Either e b)
fallback x f g = (Right <$> f x) <|> (Left <$> g x)
Not spectacularly meaningful, but you can imagine it being used in, say, a parser: try one thing, falling back to another if that doesn't work.
Does this function have a meaning when f ~ []? Sure, why not. If you think of a list's "effects" as being a search through some space, this function seems to represent some kind of biased choice, where you prefer the first option to the second, and while you're willing to try either, you also tag which way you went.
Could a function like this be part of some algorithm which is polymorphic in the Alternative it computes in? Again I don't see why not. It doesn't seem unreasonable for [] to have an Alternative instance, since there is an implementation that satisfies the Alternative laws.
As to the answer linked to by Will Ness that you pointed out: it covers that some and many don't "just loop" for lists. They loop for non-empty lists. For empty lists, they immediately return a value. How useful is this? Probably not very, I must admit. But that functionality comes along with (<|>) and empty, which can be useful.

Assign variable in list

I've got a problem with Prolog lists.
Let's say I've got this predicate:
array(p, [A,B,C]).
When I do:
array(p,X).
I got: X = [_,_,_]
Now, considering I've got this predicate:
p1(1) :- array(p1, [1,B1,C1]).
I expected to get:
X = [1,_,_]
but instead, the result is the same as before. Is such a thing even possible in Prolog? Another question is if somehow we can set these values, could we overwrite these values in the same way? I understand that in the prolog variables are assigned only once but I would like to somehow get a dynamic list.
I'm not sure what you mean by "paradigm," and I'm very unclear on what you're trying to do with this code. If you have this at the toplevel:
array(p, [A,B,C]).
you are defining a fact array/2, which associates p with a list of three uninstantiated variables. Your first query amounts to retrieving this fact.
Your second "paradigm" is really the definition of a rule or predicate p1/1, which takes a single argument, which must be 1 for the rule to fire. The body of this second predicate is a call to the predicate array/2 which is definitely going to fail. I don't see how you could possibly get the same result as before, because you defined array(p, ...) before and now you are looking for array(p1, ...). Furthermore, there is no X in your second query, so there is no reason for X to appear in the result, and it definitely would not, even if you had called array(p, ...) instead of array(p1, ...).
I think what you're trying to do here is probably set up some kind of set of three variables and then unify each of them in turn as you proceed along some calculation. To do something like that is possible and easy in Prolog, but the fact database is not going to participate in this process really. You're going to have to write predicates that pass your variables along to other predicates that will unify them and return them bound. None of this is very hard, but it looks like you're going to have to go back and understand the fundamentals here a little better. You're far enough off track here that I don't think anyone can really answer your question as stated, because there's too much confusion in it.

How to detect list changes without comparing the complete list

I have a function which will fail if there has being any change on the term/list it is using since the generation of this term/list. I would like to avoid to check that each parameter still the same. So I had thought about each time I generate the term/list to perform a CRC or something similar. Before making use of it I would generate again the CRC so I can be 99,9999% sure the term/list still the same.
Going to a specfic answer, I am programming in Erlang, I am thinking on using a function of the following type:
-spec(list_crc32(List :: [term()]) -> CRC32 :: integer()).
I use term, because it is a list of terms, (erlang has already a default fast CRC libraries but for binary values). I have consider to use "erlang:crc32(term_to_binary(Term))", but not sure if there could be a better approach.
What do you think?
Regards, Borja.
Without more context it is a little bit difficult to understand why you would have this problem, particularly since Erlang terms are immutable -- once assigned no other operation can change the value of a variable, not even in the same function.
So if your question is "How do I quickly assert that true = A == A?" then consider this code:
A = generate_list()
% other things in this function happen
A = A.
The above snippet will always assert that A is still A, because it is not possible to change A like you might do in, say, Python.
If your question is "How do I assert that the value of a new list generated exactly the same value as a different known list?" then using either matching or an actual assertion is the fastest way:
start() ->
A = generate_list(),
assert_loop(A).
assert_loop(A) ->
ok = do_stuff(),
A = generate_list(),
assert_loop(A).
The assert_loop/1 function above is forcing an assertion that the output of generate_list/0 is still exactly A. There is no telling what other things in the system might be happening which may have affected the result of that function, but the line A = generate_list() will crash if the list returned is not exactly the same value as A.
In fact, there is no way to change the A in this example, no matter how many times we execute assert_loop/1 above.
Now consider a different style:
compare_loop(A) ->
ok = do_stuff(),
case A =:= generate_list() of
true -> compare_loop(A);
false -> terminate_gracefully()
end.
Here we have given ourselves the option to do something other than crash, but the effect is ultimately the same, as the =:= is not merely a test of equality, it is a match test meaning that the two do not evaluate to the same values, but that they actually match.
Consider:
1> 1 == 1.0.
true
2> 1 =:= 1.0.
false
The fastest way to compare two terms will depend partly on the sizes of the lists involved but especially on whether or not you expect the assertion to pass or fail more often.
If the check is expected to fail more often then the fastest check is to use an assertion with =, an equivalence test with == or a match test with =:= instead of using erlang:phash2/1. Why? Because these tests can return false as soon as a non-matching element is encountered -- and if this non-match occurs near the beginning of the list then a full traverse of both lists is avoided entirely.
If the check is expected to pass more often then something like erlang:phash2/1 will be faster, but only if the lists are long, because only one list will be fully traversed each iteration (the hash of the original list is already stored). It is possible, though, on a short list that a simple comparison will still be faster than computing a hash, storing it, computing another hash, and then comparing the hashes (obviously). So, as always, benchmark.
A phash2 version could look like:
start() ->
A = generate_list(),
Hash = erlang:phash2(A),
assert_loop(Hash).
assert_loop(Hash) ->
ok = do_stuff(),
Hash = erlang:phash2(generate_list()),
loop(Hash).
Again, this is an assertive loop that will crash instead of exit cleanly, so it would need to be adapted to your needs.
The basic mystery still remains, though: in a language with immutable variables why is it that you don't know whether something will have changed? This is almost certainly a symptom of an underlying architectural problem elsewhere in the program -- either that or simply a misunderstanding of immutability in Erlang.

Why would I ever want to use Maybe instead of a List?

Seeing as the Maybe type is isomorphic to the set of null and singleton lists, why would anyone ever want to use the Maybe type when I could just use lists to accomodate absence?
Because if you match a list against the patterns [] and [x] that's not an exhaustive match and you'll get a warning about that, forcing you to either add another case that'll never get called or to ignore the warning.
Matching a Maybe against Nothing and Just x however is exhaustive. So you'll only get a warning if you fail to match one of those cases.
If you choose your types such that they can only represent values that you may actually produce, you can rely on non-exhaustiveness warnings to tell you about bugs in your code where you forget to check for a given a case. If you choose more "permissive" types, you'll always have to think about whether a warning represents an actual bug or just an impossible case.
You should strive to have accurate types. Maybe expresses that there is exactly one value or that there is none. Many imperative languages represent the "none" case by the value null.
If you chose a list instead of Maybe, all your functions would be faced with the possibility that they get a list with more than one member. Probably many of them would only be defined for one value, and would have to fail on a pattern match. By using Maybe, you avoid a class of runtime errors entirely.
Building on existing (and correct) answers, I'll mention a typeclass based answer.
Different types convey different intentions - returning a Maybe a represents a computation with the possiblity of failing while [a] could represent non-determinism (or, in simpler terms, multiple possible return values).
This plays into the fact that different types have different instances for typeclasses - and these instances cater to the underlying essence the type conveys. Take Alternative and its operator (<|>) which represents what it means to combine (or choose) between arguments given.
Maybe a Combining computations that can fail just means taking the first that is not Nothing
[a] Combining two computations that each had multiple return values just means concatenating together all possible values.
Then, depending on which types your functions use, (<|>) would behave differently. Of course, you could argue that you don't need (<|>) or anything like that, but then you are missing out on one of Haskell's main strengths: it's many high-level combinator libraries.
As a general rule, we like our types to be as snug fitting and intuitive as possible. That way, we are not fighting the standard libraries and our code is more readable.
Lisp, Scheme, Python, Ruby, JavaScript, etc., manage to get along with just one type each, which you could represent in Haskell with a big sum type. Every function handling a JavaScript (or whatever) value must be prepared to receive a number, a string, a function, a piece of the document object model, etc., and throw an exception if it gets something unexpected. People who program in typed languages like Haskell prefer to limit the number of unexpected things that can occur. They also like to express ideas using types, making types useful (and machine-checked) documentation. The closer the types come to representing the intended meaning, the more useful they are.
Because there are an infinite number of possible lists, and a finite number of possible values for the Maybe type. It perfectly represents one thing or the absence of something without any other possibility.
Several answers have mentioned exhaustiveness as a factor here. I think it is a factor, but not the biggest one, because there is a way to consistently treat lists as if they were Maybes, which the listToMaybe function illustrates:
listToMaybe :: [a] -> Maybe a
listToMaybe [] = Nothing
listToMaybe (a:_) = Just a
That's an exhaustive pattern match, which rules out any straightforward errors.
The factor I'd highlight as bigger is that by using the type that more precisely models the behavior of your code, you eliminate potential behaviors that would be possible if you used a more general alternative. Say for example you have some context in your code where you uses a type of the form a -> [b], though the only correct alternatives (given your program's specification) are empty or singleton lists. Try as hard as you may to enforce the convention that this context should obey that rule, it's still possible that you'll mess up and:
Somehow a function used in that context will produce a list of two or more items;
And somehow a function that uses the results produced in that context will observe whether the lists have two or more items, and behave incorrectly in that case.
Example: some code that expects there to be no more than one value will blindly print the contents of the list and thus print multiple items when only one was supposed to be.
But if you use Maybe, then there really must be either one value or none, and the compiler enforces this.
Even though isomorphic, e.g. QuickCheck will run slower because of the increase in search space.

Programming without if-statements? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I remember some time (years, probably) ago I read on Stackoverflow about the charms of programming with as few if-tests as possible. This question is somewhat relevant but I think the stress was on using many small functions that returned values determined by tests depending on the parameter they receive. A very simple example would be using this:
int i = 5;
bool iIsSmall = isSmall(i);
with isSmall() looking like this:
private bool isSmall(int number)
{
return (i < 10);
}
instead of just doing this:
int i = 5;
bool isSmall;
if (i < 10) {
isSmall = true;
} else {
isSmall = false;
}
(Logically this code is just sample code. It is not part of a program I am making.)
The reason for doing this, I believe, was because it looks nicer and makes a programmer less prone to logical errors. If this coding convention is applied correctly, you would see virtually no if-tests anywhere, except in functions whose only purpose is to do that test.
Now, my question is: is there any documentation about this convention? Is there anyplace where you can see wild arguments between supporters and opposers of this style? I tried searching for the Stackoverflow post that introduced me to this, but I can't find it anymore.
Lastly, I hope this question doesn't get shot down because I am not asking for a solution to a problem. I am simply hoping to hear more about this coding style and maybe increase the quality of all coding I will do in the future.
This whole "if" vs "no if" thing makes me think of the Expression Problem1. Basically, it's an observation that programming with if statements or without if statements is a matter of encapsulation and extensibility and that sometimes it's better to use if statements2 and sometimes it's better to use dynamic dispatching with methods / function pointers.
When we want to model something, there are two axes to worry about:
The different cases (or types) of the inputs we need to deal with.
The different operations we want to perform over these inputs.
One way to implement this sort of thing is with if statements / pattern matching / the visitor pattern:
data List = Nil | Cons Int List
length xs = case xs of
Nil -> 0
Cons a as -> 1 + length x
concat xs ys = case ii of
Nil -> jj
Cons a as -> Cons a (concat as ys)
The other way is to use object orientation:
data List = {
length :: Int
concat :: (List -> List)
}
nil = List {
length = 0,
concat = (\ys -> ys)
}
cons x xs = List {
length = 1 + length xs,
concat = (\ys -> cons x (concat xs ys))
}
It's not hard to see that the first version using if statements makes it easy to add new operations on our data type: just create a new function and do a case analysis inside it. On the other hand, this makes it hard to add new cases to our data type since that would mean going back through the program and modifying all the branching statements.
The second version is kind of the opposite. It's very easy to add new cases to the datatype: just create a new "class" and tell what to do for each of the methods we need to implement. However, it's now hard to add new operations to the interface since this means adding a new method for all the old classes that implemented the interface.
There are many different approaches that languages use to try to solve the Expression Problem and make it easy to add both new cases and new operations to a model. However, there are pros and cons to these solutions3 so in general I think it's a good rule of thumb to choose between OO and if statements depending on what axis you want to make it easier to extend stuff.
Anyway, going back to your question there are couple of things I would like to point out:
The first one is that I think the OO "mantra" of getting rid of all if statements and replacing them with method dispatching has more to do with how most OO languages don't have typesafe Algebraic Data Types than it has to do with "if statemsnts" being bad for encapsulation. Since the only way to be type safe is to use method calls you are encouraged to convert programs using if statements into programs using the Visitor Pattern4 or worse: convert programs that should be using the visitor pattern into programs using simple method dispatch, therefore making extensibility easy in the wrong direction.
The second thing is that I'm not a big fan of breaking things into functions just because you can. In particular, I find that style where all the functions have just 5 lines and call tons of other functions is pretty hard to read.
Finally, I think your example doesn't really get rid of if statements. Essentially, what you are doing is having a function from Integers to a new datatype (with two cases, one for Big and one for Small) and then you still need to use if statements when working with the datatype:
data Size = Big | Small
toSize :: Int -> Size
toSize n = if n < 10 then Small else Big
someOp :: Size -> String
someOp Small = "Wow, its small"
someOp Big = "Wow, its big"
Going back to the expression problem point of view, the advantage of defining our toSize / isSmall function is that we put the logic of choosing what case our number fits in a single place and that our functions can only operate on the case after that. However, this does not mean that we have removed if statements from our code! If we have the toSize being a factory function and we have Big and Small be classes sharing an interface then yes, we will have removed if statements from our code. However, if our isSmall just returns a boolean or enum then there will be just as many if statements as there were before. (and you should choose what implementation to use depending if you want to make it easier to add new methods or new cases - say Medium - in the future)
1 - The name of the problem comes from the problem where you have an "expression" datatype (numbers, variables, addition/multiplication of subexpressions, etc) and want to implement things like evaluation functions and other things.
2 - Or pattern matching over Algebraic Data Types, if you want to be more type safe...
3 - For example, you might have to define all multimethods on the "top level" where the "dispatcher" can see them. This is a limitation compared to the general case since you can use if statements (and lambdas) nested deeply inside other code.
4 - Essentially a "church encoding" of an algebraic data type
I've never heard of such a convection. I don't see how it works, anyway. Surely the only point of having a iIsSmall is to later branch on it (possibly in combination with other values)?
What I have heard of is an argument to avoid having variables like iIsSmall at all. iIsSmall is just storing the result of a test you made, so that you can later use that result to make some decision. So why not just test the value of i at the point where you need to make the decision? i.e., instead of:
int i = 5;
bool iIsSmall = isSmall(i);
...
<code>
...
if (iIsSmall) {
<do something because i is small>
} else {
<do something different because i is not small>
}
just write:
int i = 5
...
<code>
...
if (isSmall(i)) {
<do something because i is small>
} else {
<do something different because i is not small>
}
That way you can tell at the branch point what you're actually branching on because it's right there. That's not hard in this example anyway, but if the test was complicated you're probably not going to be able to encode the whole thing in the variable name.
It's also safer. There's no danger that the name iIsSmall is misleading because you changed the code so that it was testing something else, or because i was actually altered after you called isSmall so that it is not necessarily small anymore, or because someone just picked a dumb variable name, etc, etc.
Obviously this doesn't always work. If the isSmall test is expensive and you need to branch on its result many times, you don't want to execute it many times. You also might not want to duplicate the code of that call many times, unless it's trivial. Or you might want to return the flag to be used by a caller who doesn't know about i (though then you could just return isSmall(i), rather than store it in a variable and then return the variable).
Btw, the separate function saves nothing in your example. You can include (i < 10) in an assignment to a bool variable just as easily as in a return statement in a bool function. i.e. you could just as easily write bool isSmall = i < 10; - it's this that avoids the if statement, not the separate function. Code of the form if (test) { x = true; } else { x = false; } or if (test) { return true; } else { return false; } is always silly; just use x = test or return test.
Is it really a convention? Should one just kill minimal if-constructs just because there could be frustration over it?
OK, if statements tend to grow out of control, especially if many special cases are added over time. Branch after branch is added and at the end no one is able to comprehend what everything does without spending hours of time and some cups of coffee into this grown instance of spaghetti-code.
But is it really a good idea to put everything in seperate functions? Code should be reusable. Code should be readable. But a function call just creates the need to look it up further up in the source file. If all ifs are put away in this way, you just skip around in the source file all the time. Does this support readability?
Or consider an if-statement which is not reused anywhere. Should it really go into a separate function, just for the sake of convention? there is some overhead involved here, too. Performance issues could be relevant in this context, too.
What I am trying to say: following coding conventions is good. Style is important. But there are exceptions. Just try to write good code that fits into your project and keep the future in mind. In the end, coding conventions are just guidelines which try to help us to produce good code without enforcing anything on us.