Related
Seeing as the Maybe type is isomorphic to the set of null and singleton lists, why would anyone ever want to use the Maybe type when I could just use lists to accomodate absence?
Because if you match a list against the patterns [] and [x] that's not an exhaustive match and you'll get a warning about that, forcing you to either add another case that'll never get called or to ignore the warning.
Matching a Maybe against Nothing and Just x however is exhaustive. So you'll only get a warning if you fail to match one of those cases.
If you choose your types such that they can only represent values that you may actually produce, you can rely on non-exhaustiveness warnings to tell you about bugs in your code where you forget to check for a given a case. If you choose more "permissive" types, you'll always have to think about whether a warning represents an actual bug or just an impossible case.
You should strive to have accurate types. Maybe expresses that there is exactly one value or that there is none. Many imperative languages represent the "none" case by the value null.
If you chose a list instead of Maybe, all your functions would be faced with the possibility that they get a list with more than one member. Probably many of them would only be defined for one value, and would have to fail on a pattern match. By using Maybe, you avoid a class of runtime errors entirely.
Building on existing (and correct) answers, I'll mention a typeclass based answer.
Different types convey different intentions - returning a Maybe a represents a computation with the possiblity of failing while [a] could represent non-determinism (or, in simpler terms, multiple possible return values).
This plays into the fact that different types have different instances for typeclasses - and these instances cater to the underlying essence the type conveys. Take Alternative and its operator (<|>) which represents what it means to combine (or choose) between arguments given.
Maybe a Combining computations that can fail just means taking the first that is not Nothing
[a] Combining two computations that each had multiple return values just means concatenating together all possible values.
Then, depending on which types your functions use, (<|>) would behave differently. Of course, you could argue that you don't need (<|>) or anything like that, but then you are missing out on one of Haskell's main strengths: it's many high-level combinator libraries.
As a general rule, we like our types to be as snug fitting and intuitive as possible. That way, we are not fighting the standard libraries and our code is more readable.
Lisp, Scheme, Python, Ruby, JavaScript, etc., manage to get along with just one type each, which you could represent in Haskell with a big sum type. Every function handling a JavaScript (or whatever) value must be prepared to receive a number, a string, a function, a piece of the document object model, etc., and throw an exception if it gets something unexpected. People who program in typed languages like Haskell prefer to limit the number of unexpected things that can occur. They also like to express ideas using types, making types useful (and machine-checked) documentation. The closer the types come to representing the intended meaning, the more useful they are.
Because there are an infinite number of possible lists, and a finite number of possible values for the Maybe type. It perfectly represents one thing or the absence of something without any other possibility.
Several answers have mentioned exhaustiveness as a factor here. I think it is a factor, but not the biggest one, because there is a way to consistently treat lists as if they were Maybes, which the listToMaybe function illustrates:
listToMaybe :: [a] -> Maybe a
listToMaybe [] = Nothing
listToMaybe (a:_) = Just a
That's an exhaustive pattern match, which rules out any straightforward errors.
The factor I'd highlight as bigger is that by using the type that more precisely models the behavior of your code, you eliminate potential behaviors that would be possible if you used a more general alternative. Say for example you have some context in your code where you uses a type of the form a -> [b], though the only correct alternatives (given your program's specification) are empty or singleton lists. Try as hard as you may to enforce the convention that this context should obey that rule, it's still possible that you'll mess up and:
Somehow a function used in that context will produce a list of two or more items;
And somehow a function that uses the results produced in that context will observe whether the lists have two or more items, and behave incorrectly in that case.
Example: some code that expects there to be no more than one value will blindly print the contents of the list and thus print multiple items when only one was supposed to be.
But if you use Maybe, then there really must be either one value or none, and the compiler enforces this.
Even though isomorphic, e.g. QuickCheck will run slower because of the increase in search space.
Or are we all sticking to our taught "&&, ||, !" way?
Any thoughts in why we should use one or the other?
I'm just wondering because several answers state thate code should be as natural as possible, but I haven't seen a lot of code with "and, or, not" while this is more natural.
I like the idea of the not operator because it is more visible than the ! operator. For example:
if (!foo.bar()) { ... }
if (not foo.bar()) { ... }
I suggest that the second one is more visible and readable. I don't think the same argument necessarily applies to the and and or forms, though.
"What's in a name? That which we call &&, || or !
By any other name would smell as sweet."
In other words, natural depends on what you are used to.
Those were not supported in the old days. And even now you need to give a special switch to some compilers to enable these keywords. That's probably because old code base may have had some functions || variables named "and" "or" "not".
One problem with using them (for me anyway) is that in MSVC you have to include iso646.h or use the (mostly unusable) /Za switch.
The main problem I have with them is the Catch-22 that they're not commonly used, so they require my brain to actively process the meaning, where the old-fashioned operators are more or less ingrained (kind of like the difference between reading a learned language vs. your native language).
Though I'm sure I'd overcome that issue if their use became more universal. If that happened, then I'd have the problem that some boolean operators have keywords while others don't, so if alternate keywords were used, you might see expressions like:
if ((x not_eq y) and (y == z) or (z <= something)) {...}
when it seems to me they should have alternate tokens for all the (at least comparison) operators:
if ((x not_eq y) and (y eq z) or (z lt_eq something)) {...}
This is because the reason the alternate keywords (and digraphs and trigraphs) were provided was not to make the expressions more readable - it was because historically there have been (and maybe still are) keyboards and/or codepages in some localities that do not have certain punctuation characters. For example, the invariant part of the ISO 646 codepage (surprise) is missing the '|', '^' and '~' characters among others.
Although I've been programming C++ from quite some time, I did not know that the keywords "and" "or" and "not" were allowed, and I've never seen it used.
I searched through my C++ book, and I found a small section mentioning alternative representation for the normal operators "&&", "||" and "!", where it explains those are available for people with non-standard keyboards that do not have the "&!|" symbols.
A bit like trigraphs in C.
Basically, I would be confused by their use, and I think I would not be the only one.
Using a representation which is non-standard, should really have a good reason to be used.
And if used, it should be used consistently in the code, and described in the coding standard.
The digraph and trigraph operators were actually designed more for systems that didn't carry the standard ASCII character set - such as IBM mainframes (which use EBCDIC). In the olden days of mechanical printers, there was this thing called a "48-character print chain" which, as its name implied, only carried 48 characters. A-Z (uppercase), 0-9 and a handful of symbols. Since one of the missing symbols was an underscore (which rendered as a space), this could make working with languages like C and PL/1 a real fun activity (is this 2 words or one word with an underscore???).
Conventional C/C++ is coded with the symbols and not the digraphs. Although I have been known to #define "NOT", since it makes the meaning of a boolean expression more obvious, and it's visually harder to miss than a skinny little "!".
I wish I could use || and && in normal speech. People try very hard to misunderstand when I say "and" or "or"...
I personally like operators to look like operators. It's all maths, and unless you start using "add" and "subtract" operators too it starts to look a little inconsistent.
I think some languages suit the word-style and some suit the symbols if only because it's what people are used to and it works. If it ain't broke, don't fix it.
There is also the question of precedence, which seems to be one of the reasons for introducing the new operators, but who can be bothered to learn more rules than they need to?
In cases where I program with names directly mapped to the real world, I tend to use 'and' and 'or', for example:
if(isMale or isBoy and age < 40){}
It's nice to use 'em in Eclipse+gcc, as they are highlighted. But then, the code doesn't compile with some compilers :-(
Using these operators is harmful. Notice that and and or are logical operators whereas the similar-looking xor is a bitwise operator. Thus, arguments to and and or are normalized to 0 and 1, whereas those to xor aren't.
Imagine something like
char *p, *q; // Set somehow
if(p and q) { ... } // Both non-NULL
if(p or q) { ... } // At least one non-NULL
if(p xor q) { ... } // Exactly one non-NULL
Bzzzt, you have a bug. In the last case you're testing whether at least one of the bits in the pointers is different, which probably isn't what you thought you were doing because then you would have written p != q.
This example is not hypothetical. I was working together with a student one time and he was fond of these literate operators. His code failed every now and then for reasons that he couldn't explain. When he asked me, I could zero in on the problem because I knew that C++ doesn't have a logical xor operator, and that line struck me as very odd.
BTW the way to write a logical xor in C++ is
!a != !b
I like the idea, but don't use them. I'm so used to the old way that it provides no advantage to me doing it either way. Same holds true for the rest of our group, however, I do have concerns that we might want to switch to help avoid future programmers from stumbling over the old symbols.
So to summarize: it's not used a lot because of following combination
old code where it was not used
habit (more standard)
taste (more math-like)
Thanks for your thoughts
Lately, I have seen a lot of questions being asked about output for some crazy yet syntactically allowed code statements like like i = ++i + 1 and i=(i,i++,i)+1;.
Frankly realistically speaking hardly anyone writes any such code in actual programing.To be frank I have never encountered any such code in my professional experience. So I usually end up skipping such questions here on SO. But lately the sheer volume of such Q's being asked makes me think if I am missing out some important theory by skipping such Q's. I gather that the such Q's revolve around Sequence points. I hardly know anything about sequence points to be frank and I am just wondering if not knowing about it is a handicap in some way. So can someone please explain the theory /concept of Sequence points, or If possible point to a resource which explains about the concept. Also, is it worth to invest time in knowing about this concept/theory?
The simplest answer I can think of is:
C++ is defined in terms of an abstract machine. The output of a program executed on the abstract machine is defined ONLY in terms of the order that "side effects" are performed. And Side effects are defined as calls into IO library functions, and changes to variables marked volatile.
C++ compilers are allowed to do whatever they want internally to optimize code, but they cannot change the order of writes to volatile variables, and io calls.
Sequence points define the c/c++ program's heartbeat - side effects before the sequence point are "complete" and side effects after the sequence point have not yet taken place. But, side effects (or, code that can effect a side effect indirectly( within a sequence point can be re-ordered.
Which is why understanding them is important. Without that understanding, your fundamental understanding of what a c++ program is (And how it might be optimized by an agressive compiler) is flawed.
See http://en.wikipedia.org/wiki/Sequence_point.
It's a quite simple concept, so you don't need to invest much time :)
The exact technical details of sequence points can get hairy, yes. But following these guideline solves almost all the practical issues:
If an expression modifies a value, there must be a sequence point between the modification and any other use of that value.
If you're not sure whether two uses of a value are separated by a sequence point or not, break up your code into more statements.
Here "modification" includes assignment operations on the left-hand value in =, +=, etc., and also the ++x, x++, --x, and x-- syntaxes. (It's usually these increment/decrement expressions where some people try to be clever and end up getting into trouble.)
Luckily, there are sequence points in most of the "expected" places:
At the end of every statement or declaration.
At the beginning and end of every function call.
At the built-in && and || operators.
At the ? in a ternary expression.
At the built-in , comma operator. (Most commonly seen in for conditions, e.g. for (a=0, b=0; a<m && b<n; ++a, ++b).) A comma which separates function arguments is not the comma operator and is not a sequence point.
Overloaded operator&&, operator||, and operator, do not cause sequence points. Potential surprises from that fact is one reason overloading them is usually discouraged.
It is worth knowing that sequence points exist because if you don't know about them you can easily write code which seems to run fine in testing but actually is undefined and might fail when you run it on another computer or with different compile options. In particular if you write for example x++ as part of a larger expression that also includes x you can easily run into problems.
I don't think it is necessary to learn all the rules fully - but you need to know when you need to check the specification, or perhaps better - when to rewrite your code to make it so that you aren't relying on sequence points rules if a simpler design would work too.
int n,n_squared;
for(n=n_squared=0;n<100;n_squared+=n+ ++n)
printf("%i squared might or might not be %i\n",n,n_squared);
... doesn't always do what you think it will do. This can make debugging painful.
The reason is the ++n retrieves, modifies, and stores the value of n, which could be before or after n is retrieved. Therefore, the value of n_squared isn't clearly defined after the first iteration. Sequence points guarantee that the subexpressions are evaluated in order.
What are the best practices for defining constants in Clojure in terms of style, conventions, efficiency, etc.
For example, is this right?
(def *PI* 3.14)
Questions:
Should constants be capitalized in Clojure?
Stylistically, should they have the asterisk (*) character on one or both sides?
Any computational efficiency considerations I should be aware of?
I don't think there is any hard and fast rules. I usually don't give them any special treatment at all. In a functional language, there is less of a distinction between a constant and any other value, because things are more often pure.
The asterisks on both sides are called "ear muffs" in Clojure. They are usually used to indicate a "special" var, or a var that will be dynamically rebound using binding later. Stuff like out and in which are occasionally rebound to different streams by users and such are examples.
Personally, I would just name it pi. I don't think I've ever seen people give constants special names in Clojure.
EDIT: Mister Carper just pointed out that he himself capitalizes constants in his code because it's a convention in other languages. I guess this goes to show that there are at least some people who do that.
I did a quick glance through the coding standards but didn't find anything about it. This leads me to conclude that it's really up to you whether or not you capitalize them. I don't think anyone will slap you for it in the long run.
On the computational efficiency front you should know there is no such thing as a global constant in Clojure. What you have above is a var, and every time you reference it, it does a lookup. Even if you don't put earmuffs on it, vars can always be rebound, so the value could always change, so they are always looked up in a table. For performance critical loops this is most decidedly non-optimal.
There are some options like putting a let block around your critical loops and let the value of any "constant" vars so that they are not looked up. Or creating a no-arg macro so that the constant value is compiled into the code. Or you could create a Java class with a static member.
See this post, and the following discussion about constants for more info:
http://groups.google.com/group/clojure/msg/78abddaee41c1227
The earmuffs are a way of denoting that a given symbol will have its own thread-local binding at some point. As such, it does not make sense to apply the earmuffs to your Pi constant.
*clojure-version* is an example of a constant in Clojure, and it's entirely in lower-case.
Don't use a special notation for constants; everything is assumed a constant unless specified otherwise.
See http://dev.clojure.org/display/community/Library+Coding+Standards
Clojure has a variety of literals such as:
3.14159
:point
{:x 0
:y 1}
[1 2 3 4]
#{:a :b :c}
The literals are constant. As far as I know, there is no way to define new literals. If you want to use a new constant, you can effectively generate a literal in the code at compile-time:
(defmacro *PI* [] 3.14159265358979323)
(prn (*PI*))
In Common Lisp, there's a convention of naming constants with plus signs (+my-constant+), and in Scheme, by prefixing with a dollar sign ($my-constant); see this page. Any such convention conflicts with the official Clojure coding standards, linked in other answers, but maybe it would be reasonable to want to distinguish regular vars from those defined with the :const attribute.
I think there's an advantage to giving non-function variables of any kind some sort of distinguishing feature. Suppose that aside from variables defined to hold functions, you typically only use local names defined by function parameters, let, etc. If you nevertheless occasionally define a non-function variable using def, then when its name appears in a function definition in the same file, it looks to the eye like a local variable. If the function is complex, you may spend several seconds looking for the name definition within the function. Adding a distinguishing feature like earmuffs or plus signs or all uppercase, as appropriate to the variable's use, makes it obvious that the variable's definition is somewhere else.
In addition, there are good reasons to give special constants like pi a special name, so no one has to wonder whether pi means, say, "print-index", or the i-th pizza, or "preserved interface". Of course I think those variables should have more informative names, but lots of people use cryptic, short variable names, and I end up reading their code. I shouldn't have to wonder whether pi means pi, so something like PI might make sense. None would think that's a run of the mill variable in Clojure.
According to the "Practical Clojure" book, it should be named *pi*
I was working with a new C++ developer a while back when he asked the question: "Why can't variable names start with numbers?"
I couldn't come up with an answer except that some numbers can have text in them (123456L, 123456U) and that wouldn't be possible if the compilers were thinking everything with some amount of alpha characters was a variable name.
Was that the right answer? Are there any more reasons?
string 2BeOrNot2Be = "that is the question"; // Why won't this compile?
Because then a string of digits would be a valid identifier as well as a valid number.
int 17 = 497;
int 42 = 6 * 9;
String 1111 = "Totally text";
Well think about this:
int 2d = 42;
double a = 2d;
What is a? 2.0? or 42?
Hint, if you don't get it, d after a number means the number before it is a double literal
It's a convention now, but it started out as a technical requirement.
In the old days, parsers of languages such as FORTRAN or BASIC did not require the uses of spaces. So, basically, the following are identical:
10 V1=100
20 PRINT V1
and
10V1=100
20PRINTV1
Now suppose that numeral prefixes were allowed. How would you interpret this?
101V=100
as
10 1V = 100
or as
101 V = 100
or as
1 01V = 100
So, this was made illegal.
Because backtracking is avoided in lexical analysis while compiling. A variable like:
Apple;
the compiler will know it's a identifier right away when it meets letter 'A'.
However a variable like:
123apple;
compiler won't be able to decide if it's a number or identifier until it hits 'a', and it needs backtracking as a result.
Compilers/parsers/lexical analyzers was a long, long time ago for me, but I think I remember there being difficulty in unambiguosly determining whether a numeric character in the compilation unit represented a literal or an identifier.
Languages where space is insignificant (like ALGOL and the original FORTRAN if I remember correctly) could not accept numbers to begin identifiers for that reason.
This goes way back - before special notations to denote storage or numeric base.
I agree it would be handy to allow identifiers to begin with a digit. One or two people have mentioned that you can get around this restriction by prepending an underscore to your identifier, but that's really ugly.
I think part of the problem comes from number literals such as 0xdeadbeef, which make it hard to come up with easy to remember rules for identifiers that can start with a digit. One way to do it might be to allow anything matching [A-Za-z_]+ that is NOT a keyword or number literal. The problem is that it would lead to weird things like 0xdeadpork being allowed, but not 0xdeadbeef. Ultimately, I think we should be fair to all meats :P.
When I was first learning C, I remember feeling the rules for variable names were arbitrary and restrictive. Worst of all, they were hard to remember, so I gave up trying to learn them. I just did what felt right, and it worked pretty well. Now that I've learned alot more, it doesn't seem so bad, and I finally got around to learning it right.
It's likely a decision that came for a few reasons, when you're parsing the token you only have to look at the first character to determine if it's an identifier or literal and then send it to the correct function for processing. So that's a performance optimization.
The other option would be to check if it's not a literal and leave the domain of identifiers to be the universe minus the literals. But to do this you would have to examine every character of every token to know how to classify it.
There is also the stylistic implications identifiers are supposed to be mnemonics so words are much easier to remember than numbers. When a lot of the original languages were being written setting the styles for the next few decades they weren't thinking about substituting "2" for "to".
Variable names cannot start with a digit, because it can cause some problems like below:
int a = 2;
int 2 = 5;
int c = 2 * a;
what is the value of c? is 4, or is 10!
another example:
float 5 = 25;
float b = 5.5;
is first 5 a number, or is an object (. operator)
There is a similar problem with second 5.
Maybe, there are some other reasons. So, we shouldn't use any digit in the beginnig of a variable name.
The restriction is arbitrary. Various Lisps permit symbol names to begin with numerals.
COBOL allows variables to begin with a digit.
Use of a digit to begin a variable name makes error checking during compilation or interpertation a lot more complicated.
Allowing use of variable names that began like a number would probably cause huge problems for the language designers. During source code parsing, whenever a compiler/interpreter encountered a token beginning with a digit where a variable name was expected, it would have to search through a huge, complicated set of rules to determine whether the token was really a variable, or an error. The added complexity added to the language parser may not justify this feature.
As far back as I can remember (about 40 years), I don't think that I have ever used a language that allowed use of a digit to begin variable names. I'm sure that this was done at least once. Maybe, someone here has actually seen this somewhere.
As several people have noticed, there is a lot of historical baggage about valid formats for variable names. And language designers are always influenced by what they know when they create new languages.
That said, pretty much all of the time a language doesn't allow variable names to begin with numbers is because those are the rules of the language design. Often it is because such a simple rule makes the parsing and lexing of the language vastly easier. Not all language designers know this is the real reason, though. Modern lexing tools help, because if you tried to define it as permissible, they will give you parsing conflicts.
OTOH, if your language has a uniquely identifiable character to herald variable names, it is possible to set it up for them to begin with a number. Similar rule variations can also be used to allow spaces in variable names. But the resulting language is likely to not to resemble any popular conventional language very much, if at all.
For an example of a fairly simple HTML templating language that does permit variables to begin with numbers and have embedded spaces, look at Qompose.
Because if you allowed keyword and identifier to begin with numberic characters, the lexer (part of the compiler) couldn't readily differentiate between the start of a numeric literal and a keyword without getting a whole lot more complicated (and slower).
C++ can't have it because the language designers made it a rule. If you were to create your own language, you could certainly allow it, but you would probably run into the same problems they did and decide not to allow it. Examples of variable names that would cause problems:
0x, 2d, 5555
One of the key problems about relaxing syntactic conventions is that it introduces cognitive dissonance into the coding process. How you think about your code could be deeply influenced by the lack of clarity this would introduce.
Wasn't it Dykstra who said that the "most important aspect of any tool is its effect on its user"?
The compiler has 7 phase as follows:
Lexical analysis
Syntax Analysis
Semantic Analysis
Intermediate Code Generation
Code Optimization
Code Generation
Symbol Table
Backtracking is avoided in the lexical analysis phase while compiling the piece of code. The variable like Apple, the compiler will know its an identifier right away when it meets letter ‘A’ character in the lexical Analysis phase. However, a variable like 123apple, the compiler won’t be able to decide if its a number or identifier until it hits ‘a’ and it needs backtracking to go in the lexical analysis phase to identify that it is a variable. But it is not supported in the compiler.
When you’re parsing the token you only have to look at the first character to determine if it’s an identifier or literal and then send it to the correct function for processing. So that’s a performance optimization.
Probably because it makes it easier for the human to tell whether it's a number or an identifier, and because of tradition. Having identifiers that could begin with a digit wouldn't complicate the lexical scans all that much.
Not all languages have forbidden identifiers beginning with a digit. In Forth, they could be numbers, and small integers were normally defined as Forth words (essentially identifiers), since it was faster to read "2" as a routine to push a 2 onto the stack than to recognize "2" as a number whose value was 2. (In processing input from the programmer or the disk block, the Forth system would split up the input according to spaces. It would try to look the token up in the dictionary to see if it was a defined word, and if not would attempt to translate it into a number, and if not would flag an error.)
Suppose you did allow symbol names to begin with numbers. Now suppose you want to name a variable 12345foobar. How would you differentiate this from 12345? It's actually not terribly difficult to do with a regular expression. The problem is actually one of performance. I can't really explain why this is in great detail, but it essentially boils down to the fact that differentiating 12345foobar from 12345 requires backtracking. This makes the regular expression non-deterministic.
There's a much better explanation of this here.
it is easy for a compiler to identify a variable using ASCII on memory location rather than number .
I think the simple answer is that it can, the restriction is language based. In C++ and many others it can't because the language doesn't support it. It's not built into the rules to allow that.
The question is akin to asking why can't the King move four spaces at a time in Chess? It's because in Chess that is an illegal move. Can it in another game sure. It just depends on the rules being played by.
Originally it was simply because it is easier to remember (you can give it more meaning) variable names as strings rather than numbers although numbers can be included within the string to enhance the meaning of the string or allow the use of the same variable name but have it designated as having a separate, but close meaning or context. For example loop1, loop2 etc would always let you know that you were in a loop and/or loop 2 was a loop within loop1.
Which would you prefer (has more meaning) as a variable: address or 1121298? Which is easier to remember?
However, if the language uses something to denote that it not just text or numbers (such as the $ in $address) it really shouldn't make a difference as that would tell the compiler that what follows is to be treated as a variable (in this case).
In any case it comes down to what the language designers want to use as the rules for their language.
The variable may be considered as a value also during compile time by the compiler
so the value may call the value again and again recursively
Backtracking is avoided in lexical analysis phase while compiling the piece of code. The variable like Apple; , the compiler will know its a identifier right away when it meets letter ‘A’ character in the lexical Analysis phase. However, a variable like 123apple; , compiler won’t be able to decide if its a number or identifier until it hits ‘a’ and it needs backtracking to go in the lexical analysis phase to identify that it is a variable. But it is not supported in compiler.
Reference
There could be nothing wrong with it when comes into declaring variable.but there is some ambiguity when it tries to use that variable somewhere else like this :
let 1 = "Hello world!"
print(1)
print(1)
print is a generic method that accepts all types of variable. so in that situation compiler does not know which (1) the programmer refers to : the 1 of integer value or the 1 that store a string value.
maybe better for compiler in this situation to allows to define something like that but when trying to use this ambiguous stuff, bring an error with correction capability to how gonna fix that error and clear this ambiguity.