Pithy summary for comonad. (Where a monad is a 'type for impure computation') - monads

In terms of pithy summaries - this description of Monads seems to win - describing them as a 'type for impure computation'.
What is an equivalent pithy (one-sentence) description of a comonad?

"A type for context-dependent computation"
Alternatively, a better "pithy description" for monads might be a 'type for output impurity', in which case then the pithy description for comonads is a 'type for input impurity'.
(If you are interested in comonads, some more introduction is given in some talks slides of mine: http://www.cl.cam.ac.uk/~dao29/talks/comonads-and-codo-talk-dorchard-2011.pdf)

Related

Why is it the case that "in the technical sciences" propositional logic it is accepted that If P = FALSE then Q = TRUE = TRUE?

P = False
Q = True
Every calculator I use states: If P then Q = True. This is commented on in the sources I can find: http://www.math.hawaii.edu/~ramsey/Logic/IfThen.html
Most of the question is in the title. I'm wondering not why this is the case, exactly, but why it is accepted that this is the case. What is the underlying decision that resulted in this being considered the norm?
First, the logical statement is true in classical propositional logic. But, there are many logics that treat the logical connectives different. Please note that in "technical sciences", such as Computer Science, many non-classical logics are applied. Famous examples are Prolog, Stable Model Semantics, Answer Set Programming. In such logics, statements of the form "if p, then q" are completed in the sense to translate the "if then" into an "if and only if". This means that "if p then q" is true if and only if both are false or both are true.
Consequently, we use logics with different properties in different contexts. Classical logic is often appropriate when talking about mathematical statements; please note that interactive theorem provers, such as Coq, that are used to prove theorems, do not rely on classical logics. Classical as well as other logics are also studies in "non-technical fields" such as Philosophy and Psychology.
To make your answer short: Classical logic goes back to Aristotle, and is a reasonable understanding of "if then". If it would not exist, we would invent a new logical connective or would use just negation and disjunction. However, this definition of implication is crtisized as it does not meet the expecation of non-logically trained humans (see my paper)
In answer to this question, I found that I was essentially guilty of projecting meaning onto a system that made no claims to be modelling reality. Remember, this is a arbitrary model in classical logic. If it is defined that this is how this logic works, that is how it works. It needs no further justification than that.
The further discussion is 'does this model reality well?'. This is apparently a recognized field of debate. To further explore the debate around the use of this form of classical logic, I would point you towards Wittgenstein's "Meaning is Use" sections in his "Philosophical Investigations" book and the arguments presented in "Wittgenstein on Meaning".

Is it correct to say that there is no implied ordering in the presentation of grammar options in the C++ Standard?

I'll try to explain my question with an example. Consider the following grammar production in the C++ standard:
literal:
   integer-literal
   character-literal
   floating-point-literal
   string-literal
   boolean-literal
   pointer-literal
   user-defined-literal
Once the parser identifies a literal as an integer-literal, I always thought that the parser would just stop there. But I was told that this is not true. The parser will continue parsing to verify whether the literal could also be matched with a user-defined-literal, for example.
Is this correct?
Edit
I decided to include this edit as my interpretation of the Standard, in response to #rici's excellent answer below, although with a result that is the opposite of the one advocated by the OP.
One can read the following in [stmt.ambig]/1 and /3 (emphases are mine):
[stmt.ambig]/1
There is an ambiguity in the grammar involving
expression-statements and declarations: An expression-statement with a
function-style explicit type conversion as its leftmost subexpression
can be indistinguishable from a declaration where the first declarator
starts with a (. In those cases the statement is a declaration.
That is, this paragraph states how ambiguities in the grammar should be treated. There are several other ambiguities mentioned in the C++ Standard, but only three that I know are ambiguities related to the grammar, [stmt.ambig], [dcl.ambig.res]/1, a direct consequence of [stmt.ambig] and [expr.unary.op]/10, which explicitly states the term ambiguity in the grammar.
[stmt.ambig]/3:
The disambiguation is purely syntactic; that is, the meaning of the
names occurring in such a statement, beyond whether they are
type-names or not, is not generally used in or changed by the
disambiguation. Class templates are instantiated as necessary to
determine if a qualified name is a type-name. Disambiguation
precedes parsing, and a statement disambiguated as a declaration may
be an ill-formed declaration. If, during parsing, a name in a template
parameter is bound differently than it would be bound during a trial
parse, the program is ill-formed. No diagnostic is required. [ Note:
This can occur only when the name is declared earlier in the
declaration. — end note ]
Well, if disambiguation precedes parsing there is nothing that could prevent a decent compiler to optimize parsing by just considering that the alternatives present in each definition of the grammar are indeed ordered. With that in mind, the first sentence in [lex.ext]/1 below could be eliminated.
[lex.ext]/1:
If a token matches both user-defined-literal and another literal kind,
it is treated as the latter. [ Example: 123_­km is a
user-defined-literal, but 12LL is an integer-literal. — end example ]
The syntactic non-terminal preceding the ud-suffix in a
user-defined-literal is taken to be the longest sequence of characters
that could match that non-terminal.
Note also that this paragraph doesn't mention ambiguity in the grammar, which for me at least, is an indication that the ambiguity doesn't exist.
There is no implicit ordering of productions in the C++ presentation grammar.
There are ambiguities in that grammar, which are dealt with on a case-by-case basis by text in the standard. Note that the text of the the standard is normative; the grammar does not stand alone, and it does not override the text. The two need to be read together.
The standard itself points out that the grammar as resumed in Appendix A:
… is not an exact statement of the language. In particular, the grammar described here accepts a superset of valid C++ constructs. Disambiguation rules (8.9, 9.2, 11.8) must be applied to distinguish expressions from declarations. Further, access control, ambiguity, and type rules must be used to weed out syntactically valid but meaningless constructs. (Appendix A, paragraph 1)
That's not a complete list of the ambiguities resolved in the text of the standard, because there are also rules about lexical ambiguities. (See below.)
Almost all of these ambiguity resolution clauses are of the form "if both P and Q applies, choose Q", and thus would be unnecessary were there an implicit ordering of grammar alternatives, since the correct parse could be guaranteed simply by putting the alternatives in the correct order. So the fact that the standard feels the need to dedicate a number of clauses to ambiguity resolution is prima facie evidence that alternatives are not implicitly ordered. [Note 1]
The C++ standard does not explicitly name the grammar formalism being used, but it does credit the antecedents which allows us to construct a historical argument. The formalism used by the C++ standard was inherited from the C standard and the description in Kernighan & Ritchie's original book on the (then newly-minted) C language. K&R wrote their grammar using the Yacc parser generator, and the original C grammar is basically a Yacc grammar file. Yacc uses the LALR(1) algorithm to construct a parser from a context-free grammar (CFG), and its grammar files are a concrete representation of that grammar written in what has come to be known as BNF (although there is some historical ambiguity about what the letters in BNF actually stand for). BNF does not have any implicit ordering of rules, and the formalism does not allow any way to write an explicit ordering or any other disambiguation rule. (A BNF grammar must be unambiguous in order to be mechanically parsed; if it is ambiguous, the LALR(1) algorithm will fail to generate a parser.)
Yacc does go a bit outside of the box. It has some automatic disambiguation rules, and one mechanism to provide explicit disambiguation (operator precedence). But Yacc's disambiguation has nothing to do with the ordering of alternatives either.
In short, ordered alternatives were not really a feature of any grammar formalism until 2002 when Bryan Ford proposed packrat parsing, and subsequently formalised a class of grammars which he called "Parsing Expression Grammars" (PEGs). The PEG algorithm does implicitly order alternatives, by insisting that the right-hand alternative in an alternation only be attempted if the left-hand alternative failed to match. For this reason, the PEG alternation operator (or "ordered alternation" operator) is generally written as / instead of |, avoiding confusion with the traditional unordered alternation syntax.
A key feature of the PEG algorithm is that it is always deterministic. Every PEG grammar can be deterministically applied to a source text without ambiguity. (That doesn't mean that the grammar will give you the parse you wanted, of course. It just means that it will never give you a list of parses and let you select the one you want.) So grammars written in PEG cannot be accompanied by textual rules which disambiguate, because there are no ambiguities.
I mention this because the existence and popularity of PEG have to some extent altered the perception of the meaning of the alternation operator. Before PEG, we probably wouldn't be having this kind of discussion at all. But using PEG as a guide to interpreting the C++ grammar formalism is ahistoric and unjustifiable; the roots of the C++ grammar go back to at least 1978, at least a quarter of a century before PEG.
Lexical ambiguities, and the clauses which resolve them
[lex.pptoken] (§5.4) paragraph 3 lays down the fundamental rules for token recognition, which is a little more complicated than the traditional "maximal munch" principle which always recognises the longest possible token starting immediately after the previously recognised token. It includes two exceptions:
The sequence <:: is treated as starting with the token < rather than the longer token <: unless it is the start of <::> (treated as <:, :>) or <::: (treated as <:, ::). That might all make more sense if you mentally replace <: with [ and :> with ], which is the intended syntactic equivalence.
A raw string literal is terminated by the first matching delimiter sequence. This rule could in theory be written in a context-free grammar only because there is an explicit limit on the length of termination sequences, which means that the theoretical CFG would have on the order of 8816 rules, one for each possible delimiter sequence. In practice, this rule cannot be written as such, and it is described textually, along with the 16-character limit on the length of the d-char-sequence.
[lex-header] (§5.8) avoids the ambiguity between header-names and string-literals (as well as certain token sequences starting with <) by requiring header-name to only be recognised in certain contexts, including an #include preprocessing directive. (The section does not actually say that the string-literal should not be recognised, but I think that the implication is clear.)
[lex.ext] (§5.13.8) paragraph 1 resolves the ambiguities involved with user-defined-literals, by requiring that:
the user-defined-literal rule is only recognised if the token cannot be recognised as some other kind of literal, and
the decomposition of the user-defined-literal into a literal followed by a ud-suffix follows the longest-token rule, described above.
Note that this rule is not really a tokenisation rule, because it is applied after the source text has been divided into tokens. Tokenisation is done in translation phase 3, after which the tokens are passed through the preprocessing directives (phase 4), rewriting of escape sequences and UCNs (phase 5), and concatenation of string literals (phase 6). Each token which emerges from phase 6 must then be reinterpreted as a token in the syntactic grammar, and it is at that point that literal tokens will be classified. So it's not necessary that §5.13.8 clarify what the extent of the token being categorised is; the extent is already known and the converted token must use exactly all of the characters in the preprocessing token. Thus it's quite different from the other ambiguities in this list, but I left it here because it is so present in the original question and in the various comment threads.
Notes:
Curiously, in almost all of the ambiguity resolution clauses, the preferred alternative is the one which appears later in the list of alternatives. For example, §8.9 explicitly prefers declarations to expressions, but the grammar for statement lists expression-statement long before declaration-statement. Having said that, correctly parsing C++ requires a more sophisticated algorithm than just "try to parse a declaration and if that fails, then try to parse as an expression," because there are programs which must be parsed as a declaration with a syntax error (see the example at [stmt.ambig]/3).
No ordering is either implied or necessary.
All seven kinds of literal are distinct. No token that meets the definition of any of them can meet the definition of any other. For example, 42 is an integer-literal and cannot be a floating-point-literal.
How a compiler determines what a token is is an implementation detail that the standard doesn't address, and doesn't need to.
If there were an ambiguity, so that for example the same token could be either an integer-literal or a user-defined-literal, either the language would have to have a rule to disambiguate it, or it would be a bug in the grammar.
UPDATE: There is in fact such an ambiguity. As discussed in comments, 42ULL satisfies the syntax of either an integer-literal or a user-defined-literal. This ambiguity is resolved, not by the ordering of the grammar productions, but by an explicit statement:
If a token matches both user-defined-literal and another literal kind, it is treated as the latter.
The section on syntactic notation in the standard only says this about what it means:
In the syntax notation used in this document, syntactic categories are indicated by italic type, and literal words and characters in constant width type. Alternatives are listed on separate lines except in a few cases where a long set of alternatives is marked by the phrase “one of”. If the text of an alternative is too long to fit on a line, the text is continued on subsequent lines indented from the first one. An optional terminal or non-terminal symbol is indicated by the subscript “opt”, so
{ expressionopt }
indicates an optional expression enclosed in braces.
Note that the statement considers the terms in grammars to be "alternatives", rather than a list or even an ordered list. There is no statement about ordering of the "alternatives" at all.
As such, this strongly suggests that there is no ordering at all.
Indeed, the presence throughout the standard of specific rules to disambiguate cases where multiple terms match also suggests that the alternatives are not written as a prioritized list. If the alternatives were some kind of ordered list, this statement would be redundant:
If a token matches both user-defined-literal and another literal kind, it is treated as the latter.

Pithy summary for codata (Where a comonad is a 'type for input impurity')

In terms of pithy summaries - this description of Comonads seems to win - describing them as a 'type for input impurity'.
What is an equivalent pithy (one-sentence) description for codata?
"Codata are types inhabited by values that may be infinite"
This contrasts with "data" which is inhabited only by finite values. For example, if we take the "data" definition of lists, it is inhabited by lists of finite length (as in ML), but if we take the "codata" definition it is inhabited also by infinite length lists (as in Haskell, e.g. x = 1 : x).
Comonads and codata are not necessarily related (although perhaps some might think so due to Kieburtz' paper Comonads and codata in Haskell).

Why do miniKanren names always end with `o`?

All miniKanren relations end with the letter o. What is the motivation for this?
I see that the Clojure core.logic library also does this.
In the Preface of The Reasoned Schemer, they explain it thus:
A relation, a function that returns a goal as its value, ends its name with a superscript 'o' (e.g., caro and nullo).
So, it's a notation to denote a relation.
It's because the authors of The Reasoned Schemer wanted the notation of miniKanren relations to be evocative of ordinary Scheme predicates which end in ? (e.g., null?, pair?) by convention. The superscript o, if you squint enough (and have heard this story before), looks like a modified ?.

What are the differences between PEGs and CFGs?

From this wikipedia page:
The fundamental difference between
context-free grammars and parsing
expression grammars is that the PEG's
choice operator is ordered. If the
first alternative succeeds, the second
alternative is ignored. Thus ordered
choice is not commutative, unlike
unordered choice as in context-free
grammars and regular expressions.
Ordered choice is analogous to soft
cut operators available in some logic
programming languages.
Why does PEG's choice operator short circuits the matching? Is it because to minimize memory usage (due to memoization)?
I'm not sure what the choice operator is in regular expressions but let's suppose it is this: /[aeiou]/ to match a vowel. So this regex is commutative because I could have written it in any of the 5! (five factorial) permutations of the vowel characters? i.e. /[aeiou]/ behaves the same as /[eiaou]/. What is the advantage of it being commutative? (c.f. PEG's non-commutativity)
The consequence is that if a CFG is
transliterated directly to a PEG, any
ambiguity in the former is resolved by
deterministically picking one parse
tree from the possible parses. By
carefully choosing the order in which
the grammar alternatives are
specified, a programmer has a great
deal of control over which parse tree
is selected.
Is this saying that PEG's grammar is superior to CFG's?
A CFG grammar is non-deterministic, meaning that some input could result in two or more possible parse-trees. Though most CFG-based parser-generators have restrictions on the determinability of the grammar. It will give a warning or error if it has two or more choices.
A PEG grammar is deterministic, meaning that any input can only be parsed one way.
To take a classic example; The grammar
if_statement := "if" "(" expr ")" statement "else" statement
| "if" "(" expr ")" statement;
applied to the input
if (x1) if (x2) y1 else y2
could either be parsed as
if_statement(x1, if_statement(x2, y1, y2))
or
if_statement(x1, if_statement(x2, y1), y2)
A CFG-parser would generate a Shift/Reduce-conflict, since it can't decide if it should shift (read another token), or reduce (complete the node), when reaching the "else" keyword. Of course, there are ways to get around this problem.
A PEG-parser would always pick the first choice.
Which one is better is for you to decide. My opinion is that often PEG-grammars is easier to write, and CFG grammars easier to analyze.
I think you're confusing CFG with LR and with ambiguity. Grammars are not deterministic/nondeterministic, though their parsers may be. An ambiguous grammar is still CFG if it complies with the definition, and a deterministic parser can be built for it doing what PEG does.
PEGs and CFGs are two different ways of specifying a language. If you write a parser by hand, chances are very good that you will write a so-called recursive descent parser. A recursive descent parser will automatically resolve any ambiguities in your grammar, but does so silently and likely not in the way you would have wanted. The problem with this is that you never find out that there were ambiguities that got automatically resolved, unless you thoroughly test your parser. PEGs are basically a formalization of recursive descent parsers, and so come with this problem. For examples of this problem see How does backtracking affect the language recognized by a parser?, and https://cs.stackexchange.com/questions/143480/dragon-book-4-4-5-exercise/143975.
CFGs have a lot of theory to back them up, but PEGs not so much. The set of languages that can be encoded by CFG and those that can be encoded by PEG partially overlap, but neither encompasses the other.
For a more thorough review of this I recommend the excellent essay Which Parsing Approach?