Clojure printing functions: pr vs print - clojure

What is the difference between pr/prn and print/println?
When would one be used over the other?

They differ in the following ways.
print/println produce output intended for human consumption
pr/prn produce output that may be read by the reader
So use the former functions when producing output for humans, and the latter for when producing output for other Clojure programs to consume.
In the case of pr/prn, strings will be quoted, and special characters escaped. Characters will also be escaped outside of strings.
For example:
=> (println "Hello\nworld" \!)
Hello
world !
=> (prn "Hello\nworld" \!)
"Hello\nworld" \!

Related

Parsing quotes within a string literal

Why do strings in almost all languages require that you escape the quotations?
for instance if you have a string such as
"hello world""
why do languages want you to write it as
"hello world\""
Do you not only require that the string starts and ends with a quotation?
You can treat the end quote as the terminating quote for the string. If there is no end quote then there is an error. You can also assume that a string starts and ends on a single line and does not span multiple lines.
Suppose I want to put ", " into a string literal (so the literal contains quotes).
If I did that without escaping, I’d write "", "". This looks like two empty string literals separated by a comma. If I want to, for example, call a function with this string literal, I would write f("", ""). This looks to the compiler like I am passing two arguments, both empty strings. How can it know the difference?
The answer is, it can’t. Perhaps in simple cases like "hello world"", it might be able to figure it out, for at least some languages. But the set of strings which were unambiguous and didn’t need escaping would be different for different languages and it would be hard to keep track of which was which, and for any language there would be some ambiguous case which would need escaping anyway. It is much easier for the compiler writer to skip all those edge cases and just always require you to escape quotation marks, and it is probably also easier for the programmer.
Otherwise, the compiler would see the second quotation mark as the end of you string, and then a random quotation mark following it, causing an error.
"The use of the word "escape" really means to temporarily escape out of parsing the text and into a another mode where the subsequent character is treated differently." Source: https://softwareengineering.stackexchange.com/questions/112731/what-does-backslash-escape-character-really-escape
How would the compiler know which quote ended the string?
UPDATE:
In C & C++, this is a perfectly fine string:
printf("Hel" "lo" "," "Wor""ld" "!");
It prints Hello, World!
Or how 'bout is C#
Console.WriteLine("Hello, "+"World!");
Now should that print Hello, World or Hello, "+"World! ?
The reason you have to escape the second quotation mark is so the compiler knows that the quotation mark is part of the string, and not a terminator. If you weren't escaping it, the compiler would only pick up hello world rather than hello world"
Lets do a practical example.
How should this be translated?
"Hello"+"World"
'HelloWorld' or 'Hello"+"World'
vs
"Hello\"+\"World"
By escaping the quote characters, you remove the ambiguity, and code should have 0 ambiguity to the compiler. All compilers should compile the same code to identical executable's. It's basically a way of telling the compiler "I know this looks weird, but I really mean that this is how it should look"

Is that because Clojure is limited by JVM so this code can't evaluate?

code include something like '(1+2) in Clojure will cause a java.lang.RuntimeException, which leaves a error message "Unmatched delimiter: )".
But in any other lisp dialect I've ever used like Emacs Lisp or Racket, '(1+2) will just return a list, which should act like this because with the special form quote, anything in the list should not be evaluate.
So I just wonder is that because of the limitation of JVM so these codes can't act like how they act in other dialects? Or is it a bug of Clojure? Or maybe there is something different between the definition of quote in Clojure and other lisp dialects?
These are artifacts of the way tokenizers are set in different languages. In Clojure, if a token starts with a digit, it is consumed until the next reader macro character (that includes parentheses among other things,) whitespace or end of file (whitespace includes comma.) And what's consumed must be a valid number, which includes integer, float and rational. So when you feed '(1+2) to the reader, it consumes 1+2 as one token, which then fails to match against integer, float or rational number patterns. After that, the reader tries to recover, which resets its state. In this state, a ) is unmatched.
Try to enter '(1 + 2) instead (mind the spaces around +,) you will see exactly what you expect.

What is the purpose of * in Fortran input/output

I am learning Fortran because well, yeah, thought I'd learn it. However, I've found utterly no information on what the purpose of * is in print, read, etc:
program main
print *, "Hello, world!"
end program main
What is the purpose of this *? I've done some research however I don't think I understand it properly, I don't want to carry on learning until I actually know the point of *.
From what I've managed to find I think that it's some sort of format specifier, however, I don't understand what that means and I have no idea if I'm even correct. All the tutorials I've found just tell me what to write to print to the console but not what * actually means.
You are correct in that it is a format specifier.
There's a page on Wikibooks to do with IO and that states:
The second * specifies the format the user wants the number read with
when talking about the read statement. This applies to the write and print statements too.
For fixed point reals: Fw.d; w is the total number of spaces alloted for the number, and d is the number of decimal places.
And the example they give is
REAL :: A
READ(*,'(F5.2)') A
which reads a real number with 2 digits before and after the decimal point. So if you were printing a real number you'd use:
PRINT '(F5.2)', A
to get the rounded output.
In your example you're just printing text so there's no special formatting to do. Also if you leave the format specifier as * it will apply the default formatting to reals etc.
The print statement does formatted output to a special place (which we assume is the screen). There are three available ways to specify the format used in this output. One of these, as in the question, is using *.
The * format specifier means that the formatted output will be so-called list-directed output. Conversely, for a read statement, it will be list-directed input. Now that the term has been introduced you will be able to find many other questions here: list-directed input/output has many potentially surprising consequences.
Generally, a format specification item says how to format an item in the input/output list. If we are writing out a real number we'd use a format item which may, say, state the precision of the output or whether it uses scientific notation.
As in the question, when writing out a character variable we may want to specify the width of the output field (so that there is blank padding, or partial output) or not. We could have
print '(A20)', "Hello, world!"
print '(A5)', "Hello, world!"
print '(A)', "Hello, world!"
which offers a selection of effects. [Using a literal character constant like '(A)' is one of the other ways of giving the format.]
But we couldn't have
print '(F12.5)', "Hello, world!" ! A numeric format for a character variable
as our format doesn't make sense with the output item.
List-directed output is different from the above cases. Instead, (quote here and later from the Fortran 2008 draft standard) this
allows data editing according to the type of the list item instead of by a format specification. It also allows data to be free-field, that is, separated by commas (or semicolons) or blanks.
The main effect, and why list-directed is such a convenience, is that no effort is required on the part of the programmer to craft a suitable format for the things to be sent out.
When an output list is processed under list-directed output and a real value appears it is as though the programmer gave a suitable format for that value. And the same for other types.
But the crucial thing here is "a suitable format". The end user doesn't get to say A5 or F12.5. Instead of that latter the compiler gets to choose "reasonable processor-dependent values" for widths, precisions and so on. [Leading to questions like "why so much white space?".] The constraints on the output are given in, for example, Fortran 2008 10.10.4.
So, in summary * is saying, print out the following things in a reasonable way. I won't complain how you choose to do this, even if you give me output like
5*0
1 2
rather than
0 0 0 0 0 1 2
and I certainly won't complain about having a leading space like
Hello, world!
rather than
Hello, world!
For, yes, with list-directed you will (except in cases beyond this answer) get an initial blank on the output:
Except for new records created by [a special case] or by continuation of delimited character sequences, each output record begins with a blank character.

String matching in emacs lisp matching arbitary string

In emacs lisp I only know the functions string-match[-p], but I know no method for matching a literal string to a string.
E.g. assume that I have a string generated by some function and want to know if another string contains it. In many cases string-match-p will work fine, but when the generated string contains regexp syntax, it will result in unexpected behaviour, maybe even crash if the regular expression syntax contained is invalid (e.g. unbalanced quoted parentheses \(, \)).
Is the some function in emacs lisp, that is similiar to string-match-p but doesn't interpret regular expression syntax?
As regexp-matching is implemented in C I assume that matching the correct regexp is faster than some substring/string= loop; Is there some method to escape an arbitrary string into a regular expression that matches that string and only that string?
Are you looking for regexp-quote?
The docs say:
(regexp-quote STRING)
Return a regexp string which matches exactly STRING and nothing else.
And I don't know that your assumption in #2 is correct, string= should be faster...
Either use regexp-quote as recommended by #trey-jackson, or don't use strings at all.
Emacs is not optimized for string handling; it is optimized for buffers. So, if you manipulate text, you might find it faster to create a temporary buffer, insert your text there, and then use search-forward to find your fixed string (non-regexp) in that buffer.
Perhaps cl-mismatch, an analogue to Common Lisp mismatch function? Example usage below:
(mismatch "abcd" "abcde")
;; 4
(mismatch "abcd" "aabcd" :from-end t)
;; -1
(mismatch "abcd" "aabcd" :start2 1)
;; nil
Ah, sorry, I didn't understand the question the first time. If you want to know whether the string is a substring of another string (may start at any index in the searched string), then you could use cl-search, again, an analogue of Common Lisp search function.
(search "foo\\(bar" "---foo\\(bar")
;; 3

Derive RegExp from set of strings

Imagine there is an arbitrary set of strings. We now suppose that they are all equal beside a few succeeding characters (if this assumption does not hold I'm fine with returning an error). I now want to derive a regular expression to identify the portion of the strings that is different.
Input:
"Hello Alice, I'm Bob.", "Hello John, I'm Bob.", "Hello Josh, I'm Bob."
Output:
"Hello (.+), I'm Bob."
Input:
"Monday", "Tree", "Dog"
Output:
Error
Maybe finding the longest common substrings or the Levenshtein distance could help? I'm not sure yet if one of them really applies to my problem or how to use them to solve it.
You had a problem and decided to use regexp to solve it -- now you have two problems. :-)
All kidding aside, you can break this down into two steps:
Identify differences between strings.
Look at all the differences and figure out a regexp to match them.
For (1), it's a matter of using a diff-computing library in your language (like difflib in Python) to find a list of identical regions between two strings. If all strings have common segments, then compare string-1 to each of string-[2..N] to analyze the resulting identical blocks (you have to be smart about comparing both the contents of each block and its position relative to other identical blocks). Extract and record text between the identical blocks too.
For your example, you'd get two identical block every time you compare: "Hello " and ", I'm Bob.".
The text between the identical blocks will be these strings: "Alice", "John", "Josh".
For (2), the most trivial solution is to combine your findings into a quite literal regexp composed of:
Hello + (Alice|John|Josh) + , I'm Bob.
Or, replace any segment between the same identical blocks found in all strings with .*. Consider making that a non-greedy match -- .*?.
I don't know automata theory and can't help you with DFA/NFA, but that's a solid direction to go if you needed more precision.