In SML, can you convert ".3" to the real "0.3"? - sml

I'm pretty new to SML and I've been using SML/NJ.
Let's say I have the following simple function:
fun test(x) = x / 2.0;
test(0.3); returns 0.15.
I'd like for it to also work with test(.3);
Right now I'm getting the following error:
- test(.3);
stdIn:23.6-23.9 Error: syntax error: deleting DOT INT RPAREN
Of course, I'd like it to work with any real of the form 0.X.
Is this doable? Thank you!

"A real constant is an integer constant, possibly followed by a point (.) and one or more digits, possibly followed by an exponent symbol E and an integer constant; at least one of the optional parts must occur, hence no integer constant is a real constant. Examples: 0.7, +3.32E5, 3E~7 . Non-examples: 23, .3, 4.E5, 1E2.0 ."
from: Definition of Standard ML Version 2 [Robert Harper, Robin Milner, Mads Tofte] 1988
Update:
The Definition of Standard ML (Revised) 1997 modifies the passage to:
an exponent symbol (E or e ) and an integer constant in decimal
notation;

It appears that Reals must have something before the decimal point, even if it's a zero, at least in the implementation of SML that you're using.
I can't find anything about this in the libraries or specification, so it could be specific to the implementation, but it is also true that all of the examples in both of those places do always put a zero before the decimal point, so it may, in fact, be a requirement of the language itself.

Related

C++ single quotes in the middle of a number [duplicate]

As of C++14, thanks to n3781 (which in itself does not answer this question) we may write code like the following:
const int x = 1'234; // one thousand two hundred and thirty four
The aim is to improve on code like this:
const int y = 100000000;
and make it more readable.
The underscore (_) character was already taken in C++11 by user-defined literals, and the comma (,) has localisation problems — many European countries bafflingly† use this as the decimal separator — and conflicts with the comma operator, though I do wonder what real-world code could possibly have been broken by allowing e.g. 1,234,567.
Anyway, a better solution would seem to be the space character:
const int z = 1 000 000;
These adjacent numeric literal tokens could be concatenated by the preprocessor just as are string literals:
const char x[5] = "a" "bc" "d";
Instead, we get the apostrophe ('), not used by any writing system I'm aware of as a digit separator.
Is there a reason that the apostrophe was chosen instead of a simple space?
† It's baffling because all of those languages, within text, maintain the notion of a comma "breaking apart" an otherwise atomic sentence, with a period functioning to "terminate" the sentence — to me, at least, this is quite analogous to a comma "breaking apart" the integral part of a number and a period "terminating" it ready for the fractional input.
There is a previous paper, n3499, which tell us that although Bjarne himself suggested spaces as separators:
While this approach is consistent with one common typeographic style, it suffers from some compatibility problems.
It does not match the syntax for a pp-number, and would minimally require extending that syntax.
More importantly, there would be some syntactic ambiguity when a hexadecimal digit in the range [a-f] follows a space. The preprocessor would not know whether to perform symbol substitution starting after the space.
It would likely make editing tools that grab "words" less reliable.
I guess the following example is the main problem noted:
const int x = 0x123 a;
though in my opinion this rationale is fairly weak. I still can't think of a real-world example to break it.
The "editing tools" rationale is even worse, since 1'234 breaks basically every syntax highlighter known to mankind (e.g. that used by Markdown in the above question itself!) and makes updated versions of said highlighters much harder to implement.
Still, for better or worse, this is the rationale that led to the adoption of apostrophes instead.
The obvious reason for not using white space is that a new line is also
white space, and that C++ treats all white space identically. And off
hand, I don't know of any language which accepts arbitrary white space
as a separator.
Presumably, Unicode 0xA0 (non-breaking space) could be used—it is
the most widely used solution when typesetting. I see two problems with
that, however: first, it's not in the basic character set, and second,
it's not visually distinctive; you can't see that it isn't a space by
just looking at the text in a normal editor.
Beyond that, there aren't many choices. You can't use the comma, since
that is already a legal token (and something like 1,234 is currently
legal C++, with the meaning 234). And in a context where it could occur
in legal code, e.g. a[1,234]. While I can't quite imagine any real
code actually using this, there is a basic rule that no legal program,
regardless how absurd, should silently change semantics.
Similar considerations mean that _ can't be used either; if there is a
#define _234 * 2, then a[1_234] would silently change the meaning of
the code.
I can't say that I'm particularly pleased with the choice of ', but it
does have the advantage of being used in continental Europe, at least in
some types of texts. (I seem to remember having seen it in German, for
example, although in typical running text, German, like most other
languages, will use a point or a non breaking space. But maybe it was
Swiss German.) The problem with ' is parsing; the sequence '1' is
already legal, as is '123'. So something like 1'234 could be a 1,
followed by the start of a character constant; I'm not sure how far you
have to look-ahead to make the decision. There is no sequence of legal
C++ in which an integral constant can be followed by a character
constant, so there's no problem with breaking legal code, but it means
that lexical scanning suddenly becomes very context dependent.
(With regards to your comment: there is no logic in the choice of a
decimal or a thousands separator. A decimal separator, for example, is
certainly not a full stop. They are just arbitrary conventions.)
From wiki, we have a nice example:
auto floating_point_literal = 0.000'015'3;
Here, we have the . operator and then if another operator would be to be met, my eyes would wait for something visible, like a comma or something, not a whitespace.
So an apostrophe does much better here than a whitespace would do.
With whitespaces it would be
auto floating_point_literal = 0.000 015 3;
which doesn't feel as right as the case with the apostrophes.
In the same spirit of Albert Renshaw's answer, I think that the apostrophe is more clear than the space the Lightness Races in Orbit proposes.
type a = 1'000'000'000'000'000'544'445'555;
type a = 1 000 000 000 000 000 544 445 555;
Space is used for many things, like the strings concatenation the OP mentions, unlike the apostrophe, which in this case makes it clear for someone that is used separating the digits.
When the lines of code become many, I think that this will improve readability, but I doubt that is the reason they choose it.
About the spaces, it might worth taking a look at this C question, which says:
The language doesn't allow int i = 10 000; (an integer literal is one token, the intervening whitespace splits it into two tokens) but there's typically little to no expense incurred by expressing the initializer as an expression that is a calculation of literals:
int i = 10 * 1000; /* ten thousand */
It is true I see no practical meaning to:
if (a == 1 1 1 1 1) ...
so digits might be merged without real ambiguity
but what about an hexadecimal number?
0 x 1 a B 2 3
There is no way to disambiguate from a typo doing so (normally we should see an error)
I would assume it's because, while writing code, if you reach the end of a "line" (the width of your screen) an automatic line-break (or "word wrap") occurs. This would cause your int to get split in half, one half of it would be on the first line, the second half on the second... this way it all stays together in the event of a word-wrap.
float floating_point_literal = 0.0000153; /* C, C++*/
auto floating_point_literal = 0.0000153; // C++11
auto floating_point_literal = 0.000'015'3; // C++14
Commenting does not hurt:
/* 0. 0000 1530 */
float floating_point_literal = 0.00001530;
Binary strings can be hard to parse:
long bytecode = 0b1111011010011001; /* gcc , clang */
long bytecode = 0b1111'0110'1001'1001; //C++14
// 0b 1111 0110 1001 1001 would be better, really.
// It is how humans think.
A macro for consideration:
#define B(W,X,Y,Z) (0b##W##X##Y##Z)
#define HEX(W,X,Y,Z) (0x##W##X##Y##Z)
#define OCT(O) (0##O)
long z = B(1001, 1001, 1020, 1032 );
// result : long z = (0b1001100110201032);
long h = OCT( 35);
// result : long h = (035); // 35_oct => 29_dec
long h = HEX( FF, A6, 3B, D0 );
// result : long h = (0xFFA6BD0);
It has to do with how the language is parsed. It would have been difficult for the compiler authors to rewrite their products to accept space delimited literals.
Also, I don't think seperating digits with spaces is very common. That i've seen, it's always non-whitespace characters, even in different countries.

How is the rules of calculating a value named in C++

Ok, this is a hard (for me) to explain what exactly I'm asking for, but I'll try it anyway...
I'm trying to explain to a person, who is learning C++, how an expression is calculated.
More specifically, why this:
5 / 2
gives 2 and why that:
5.0 / 2.0
gives an expected 2.5.
My explanation says it is because Integer value / Integer value = Integer value. And this is the clu of my question: how is that rule called? I always thought it is "Type Algebra", but putting that on Google shows this term is rather Haskell related.
So, is the rule describing how operations and type of expressions in C++ depends on the types of values/variables somehow called? And an extra question: is it related only to C++ (I mean: this term is used only in C++ related material)?
You are looking for topics like:
Promotion rules
Implicit conversions
Arithmetic conversions
Other stuff like operator precedence may also apply depending on the expression.

tokenizing ints vs floats in lex/flex

I'm teaching myself a little flex/bison for fun. I'm writing an interpreter for the 1975 version of MS Extended BASIC (Extended as in "has strings"). I'm slightly stumped by one issue though.
Floats can be identified by looking for a . or an E (etc), and then fail over to an int otherwise. So I did this...
[0-9]*[0-9.][0-9]*([Ee][-+]?[0-9]+)? {
yylval.d = atof(yytext);
return FLOAT;
}
[0-9]+ {
yylval.i = atoi(yytext);
return INT;
}
sub-fields in the yylval union are .d for double, .i for int and .s for string.
But it is also possible that you need to use a float because the number is too large to store in an int - which in this case is a 16-bit signed.
Is there a way to do this in the regex? Or do I have to do this in the associated c-side code with an if?
If you want integer to take priority over float (so that a literal which looks like an integer is an integer), then you need to put the integer pattern first. (The pattern with the longest match always wins, but if two patterns both match the same longest prefix, the first one wins.) So your basic outline is:
integer-pattern { /* integer rule */ }
float-pattern { /* float rule */ }
Your float rule looks reasonable, but note that it will match a single ., possibly followed by an exponent. Very few languages consider a lone . as a floating point constant (that literal is conventionally written as 0 :-) ) So you might want to change it to something like
[0-9]*([0-9]\.?|\.[0-9])[0-9]*([Ee][-+]?[0-9]+)
To use a regex to match a non-negative integer which fits into a 16-bit signed int, you can use the following ugly pattern:
0*([12]?[0-9]{1,4}|3(2(7(6[0-7]|[0-5][0-9])|[0-6][0-9]{2})|[0-1][0-9]{3}))
(F)lex will produce efficient code to implement this regex, but that doesn't necessarily make it a good idea.
Notes:
The pattern recognises integers with redundant leading zeros, like 09. Some languages (like C) consider that to be an invalid octal literal, but I don't think Basic has that restriction.
The pattern does not recognise 32768, since that's too big to be a positive integer. However, it is not too big to be a negative integer; -32768 would be perfectly fine. This is always a corner case in parsing integer literals. If you were just lexing integer literals, you could easily handle the difference between positive and negative limits by having a separate pattern for literals starting with a -, but including the sign in the integer literal is not appropriate for expression parsers, since it produces an incorrect lexical analysis of a-1. (It would also be a bit weird for -32768 to be a valid integer literal, while - 32768 is analysed as a floating point expression which evaluates to -32768.0.) There's really no good solution here, unless your language includes unsigned integer literals (like C), in which case you could analyse literals from 0 to 32767 as signed integers; from 32768 to 65535 as unsigned integers; and from 65536 and above as floating point.
The literals for integer and floating point numbers are the same for many programming languages. For example, the Java Language Specification (and several others) contains the grammar rules for internet and floating-point literals. In these rules, 0 does not validate as a floating point literal. That's the main problem I see with your current approach.
When parsing, you should not use atoi or atof since they don't check for errors. Use strtoul and strtod instead.
The action for integer numbers should be:
if strtoul succeeds:
if the number is less than 0x8000:
llval.i = number
return INT
strtod must succeed
llval.d = number
return FLOAT

Z3 optimization: detect unboundedness via API

I am experiencing difficulties in detecting unboundedness of an optimization problem. As stated in the examples and in some answers here, the printed result of an unbounded optimization problem equals to something like "oo", which has to be interpreted (via string compare?).
My question is: Is there any way to use the API to detect this?
I've searched for some time now and the only function, which might do what I want is Z3_mk_fpa_is_infinite(Z3_context c, Z3_ast t) which returns some Z3_ast object. The problem is: Is this the right approach and how do I get the unbounded property out of that Z3_ast object?
There is currently no built-in way to extract unbounded values or infinitesimals.
The optimization engine uses ad-hoc constants called "epsilon" (of type Real) and "oo" (of type Real or Integer) when representing maximal/minimal that
are unbounded or at a strict bound. There is no built-in recognizer for these constants and formally, they don't belong to the domain of Reals. They belong to an extension field. So formally, I would have to return an expression over a different number field or return what amounts to a triple of numerals (epsilon, standard numeral, infinite). For example, a standard numeral 5.6 would be represented as (0, 5.6, 0), and a numeral that comes just below 5.6 is represented as (-1, 5.6, 0), and a numeral that is +infinity is (0, 0, 1). Returning three values instead of one seemed no more satisfactory a solution to me as returning an expression. I am leaving it to users to post-process the expressions that are returned and indeed match for symbols "oo" and "epsilon" to decompose the values if they need so.

regular expression for matching a specific decimal number

I need to verify if the answer entered by a user is correct for an online quiz.
The answer is supposed to be a decimal number that may be entered in a number of different ways. For example,
0.666666...
could match
.66
.67
.66666
.667
0.6667
etc.
Basically, I want to ignore rounding, precision, and preceding zeros. The examples I found are for matching any decimal numbers.
thanks,
RT
edits -
I am writing a quiz for WebCT. It allows three options for matching correct answer: "equals", "contains" and "regular expression". I believe WebCT is Java based. But I couldn't be sure as to what flavor of regular-expression it uses. I can ask users to provide correct answer upto three decimal places. In which case correct answers could be one of the following four:
0.666
0.667
.666
.667
For a decimal representation of a fraction, you have a nice normal form: it will consist of a finite decimal, followed by a sequence of decimal digits that repeat infinitely. So, e.g., 1/7 is "0." followed by infinitely many repetitions of "142857". There's a sense (see below) in which these fractional decimal representations are the only ones that can be represented by a regular expression.
The technique for these is that you represent these with a tree, where the base part is a series of optional bracketed expressions, with alternatives to express rounding up so 8.19 would be given 8(.(2|(19?)?)?, and then repeating part starred, and then a section that gives all the ways that the repeating part may be rounded.
E.g.: 1/7 is given by 0?.(142857)*(1(4(3|2(8(6|57?)?)?)?)?)?.
Aside
The sense in which regular expressions can express only fractions is the following. Say that a formal expression describes a decimal representation if its language is all the finite decimal prefixes of the number, so an expression for 2/3 must have as its language 0, 0.6, 0.66, 0.666, &c. A finite state machine that accepts only prefixes in this way must repeat itself, and thus be a fraction.
So no regular expression accepts, say sqrt(2) exactly.
You may be better off simply converting the user-entered string into a number (using whatever language facilities you have, lke C's atof), then just ensuring it's close enough (within a set margin of error), or even by specifying a minimum and maximum (say 0.6 and 0.67).
But, if you really want a regex:
^0*\.6+7?$
should do the trick for that particular number.
That's zero of more of 0 followed by ., then one or more 6 characters and an optional 7.
To enforce at least two decimal places as requested in a comment, you could use:
^0*\.6+[67]$
That forces it to end in a 6 or 7 and have one or more 6 characters preceding that.
/^0*\.6+7?\.*$/
Broken down:
^0* <- optional zeroes
\. <- escaped decimal
6+ <- at least one 6, or more
7? <- optional 7
\.*$ <- ellipsis, at least 0 or more
It doesn't sound like you want regular expressions for this. They are purely textual and have no notion of "decimal numbers that can be expressed in more than one way."
I think what you should do is convert the user's answer to a floating-point data type in whatever programming language you're working in, and subtract it from the correct answer. The result is the error in the student's answer. If the result is less than some threshold, say +0.06, then consider it correct. That way, 0.6, 0.66, 0.667, 0.6667, etc., will all be considered correct.