How to disable non-standard features in SML/NJ - sml

SML/NJ provides a series of non-standard features, such as higher-order modules, vector literal syntax, etc.
Is there a way to disable these non-standard features in SML/NJ, through some command-line param maybe, or, ideally, using a CM directive?

Just by looking at the grammar used by the parser, I'm going to say that there is not a way to do this. From "admin/base/compiler/Parse/parse/ml.grm":
apat' : OP ident (VarPat [varSymbol ident])
| ID DOT qid (VarPat (strSymbol ID :: qid varSymbol))
| int (IntPat int)
| WORD (WordPat WORD)
| STRING (StringPat STRING)
| CHAR (CharPat CHAR)
| WILD (WildPat)
| LBRACKET RBRACKET (ListPat nil)
| LBRACKET pat_list RBRACKET (ListPat pat_list)
| VECTORSTART RBRACKET (VectorPat nil)
| VECTORSTART pat_list RBRACKET (VectorPat pat_list)
| LBRACE RBRACE (unitPat)
| LBRACE plabels RBRACE (let val (d,f) = plabels
in RecordPat{def=d,flexibility=f}
end)
The VectorPat stuff is fully mixed in with the rest of the patterns. A recursive grep for VectorPat also will show that there aren't any options to turn this off anywhere else.

Related

c++ find any string from a list in another string

What options do I have to find any string from a list in another string ?
With s being an std::string, I tried
s.find("CAT" || "DOG" || "COW" || "MOUSE", 0);
I want to find the first one of these strings and get its place in the string ; so if s was "My cat is sleeping\n" I'd get 3 as return value.
boost::to_upper(s);
was applied (for those wondering).
You can do this with a regex.
I don't think there's a way to get the position of a match directly, so first you have to search for the regex, and if there is a match you can search for that string. Like this:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main() {
string s = "My cat is sleeping\n";
smatch m;
regex animal("cat|dog|cow|mouse");
if (regex_search (s,m,animal)) {
cout << "Match found: " << m.str() << endl;
size_t match_position = s.find(m.str());
// In this case it is always true, but in general you might want to check
if (match_position != string::npos) {
cout << "First animal found at: " << match_position << endl;
}
}
return 0;
}
You may convert your search cases to a DFA. It is the most efficient way of doing it.
states:
nil, c, ca, cat., d, do, dog., co, cow., m, mo, mou, mous, mouse.
transition table:
state | on | goto
nil | c | c
nil | d | d
nil | m | m
c | a | ca
c | o | co
d | o | do
m | o | mo
ca | t | cat.
co | w | cow.
do | g | dog.
mo | u | mou
mou | s | mous
mous | e | mouse.
* | * | nil
You may express this using a lot of intermediary functions. Using a lot of switches. Or using enum to represent states and a mapping to represent the transitions.
If your test case list is dynamic or grows too big, then a manually hardcoding the states will nor suffice for you. However, as you can see, the rule to make the states and the transitions is very simple.

Semantics of identifier line in Python

What is the semantics of a Python 2.7 line containing ONLY identifier. I.e. simply
a
or
something
?
If you know the exact place in the Reference, I'd be very pleased.
Tnx.
An identifier by itself is a valid expression. An expression by itself on a line is a valid statement.
The full semantic chain is a little more involved. In order to have nice operator precedence, we classify things like "a and b" as technically both an and_test and an or_test. As a result, a simple identifier technically qualifies as over a dozen grammar items
stmt: simple_stmt | compound_stmt
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | nonlocal_stmt | assert_stmt)
expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) |
('=' (yield_expr|testlist_star_expr))*)
testlist_star_expr: (test|star_expr) (',' (test|star_expr))* [',']
test: or_test ['if' or_test 'else' test] | lambdef
or_test: and_test ('or' and_test)*
and_test: not_test ('and' not_test)*
not_test: 'not' not_test | comparison
comparison: expr (comp_op expr)*
expr: xor_expr ('|' xor_expr)*
xor_expr: and_expr ('^' and_expr)*
and_expr: shift_expr ('&' shift_expr)*
shift_expr: arith_expr (('<<'|'>>') arith_expr)*
arith_expr: term (('+'|'-') term)*
term: factor (('*'|'/'|'%'|'//') factor)*
factor: ('+'|'-'|'~') factor | power
power: atom trailer* ['**' factor]
atom: ('(' [yield_expr|testlist_comp] ')' |
'[' [testlist_comp] ']' |
'{' [dictorsetmaker] '}' |
NAME | NUMBER | STRING+ | '...' | 'None' | 'True' | 'False')
a stmt can be composed of a single simple_stmt, which can be composed of a simgle small_stmt, which can be composed of a single expr_stmt, and so on, down through testlist_star_expr, test, or_test, and_test, not_test, comparison, expr, xor_expr, and_expr, shift_expr, arith_expr, term, factor, power, atom, and finally NAME.
It's a simple expression statement: https://docs.python.org/2/reference/simple_stmts.html

Regex: How to Implement Negative Lookbehind in PL/SQL

How do I match all the strings that begin with loockup. and end with _id but not prefixed by msg? Here below are some examples:
lookup.asset_id -> should match
lookup.msg_id -> shouldn't match
lookup.whateverelse_id -> should match
I know Oracle does not support negative lookbehind (i.e. (?<!))... so I've tried to explicitly enumerate the possibilities using alternation:
regexp_count('i_asset := lookup.asset_id;', 'lookup\.[^\(]+([^m]|m[^s]|ms[^g])_id') <> 0 then
dbms_output.put_line('match'); -- this matches as expected
end if;
regexp_count('i_msg := lookup.msg_id;', 'lookup\.[^\(]+([^m]|m[^s]|ms[^g])_id') <> 0 then
dbms_output.put_line('match'); -- this shouldn’t match
-- but it does like the previous example... why?
end if;
The second regexp_count expression should't match... but it does like the first one. Am I missing something?
EDIT
In the real use case, I've a string that contains PL/SQL code that might contains more than one lookup.xxx_id instances:
declare
l_source_code varchar2(2048) := '
...
curry := lookup.curry_id(key_val => ''CHF'', key_type => ''asset_iso'');
asset : = lookup.asset_id(key_val => ''UBSN''); -- this is wrong since it does
-- not specify key_type
...
msg := lookup.msg_id(key_val => ''hello''); -- this is fine since msg_id does
-- not require key_type
';
...
end;
I need to determine whether there is at least one wrong lookup, i.e. all occurrences, except lookup.msg_id, must also specify the key_type parameter.
With lookup\.[^\(]+([^m]|m[^s]|ms[^g])_id, you are basically asking to check for a string
starting with lookup. denoted by lookup\.,
followed by at least one character different from ( denoted by [^\(]+,
followed by either -- ( | | )
one character different from m -- [^m], or
two characters: m plus no s -- m[^s], or
three characters: ms and no g -- ms[^g], and
ending in _id denoted by _id.
So, for lookup.msg_id, the first part matches obviously, the second consumes ms, and leaves the g for the first alternative of the third.
This could be fixed by patching up the third part to be always three characters long like lookup\.[^\(]+([^m]..|m[^s.]|ms[^g])_id. This, however, would fail everything, where the part between lookup. and _id is not at least four characters long:
WITH
Input (s, r) AS (
SELECT 'lookup.asset_id', 'should match' FROM DUAL UNION ALL
SELECT 'lookup.msg_id', 'shouldn''t match' FROM DUAL UNION ALL
SELECT 'lookup.whateverelse_id', 'should match' FROM DUAL UNION ALL
SELECT 'lookup.a_id', 'should match' FROM DUAL UNION ALL
SELECT 'lookup.ab_id', 'should match' FROM DUAL UNION ALL
SELECT 'lookup.abc_id', 'should match' FROM DUAL
)
SELECT
r, s, INSTR(s, 'lookup.msg_id') has_msg, REGEXP_COUNT(s , 'lookup\.[^\(]+([^m]..|m[^s]|ms[^g])_id') matched FROM Input
;
| R | S | HAS_MSG | MATCHED |
|-----------------|------------------------|---------|---------|
| should match | lookup.asset_id | 0 | 1 |
| shouldn't match | lookup.msg_id | 1 | 0 |
| should match | lookup.whateverelse_id | 0 | 1 |
| should match | lookup.a_id | 0 | 0 |
| should match | lookup.ab_id | 0 | 0 |
| should match | lookup.abc_id | 0 | 0 |
If you have just to make sure, there is no msg in the position in question, you might want to go for
(INSTR(s, 'lookup.msg_id') = 0) AND REGEXP_COUNT(s, 'lookup\.[^\(]+_id') <> 0
For code clarity REGEXP_INSTR(s, 'lookup\.[^\(]+_id') > 0 might be preferable…
#j3d Just comment if further detail is required.
With the requirements still being kind of vague…
Split the string at the semicolon.
Check each substring s to comply:
WITH Input (s) AS (
SELECT ' curry := lookup.curry_id(key_val => ''CHF'', key_type => ''asset_iso'');' FROM DUAL UNION ALL
SELECT 'curry := lookup.curry_id(key_val => ''CHF'', key_type => ''asset_iso'');' FROM DUAL UNION ALL
SELECT 'asset := lookup.asset_id(key_val => ''UBSN'');' FROM DUAL UNION ALL
SELECT 'msg := lookup.msg_id(key_val => ''hello'');' FROM DUAL
)
SELECT
s
FROM Input
WHERE REGEXP_LIKE(s, '^\s*[a-z]+\s+:=\s+lookup\.msg_id\(key_val => ''[a-zA-Z0-9]+''\);$')
OR
((REGEXP_INSTR(s, '^\s*[a-z]+\s+:=\s+lookup\.msg_id') = 0)
AND (REGEXP_INSTR(s, '[(,]\s*key_type') > 0)
AND (REGEXP_INSTR(s,
'^\s*[a-z]+\s+:=\s+lookup\.[a-z]+_id\(( ?key_[a-z]+ => ''[a-zA-Z_]+?'',?)+\);$') > 0))
;
| S |
|--------------------------------------------------------------------------|
|[tab] curry := lookup.curry_id(key_val => 'CHF', key_type => 'asset_iso');|
| curry := lookup.curry_id(key_val => 'CHF', key_type => 'asset_iso');|
| msg := lookup.msg_id(key_val => 'hello');|
This would tolerate a superfluous comma right before the closing parenthesis. But if the input is syntactically correct, such a comma won't exist.

Writing an interpreter for ANTLR grammar

I've made a grammar for APL subset.
grammar APL;
program: (statement NEWLINE)*;
statement: thing;
assignment: variable LARR thing;
thing: simpleThing
| complexThing;
escapedThing: simpleThing
| '(' complexThing ')';
simpleThing: variable # ThingVariable
| number # ThingNumber
;
complexThing: unary # ThingUOp
| binary # ThingBOp
| assignment # ThingAssignment
;
variable: CAPITAL;
number: DIGITS;
unary: iota # UOpIota
| negate # UOpNegate
;
iota: SMALL_IOTA number;
negate: TILDA thing;
binary: drop # BOpDrop
| select # BOpSelect
| outerProduct # BOpOuterProduct
| setInclusion # BOpSetInclusion
;
drop: left=number SPIKE right=thing;
select: left=escapedThing SLASH right=thing;
outerProduct: left=escapedThing OUTER_PRODUCT_OP right=thing;
setInclusion: left=escapedThing '∊' right=thing;
NEWLINE: [\r\n]+;
CAPITAL: [A-Z];
CAPITALS: (CAPITAL)+;
DIGITS: [0-9]+;
TILDA: '~';
SLASH: '/';
// greek
SMALL_IOTA: 'ι' | '#i';
// arrows
LARR: '←' | '#<-';
SPIKE: '↓' | '#Iv';
OUTER_PRODUCT_OP: '∘.×' | '#o.#x';
Now I'd like to create an interpreter for it. I'm trying to use clj-antlr with Clojure. How do I do that?
As Jared314 pointed, take a look at instaparse:
This is how you create a grammar:
(def as-and-bs
(insta/parser
"S = AB*
AB = A B
A = 'a'+
B = 'b'+"))
This is how you call it:
(as-and-bs "aaaaabbbaaaabb")
And here is the result with default formatting:
[:S
[:AB [:A "a" "a" "a" "a" "a"] [:B "b" "b" "b"]]
[:AB [:A "a" "a" "a" "a"] [:B "b" "b"]]]
While ANTLR is definitely doing a great job, in the Clojure world you can remove all the surrounding glue by using instaparse.

isnumeric() with PostgreSQL

I need to determine whether a given string can be interpreted as a number (integer or floating point) in an SQL statement. As in the following:
SELECT AVG(CASE WHEN x ~ '^[0-9]*.?[0-9]*$' THEN x::float ELSE NULL END) FROM test
I found that Postgres' pattern matching could be used for this. And so I adapted the statement given in this place to incorporate floating point numbers. This is my code:
WITH test(x) AS (
VALUES (''), ('.'), ('.0'), ('0.'), ('0'), ('1'), ('123'),
('123.456'), ('abc'), ('1..2'), ('1.2.3.4'))
SELECT x
, x ~ '^[0-9]*.?[0-9]*$' AS isnumeric
FROM test;
The output:
x | isnumeric
---------+-----------
| t
. | t
.0 | t
0. | t
0 | t
1 | t
123 | t
123.456 | t
abc | f
1..2 | f
1.2.3.4 | f
(11 rows)
As you can see, the first two items (the empty string '' and the sole period '.') are misclassified as being a numeric type (which they are not). I can't get any closer to this at the moment. Any help appreciated!
Update Based on this answer (and its comments), I adapted the pattern to:
WITH test(x) AS (
VALUES (''), ('.'), ('.0'), ('0.'), ('0'), ('1'), ('123'),
('123.456'), ('abc'), ('1..2'), ('1.2.3.4'), ('1x234'), ('1.234e-5'))
SELECT x
, x ~ '^([0-9]+[.]?[0-9]*|[.][0-9]+)$' AS isnumeric
FROM test;
Which gives:
x | isnumeric
----------+-----------
| f
. | f
.0 | t
0. | t
0 | t
1 | t
123 | t
123.456 | t
abc | f
1..2 | f
1.2.3.4 | f
1x234 | f
1.234e-5 | f
(13 rows)
There are still some issues with the scientific notation and with negative numbers, as I see now.
As you may noticed, regex-based method is almost impossible to do correctly. For example, your test says that 1.234e-5 is not valid number, when it really is. Also, you missed negative numbers. What if something looks like a number, but when you try to store it it will cause overflow?
Instead, I would recommend to create function that tries to actually cast to NUMERIC (or FLOAT if your task requires it) and returns TRUE or FALSE depending on whether this cast was successful or not.
This code will fully simulate function ISNUMERIC():
CREATE OR REPLACE FUNCTION isnumeric(text) RETURNS BOOLEAN AS $$
DECLARE x NUMERIC;
BEGIN
x = $1::NUMERIC;
RETURN TRUE;
EXCEPTION WHEN others THEN
RETURN FALSE;
END;
$$
STRICT
LANGUAGE plpgsql IMMUTABLE;
Calling this function on your data gets following results:
WITH test(x) AS ( VALUES (''), ('.'), ('.0'), ('0.'), ('0'), ('1'), ('123'),
('123.456'), ('abc'), ('1..2'), ('1.2.3.4'), ('1x234'), ('1.234e-5'))
SELECT x, isnumeric(x) FROM test;
x | isnumeric
----------+-----------
| f
. | f
.0 | t
0. | t
0 | t
1 | t
123 | t
123.456 | t
abc | f
1..2 | f
1.2.3.4 | f
1x234 | f
1.234e-5 | t
(13 rows)
Not only it is more correct and easier to read, it will also work faster if data was actually a number.
You problem is the two 0 or more [0-9] elements on each side of the decimal point. You need to use a logical OR | in the number identification line:
~'^([0-9]+\.?[0-9]*|\.[0-9]+)$'
This will exclude a decimal point alone as a valid number.
I suppose one could have that opinion (that it's not a misuse of exception handling), but generally I think that an exception handling mechanism should be used just for that. Testing whether a string contains a number is part of normal processing, and isn't "exceptional".
But you're right about not handling exponents. Here's a second stab at the regular expression (below). The reason I had to pursue a solution that uses a regular expression was that the solution offered as the "correct" solution here will fail when the directive is given to exit when an error is encountered:
SET exit_on_error = true;
We use this often when groups of SQL scripts are run, and when we want to stop immediately if there is any issue/error. When this session directive is given, calling the "correct" version of isnumeric will cause the script to exit immediately, even though there's no "real" exception encountered.
create or replace function isnumeric(text) returns boolean
immutable
language plpgsql
as $$
begin
if $1 is null or rtrim($1)='' then
return false;
else
return (select $1 ~ '^ *[-+]?[0-9]*([.][0-9]+)?[0-9]*(([eE][-+]?)[0-9]+)? *$');
end if;
end;
$$;
Since PostgreSQL 9.5 (2016) you can just ask the type of a json field:
jsonb_typeof(field)
From the PostgreSQL documentation:
json_typeof(json)
jsonb_typeof(jsonb)
Returns the type of the outermost JSON value as a text string. Possible types are object, array, string, number, boolean, and null.
Example
When aggregating numbers and wanting to ignore strings:
SELECT m.title, SUM(m.body::numeric)
FROM messages as m
WHERE jsonb_typeof(m.body) = 'number'
GROUP BY m.title;
Without WHERE the ::numeric part would crash.
The obvious problem with the accepted solution is that it is an abuse of exception handling. If there's another problem encountered, you'll never know it because you've tossed away the exceptions. Very bad form. A regular expression would be the better way to do this. The regex below seems to behave well.
create function isnumeric(text) returns boolean
immutable
language plpgsql
as $$
begin
if $1 is not null then
return (select $1 ~ '^(([-+]?[0-9]+(\.[0-9]+)?)|([-+]?\.[0-9]+))$');
else
return false;
end if;
end;
$$
;