Formatting numbers with thousand separators - ocaml

Is there anything in the standard library or in Core that I can use to format integers with thousand separators?

Unfortunately, nothing, expect that you can use the %a format specifier and provide your own pretty-printer.

You could use a %#d format to print an integer using underscores as separators (following OCaml lexical conventions):
# Printf.sprintf "=> %#d" 1000000;;
- : string = "=> 1_000_000"
And then replace underscores with commas:
# Printf.sprintf "=> %#d" 1000000 |> String.map (function '_' -> ',' | char -> char);;
- : string = "=> 1,000,000"

Related

perl6 regex: match all punctuations except . and "

I read some threads on matching "X except Y", but none specific to perl6. I am trying to match and replace all punctuation except . and "
> my $a = ';# -+$12,678,93.45 "foo" *&';
;# -+$12,678,93.45 "foo" *&
> my $b = $a.subst(/<punct - [\.\"]>/, " ", :g);
===SORRY!===
Unrecognized regex metacharacter - (must be quoted to match literally)
------> my $b = $a.subst(/<punct⏏ - [\.\"]>/, " ", :g);
Unrecognized regex metacharacter (must be quoted to match literally)
------> my $b = $a.subst(/<punct -⏏ [\.\"]>/, " ", :g);
Unable to parse expression in metachar:sym<assert>; couldn't find final '>' (corresponding starter was at line 1)
------> my $b = $a.subst(/<punct - ⏏[\.\"]>/, " ", :g);
> my $b = $a.subst(/<punct-[\.\"]>/, " ", :g);
===SORRY!=== Error while compiling:
Unable to parse expression in metachar:sym<assert>; couldn't find final '>' (corresponding starter was at line 1)
------> my $b = $a.subst(/<punct⏏-[\.\"]>/, " ", :g);
expecting any of:
argument list
term
> my $b = $a.subst(/<punct>-<[\.\"]>/, " ", :g);
===SORRY!===
Unrecognized regex metacharacter - (must be quoted to match literally)
------> my $b = $a.subst(/<punct>⏏-<[\.\"]>/, " ", :g);
Unable to parse regex; couldn't find final '/'
------> my $b = $a.subst(/<punct>-⏏<[\.\"]>/, " ", :g);
> my $b = $a.subst(/<- [\.\"] + punct>/, " ", :g); # $b is blank space, not want I want
> my $b = $a.subst(/<[\W] - [\.\"]>/, " ", :g);
12 678 93.45 "foo"
# this works, but clumsy; I want to
# elegantly say: punctuations except \, and \"
# using predefined class <punct>;
What is the best approach?
I think the most natural solution is to use a "character class arithmetic expression". This entails using + and - prefixes on any number of either Unicode properties or [...] character classes:
#;# -+$12,678,93.45 "foo" *&
<+:punct -[."]> # +$12 678 93.45 "foo"
This can be read as "the class of characters that have the Unicode property punct minus the . and " characters".
Your input string includes + and $. These are not considered "punctuation" characters. You could explicitly add them to the set of characters being replaced by spaces:
<:punct +[+$] -[."] > # 12 678 93.45 "foo"
(I've dropped the initial + before :punct. If you don't write a + or - for the first item in a character class arithmetic expression then + is assumed.)
There's a Unicode property that covers all "symbols" including + and $ so you could use that instead:
<:punct +:symbol -[."] > # 12 678 93.45 "foo"
To recap, you can combine any number of:
Unicode properties like :punct that start with a : and correspond to some character property specified by Unicode; or
[...] character classes that enumerate specific characters, backslash character classes (like \d), or character ranges (eg a..z).
If an overall <...> assertion is to be a character class arithmetic expression then the first character after the opening < must be one of four characters:
: introducing a Unicode property (eg <:punct ...>);
[ introducing a [...] character class (eg <[abc ...>);
+ or a -. This may be followed by spaces. It must then be followed by either a Unicode property (:foo) or a [...] character class (eg <+ :punct ...>).
Thereafter each additional property or character class in the same overall character class arithmetic expression must be preceded by a + or - with or without additional spaces (eg <:punct - [."] ...>).
You can group sub-expressions in parentheses.
I'm not sure what the precise semantics of + and - are. I note this surprising result:
say $a.subst(/<-[."] +:punct>/, " ", :g); # substitutes ALL characters!?!
Built ins of the form <...> are not accepted in character class arithmetic expressions.
This is true even if they're called "character classes" in the doc. This includes ones that are nothing like a character class (eg <ident> is called a character class in the doc even though it matches a string of multiple characters which string matches a particular pattern!) but also ones that seem like they are character classes like <punct> or <digit>. (Many of these latter correspond directly to Unicode properties so you just use those instead.)
To use a backslash "character class" like \d in a character class arithmetic expression using + and - arithmetic you must list it within a [...] character class.
Combining assertions
While <punct> can't be combined with other assertions using character class arithmetic it can be combined with other regex constructs using the & regex conjunction operator:
<punct> & <-[."]> # +$12 678 93.45 "foo"
Depending on the state of compiler optimization (and as of 2019 there's been almost no effort applied to the regex engine), this will be slower in general than using real character classes.

regex remove punct removes non-punctuation characters in R

While filtering and cleaning text in Hebrew, I found that
gsub("[[:punct:]]", "", txt)
actually removes a relevant character. The character is "ק" and it is located in the "E" spot on the keyboard. Interestingly, the gsub function in R removes the "ק" character and then all words get messed up. Does anyone have an idea why?
According to Regular Expressions as used in R:
Certain named classes of characters are predefined. Their
interpretation depends on the locale (see locales); the interpretation
below is that of the POSIX locale.
Acc. to POSIX locale, [[:punct:]]should capture ! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { | } ~. So, you might need to adjust your regex to remove only the characters you want:
txt <- "!\"#$%&'()*+,\\-./:;<=>?#[\\\\^\\]_`{|}~"
gsub("[\\\\!\"#$%&'()*+,./:;<=>?#[\\^\\]_`{|}~-]", "", txt, perl = T)
Sample program output:
[1] ""

Is there a way to use whitespace in BNFC?

How do you use whitespace in a BNFC definition?
For example, suppose I want to produce a parser for the lambda calculus where I allow a list of variables to be abstracted:
\x y z.x z (y z)
The "obvious" thing to do is use a labeled rule like:
ListAbs . Exp ::= "\\" [Ident] "." Exp ;
separator Ident " "
However, BNFC defaults to stripping whitespace, so that does not work. What does work is using a comma separator. A bit uglier, but I could live with it... Still it would be nice to be able to separate by space.
Is there a whitespace character class in BNFC?
You can declare the empty string as separator:
separator Ident ""
In practice this lets you use white-spaces (or any space character) as separator:
$ cat test.cf
A . A ::= [Ident] ;
separator Ident ""
$ bnfc -haskell -m test.cf
$ make
$ echo 'x y z' | ./Testtest
Parse Successful!
[Abstract Syntax]
A [Ident "x",Ident "y",Ident "z"]
[Linearized tree]
x y z

Convert punctuation to space

I have a bunch of strings with punctuation in them that I'd like to convert to spaces:
"This is a string. In addition, this is a string (with one more)."
would become:
"This is a string In addition this is a string with one more "
I can go thru and do this manually with the stringr package (str_replace_all()) one punctuation symbol at a time (, / . / ! / ( / ) / etc. ), but I'm curious if there's a faster way I'd assume using regex's.
Any suggestions?
x <- "This is a string. In addition, this is a string (with one more)."
gsub("[[:punct:]]", " ", x)
[1] "This is a string In addition this is a string with one more "
See ?gsub for doing quick substitutions like this, and ?regex for details on the [[:punct:]] class, i.e.
‘[:punct:]’ Punctuation characters:
‘! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { |
} ~’.
have a look at ?regex
library(stringr)
str_replace_all(x, '[[:punct:]]',' ')
"This is a string In addition this is a string with one more "

Concatenate strings in ocaml with newline between them

I'd like to do something like this
String.concat '\n' [str1; str2 ... strn]
so I can print in a file. But ocaml doesn't allow me to do that. What can I do?
String.concat "\n" [str1; str2 ... strn]
works fine. The problem is that you used '\n', which is a character literal, not a string. Example:
# String.concat '\n' ["abc"; "123"];;
Error: This expression has type char but an expression was expected of type
string
# String.concat "\n" ["abc"; "123"];;
- : string = "abc\n123"
If you're using Jane Street's base module for your standard library you'll have to do it like so:
# #require "base";;
# open! Base;;
# String.concat ~sep:"\n" ["abc"; "123"];;
- : string = "abc\n123"
Jane Street really likes to take advantage of named arguments.