Given a basic record
type t = {a:string;b:string;c:string}
why does this code compile
let f t = match t with
{a;b;_} -> a
but this
let f t = match t with
{_;b;c} -> b
and
let f t = match t with
{a;_;c} -> c
does not? I'm asking this out of curiosity thus the obvious useless code examples.
The optional _ field must be the last field. This is documented as a language extension in Section 7.2
Here's the production for reference:
pattern ::= ...
∣ '{' field ['=' pattern] { ';' field ['=' pattern] } [';' '_' ] [';'] '}'
Because the latter two examples are syntactically incorrect. The syntax allows you to terminate your field name pattern with the underscore to notify the compiler that you're aware, that there are more fields than you are trying to match. It is used to suppress a warning (that is disabled by default). Here is what the OCaml manual says about it:
Optionally, a record pattern can be terminated by ; _ to convey the fact that not all fields of the record type are listed in the record pattern and that it is intentional. By default, the compiler ignores the ; _ annotation. If warning 9 is turned on, the compiler will warn when a record pattern fails to list all fields of the corresponding record type and is not terminated by ; _. Continuing the point example above,
If you want to match to a name without binding it to a variable, then you should use the following syntax:
{a=_; b; c}
E.g.,
let {a=_; b; c} = {a="hello"; c="cruel"; b="world"};;
val b : string = world
val c : string = cruel
To add to the answers by Jeffrey Scofield and ivg, what the erroneous examples are trying to achieve can in fact be achieved by using a different order of fields. Like so:
let f t = match t with
{b;c;_} -> b
Related
This is a cross-post from TeX, but it did not get any answers there. And since I assume the problem has more to do with my understanding of regular expressions (or better, lack thereof) than with LaTeX itself, StackOverflow may have been the better place to ask to begin with.
I would like to use BibTool (which was written in C, if this is of any consequence here) to enclose some strings in a bib-file in curly braces. The test bib entry looks like this:
#Article{Cite1,
author = {Adelbert, A.},
date = {2020},
journaltitle = {A Journal},
title = {A title with just \textit{Test} structure and some chemistry \ce{CO2}},
number = {2},
pages = {1--4},
volume = {1},
}
I have created the following BibTool resource file:
resource {biblatex}
preserve.keys = on
preserve.key.case = on
rewrite.rule = {"\\\(.*{.*}\)" "{{\1}}"}
The rewrite.rule is supposed to be the following:
Find all strings within any field that start with \, like \ce{}, \textit{}, etc. This is done by the \\ at the beginning of the regular expression.
When this string is found save the following in a group, denoted by \(\): A random string at the beginning, followed by {, a random string, followed by }; i.e. the string textit{Test}.
Write this string back into the same position, but enclose it in a double-set of curly braces "{{\1}}".
What it manages so far:
It apparently finds all commands starting with \.
It saves the strings and writes them back into the file.
So far, the code returns the following
#Article{Cite1,
Author = {Adelbert, A.},
Date = {2020},
JournalTitle = {A Journal},
Title = {A title with just {{textit{Test} structure and some chemistry {{ce{CO2}}}}}},
Number = {2},
Pages = {1--4},
Volume = {1},
}
You see it finds the strings and puts {{ at the beginning of each string. Unfortunately, it puts }} at the end of the field, not the string, so I now have 6 curly braces at the end of the title field. The braces do match, just two of them should be after {{textit{Test} not at the very end. I tried various constructions like rewrite.rule = {"\\\(.*{.*}\)$" "{{\1}}"}, rewrite.rule = {"\\\(.*{.*}\) ?$" "{{\1}}"}, rewrite.rule = {"\\\(.*{.*}\)*$" "{{\1}}"} but this all did not work.
When trying to get the \ back at the beginning of the string, using rewrite.rule = {"\\\(.*{.*}\)" "{{\\\1}}"} I get the \ back, but also thousands of {} until I get a Rewrite limit exceeded error.
I am not very good with regular expressions and would be happy for any comments.
My approach would use two phases. In the first phase I would process the macro with one argument and replace in the result the \ by a replacement representation (here ##). In the second pahe I simply replace ## by \.
In BibTool this looks as follows:
rewrite.rule {"\\\(\([a-zA-Z]+\|.\){[^{}]*}\)" "{##\1}"}
rewrite.rule {"##" "\\"}
Note, that in general the task depicted can not be solved with regular expressions...
The behavior of .* by default is to match as many characters as possible. This is called 'greedy matching' in regex terms.
Your pattern is likely matching the following on hitting the first \:
\textit{Test} structure and some chemistry \ce{CO2}}
Replacing the text to:
{{textit{Test} structure and some chemistry \ce{CO2}}}}
And then finding the next \ and replacing:
\ce{CO2}}}} becomes {{ce{CO2}}}}}}
Total effect:
{A title with just \textit{Test} structure and some chemistry \ce{CO2}}
{A title with just {{textit{Test} structure and some chemistry {{ce{CO2}}}}}}
To change the behaviour in most regex flavors you can put a ? after the quantifier: .*? to make it 'lazy', that is match the least amount of characters.
I have just started to learn ocaml and I find it difficult to extract small list of chars from a bigger list of chars.
lets say I have:
let list_of_chars = ['#' ; 'a' ; 'b' ; 'c'; ... ; '!' ; '3' ; '4' ; '5' ];;
I have the following knowledge - I know that in the
list above I have '#' followed by a '!' in some location further in the list .
I want to extract the lists ['a' ;'b' ;'c' ; ...] and ['3' ; '4' ; '5'] and do something with them,
so I do the following thing:
let variable = match list_of_chars with
| '#'::l1#['!']#l2 -> (*[code to do something with l1 and l2]*)
| _ -> raise Exception ;;
This code doesn't work for me, it's throwing errors. Is there a simple way of doing this?
(specifically for using match)
As another answer points out, you can’t use pattern matching for this because pattern matching only lets you use constructors and # is not a constructor.
Here is how you might solve your problem
let split ~equal ~on list =
let rec go acc = function
| [] -> None
| x::xs -> if equal x on then Some (rev acc, xs) else go (x::acc) xs
in
go [] list
let variable = match list_of_chars with
| '#'::rest ->
match split rest ~on:'!' ~equal:(Char.equal) with
| None -> raise Exception
| Some (left,right) ->
... (* your code here *)
I’m now going to hypothesise that you are trying to do some kind of parsing or lexing. I recommend that you do not do it with a list of chars. Indeed I think there is almost never a reason to have a list of chars in ocaml: a string is better for a string (a chat list has an overhead of 23x in memory usage) and while one might use chars as a kind of mnemonic enum in C, ocaml has actual enums (aka variant types or sum types) so those should usually be used instead. I guess you might end up with a chat list if you are doing something with a trie.
If you are interested in parsing or lexing, you may want to look into:
Ocamllex and ocamlyacc
Sedlex
Angstrom or another parser generator like it
One of the regular expression libraries (eg Re, Re2, Pcre (note Re and Re2 are mostly unrelated)
Using strings and functions like lsplit2
# is an operator, not a valid pattern. Patterns need to be static and can't match a varying number of elements in the middle of a list. But since you know the position of ! it doesn't need to be dynamic. You can accomplish it just using :::
let variable = match list_of_chars with
| '#'::a::b::c::'!'::l2 -> let l1 = [a;b;c] in ...
| _ -> raise Exception ;;
I have following regular expression: ((abc)+d)|(ef*g?)
I have created a DFA (I hope it is correct) which you can see here
http://www.informatikerboard.de/board/attachment.php?attachmentid=495&sid=f4a1d32722d755bdacf04614424330d2
The task is to create a regular grammar (Chomsky hierarchy Type 3) and I don't get it. But I created a regular grammar, which looks like this:
S → aT
T → b
T → c
T → dS
S → eT
S → eS
T → ε
T → f
T → fS
T → gS
Best Regards
Patrick
Type 3 Chomsky are the class of regular grammars constricted to the use of following rules:
X -> aY
X -> a,
in which X is an arbitrary non-terminal and a an arbitrary terminal. The rule A -> eps is only allowed if A is not present in any of the right hand sides.
Construction
We notice the regular expression consists of two possibilities, either (abc)+d or ef*g?, our first rules will therefor be S -> aT and S -> eP. These rules allow us to start creating one of the two possibilities. Note that the non-terminals are necessarily different, these are completely different disjunct paths in the corresponding automaton. Next we continue with both regexes separately:
(abc)+
We have at least one sequence abc followed by 0 or more occurrences, it's not hard to see we can model this like this:
S -> aT
T -> bU
U -> cV
V -> aT # repeat pattern
V -> d # finish word
ef*g? Here we have an e followed by zero or more f characters and an optional g, since we already have the first character (one of the first two rules gave us that), we continue like this:
S -> eP
S -> e # from the starting state we can simply add an 'e' and be done with it,
# this is an accepted word!
P -> fP # keep adding f chars to the word
P -> f # add f and stop, if optional g doesn't occur
P -> g # stop and add a 'g'
Conclusion
Put these together and they will form a grammar for the language. I tried to write down the train of thought so you could understand it.
As an exercise, try this regex: (a+b*)?bc(a|b|c)*
I have the following lexer rules:
let ws = [' ' '\t' '\n']+
...
| ws {Printf.printf "%s" (Lexing.lexeme lexbuf); WS(Lexing.lexeme lexbuf)}
And the following parser rules:
%token <string> WORD WS
cs : LSQRB wsornon choices wsornon RSQRB {$2}
;
wsornon : /* nothing */
| WS {$1}
;
choices : choice {$1}
| choices choice {$2}
;
choice : CHOICE LCURLYB mainbody RCURLYB {$3}
;
I basically want to get wsornon to match with whitespace or nothing. But cs gives syntax errors for the case without whitespace (which corresponds to the empty rule).
Am I missing something?
Even if you parse the empty stream, you should have a production rule:
wsornon:
| { something for nothing }
| WS { something for whitespace }
Note that menhir has an OPTION parametrized rule that is just fine for this kind of things, so that you don't have to write another rule for that. In fact OPTION(foo) return a production of type bar option if rule foo returns something of type bar, while you're going to ignore them anyway, so that's a bit of a different situation.
If you want to ignore whitespace, why don't you drop it altogether at the lexer step? Is it useful somewhere else in your grammar? I'd rather hack the lexer a bit to have some whitespace token just after some tokens where I know they're important than have them pollute my whole grammar. Of course, menhir allows to define parametrized rules that could help with that (example below untested):
ws(rule):
| LIST(WS) result = rule LIST(WS) { result }
In my tokenizer (.lex) file I want to match the following pattern :
AaBC12/awD41/dfs21 etc...
I've written this rule
[A-Za-z]+[A-Za-z0-9]*[[/]+[A-Za-z][A-Za-z0-9]*]*
{lline = cpflineno;cpflval.str = strdup(cpftext);return K_IDENTIFIER;}
This rule seems correct to me but if i have an input like this :
TOP/MD1
TOP/MD2
TOP/MD2/D/E
My output is
TOP/MD1
TOP/MD2
TOP/MD2
/D/E
instead of
TOP/MD1
TOP/MD2
TOP/MD2/D/E
Could you tell me where my rule fails ?
What about this:
[A-Za-z]+[A-Za-z0-9]*([/]+[A-Za-z][A-Za-z0-9]*)*
Replaced [] with () where you mean a group.
Note that it will match foo////bar, if you don't want that remove the second + (and the first + for that matter too, it's useless in this case).