Pattern Matching a 2d array with tuples - ocaml

I have a 2d list that looks as follows (string * int) list list
[ [("size1", 1);("size2",2)] ; [("size3",3);("size4",4)] ]
When given a string name, I want to return the int associated with that name
"size1" -> 1
"size2" -> 2
"size4" -> 4
I have an idea of how I would do this using the List module
but how would I do something like this using pattern matching?

An OCaml pattern essentially matches a single constructor of a type, with subpatterns in the places of the constructed values. For a list, this means you need to pick the number of :: constructors you want to have in your pattern. Hence there's no OCaml pattern that matches at an arbitrary place in a list. So there's no way to solve your problem with just pattern matching (if I understand what you're trying to do).
If you know the maximum length of your lists you can enumerate all possible lengths, but somehow I doubt this is what you want to do.
My way of looking at it is that OCaml patterns have a very clean and disciplined structure, which allows them to be implemented very efficiently. Other languages (like perl say) have very powerful patterns that can't really be implemented efficiently. They can require more or less arbritrary amounts of retrying of different subpatterns. Naturally being an OCaml guy I much prefer the OCaml approach.

Related

What is the name of the data structure for and-or-lists (or and-or-trees) and where can I read about it?

I recently needed to make a data structure which was a nested list of and/or questions. Since most every interesting thing has been discovered by someone else previously, I’m looking for the name of this data structure. it looks something like this.
‘((a b c) (b d e) (c (a b) (f a)))
The interpretation is I want to find abc or bde or caf or caa or cbf or cba and the list encapsulates that. At the top level each item is or’ed together and sub-lists of the top level are and’ed together and sub-lists of sub-lists are or’ed again sub-lists of those are and’ed and sub-lists of those or’ed ad infinitum. Note that in my example, all the lists are the same length, in my real application the lists vary in length.
The code to walk such a “tree” is relatively simple, but I’m assuming that there is a name for that type of tree and there is stuff I can read about it.
These lists are equivalent to fixed length regular expressions (which I've seen referred to as "network expressions", but I am particularly interested in this data structure and representation thereof.
In general (in the very high level of abstraction) it is:
Context free grammar -Wiki
If you allow it to be infinitely nested, then it is not a regular expression because of presence of parentheses (left and right should match).
If you consider, that expressions inside parentheses are ordered. I mean that a and b and c is equivalent to (a and b) and c. You get then Binary expression tree -Wiki
But for your particular case, it is probably: Disjunctive normal form -Wiki
I am not sure, but my intuition says that it is regular expression again because you have only 2 levels of nesting (1st - for 'or-ed' and 2nd - for 'and-ed' parts)
The trees are also a subset of DAWGS - directed acyclic word graphs and one could construct them the same way.
In my case, I have a very small set that I have built by hand and I don't worry about getting the minimal set, but instead just want something that I can easily write down but deals with the types of simple variations I see. Basically, I have different ways of finding where I keep my .el files based upon the different directory structures of various OSes I use. (E.g. when I was working at Google, the /usr/local/emacs/site-lisp directory was actually more like /usr/local/Google/emacs/site-lisp.)
I don't need a full regex, but there are about a dozen variations, some having quite long lists of nested sub-directories (c:\users\cfclark\appData\roaming\emacs.emacs.d or some other awful thing) that I wanted to write down (and then have emacs make an automated search to find the one that is appropriate to this particular installation). And every time I go to a new job, I can simply add to the list a description of where they are in that setup.
Anyway, as that code has evolved, I found that I had I was doing (nested or's and and's and realized that the structure generalized to the alternating or/and/or/and/... case). So, my assumption is that someone must have discovered this before. I had hints of it myself several years ago, but didn't set down to implement it. The Disjunctive Normal Form link mpasko256 gave is also particularly relevant. I don't normalize to that level, I still keep nested and's and or's rather than flattening to 2, but I do have a distinct structure, or's at the top, then and's, then or's....

Erlang: Printing a List with a name always in front of it

I just started learning Erlang so please bear with me if this question seems a little simple.
Hi guys. I've been thinking about it for a while but nothing I come up with seems to be working.
I am writing an Erlang function that is supposed to take a list as an argument then print the list with my name in front of it. For the purposes of this question, let's say my name is "James".
If I type in testmodule:NameInFront("Legible", "Hey", "Think").
Erlang should return ["James", "Legible", "Hey", "Think"]
This is the code I have so far:
-module(testmodule).
-export([NameInFront/1]).
NameInFront(List)-> ["James"]++[List].
It works just fine when I type in just one word, which I guess it the fault of the NameInFront/1 part but I want it to be able to handle any amount of words I type in. Anyone know how I can get my function to handle multiple inputs? Thank you very much.
I'm not quite sure what you mean: whether you want your function to be variadic (take a flexible number of arguments), or you are having trouble getting your lists to join together properly.
Variadic functions are not the way Erlang works. FunctionName/Arity defines the concrete identity of a function in Erlang (discussed here). So our way of having a function take multiple arguments is to make one (or more) of the arguments a list:
print_terms(Terms) -> io:format("~tp~n", [Terms]).
The io:format/2 function itself actually takes a list as its second function, which is how it deals with a variable number of arguments:
print_two_things(ThingOne, ThingTwo) ->
io:format("~tp~n~tp~n", [ThingOne, ThingTwo]).
In your case you want to accept a list of things, add your name to it, and print it out. This is one way to do it.
name_in_front(ListOfStrings) ->
NewList = ["James" | ListOfStrings],
io:format("~p~n", [NewList]).
Using the ++ operator is another (which is actually a different syntax for a recursive operation which expands to the exact same thing, ):
name_in_front(ListOfStrings) ->
NewList = ["James"] ++ ListOfStrings,
io:format("~tp~n", [NewList]).
But that's a little silly, because it is intended to join two strings together in a simple way, and in this case it makes the syntax look weird.
Yet another way would be to more simply write a function that take two arguments and accomplishes the same thing:
any_name_in_front(Name, ListOfThings) ->
io:format("~tp~n", [[Name | ListOfThings]]).
The double [[]] is because io:format/2 takes a list as its second argument, and you want to pass a list of one thing (itself a list) into a single format substitution slot (the "~tp" part).
One thing to note is that capitalization matters in Erlang. It has a meaning. Module and function names are atoms, which are not the same thing as variables. For this reason they must be lowercase, and because they must be lowercase to start with the convention is to use underscores between words instead of usingCamelCase. Because, well, erlangIsNotCpp.
Play around in the shell a bit with the simple elements of the function you want, and once you have them ironed out write it into a source file and give it a try.

Splitting a mixed string/number argument list of a Lua function call in C++/Qt

I want to parse the argument list of a Lua function call in C++ using Qt (4.8) in order to avoid a dependency to the Lua interpreter. The comma-separated argument list can be assumed to consist only of string literals and numbers. Eventually the result should be available as a QStringList. The tricky part there is to cope with commas that are part of string arguments as well with the fact that string arguments may use single or double quotes. Until I get to a solution (using regular expressions) myself, somebody might already have dealt with that or a similar problem.
Example:
The argument list string
"Foo", "not 'bar'", 'a, b ,c', 42, 1e-8
should be transformed to a string list containing the items
Foo, not 'bar', a, b, c, 42 and 1e-8
(omitting the quotes per item to avoid confusion)
Not familiar with all the possibilities of your arguments, but the examples you mentioned get correctly matched with this: (?<=")[\w',-]*?(?=")|(?<=^'|\s').*(?='(?:,|$))|[\w-]+, as seen here: https://regex101.com/r/rX7fX7/3
The idea is that you write the "difficult" situations in alternations, preferably to the left, while the less difficult solutions to the right. This way, the engine will first check if a problem situation is present before trying to match whole words.
The current regex doesn't work correctly if quotes/doublequotes appear in middle of the arguments, but your examples didn't have such situations.

How to split a simple Lisp-like code to tokens in C++?

Basically, the language has 3 list and 3 fixed-length types, one of them is string.
This is simple to detect the type of a token using regular expressions, but splitting them into tokens is not that trivial.
String is notated with double-quote, and double-qoute is escaped with backslash.
EDIT:
Some example code
{
print (sum (1 2 3 4))
if [( 2 + 3 ) < 6] : {print ("Smaller")}
}
Lists like
() are argument lists that are only evaluated when necessary.
[] are special list to express 2 operand operations in a prettier
way.
{} are lists that are always evaluated. First element is a function
name, second is a list of arguments, and this repeats.
anything : anything [ : anything [: ...]] translate to argument lists that have the elements joined by the :s. This is only for making loops and conditionals look better.
All functions take a single argument. Argument lists can be used for functions that need more. You can fore and argument list to evaluate using different types of eval functions. (There would be eval functions for each list model)
So, if you understand this, this works very similar like Lisp does, it's only has different list types for prettifying the code.
EDIT:
#rici
[[2 + 3] < 6] is OK too. As I mentioned, argument lists are evaluated only when it's necessary. Since < is a function that requires an argument list of length 2, (2 + 3) must be evaluated somehow, other ways it [(2 + 3) < 6] would translate to < (2 + 3) : 6 which equals to < (2 + 3 6) which is and invalid argument list for <. But I see you point, it's not trivial that how automatic parsing in this case should work. The version that I described above, is that the [...] evaluates arguments list with a function like eval_as_oplist (...) But I guess you are right, because this way, you couldn't use an argument list in the regular way inside a [...] which is problematic even if you don't have a reason to do so, because it doesn't lead to a better code. So [[. . .] . .] is a better code, I agree.
Rather than inventing your own "Lisp-like, but simpler" language, you should consider using an existing Lisp (or Scheme) implementation and embedding it in your C++ application.
Although designing your own language and then writing your own parser and interpreter for it is surely good fun, you will have hard time to come up with something better designed, more powerful and implemented more efficiently and robustly than, say, Scheme and it's numerous implementations.
Chibi Scheme: http://code.google.com/p/chibi-scheme/ is particularly well suited for embedding in C/C++ code, it's very small and fast.
I would suggest using Flex (possibly with Bison) or ANTLR, which has a C++ output target.
Since google is simpler than finding stuff on my own file server, here is someone else's example:
http://ragnermagalhaes.blogspot.com/2007/08/bison-lisp-grammar.html
This example has formatting problems (which can be resolved by viewing the HTML in a text editor) and only supports one type of list, but it should help you get started and certainly shows how to split the items into tokens.
I believe Boost.Spirit would be suitable for this task provided you could construct a PEG-compatible grammar for the language you're proposing. It's not obvious from the examples as to whether or not this is the case.
More specifically, Spirit has a generalized AST called utree, and there is example code for parsing symbolic expressions (ie lisp syntax) into utree.
You don't have to use utree in order to take advantage of Spirit's parsing and lexing capabilities, but you would have to have your own AST representation. Maybe that's what you want?

Automatic regex builder

I have N strings.
Also, there are K regular expressions, unknown to me. Each string is either matching one of the regular expressions, or it is garbage. There are total of L garbage strings in the set. Both K and L are unknown.
I'd like to deduce the regular expressions. Obviously, this problem has infinite number of solutions. I need to find a "reasonably good solution", which
1) minimizes K
2) minimizes L
3) maximizes "specifics" of the regular expressions. I don't know what't the right term for this quality. For example, the string "ab123" can be described as /ab\d+/ or /\w+.+/, but the first regex is more "specific".
All 3 requirements need to be taken as one compound criteria, with certain reasonable weights.
A solution for one particular case: If L = 0 and K = 1 (just one regex, and no garbage), then we can just find LCS (longest common subsequence) for the strings and come up with a corresponding regex from there. However, when we have "noise" (L > 0), this approach doesn't work.
Any ideas (or pointers to existing work) are greatly appreciated.
What you are trying to do is language learning or language inference with a twist: instead of generalising over a set of given examples (and possibly counter-examples), you wish to infer a language with a small yet specific grammar.
I'm not sure how much research is being done on that. However, if you are also interested in finding the minimal (= general) regular expression that accepts all n strings, search for papers on MDL (Minimum Description Length) and FSMs (Finite State Machines).
Two interesting queries at Google Scholar:
"minimum description length" automata
"language inference" automata
The key words in academia are "grammatical inference". Unfortunately, there aren't any efficient, general algorithms to do the sort of thing you're proposing. What's your real problem?
Edit: it sounds like you might be interested in Data Description Languages. PADS (http://www.padsproj.org/) is a typical example.
Nothing clever here, perhaps I don't fully understand the problem?
Why not just always reduce L to 0? Check each string against each regex; if a string doesn't match any of the regex's, it's garbage. if it does match, remember the regex/string(s) that did match and do LCS on each L = 0, K = 1 to deduce each regex's definition.