List to String in Racket - regex

I've got a list defined like this:
(define testlist '((Dog <=> Cat)
(Anne <=> Dodd))
Is there any way to turn: (car testlist) into a string so I can use regexp on it to search for "<=>"?

Let me start with this extremely relevant Jamie Zawinski quote:
Some people, when confronted with a problem, think, “I know, I'll use regular expressions.” Now they have two problems.
You really really don't want to use regular expressions here. For one thing, a regexp-based solution will break when you have identifiers with <=> in the middle of them.
For another, it's really easy to solve this problem without using regular expressions.
There are a whole bunch of "right answers" here, depending on what exactly you're trying to do, but let me start by pointing out that you can use the "member" function to see whether a list contains the symbol '<=> :
#lang racket
(define testlist '((Dog <=> Cat)
(Anne <=> Dodd)))
(cond [(member '<=> (car testlist)) "yep"]
[else "nope"])
I suspect that you're trying to parse these as logical equivalences, in which case you'll need to define the possible structures of the statements, and go from there, but let's just start by NOT USING REGULAR EXPRESSIONS :).

Related

Regex to extract S expression?

I'm wondering if it's possible to do a pass on parsing of a define expression in lisp with a single regular expression, for example with the following input:
#lang sicp
(define (square x) (* x x))
(define (average x y) (/ (+ x y) 2))
; using block scope
(define (sqrt x)
; x is always the same -- "4" or whatever we pass to it, so we don't need that
; in every single function that we define, we can just inherit it from above.
(define (improve guess) (average guess (/ x guess)))
(define (good-enough? guess) (< (abs (- (square guess) x)) 0.001 ))
(define (sqrt-iter guess) (if (good-enough? guess) guess (sqrt-iter (improve guess))))
(sqrt-iter 1.0)
)
(sqrt 4)
I would want to highlight the three procedures below (none of the function-scoped procedures) that start with define. The process I was thinking (if I were to do it iteratively) would be:
Remove comments.
Grab the start of the define with \(\s*define
Consume balanced parentheses up until the unbalanced ) that finishes our procedure. For a regex, something like: (?:\([^)]*\))*, though I'm sure it gets much more complex with the greediness of the *'s.
And this wouldn't even be taking into account I could have a string "( define )" that we'd also want to ignore.
Would it be possible to build a regex for this, or too complicated? Here is my starting point, which is a long way from complete: https://regex101.com/r/MlPmOd/1.
As a preface, there is a famous quote due to Jamie Zawinski:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
The one word answer to your question is 'no'. Regular languages – the languages that regular expressions can recognise – are a proper subset of context-free languages, and the written form of s-expressions is context-free but not regular. So no regular expression can recognise the written form of an s-expression.
To see this consider a very tiny subset of s-expressions:
n = () | ( n)
So n consists of the set {(), (()), ((())), ...}, where the number of left parens and right parens in each string are equal. Such a language can't be recognised by a regular expression because you need to count parens.
Notes
Some instances of what are called 'regular expressions' in various programming languages are in fact more powerful than regular expressions and can therefore recognise classes of languages larger than regular languages. jwz's quote still applies: just because, perhaps, you can does not mean you should.
All programmers should in my opinion learn enough formal language theory to be dangerous. I don't know what a good modern reference is, but I learnt it from the Cinderella book: Hopcroft & Ullman, Introduction to Automata Theory, Languages, and Computation.
All Lisp programmers should in my opinion write a toy reader for s-expressions, as this is a good way of learning about how the real reader works, and doesn't take long.

Subset / Subsequence Recursive Procedure in Simply Scheme Lisp

I am working my way through Simply Scheme in combination with the Summer 2011 CS3 Course from Berkley. I am struggling with my understanding of the subset / subsequence procedures. I understand the basic mechanics once I'm presented with the solution code, but am struggling to grasp the concepts enough to come up with the solution on my own.
Could anyone point me in the direction of something that might help me understand it a little bit better? Or maybe explain it differently themselves?
This is the basis of what I understand so far:
So, in the following procedure, the subsequences recursive call that is an argument to prepend, is breaking down the word to its basest element, and prepend is adding the first of the word to each of those elements.
; using words and sentences
(define (subsequences wd)
(if (empty? wd)
(se "")
(se (subsequences (bf wd))
(prepend (first wd)
(subsequences (bf wd))))))
(define (prepend l wd)
(every (lambda (w) (word l w))
wd))
; using lists
(define (subsequences ls)
(if (null? ls)
(list '())
(let ((next (subsequences (cdr ls))))
(append (map (lambda (x) (cons (car ls) x))
next)
next))))
So the first one, when (subsequences 'word) is entered, would return:
("" d r rd o od or ord w wd wr wrd wo wod wor word)
The second one, when (subsequences '(1 2 3)) is entered, would return:
((1 2 3) (1 2) (1 3) (1) (2 3) (2) (3) ())
So, as I said, this code works. I understand each of the parts of the code individually and, for the most part, how they work with each other. The nested recursive call is what is giving me the trouble. I just don't completely understand it well enough to write such code on my own. Anything that might be able to help me understand it would be greatly appreciated. I think I just need a new perspective to wrap my head around it.
Thanks in advance for anyone willing to point me in the right direction.
EDIT:
So the first comment asked me to try and explain a little more about what I understand so far. Here it goes:
For the words / sentence procedure, I think that it's breaking the variable down to it's "basest" case (so to speak) via the recursive call that appears second.
Then it's essentially building on the basest case, by prepending.
I don't really understand why the recursive call that appears first needs to be there then.
In the lists one, when I was writing it on my own I got this:
(define (subseq lst)
(if (null? lst)
'()
(append (subseq (cdr lst))
(prepend (car lst)
(subseq (cdr lst))))))
(define (prepend i lst)
(map (lambda (itm) (cons i itm))
lst))
With the correct solution it looks to me like the car of the list would just drop off and not be accounted for, but obviously that's not the case. I'm not grasping how the two recursive calls are working together.
Your alternate solution is mostly good, but you've made the same mistake many people make when implementing this (power-set of a list) function for the first time: your base case is wrong.
How many ways are there to choose a subset of 0 or more items from a 0-element list? "0" may feel obvious, but in fact there is one way: choose none of the items. So instead of returning the empty list (meaning "there are no ways it can be done"), you should return (list '()) (meaning, "a list of one way to do it, which is to choose no elements"). Equivalently you could return '(()), which is the same as (list '()) - I don't know good Scheme style, so I'll leave that to you.
Once you've made that change, your solution works, demonstrating that you do in fact understand the recursion after all!
As to explaining the solution that was provided to you, I don't quite see what you think would happen to the car of the list. It's actually very nearly the exact same algorithm as the one you wrote yourself: to see how close it is, inline your definition of prepend (that is, substitute its body into your subsequences function). Then expand the let binding from the provided solution, substituting its body in the two places it appears. Finally, if you want, you can swap the order of the arguments to append - or not; it doesn't matter much. At this point, it's the same function you wrote.
Recursion is a tool which is there to help us, to make programming easier.
A recursive approach doesn't try to solve the whole problem at once. It says, what if we already had the solution code? Then we could apply it to any similar smaller part of the original problem, and get the solution for it back. Then all we'd have to do is re-combine the leftover "shell" which contained that smaller self-similar part, with the result for that smaller part; and that way we'd get our full solution for the full problem!
So if we can identify that recursive structure in our data; if we can take it apart like a Russian "matryoshka" doll which contains its own smaller copy inside its shell (which too contains the smaller copies of self inside itself, all the way down) and can put it back; then all we need to do to transform the whole "doll" is to transform the nested "matryoshka" doll contained in it (with all the nested dolls inside -- we don't care how many levels deep!) by applying to it that recursive procedure which we are seeking to create, and simply put back the result:
solution( shell <+> core ) == shell {+} solution( core )
;; -------------- ----
The two +s on the two sides of the equation are different, because the transformed doll might not be a doll at all! (also, the <+> on the left is deconstructing a given datum, while {+} on the right constructs the overall result.)
This is the recursion scheme used in your functions.
Some problems are better fit for other recursion schemes, e.g. various sorts, Voronoi diagrams, etc. are better done with divide-and-conquer:
solution( part1 <+> part2 ) == solution( part1 ) {+} solution( part2 )
;; --------------- ----- -----
As for the two -- or one -- recursive calls, since this is mathematically a function, the result of calling it with the same argument is always the same. There's no semantic difference, only an operational one.
Sometimes we prefer to calculate a result and keep it in memory for further reuse; sometimes we prefer to recalculate it every time it's needed. It does not matter as far as the end result is concerned -- the same result will be calculated, the only difference being the consumed memory and / or the time it will take to produce that result.

Remove a specific item in a list?

I want to preface this by saying that yes, this is a homework problem I'm working on and I don't want the actual answer, just maybe a nudge in the right direction. Anyhoo, I'm taking a class on programming languages' structures, and one of our projects is to write a variety of small programs in lisp. This one requires the user to input a list and an atom, then remove all instances of the atom from the list. I've scoured the internet and haven't found all that many good lisp resources, so I'm turning to you all.
Anyways, our professor has given us very little by way of stuff to work off of, and by very little I mean practically nothing.
This is what I have so far, and it doesn't work.
(defun removeIt (a lis)
(if (null lis) 0
(if (= a (car lis))
(delete (car lis))
(removeIt (cdr lis)))))
And when I type
(removeIt 'u '(u e u e))
as the input, it gives me an error stating it got 1 argument when it wanted 2. What errors am I making?
First, a few cosmetic changes:
(defun remove-it (it list)
(if (null list) 0
(if (= it (car list))
(delete (car list))
(remove-it (cdr list)))))
Descriptive and natural sounding identifier names are preferred in the CL community. Don't be shy to use names like list – CL has multiple namespaces, so you don't have to worry about clashes too much. Use hyphens instead of camel case or underscores. Also, read a short style guide.
You said you didn't want the answer but helpful tips, so here we go:
Check your base case – your result will be a list, so why do you return a number?
Use the appropriate comparison function – = is for numbers only.
You are building a new result list, so no need to delete anything – just don't add to it what you don't want.
But remember to add what you want – build your result list by consing what you want to keep to the result of applying your function to the rest of the list.
If you don't want to keep an element, just go on applying your function to the rest of the list.
You defined your function to take two arguments, but you're calling it with (cdr list) only. Provide the missing argument.
I've scoured the internet and haven't found all that many good lisp
resources,
Oh, come on.
Anyhow, I recommend Touretzky.
By the way, the function you're trying to implement is built-in, but your professor probably won't accept it as a solution, and doing it yourself is a good exercise. (For extra credit, try solving it for nested lists.)
This is a good case for a recursive function. Suppose there exists already a function called my-remove which takes an atom and a list as arguments and returns the list without the given atom. So (my-remove 'Y '(X Y Z)) => '(X Z)
Now, how would you use this function when instead of the list '(X Y Z) you have another list which is (A X Y Z), i.e. with an element A in front?
You would compare A to your atom and then, depending on whether the element A matches your atom, you would add this element A or not to the result of applying remove to the rest of the list.
With this recursion the function my-remove will be called successively with shorter lists. Now you only have to think about the base case, i.e. what does the function my-remove have to return when the list is empty.
This is an answer for other people looking specifically for elisp. A builtin function exists for this purpose called delq
Example
(setq my-list '(0 40 80 40 90)) ;; test list
(delq 40 my-list) ;; (0 80 90)
If you installed emacs from source you can check out how it is implemented by doing Mx find-function delq

Combing multiple functions into a single function

I have written several functions that input strings and use varying regular expressions to search for patterns within the strings. All of the functions work on the same input [string]. What is the optimal way to combine all such functions into a single function?
I had tried combining the all of the regular expressions into a single regex, but ran into issues of degeneracy. Whereby the pattern fit multiple regular expressions and was outputting incorrect results. Next, I tried using the threading arrows -> and ->> but was unable to get those to work. I believe this might be the right option to use, but could not get the functions to run properly. So I am unable to test my hypothesis.
As an example of two functions to combine consider the following:
(defn fooip [string]
(re-seq #"\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b" string))
and
(defn foophone [string]
(re-seq #"[0-9]{3}-?[0-9]{3}-?[0-9]{4}" s))
If you have multiple functions that you want to combine into a function that will return the result of applying each function to the same input, that is exactly the purpose of juxt.
(def foo (juxt foophone fooip))
(foo "555-555-5555 222.222.222.222 888-888-8888")
;=> [("555-555-5555" "888-888-8888") ("222.222.222.222")]
Your question is a little vague, but the threading arrows' purpose is to apply multiple functions sequentially to the output of each other: (-> 1 inc inc inc), for example, is equivalent to (inc (inc (inc 1))).
From your code samples, it looks like you have multiple regexes you want to match against a single input string. The simple way to do that is to use for:
(for [r [#"foo" #"bar" #"baz"]] (re-seq r s))
To check for both patterns you can use or:
(defn phone-or-ip [s]
(or (matchphone s) (matchip s)))
There isn't one proper way to combine functions. It depends what you want to do.
P.S. There are ways to combine the regexps themselves. The naïve way is to just use | and parentheses to combine the two. I think there are optimizers, which can improve such patterns.

Peter Norvig's regular expression compiler from Udacity course rewritten in Racket

These days I have took Peter Norvig's Udacity course CS212: DESIGN OF COMPUTER PROGRAMS.
Unfortunately, the course is all in Python so, for the sake of learning, I wrote the equivalent code in Racket for regex compiler given in unit 3 of that course.
You can see my code here: http://codepad.org/8x0rMXOi
Now, what's bothering me is that Mr. Norvig's original code in Python is somewhat shorter than mine. :( But ok, I'm just beginner in Racket, and that is expected. But I wonder if some Racket expert can shorten my code, so that Racket code become shorter than original Norvig's Python code?
Here are a few tips.
The ormap-expression in:
(define (in-chars? c chars)
(ormap (lambda (ch) (equal? c ch)) (string->list chars)))
can be written as
(memv c (string->list chars))
The if-epression in
(define (match pattern text)
(define remainders (pattern text))
(if (not (set-empty? remainders))
(substring text 0 (- (string-length text)
(string-length (argmin string-length
(set->list remainders)))))
#f))
can be written as
(and (not (set-empty? remainders))
(substring ...)
However your function are small and to the point, so I wouldn't change much.
A more convenient syntax for manipulating strings would make
it easier to read and write string manipulation programs. Some years ago I made an attempt and wrote a concat macro.
I used it to implement Norvig's spelling checker (his original article might interest you). The resulting spelling checker and the concat macro is explained here
http://blog.scheme.dk/2007/04/writing-spelling-corrector-in-plt.html
Update: I have written an updated version of the spell checker.
The concat macro is makes simple string manipulation shorter.
https://github.com/soegaard/this-and-that/blob/master/spell-checker.rkt