How to speed up this Clojure code? - regex

I am a beginner in clojure. I am trying to solve this simple problem on codechef using clojure. Below is my clojure code but this code is taking too long to run and gives TimeoutException. Can someone please help me to optimize this code and make it run faster.
(defn checkCase [str]
(let [len (count str)]
(and (> len 1) (re-matches #"[A-Z]+" str))))
(println (count (filter checkCase (.split (read-line) " "))))
Note: My program is not getting timedout due to input error. On codechef input is handled automatically (probably through input redirection. Please read the question for more details)
Thank you!

Most text finding exercises are exercizes in regexps, this one no different. It's usually pretty hard to find a more efficient way in whatever programming language that will outpace good regexp implementations.
In this case re-seq, look around regexps, repetition limiting and the multiline regexp flag (?m) are your friends
(defn find-acronyms
[s]
(re-seq #"(?m)(?<=\W|^)[A-Z]+(?=\W|$)" s))
(find-acronyms "I like coding and will participate in IOI Then there is ICPC")
=> ("IOI" "ICPC")
Let's dissect the regex:
(?m) The multiline flag: lets you match your regex over multiple lines, so no need to split into multiple strings
(?<=\W|^) The match should follow a non-word character or the beginning of the (multiline) string
[A-Z]{2,} Match concurrent capital letters, a minimum of 2
(?=\W|$) The match should be followed by a non-word character or the end of the (multiline) string

I can only guess that wherever you run this snippet of code, it doesn't feed anything to your read-line invocation. Or maybe it does, but doesn't send a newline as the last thing. So it hangs waiting.

(defn checkCase [str]
(let [len (count str)]
(and (> len 1) (re-matches #"[A-Z]+" str))))
(defn answer [str]
(println (count (filter checkCase (.split str " ")))))
So at the REPL:
=> (answer "GGG fff TTT")
;-> 2
;-> nil
The answer is being printed to the screen. But probably best to have your function return the answer rather than print it out:
(defn answer [str]
(count (filter checkCase (.split str " "))))
All I have done is replaced your (read-line) with an argument. (read-line) is expecting input from stdin and waiting for it forever - or until a timeout happens in your case.

I am not sure if this is the slow part of your code, but if it is your could try to split up the execution and safe gard the very slow regexp part by executing it when it is necessary. I think the current version with AND already does that. If it does not you can try to do something else, like this:
(defn checkCase [^String str]
(cond
(< (.length str) 2)
false
(re-matches #"[A-Z]+" str)
true
:else
false))

maybe you could try using re-seq instead of spltting the string and checking every item? So you will lose the filter, .split, and additional function call. Something like this:
(println (count (re-seq #"\b[A-Z]{2,}?\b" (read-line))))

You need to submit a Java program. You can test it on the command line before you submit it. You can but don't need to use redirection symbols (<,>). Just type the input and see that every time you do it returns the count after you have typed enter.
You will need aot compilation (Ahead Of Time, which means that .class files are included) and a main that is exported. Only then will it become a Java program.
Actually when they ask for a Java program they probably mean a .class file. You can run a .class file with the java program (which I imagine is what their test-runner does). Put it in a shell or batch file when testing, but just submit the .class file.

Related

Clojure and Leiningen: Why is `doall` needed in this example?

I'm currently using Leiningen to learn Clojure and I'm confused about the requirement of doall for running this function:
;; take a function (for my purposes, println for output formatting) and repeat it n times
(defn repeat [f n] (for [i (range n)] (f)))
(repeat println 2) works just fine in an REPL session, but not when running with lein run unless I include the doall wrapper. (doall (repeat println 2)) works and I'm curious why. Without it, lein run doesn't show the two blank lines in the output.
I also have:
(defmacro newPrint1 [& args] `'(println ~args))
(defmacro newPrint2 [& args] `'(println ~#args))
The first function I thought of myself. The next two macros are examples from a tutorial video I'm following on Udemy. Even if I wrap the macro calls with doall, such as (doall (newPrint1 1 2 3)), lein run produces no output, but (newPrint1 1 2 3) in a terminal REPL session produces the desired output of (clojure.core/println (1 2 3)) as it does in the video tutorial. Why doesn't doall work here?
for creates a lazy sequence. This lazy sequence is returned. The P in REPL (read eval Print loop) prints the sequence, thus realizing it. For realizing, the code to produce each element is run.
If you do not use the sequence, it is not realized, so the code is never run. In non-interactive use, this is likely the case. As noted, doall forces realization.
If you want to do side-effects, doseq is often better suited than for.

Is there an idiomatic way to avoid long Clojure string literals?

Various Clojure style guides recommend avoiding lines longer than 80 characters. I am wondering if there is an idiomatic way to avoid long String literals.
While it's common these days to have wide screens, I still agree that long lines should be avoided.
Here are some examples (I'm tempted to follow the first):
;; break the String literal with `str`
(println (str
"The quick brown fox "
"jumps over the lazy dog"))
;; break the String literal with `join`
(println (join " " [
"The quick brown fox"
"jumps over the lazy dog"]))
I am aware that Clojure supports multi-line String literals, but using this approach has the undesired effect of the newline characters being interpreted, e.g. using the repl:
user=> (println "The quick brown fox
#_=> jumps over the lazy dog")
The quick brown fox
jumps over the lazy dog
You should probably store the string inside an external text file, and read the file from your code. If you still feel the need to store the string in your code, go ahead and use str.
EDIT:
As requested, I will demonstrate how you can read long strings at compile time.
(defmacro compile-time-slurp [file]
(slurp file))
Use it like this:
(def long-string (compile-time-slurp "longString.txt"))
You may invent similar macros to handle Java Properties files, XML/JSON configuration, SQL queries, HTML, or whatever else you need.
I find it convenient to use str to create strings and use character literals such as \newline or \tab instead of "\n" to break them.
I rarely violate the 80-column rule this way.
The most idiomatic ways I know of are the following:
1) Use (str) to split the string over multiple lines.
(str "User " (:user context)
" is now logged in.")
This is probably the most idiomatic usage. I've seen this done in multiple libraries and projects. It is fast, since (str) uses a StringBuilder under the hood. It also allows you to mix code in transparently, as I've done in the example.
2) Allow strings to break the 80 char limit by themselves, when it makes sense.
(format
"User %s is now logged in."
(:user context))
Basically, it's ok to break the 80 char limit for strings. Chances are it's less likely you care about reading the string when you work with the code, and on the off chance you need to, exceptionally, you'll need to scroll horizontally.
I've wrapped the string in a (format) here to be able to inject code similarly to my previous example. You don't need to.
Less idiomatic ways would be:
3) Put your strings in files and load them from there.
(slurp "/path/to/userLoggedIn.txt")
With a file: /path/to/userLoggedIn.txt containing:
User logged in.
I advise against this because:
It introduces IO side effects
It has the potential to fail, say the path is wrong, the resource is missing or corrupted, the disk errors, etc.
It has performance implications, disk reads are slow.
Its hard to inject content from code if you need too.
I would say do this only if your text is really big. Or if the content of the string needs to be changed by non devs. Or if the content is obtained externally.
4) Have a namespace where you def all your strings in, and load them from there.
(ns msgs)
(defn logged-in-msg [user]
(format
"User %s is now logged in."
user))
Which you then use like this:
(msgs/logged-in-msg (:user context))
I prefer this over #3. You still need to allow to use #2 here, where it's ok to have strings break the 80 char limit. In fact, here you put strings by themselves on a line, so they are easy to format. If you use code analysis like checkstyle, you can exclude this file from the rule. It also does not suffer from the issues of #3.
If you are going with #3 or #4, you probably have a special use case for your strings, like internationalization, or having business edit them, etc. In those cases, you might be better served building a more robust solution, that could be inspired from the above methods, or using a library that specializes in those use cases.
(defmacro strs
([]
"")
([a]
(if (string? a) `~a `(str ~a)))
([a & more]
`(str
~#(->> (cons a more)
(partition-by string?)
(mapcat #(if (string? (first %)) (cons (apply str %) nil) %))))))
Usage:
(strs "one "
"two "
3
" four" " five")
; => "one two 3 four five"
Neighbouring literal strings will be concatenated at compile time.

Subtraction not working - Clojure

I am trying to make a very simple Nim game that probably isn't even considered a proper implementation of Nim, but I am only starting Clojure. Not sure why this subtraction on line four doesn't work...
1. (def nimBoard 10)
2. (println "There are" nimBoard "objects left")
3. (def in (read-line))
4. (- nimBoard in)
I can't seem to come up with a solid algorithm for asking the user if they want to remove one or two "objects" from the board until it is empty. I am coming from Java, but loops in this language just confuse me a lot. I know what I am trying to make isn't exactly the Game of Nim, but it is for practice.
I would appreciate any help:)
Since in is a string you read from standard in, you need to convert in to a number first before the subtraction. Try this:
(defn parse-int [s]
(Integer. (re-find #"\d+" s )))
(- nimBoard (parse-int in))

Clojure re-find reg-ex OR

I've been trying to get a simple reg-ex working in Clojure to test a string for some SQL reserved words (select, from, where etc.) but just can't get it to work:
(defn areserved? [c]
(re-find #"select|from|where|order by|group by" c))
(I split a string by spaces then go over all the words)
Help would be greatly appreciated,
Thanks!
EDIT: My first goal (after only reading some examples and basic Clojure materials) is to parse a string and return for each part of it (i.e. words) what "job" they have in the statement (a reserved word, a string etc.).
What I have so far:
(use '[clojure.string :only (join split)])
(defn isdigit? [c]
(re-find #"[0-9]" c))
(defn isletter? [c]
(re-find #"[a-zA-Z]" c))
(defn issymbol? [c]
(re-find #"[\(\)\[\]!\.+-><=\?*]" c))
(defn isstring? [c]
(re-find #"[\"']" c))
(defn areserved? [c]
(if (re-find #"select|from|where|order by|group by" c)
true
false))
(defn what-is [token]
(let [c (subs token 0 1)]
(cond
(isletter? c) :word
(areserved? c) :reserved
(isdigit? c) :number
(issymbol? c) :symbol
(isstring? c) :string)))
(defn checkr [token]
{:token token
:type (what-is token)})
(defn goparse [sql-str]
(map checkr (.split sql-str " ")))
Thanks for all the help guys! it's great to see so much support for such a relatively new language (at least for me :) )
I'm not entirely sure what you want exactly, but here's a couple of variations to coerce your first regex match to a boolean:
(defn areserved? [c]
(string?
(re-find #"select|from|where|order by|group by"c)))
(defn areserved? [c]
(if (re-find #"select|from|where|order by|group by"c)
true
false))
UPDATE in response to question edit:
Thanks for posting more code. Unfortunately there are a number of issues here that we could
try to address by patching your existing code in a simplistic and naïve fashion, but it will
only get you so far, before you hit the next problem with this single iteration approach.
#alex is correct, that your areserved? method will fail to match order by if you have already
split your string by white space. That said, a simple fix is to treat order and by as separate keywords (which they are, even though they always appear together).
The next issue is that the areserved? function will match keywords in a string, but you are dispatching it against a character in the what-is function. You nearly always get a match in your cond for isletter?, so you will everything is marked as a 'word'.
All in all, it looks like you are trying to do too much work in a single application of map.
I'm not sure if you are just doing this for fun to play with Clojure (which is admirable - keep going!), in which case, maybe it doesn't matter if you press on with this simple parsing approach... you'll definitely learn something; but if you would like to take it further and parse SQL more successfully, then I would suggest that you may find it helpful to to read a little on Lexing, Parsing and building Abstract Syntax Trees (AST).
Brian Carper has written about using the Java parser generator "ANTLR" from Clojure - it's a few years old, but might be worth looking at.
You also might be able to get some transferrable ideas from this chapter from the F# programming book on lexing and parsing SQL.

How do I beautify lisp source code?

My code is a mess many long lines in this language like the following
(defn check-if-installed[x] (:exit(sh "sh" "-c" (str "command -v " x " >/dev/null 2>&1 || { echo >&2 \"\"; exit 1; }"))))
or
(def Open-Action (action :handler (fn [e] (choose-file :type :open :selection-mode :files-only :dir ListDir :success-fn (fn [fc file](setup-list file)))) :name "Open" :key "menu O" :tip "Open spelling list"))
which is terrible. I would like to format it like so
(if (= a something)
(if (= b otherthing)
(foo)))
How can I beautify the source code in a better way?
The real answer hinges on whether you're willing to insert the newlines yourself. Many systems
can indent the lines for you in an idiomatic way, once you've broken it up into lines.
If you don't want to insert them manually, Racket provides a "pretty-print" that does some of what you want:
#lang racket
(require racket/pretty)
(parameterize ([pretty-print-columns 20])
(pretty-print '(b aosentuh onethunoteh (nte huna) oehnatoe unathoe)))
==>
'(b
aosentuh
onethunoteh
(nte huna)
oehnatoe
unathoe)
... but I'd be the first to admit that inserting newlines in the right places is hard, because
the choice of line breaks has a lot to do with how you want people to read your code.
I use Clojure.pprint often for making generated code more palatable to humans.
it works well for reporting thought it is targeted at producing text. The formatting built into the clojure-mode emacs package produces very nicely formatted Clojure if you put the newlines in your self.
Now you can do it with Srefactor package.
Some demos:
Formatting whole buffer demo in Emacs Lisp (applicable in Common Lisp as well).
Transform between one line <--> Multiline demo
Available Commands:
srefactor-lisp-format-buffer: format whole buffer
srefactor-lisp-format-defun: format current defun cursor is in
srefactor-lisp-format-sexp: format the current sexp cursor is in.
srefactor-lisp-one-line: turn the current sexp of the same level into one line; with prefix argument, recursively turn all inner sexps into one line.
Scheme variants are not as polished as Emacs Lisp and Common Lisp yet but work for simple and small sexp. If there is any problem, please submit an issue report and I will be happy to fix it.