(Clojure) Count how many times any character appear in a string - clojure

I am trying to write a function (char-count) which takes a pattern and a string, then returns a number (count) which represents how many times any of the characters in the pattern appear in the string.
For example:
(char-count "Bb" "Best buy")
would return 2 since there is 1 match for B and 1 match for b, so added together we get 2
(char-count "AaR" "A Tale of Recursion")
would return 3 and so on
I tried using re-seq in my function, but it seems to work only for continuous strings. As in (re-seq #Bb "Best Buy) only looks for the pattern Bb, not for each individual character.
This is what my function looks like so far:
(defn char-count [pattern text]
(count (re-seq (#(pattern)) text)))
But it does not do what I want. Can anybody help?
P.s. Very new to clojure (and functional programming in general).

You don't need anything nearly as powerful as a regular expression here, so just use the simple tools your programming language comes with: sets and functions. Build a set of the characters you want to find, and count how many characters from the input string are in the set.
(defn char-count [chars s]
(count (filter (set chars) s)))

Try wrapping the characters in [...] within the RegEx:
(count (re-seq #"[Bb]" "Best buy"))
Or, since you need that pattern to be dynamic:
(count (re-seq (re-pattern (str "[" pattern "]")) text))
But note that the solution might not work properly if the pattern contains special RegEx characters such as [, ], \, -, ^ - you'd have to escape them by prepending \\ in front of each one.

Related

Mapping a string using a map

(def conversions {"G" "C"
"C" "G"
"T" "A"
"A" "U"})
(defn to-rna [dna]
(map conversions dna)
)
(conversions "G") ;; Gives "C"
(to-rna "GC") ;; Gives (nil nil)
I'm attempting to do an exercise where I convert letters. I have a working solution, but I don't like it. I feel like the above ought to work, but evidently I'm wrong, because it doesn't.
Could someone explain to me why this is, and how I might properly achieve this?
When mapping over a string, it will treat the string as a sequence of characters. So, your code ends up looking for a \G and a \C entry in the map, which both return nil.
As dpassen says, you need to put a java.lang.Character in the map, not a length-1 string. Try this:
(def conversions { \G \C
\C \G
\T \A
\A \U })
I'm just starting learning Clojure myself so please take this answer with caution.
In addition to what's been already suggested, I would put the conversions map into a let form to keep your function "isolated". (As is, your function relies on a conversions being defined outside of its scope).
I also read (can't remember where exactly) that a common naming convention when writing functions that "convert" X to Y should be named as follow: x->y.
Finally I'd use a threading macro for improved readability.
(defn dna->rna [dna]
(let [conversions {\G \C
\C \G
\T \A
\A \U}]
(->> dna
(map conversions)
(string/join ""))))
(dna->rna "GC")
;; "CG"
FYI, Clojure has clojure.string/escape and clojure.string/replace that you might want to look at. escape is probably most similar to what you are doing.

How to compare two # character in clojure

Please, how can I write this in clojure? I have an # character and I want to compare it to "#".
eg (= # "#")
gives error (ClassCastException java.lang.String cannot be cast to java.util.concurrent.Future clojure.core/deref-future (core.clj:2206))
There's an inherent conflict in this comparison, in that "#" is a string sequence of characters and \# is an individual character. If you wanted to confirm that the string consisted of a single character which matched the \# symbol, something like the following would be work:
(let [s "#"]
(and (= \# (first s)) (= (count s) 1)))
However, if you want to detect if the string contains any \# characters or if it just started with an \# character, that requires different code. This is the problem with comparing strings and characters -- it's not inherently obvious what you need from the comparison.
Why do you get the error? The reader translates your example into ...
(= (deref "#"))
The deref function tests whether the argument is deref-able (implementing IDeref). If not, it treats the argument as a future. It isn't, so it throws the confusing exception. This behaviour is a defect, albeit a minor one.
By the way, (= x) returns true for any x, if it returns at all.
Tim Clemons' answer shows what you can do about this.
Yoy can quote your # using \# for Clojure to interpret it as a character literal. By default # is a reader macro for a deref form to make many other things in clojure less verbose. Anyway:
(= \# \#)
true
If you want to check that the first character of a string is a # then:
(= \# (first "#"))
true
The following only works in Clojurescript as it doesn't have a character type and just uses strings of length one.
(= \# "#")
true ;; in cljs only

test.check generate strings of a certain length

In using test.check I need a generator for strings of a certain length. Phone numbers, postal codes, social security numbers are all examples of this type of data. Although the examples appear to be only numbers, my question is for strings in general.
Given length the generator below generates random strings:
(gen/fmap #(apply str %)
(gen/vector gen/char-alpha length))
(gen/vector gen/char-alpha length) generates sequences of characters and the fmap converts them into strings:
(apply str [\a \b]) ;; => "ab"
If a custom alphabet (say [\a \b \c]) is needed gen/char-alpha can be substituted with something like:
(gen/elements alphabet)
For more complex generators, like formatted phone numbers, test.chuck's string-from-regex might be a better choice than manually combining official generators.
This function will generate a string of a given length with characters from a given alphabet (optional). If you don't pass any alphabet as an argument, a default will be used, which you can of course change.
(defn generate-string
([length]
(generate-string length
(map char (range 49 127))))
([length alphabet]
(apply str (take length (repeatedly #(rand-nth alphabet))))))
Examples:
(generate-string 7 [\a \b \c])
"bacacbb"
(generate-string 10)
"mxqE<OKH3L"
You can use the more primitive generators to quickly build one that does just that:
For alphanumeric strings between min and max:
(sgen/fmap str/join (sgen/vector (sgen/char-alphanumeric) min max))
For alphanumeric strings of exactly a given length
(sgen/fmap str/join (sgen/vector (sgen/char-alphanumeric) length))
And you can modify (sgen/char-alphanumeric) accordingly to whatever your character range needs to be, such as a string of min/max with alphanumeric and underscore and dash character as well, with different frequencies of each character showing up:
(sgen/fmap str/join
(sgen/vector
(sgen/frequency [[99 (sgen/char-alphanumeric)]
[1 (sgen/elements #{"_" "-"})]])
min max))

Open file in racket and use regex on said file to print matches

I have been trying to use regular expressions in racket on a text file full of random words separated by the end of line character \n. I'm trying to read in the file as a string or list (whichever is easiest and most intuitive) and use regex to print all the words in the file of length 6 that does not contain a certain letter (in this case the letter t). Below you can see how I read in the file but I am not sure how to use its resulting list because of the lack of variables. Also you can see below I try a test with regex that's true outcome is #f when I actually want the words grumpy and foobar returned excluding stumpy.
#lang racket
(require 2htdp/batch-io)
(require racket/match)
;(file->string "words.txt");;reads in a file to a string
;(file->list "words.txt);; reads in a file to a list
(define (listMatches)
(regexp-match #rx"\b[^<t> | ^<T> | ^<\n>]{<6>}\b" "grumpy\nstumpy\nfoobar" )
)
I am very new to Racket and would love some input, useful links, and any other help.
I would not use a regex at all, but rather use for/list, in combination with string-length and string-countains? to solve the problem. The overall solution looks something like this:
(call-with-input-file* "words.txt"
(lambda (f)
(for/list ([i (in-lines f)]
#:when (and (= (string-length i) 6)
(not (string-contains? i "t"))))
i)))
The use of call-with-input-file* takes a procedure, and in this case binds f to an open file. This way we do not need to close the file ourselves when we are done with it.
Finally, string-contains? was added relatively recently to Racket. And if you need to support older versions of Racket, you can use regexp-match to just search for "t", which is much easier.
One of the things Racket regular expressions can take as a value to match a regular expression against is an input port. This means you can look for matches in a file without having to first read from it; the matching code will do that part for you. Combine with using multi-line mode so that ^ and $ match after and before newlines as well as the very beginning and end of the input, and you get a simple approach using regexp-match* and a RE that matches 6 non-t characters on a line by themselves:
#lang racket/base
(require racket/port)
;;; Using a string port to demonstrate
(define input "grumpy\nstumpy\nfoobar")
(define (list-matches inp)
(map bytes->string/utf-8 (regexp-match* #px"(?m:^[^t]{6}$)" inp)))
(println (call-with-input-string input list-matches)) ; '("grumpy" "foobar")
The big thing to remember about using an input port is that what it returns are byte strings; you have to convert them to strings yourself.

regexp for elisp

In Emacs I would like to write some regexp that does the following:
First, return a list of all dictionary words that can be formed within "hex space". By this I mean:
#000000 - #ffffff
so #00baba would be a word (that can be looked up in the dictionary)
so would #baba00
and #abba00
and #0faded
...where trailing and leading 0's are considered irrelevant. How would I write this? Is my question clear enough?
Second, I would like to generate a list of words that can be made using numbers as letters:
0 = o
1 = i
3 = e
4 = a
...and so on. How would I write this?
First, load your dictionary. I'll assume that you're using /var/share/dict/words, which is nearly always installed by default when you're running Linux. It lists one word per line, which is a very handy format for this sort of thing.
Next run M-x keep-lines. It'll ask you for a regular expression and then delete any line that doesn't match it. Use the regex ^[a-f]\{,6\}$ and it will filter out anything that can't be part of a color.
Specifically, the ^ makes the regex start at the beginning of the line, the [a-f] matches any one character that is between a and f (inclusive), the {,6} lets it match between 0 and 6 instances of the previous item (in this case the character class [a-f] and finally the $ tells it that the next thing must be the end of the line.
This will return a list of all instances of #000000 - #ffffff in the buffer, although this pattern may not be restrictive enough for your purposes.
(let ((my-colour-list nil))
(save-excursion
(goto-char (point-min))
(while (re-search-forward "#[0-9a-fA-F]\\{6\\}" nil t)
(add-to-list 'my-colour-list (match-string-no-properties 0)))
my-colour-list))
I'm not actually certain that this is what you were asking for. What do you mean by "dictionary"?
A form that will return you a hash table with all the elements you specify in it could be this:
(let ((hash-table (make-hash-table :test 'equal)))
(dotimes (i (exp 256 3))
(puthash (concat "#" (format "%06x" i)) t hash-table))
hash-table)
I'm not sure how Emacs will manage that size of elements (16 million). As you don't want the 0, you can generate the space without that format, and removing trailing 0's. I don't know what do you want to do with the rest of the numbers. You can write the function step by step like this then:
(defun find-hex-space ()
(let (return-list)
(dotimes (i (exp 256 3))
(let* ((hex-number (strip-zeros (format "%x" i)))
(found-word (gethash hex-number *dictionary*)))
(if found-word (push found-word return-list))))
return-list))
Function strip-zeros is easy to write, and here I suppose your words are in a hash called *dictionary*. strip-zeros could be something like this:
(defun strip-zeros (string)
(let ((sm (string-match "^0*\\(.*?\\)0*$" string)))
(if sm (match-string 1 string) string)))
I don't quite understand your second question. The words would be also using the hex space? Would you then consider only the words formed by numbers, or would also include the letters in the word?