test.check generate strings of a certain length - clojure

In using test.check I need a generator for strings of a certain length. Phone numbers, postal codes, social security numbers are all examples of this type of data. Although the examples appear to be only numbers, my question is for strings in general.

Given length the generator below generates random strings:
(gen/fmap #(apply str %)
(gen/vector gen/char-alpha length))
(gen/vector gen/char-alpha length) generates sequences of characters and the fmap converts them into strings:
(apply str [\a \b]) ;; => "ab"
If a custom alphabet (say [\a \b \c]) is needed gen/char-alpha can be substituted with something like:
(gen/elements alphabet)
For more complex generators, like formatted phone numbers, test.chuck's string-from-regex might be a better choice than manually combining official generators.

This function will generate a string of a given length with characters from a given alphabet (optional). If you don't pass any alphabet as an argument, a default will be used, which you can of course change.
(defn generate-string
([length]
(generate-string length
(map char (range 49 127))))
([length alphabet]
(apply str (take length (repeatedly #(rand-nth alphabet))))))
Examples:
(generate-string 7 [\a \b \c])
"bacacbb"
(generate-string 10)
"mxqE<OKH3L"

You can use the more primitive generators to quickly build one that does just that:
For alphanumeric strings between min and max:
(sgen/fmap str/join (sgen/vector (sgen/char-alphanumeric) min max))
For alphanumeric strings of exactly a given length
(sgen/fmap str/join (sgen/vector (sgen/char-alphanumeric) length))
And you can modify (sgen/char-alphanumeric) accordingly to whatever your character range needs to be, such as a string of min/max with alphanumeric and underscore and dash character as well, with different frequencies of each character showing up:
(sgen/fmap str/join
(sgen/vector
(sgen/frequency [[99 (sgen/char-alphanumeric)]
[1 (sgen/elements #{"_" "-"})]])
min max))

Related

(Clojure) Count how many times any character appear in a string

I am trying to write a function (char-count) which takes a pattern and a string, then returns a number (count) which represents how many times any of the characters in the pattern appear in the string.
For example:
(char-count "Bb" "Best buy")
would return 2 since there is 1 match for B and 1 match for b, so added together we get 2
(char-count "AaR" "A Tale of Recursion")
would return 3 and so on
I tried using re-seq in my function, but it seems to work only for continuous strings. As in (re-seq #Bb "Best Buy) only looks for the pattern Bb, not for each individual character.
This is what my function looks like so far:
(defn char-count [pattern text]
(count (re-seq (#(pattern)) text)))
But it does not do what I want. Can anybody help?
P.s. Very new to clojure (and functional programming in general).
You don't need anything nearly as powerful as a regular expression here, so just use the simple tools your programming language comes with: sets and functions. Build a set of the characters you want to find, and count how many characters from the input string are in the set.
(defn char-count [chars s]
(count (filter (set chars) s)))
Try wrapping the characters in [...] within the RegEx:
(count (re-seq #"[Bb]" "Best buy"))
Or, since you need that pattern to be dynamic:
(count (re-seq (re-pattern (str "[" pattern "]")) text))
But note that the solution might not work properly if the pattern contains special RegEx characters such as [, ], \, -, ^ - you'd have to escape them by prepending \\ in front of each one.

How to compare two # character in clojure

Please, how can I write this in clojure? I have an # character and I want to compare it to "#".
eg (= # "#")
gives error (ClassCastException java.lang.String cannot be cast to java.util.concurrent.Future clojure.core/deref-future (core.clj:2206))
There's an inherent conflict in this comparison, in that "#" is a string sequence of characters and \# is an individual character. If you wanted to confirm that the string consisted of a single character which matched the \# symbol, something like the following would be work:
(let [s "#"]
(and (= \# (first s)) (= (count s) 1)))
However, if you want to detect if the string contains any \# characters or if it just started with an \# character, that requires different code. This is the problem with comparing strings and characters -- it's not inherently obvious what you need from the comparison.
Why do you get the error? The reader translates your example into ...
(= (deref "#"))
The deref function tests whether the argument is deref-able (implementing IDeref). If not, it treats the argument as a future. It isn't, so it throws the confusing exception. This behaviour is a defect, albeit a minor one.
By the way, (= x) returns true for any x, if it returns at all.
Tim Clemons' answer shows what you can do about this.
Yoy can quote your # using \# for Clojure to interpret it as a character literal. By default # is a reader macro for a deref form to make many other things in clojure less verbose. Anyway:
(= \# \#)
true
If you want to check that the first character of a string is a # then:
(= \# (first "#"))
true
The following only works in Clojurescript as it doesn't have a character type and just uses strings of length one.
(= \# "#")
true ;; in cljs only

Clojure, replace each instance of a character with a different value from a collection

First of all, I am not sure how to easily word the title.
The problem I have is give a string insert value here ? I want to be able to swap the ? with a value of my choice, i can do this using clojure.string/replace.
Now, the use case I require is slightly more complex, given a string like:
these are the specified values: ?, ?, ?, ?
I want to replace the values of the ? with values from a collection which could look like:
[2 389 90 13]
so in this example the string would now read:
these are the specified values: 2, 389, 90, 13
so ? x maps to collection x (e.g. ? 0 maps to collection 0)
The number of ? will not always be 4 or a specific n, however the length of the collection will always be the same as the number of ?.
I tried doing the following:
(mapv #(clojure.string/replace-first statement "?" %) [1 2 3 4])
But this doesn't produce the desired results in produces a vector of size 4 where only the first ? is replaced by the value.
I am lost due the inability to modify variables in clojure, and I don't want to have a global string which is redefined and passed to a function n times.
While I agree that DaoWen's answer is likely the most practical, the end of your question seems worth discussing a little as well as a matter of learning functional approaches. You're essentially looking for a way to
Take the initial string and the first value and use replace-first to make another string from them.
Take the result of that and the next value from the sequence and use replace-firston them.
Repeat 2 until you've gone through the entire sequence.
This is actually a classic pattern in mathematics and functional programming, called a "left fold" or "reduction" over the value sequence. Functional languages usually build it into their standard libraries as a higher order function. In Clojure it's called reduce. Implementing your attempted strategy with it looks something like
(reduce #(clojure.string/replace-first %1 "?" %2)
"these are the specified values: ?, ?, ?, ?"
[2 389 90 13])
; => "these are the specified values: 2, 389, 90, 13"
Note that unlike your similar function literal, this takes two arguments, so that the statement can be rebound as we proceed through the reduction.
If you want to see what happens along the way with a reduce, you can swap it out with reductions. Here you get
("these are the specified values: ?, ?, ?, ?" ;After 0 applications of our replace-first fn
"these are the specified values: 2, ?, ?, ?" ;;Intermediate value after 1 application
"these are the specified values: 2, 389, ?, ?" ;;after 2...
"these are the specified values: 2, 389, 90, ?"
"these are the specified values: 2, 389, 90, 13");;Final value returned by reduce
There might be other considerations that aren't covered by your question—but as written, it seems like you should just be using the string format function:
(apply format
"these are the specified values: %s, %s, %s, %s"
[2 389 90 13])
; => "these are the specified values: 2, 389, 90, 13"
DaoWen has given the pragmatic answer for strings but you do have to replace your "?" with "%s" first.
But suppose this wasn't a string. Clojure has a great collection library that often means you can avoid recursion or reducers but I couldn't think of a way to do that here without getting really inefficient and messy. So here's a reduce solution applicable to non-strings, with lots of comments.
(let [original "these are the specified values: ?, ?, ?, ?"
replacements [2 389 90 13]
starting-accumulator [[] replacements] ; Start with an empty vector and all of the replacements.
reducing-fn (fn [[acc [r & rs :as all-r]] o] ; Apply destructuring liberally
(if (= o \?) ; If original character is "?".
[(conj acc r) rs] ; Then add the replacement character and return the rest of the remaining replacements.
[(conj acc o) all-r])) ; Else add the original character and return all the remaining replacements.
reduced (reduce reducing-fn starting-accumulator original) ; Run the reduce.
result (first reduced) ; Get the resulting seq (discard the second item: the remaining empty seq of replacements).
string-joined (apply str result)] ; The string was turned into a seq of chars by `reduce`. Turn it back into a string.
string-joined)

Phone number regular expressions

Included is some code that can find a single number such as "555-555-5555" in a string. But I'm not quite sure how to extend the code to find all phone numbers within a string. The code stops after it has found the first number...
(defn foo [x]
(re-find (re-matcher #"((\d+)-(\d+)-(\d+))" x)))
Is there a way to extend this code to find all numbers within a string?
re-seq returns a sequence of all the matches to a regex in a string:
user> (defn foo [x] (re-seq #"\d+-\d+-\d+" x))
#'user/foo
user> (foo "111-222-3333 555-666-7777")
("111-222-3333" "555-666-7777")
user> (foo "phone 1: 111-222-3333 phone 2: 555-666-7777")
("111-222-3333" "555-666-7777")
So it will keep going until it finds all the phone numbers in the string.
I you are interested in searching all possible phone numbers, depending on the region / country code and other parameters, check the phone-number library:
https://github.com/randomseed-io/phone-number
There is a function find-numbers for that purpose:
https://randomseed.io/software/phone-number/phone-number.core#var-find-numbers

regexp for elisp

In Emacs I would like to write some regexp that does the following:
First, return a list of all dictionary words that can be formed within "hex space". By this I mean:
#000000 - #ffffff
so #00baba would be a word (that can be looked up in the dictionary)
so would #baba00
and #abba00
and #0faded
...where trailing and leading 0's are considered irrelevant. How would I write this? Is my question clear enough?
Second, I would like to generate a list of words that can be made using numbers as letters:
0 = o
1 = i
3 = e
4 = a
...and so on. How would I write this?
First, load your dictionary. I'll assume that you're using /var/share/dict/words, which is nearly always installed by default when you're running Linux. It lists one word per line, which is a very handy format for this sort of thing.
Next run M-x keep-lines. It'll ask you for a regular expression and then delete any line that doesn't match it. Use the regex ^[a-f]\{,6\}$ and it will filter out anything that can't be part of a color.
Specifically, the ^ makes the regex start at the beginning of the line, the [a-f] matches any one character that is between a and f (inclusive), the {,6} lets it match between 0 and 6 instances of the previous item (in this case the character class [a-f] and finally the $ tells it that the next thing must be the end of the line.
This will return a list of all instances of #000000 - #ffffff in the buffer, although this pattern may not be restrictive enough for your purposes.
(let ((my-colour-list nil))
(save-excursion
(goto-char (point-min))
(while (re-search-forward "#[0-9a-fA-F]\\{6\\}" nil t)
(add-to-list 'my-colour-list (match-string-no-properties 0)))
my-colour-list))
I'm not actually certain that this is what you were asking for. What do you mean by "dictionary"?
A form that will return you a hash table with all the elements you specify in it could be this:
(let ((hash-table (make-hash-table :test 'equal)))
(dotimes (i (exp 256 3))
(puthash (concat "#" (format "%06x" i)) t hash-table))
hash-table)
I'm not sure how Emacs will manage that size of elements (16 million). As you don't want the 0, you can generate the space without that format, and removing trailing 0's. I don't know what do you want to do with the rest of the numbers. You can write the function step by step like this then:
(defun find-hex-space ()
(let (return-list)
(dotimes (i (exp 256 3))
(let* ((hex-number (strip-zeros (format "%x" i)))
(found-word (gethash hex-number *dictionary*)))
(if found-word (push found-word return-list))))
return-list))
Function strip-zeros is easy to write, and here I suppose your words are in a hash called *dictionary*. strip-zeros could be something like this:
(defun strip-zeros (string)
(let ((sm (string-match "^0*\\(.*?\\)0*$" string)))
(if sm (match-string 1 string) string)))
I don't quite understand your second question. The words would be also using the hex space? Would you then consider only the words formed by numbers, or would also include the letters in the word?