Clojure split lines with all empty lines kept

Clojure split lines with all empty lines kept - clojure

I'd like to split a string by lines, keeping all empty lines, including trailing ones. The basic functions I found seem to trim these:
user=> (require 'clojure.string)
nil
user=> (clojure.string/split-lines "a\n\nb\n")
["a" "" "b"]
user=> (clojure.string/split "a\n\nb\n" #"\n")
["a" "" "b"]
I'd like the last empty line(s) kept, as in this python example:
>>> 'a\n\nb\n'.split('\n')
['a', '', 'b', '']
What is the right way to obtain that in clojure ?

Try this one:
user=> (clojure.string/split "a\n\nb\n" #"\n" -1)
https://clojuredocs.org/clojure.string/split

Related

How to correctly check if a string is equal to another string in Clojure?

I am looking for better ways to check if two strings are equal in Clojure!
Given a map 'report' like
{:Result Pass}
, when I evaluate
(type (:Result report))
I get : Java.Lang.String
To write a check for the value of :Result, I first tried
(if (= (:Result report) "Pass") (println "Pass"))
But the check fails.
So I used the compare method, which worked:
(if (= 0 (compare (:Result report) "Pass")) (println "Pass"))
However, I was wondering if there is anything equivalent to Java's .equals() method in Clojure. Or a better way to do the same.

= is the correct way to do an equality check for Strings. If it's giving you unexpected results, you likely have whitespace in the String like a trailing newline.
You can easily check for whitespace by using vec:
(vec " Pass\n")
user=> [\space \P \a \s \s \newline]

As #Carcigenicate wrote, use = to compare strings.
(= "hello" "hello")
;; => true
If you want to be less strict, consider normalizing your string before you compare. If we have a leading space, the strings aren't equal.
(= " hello" "hello")
;; => false
We can then define a normalize function that works for us.
In this case, ignore leading and trailing whitespace and
capitalization.
(require '[clojure.string :as string])
(defn normalize [s]
(string/trim
(string/lower-case s)))
(= (normalize " hellO")
(normalize "Hello\t"))
;; => true
Hope that helps!

replace multiple bad characters in clojure

I am trying to replace bad characters from a input string.
Characters should be valid UTF-8 characters (tabs, line breaks etc. are ok).
However I was unable to figure out how to replace all found bad characters.
My solution works for the first bad character.
Usually there are none bad characters. 1/50 cases there is one bad character. I'd just want to make my solution foolproof.
(defn filter-to-utf-8-string
"Return only good utf-8 characters from the input."
[input]
(let [bad-characters (set (re-seq #"[^\p{L}\p{N}\s\p{P}\p{Sc}\+]+" input))
filtered-string (clojure.string/replace input (apply str (first bad-characters)) "")]
filtered-string))
How can I make replace work for all values in sequence not just for the first one?
Friend of mine helped me to find workaround for this problem:
I created a filter for replace using re-pattern.
Within let code is currently
filter (if (not (empty? bad-characters))
(re-pattern (str "[" (clojure.string/join bad-characters) "]"))
#"")
filtered-string (clojure.string/replace input filter "")

Here is a simple version:
(ns xxxxx
(:require
[clojure.string :as str]
))
(def all-chars (str/join (map char (range 32 80))))
(println all-chars)
(def char-L (str/join (re-seq #"[\p{L}]" all-chars)))
(println char-L)
(def char-N (str/join (re-seq #"[\p{N}]" all-chars)))
(println char-N)
(def char-LN (str/join (re-seq #"[\p{L}\p{N}]" all-chars)))
(println char-LN)
all-chars => " !\"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNO"
char-L => "ABCDEFGHIJKLMNO"
char-N => "0123456789"
char-LN => "0123456789ABCDEFGHIJKLMNO"
So we start off with all ascii chars in the range of 32-80. We first print only the letter, then only the numbers, then either letters or numbers. It seems this should work for your problem, although instead of rejecting non-members of the desired set, we keep the members of the desired set.

What is idiomatic clojure to validate that a string has only alphanumerics and hyphen?

I need to ensure that a certain input only contains lowercase alphas and hyphens. What's the best idiomatic clojure to accomplish that?
In JavaScript I would do something like this:
if (str.match(/^[a-z\-]+$/)) { ... }
What's a more idiomatic way in clojure, or if this is it, what's the syntax for regex matching?

user> (re-matches #"^[a-z\-]+$" "abc-def")
"abc-def"
user> (re-matches #"^[a-z\-]+$" "abc-def!!!!")
nil
user> (if (re-find #"^[a-z\-]+$" "abc-def")
:found)
:found
user> (re-find #"^[a-zA-Z]+" "abc.!#####123")
"abc"
user> (re-seq #"^[a-zA-Z]+" "abc.!#####123")
("abc")
user> (re-find #"\w+" "0123!#####ABCD")
"0123"
user> (re-seq #"\w+" "0123!#####ABCD")
("0123" "ABCD")

Using RegExp is fine here. To match a string with RegExp in clojure you may use build-in re-find function.
So, your example in clojure will look like:
(if (re-find #"^[a-z\-]+$" s)
:true
:false)
Note that your RegExp will match only small latyn letters a-z and hyphen -.

While re-find surely is an option, re-matches is what you'd want for matching a whole string without having to provide ^...$ wrappers:
(re-matches #"[-a-z]+" "hello-there")
;; => "hello-there"
(re-matches #"[-a-z]+" "hello there")
;; => nil
So, your if-construct could look like this:
(if (re-matches #"[-a-z]+" s)
(do-something-with s)
(do-something-else-with s))

How to compare two regexps in Clojure?

I’m unit-testing a function which builds a regexp, but using = doesn’t work. How can I test that it returns the correct regexp?
Here is what I tried for an empty regexp:
(= #"" #"") ; false
(== #"" #"") ; ClassCastException java.util.regex.Pattern cannot be cast to java.lang.Number
(identical? #"" #"") ; false
(.equals #"" #"") ; false
Is there a Clojure-ish way to do that, or do I have to convert both regexps to strings then compare them?

unfortunatly there is not better way, you just have to use strings
user> (= (str #"foo") (str #"foo"))
true
user> (= (str #"foo") (str #"fooo"))
false
Even this is not perfect because it doesn't catch regular expressions that match the same strings though look different.
user> (re-seq #"[a]" "aaaa")
("a" "a" "a" "a")
user> (re-seq #"a" "aaaa")
("a" "a" "a" "a")
user> (= (str #"a") (str #"[a]"))
false
This is the same reason that you can't compare functions for equality either. I suspect that Clojure does not implament == for regexes because it would be impractical to determine if the two regexes would match the same strings (or some other idea of equality).

This is tied to the fact that pattern in clojure internally uses java.util.regex.Pattern.
If you will try to write a java program to compare two pattern objects like this, it will again return false.
The only way to do it is to do equals on regex Strings.

Iterating through a map with doseq

I'm new to Clojure and I'm doing some basic stuff from labrepl, now I want to write a function that will replace certain letters with other letters, for example: elosska → elößkä.
I wrote this:
(ns student.dialect (:require [clojure.string :as str]))
(defn germanize
[sentence]
(def german-letters {"a" "ä" "u" "ü" "o" "ö" "ss" "ß"})
(doseq [[original-letter new-letter] german-letters]
(str/replace sentence original-letter new-letter)))
but it doesn't work as I expect. Could you help me, please?

Here is my take,
(def german-letters {"a" "ä" "u" "ü" "o" "ö" "ss" "ß"})
(defn germanize [s]
(reduce (fn[sentence [match replacement]]
(str/replace sentence match replacement)) s german-letters))
(germanize "elosska")

There are 2 problems here:
doseq doesn't preserve head of list that created by its evaluation, so you won't get any results
str/replace works on separate copies of text, producing 4 different results - you can check this by replacing doseq with for and you'll get list with 4 entries.
You code could be rewritten following way:
(def german-letters {"a" "ä" "u" "ü" "o" "ö" "ss" "ß"})
(defn germanize [sentence]
(loop [text sentence
letters german-letters]
(if (empty? letters)
text
(let [[original-letter new-letter] (first letters)]
(recur (str/replace text original-letter new-letter)
(rest letters))))))
In this case, intermediate results are collected, so all replacements are applied to same string, producing correct string:
user> (germanize "elosska")
"elößkä"
P.S. it's also not recommended to use def in the function - it's better to use it for top-level forms

Alex has of course already correctly answered the question with respect to the original issue using doseq... but I found the question interesting and wanted to see what a more "functional" solution would look like. And by that I mean without using a loop.
I came up with this:
(ns student.dialect (:require [clojure.string :as str]))
(defn germanize [sentence]
(let [letters {"a" "ä" "u" "ü" "o" "ö" "ss" "ß"}
regex (re-pattern (apply str (interpose \| (keys letters))))]
(str/replace sentence regex letters)))
Which yields the same result:
student.dialect=> (germanize "elosska")
"elößkä"
The regex (re-pattern... line simply evaluates to #"ss|a|o|u", which would have been cleaner, and simpler to read, if entered as an explicit string, but I thought it best to have only one definition of the german letters.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Clojure split lines with all empty lines kept - clojure

Try this one: user=> (clojure.string/split "a\n\nb\n" #"\n" -1) https://clojuredocs.org/clojure.string/split

Related

How to correctly check if a string is equal to another string in Clojure?

replace multiple bad characters in clojure

What is idiomatic clojure to validate that a string has only alphanumerics and hyphen?

How to compare two regexps in Clojure?

Iterating through a map with doseq

Categories

Resources