Early in the morning playing with Erlang I got a curious result:
-module(bucle01).
-compile(export_all).
for(N) when N >=0 ->
lists:seq(1,N).
for(L,N) when L =< N ->
lists:seq(L,N);
for(L,N) when L > N ->
lists:reverse(for(N,L)).
When I run the program I see this:
> bucle01:for(1,10).
[1,2,3,4,5,6,7,8,9,10]
> bucle01:for(10,1).
[10,9,8,7,6,5,4,3,2,1]
>bucle01:for(7,10).
[7,8,9,10]
>bucle01:for(8,10).
"\b\t\n" %% What's that !?!
>bucle01:for(10,8).
"\n\t\b" %% After all it has some logic !
Any "Kool-Aid" to "Don't drink too much" please ?
Strings in Erlang are just lists of ASCII numbers. The Erlang shell tries to determine, without metadata, if your list is a list of numbers or a string by looking for printable characters.
\b (backspace), \t (tab) and \n (newline) are all somewhat common ASCII characters and therefore the shell shows you the string instead of the numbers. The internal structure of the list is exactly the same, however.
This is also covered by the Erlang FAQ:
Why do lists of numbers get printed incorrectly?
And here's a few ideas to prevent this magic: Can I disable printing lists of small integers as strings in Erlang shell?
Related
I'm quite new to Haskell and I'm trying to solve the following problem:
I have a function, that produces an infinite list of strings with different lengths. But the number of strings of a certain length is restricted.
Now I want to extract all substrings of the list with a certain length n . Unfortunately I did a lot of research and tried a lot of stuff, but nothing worked for me.
I know that filter() won't work, as it checks every part of the lists and results in an infinite loop.
This is my function that generates the infinite list:
allStrings = [ c : s | s <- "" : allStrings, c <- ['R', 'T', 'P']]
I've already tried this:
allStrings = [x | x <- [ c : s | s <- "" : allStrings,
c <- ['R', 'T', 'P']], length x == 4]
which didn't terminate.
Thanks for your help!
This
allStrings4 = takeWhile ((== 4) . length) .
dropWhile ((< 4) . length) $ allStrings
does the trick.
It works because your (first) allStrings definition cleverly generates all strings containing 'R', 'T', and 'P' letters in productive manner, in the non-decreasing length order.
Instead of trying to cram it all into one definition, separate your concerns! Build a solution to the more general problem first (this is your allStrings definition), then use it to solve the more restricted problem. This will often be much simpler, especially with the lazy evaluation of Haskell.
We just need to take care that our streams are always productive, never stuck.
The problem is that your filter makes it impossible to generate any solutions. In order to generate a string of length 4, you first will need to generate a string of length 3, since you each time prepend one character to it. In order to generate a list of length 3, it thus will need to generate strings of length 2, and so on, until the base case: an empty string.
It is not the filter itself that is the main problem, the problem is that you filter in such a way that emitting values is now impossible.
We can fix this by using a different list that will build strings, and filter that list like:
allStrings = filter ((==) 4 . length) vals
where vals = [x | x <- [ c : s | s <- "" : vals, c <- "RTP"]]
This will emit all lists of length 4, and then get stuck in an infinite loop, since filter will keep searching for more strings, and fail to find these.
We can however do better, for example by using replicateM :: Monad m => Int -> m a -> m [a] here:
Prelude Control.Monad> replicateM 4 "RTP"
["RRRR","RRRT","RRRP","RRTR","RRTT","RRTP","RRPR","RRPT","RRPP","RTRR","RTRT","RTRP","RTTR","RTTT","RTTP","RTPR","RTPT","RTPP","RPRR","RPRT","RPRP","RPTR","RPTT","RPTP","RPPR","RPPT","RPPP","TRRR","TRRT","TRRP","TRTR","TRTT","TRTP","TRPR","TRPT","TRPP","TTRR","TTRT","TTRP","TTTR","TTTT","TTTP","TTPR","TTPT","TTPP","TPRR","TPRT","TPRP","TPTR","TPTT","TPTP","TPPR","TPPT","TPPP","PRRR","PRRT","PRRP","PRTR","PRTT","PRTP","PRPR","PRPT","PRPP","PTRR","PTRT","PTRP","PTTR","PTTT","PTTP","PTPR","PTPT","PTPP","PPRR","PPRT","PPRP","PPTR","PPTT","PPTP","PPPR","PPPT","PPPP"]
Note that here the last character first changes when we generate the next string. I leave it as an exercise to obtain the reversed result.
I made a Mastodon / Twitter <--> IRC bot a while back. It's been working great, but someone complained that when people use emojis on mastodon (which seems to happen a lot in some usernames ..) it breaks his terminal.
I was wondering if there is a way to remove those from the ByteStrings before sending them to IRC (or at least provide an option to do so), googling a bit I found this : removing emojis from a string in Python
Looks like \U0001F600-\U0001F64F should be the emoji range if I understand it correctly, but I've never been big with regex. Any easy-ish way to translate that to Haskell ? I've tried reading up a bit on regex but I only get "lexical error in string/character literal at character 'U'" when I try, I assume that syntax must be a python thing.
Thanks
Unicode characters are represented by a single backslash, followed by an optional x for hexadecimal, o for octal and none for decimal number representing the character [0]:
putStrLn "\x1f600" -- 😀
Here, \x is a prefix for the hexadecimal representation of the first emoji character in Unicode.
You can now remove the emojis using RegExp or you could simply do:
emojis = concat [['\x1f600'..'\x1F64F'],
['\x1f300'..'\x1f5ff'],
['\x1f680'..'\x1f6ff'],
['\x1f1e0'..'\x1f1ff']]
someString = "hello 🙋"
removeEmojis = filter (`notElem` emojis)
putStrLn . removeEmojis $ someString -- "hello "
[0] Haskell Language 2010: Lexical Structure#Character and String Literals
Not a emoji or unicode expert, but this seems to work:
isEmoji :: Char -> Bool
isEmoji c = let uc = fromEnum c
in uc >= 0x1F600 && uc <= 0x1F64F
str = "😁wew😁"
As Daniel Wagner points out, this can be made even better:
isEmoji :: Char -> Bool
isEmoji c = c >= '\x1F600' && c <= '\x1F64F'
Demo in ghci:
λ> str
"\128513wew\128513"
λ> filter isEmoji str
"\128513\128513"
λ> filter (not . isEmoji) str
"wew"
Explanation: fromEnum function converts the character to the corresponding Int value defined by the Unicode. I just check for the unicode range of emoji in the function to determine if it's actually an emoji.
I'm writing a small function in R as follows:
tags.out <- as.character(tags.out)
tags.out.unique <- unique(tags.out)
z <- NROW(tags.out.unique)
for (i in 1:10) {
l <- length(grep(tags.out.unique[i], x = tags.out))
tags.count <- append(x = tags.count, values = l) }
Basically I'm looking to take each element of the unique character vector (tags.out.unique) and count it's occurrence in the vector prior to the unique function.
This above section of code works correctly, however, when I replace for (i in 1:10) with for (i in 1:z) or even some number larger than 10 (18000 for example) I get the following error:
Error in grep(tags.out.unique[i], x = tags.out) :
invalid regular expression 'c++', reason 'Invalid use of repetition operators
I would be extremely grateful if anyone were able to help me understand what's going on here.
Many thanks.
The "+" in "c++" (which you're passing to grep as a pattern string) has a special meaning. However, you want the "+" to be interpreted literally as the character "+", so instead of
grep(pattern="c++", x="this string contains c++")
you should do
grep(pattern="c++", x="this string contains c++", fixed=TRUE)
If you google [regex special characters] or something similar, you'll see that "+", "*" and many others have a special meaning. In your case you want them to be interpreted literally -- see ?grep.
It would appear that one of the elements of tags.out_unique is c++ which is (as the error message plainly states) an invalid regular expression.
You are currently programming inefficiently. The R-inferno is worth a read, noting especially that Growing objects is generally bad form -- it can be extremely inefficient in some cases. If you are going to have a blanket rule, then "not growing objects" is a better one than "avoid loops".
Given you are simply trying to count the number of times each value occurs there is no need for the loop or regex
counts <- table(tags.out)
# the unique values
names(counts)
should give you the results you want.
In Haskell code from Real World Haskell, Chapter 24, an example of using MapReduce to count the number of LINES in a file is implemented as follows:
import qualified Data.ByteString.Lazy.Char8 as LB
lineCount :: [LB.ByteString] -> Int64
lineCount = mapReduce rdeepseq (LB.count '\n')
rdeepseq sum
It's clear to me that this is counting the number of newline characters. If I wanted to count the number of a's, I do:
import qualified Data.ByteString.Lazy.Char8 as LB
lineCount :: [LB.ByteString] -> Int64
lineCount = mapReduce rdeepseq (LB.count 'a')
rdeepseq sum
I've tried this, and it works. How do I modify this code to count the number of characters (ie total number of characters present? Is there some sort of regular expression framework I can use?
It's clear to me that this is counting the number of newline characters.
Well, not really. A ByteString is a string of bytes. (If you want a string of characters, you should use Text from either Data.Text or Data.Text.Lazy, in the text package.)
Data.ByteString.Lazy.Char8 exports an interface that lets you pretend you're working with characters, but it assumes one character = one byte, à la ISO-8859-1 or ASCII. Unicode it ain't.
How do I modify this code to count the number of characters (ie total number of characters present?
LB.count :: Char -> ByteString -> Int64, so we're looking for a function of type ByteString -> Int64. That function is LB.length.
lineCount = mapReduce rdeepseq LB.length
rdeepseq sum
Is there some sort of regular expression framework I can use?
It's easy enough to use full-blown parsers in Haskell that we (well, I at least) use parsers instead of regular expressions. If your data is in the form of a ByteString (or a Text, for that matter) I'd recommend using attoparsec.
Let's say I want to consider input of the form
[int_1, int_2, ..., int_n]
[int_1, int_2, ..., int_m]
...
where the input is read in from a text file. My goal is to obtain the maximum size of this list. Currently I have a regular expression that recognizes this pattern:
let input = "[1,2,3] [1,2,3,4,5]"
let p = input =~ "(\\[([0-9],)*[0-9]\\])" :: [[String]]
Output:
[["[1,2,3]","[1,2,3]","2,"],["[1,2,3,4,5]","[1,2,3,4,5]","4,"]]
So what I'm after is the max of the third index + 1. However, where I'm stuck is trying to consider this index as an int. For instance I can refer to the element just fine:
(p !! 0) !! 2
> "2,"
But I can't convert this to an int, I've tried
read( (p !! 0) !! 2)
However, this does not work despite the fact that
:t (p !! 0) !! 2
> (p !! 0) !! 2 :: String
Appears to be a string. Any advice as to why I can't read this as an int would be greatly appreciated.
Thanks again.
I'm not entirely sure that your approach is one I'd recommend, but I'm struggling to wrap my head around the goal, so I'll just answer the question.
The problem is that read "2," can't just produce an Int, because there's a leftover comma. You can use reads to get around this. reads produces a list of possible parses and the strings left over, so:
Prelude> (reads "2,") :: [(Int,String)]
[(2,",")]
In this case it's unambiguous, so you get one parse from which you can then pull out the int, although regard for your future self-respect suggests being defensive and not assuming that there will always be a valid parse (the Safe module is good for that sort of thing).
Alternatively, you could modify your regex to not include the comma in the matched group.