Getting all Substrings with length 4 out of infinite list - list

I'm quite new to Haskell and I'm trying to solve the following problem:
I have a function, that produces an infinite list of strings with different lengths. But the number of strings of a certain length is restricted.
Now I want to extract all substrings of the list with a certain length n . Unfortunately I did a lot of research and tried a lot of stuff, but nothing worked for me.
I know that filter() won't work, as it checks every part of the lists and results in an infinite loop.
This is my function that generates the infinite list:
allStrings = [ c : s | s <- "" : allStrings, c <- ['R', 'T', 'P']]
I've already tried this:
allStrings = [x | x <- [ c : s | s <- "" : allStrings,
c <- ['R', 'T', 'P']], length x == 4]
which didn't terminate.
Thanks for your help!

This
allStrings4 = takeWhile ((== 4) . length) .
dropWhile ((< 4) . length) $ allStrings
does the trick.
It works because your (first) allStrings definition cleverly generates all strings containing 'R', 'T', and 'P' letters in productive manner, in the non-decreasing length order.
Instead of trying to cram it all into one definition, separate your concerns! Build a solution to the more general problem first (this is your allStrings definition), then use it to solve the more restricted problem. This will often be much simpler, especially with the lazy evaluation of Haskell.
We just need to take care that our streams are always productive, never stuck.

The problem is that your filter makes it impossible to generate any solutions. In order to generate a string of length 4, you first will need to generate a string of length 3, since you each time prepend one character to it. In order to generate a list of length 3, it thus will need to generate strings of length 2, and so on, until the base case: an empty string.
It is not the filter itself that is the main problem, the problem is that you filter in such a way that emitting values is now impossible.
We can fix this by using a different list that will build strings, and filter that list like:
allStrings = filter ((==) 4 . length) vals
where vals = [x | x <- [ c : s | s <- "" : vals, c <- "RTP"]]
This will emit all lists of length 4, and then get stuck in an infinite loop, since filter will keep searching for more strings, and fail to find these.
We can however do better, for example by using replicateM :: Monad m => Int -> m a -> m [a] here:
Prelude Control.Monad> replicateM 4 "RTP"
["RRRR","RRRT","RRRP","RRTR","RRTT","RRTP","RRPR","RRPT","RRPP","RTRR","RTRT","RTRP","RTTR","RTTT","RTTP","RTPR","RTPT","RTPP","RPRR","RPRT","RPRP","RPTR","RPTT","RPTP","RPPR","RPPT","RPPP","TRRR","TRRT","TRRP","TRTR","TRTT","TRTP","TRPR","TRPT","TRPP","TTRR","TTRT","TTRP","TTTR","TTTT","TTTP","TTPR","TTPT","TTPP","TPRR","TPRT","TPRP","TPTR","TPTT","TPTP","TPPR","TPPT","TPPP","PRRR","PRRT","PRRP","PRTR","PRTT","PRTP","PRPR","PRPT","PRPP","PTRR","PTRT","PTRP","PTTR","PTTT","PTTP","PTPR","PTPT","PTPP","PPRR","PPRT","PPRP","PPTR","PPTT","PPTP","PPPR","PPPT","PPPP"]
Note that here the last character first changes when we generate the next string. I leave it as an exercise to obtain the reversed result.

Related

(Ocaml) Using 'match' to extract list of chars from a list of chars

I have just started to learn ocaml and I find it difficult to extract small list of chars from a bigger list of chars.
lets say I have:
let list_of_chars = ['#' ; 'a' ; 'b' ; 'c'; ... ; '!' ; '3' ; '4' ; '5' ];;
I have the following knowledge - I know that in the
list above I have '#' followed by a '!' in some location further in the list .
I want to extract the lists ['a' ;'b' ;'c' ; ...] and ['3' ; '4' ; '5'] and do something with them,
so I do the following thing:
let variable = match list_of_chars with
| '#'::l1#['!']#l2 -> (*[code to do something with l1 and l2]*)
| _ -> raise Exception ;;
This code doesn't work for me, it's throwing errors. Is there a simple way of doing this?
(specifically for using match)
As another answer points out, you can’t use pattern matching for this because pattern matching only lets you use constructors and # is not a constructor.
Here is how you might solve your problem
let split ~equal ~on list =
let rec go acc = function
| [] -> None
| x::xs -> if equal x on then Some (rev acc, xs) else go (x::acc) xs
in
go [] list
let variable = match list_of_chars with
| '#'::rest ->
match split rest ~on:'!' ~equal:(Char.equal) with
| None -> raise Exception
| Some (left,right) ->
... (* your code here *)
I’m now going to hypothesise that you are trying to do some kind of parsing or lexing. I recommend that you do not do it with a list of chars. Indeed I think there is almost never a reason to have a list of chars in ocaml: a string is better for a string (a chat list has an overhead of 23x in memory usage) and while one might use chars as a kind of mnemonic enum in C, ocaml has actual enums (aka variant types or sum types) so those should usually be used instead. I guess you might end up with a chat list if you are doing something with a trie.
If you are interested in parsing or lexing, you may want to look into:
Ocamllex and ocamlyacc
Sedlex
Angstrom or another parser generator like it
One of the regular expression libraries (eg Re, Re2, Pcre (note Re and Re2 are mostly unrelated)
Using strings and functions like lsplit2
# is an operator, not a valid pattern. Patterns need to be static and can't match a varying number of elements in the middle of a list. But since you know the position of ! it doesn't need to be dynamic. You can accomplish it just using :::
let variable = match list_of_chars with
| '#'::a::b::c::'!'::l2 -> let l1 = [a;b;c] in ...
| _ -> raise Exception ;;

Match warning and pattern-matching in SML

I was wondering what would be a good strategy to understand if pattern-matching in SML will proceed the Match warning.
Consider the following function:
fun f 7 (x,y) = x * 5.1 | f x (y,#"a") = y;
From first glance, it looks like it does not provide the Match warning. But if I'll run it, it will.
From my point of view, we handle all of the cases. which case we don't handle? even if f 7 (x,#"a") we know which case should be (first one).
My question is, how to decide that the function will output that waning.
Also, I would be glad for an answer why the following function is invalid:
fun f (x::xs) (y::ys) (z::zs) = y::xs::ys::zs;
without zs its valid. how does zs change it?
My question is, how to decide that the function will output that waning.
The compiler has an algorithm that decides this.
Either use the compiler and have it warn you, or use a similar heuristic in your head.
See Warnings for pattern matching by Luc Maranget (2007).
It covers the problem, algorithm and implementation of finding missing and duplicate patterns.
A useful heuristic: Line patterns up, e.g. like:
fun fact 0 = 1
| fact n = n * fact (n - 1)
and ask yourself: Is there any combination of values that is not addressed by exactly one case of the function? Each function case should address some specific, logical category of the input. Since your example isn't a practical example, this approach cannot be used, since there are no logical categories over the input.
And fact is a bit simple, since it's very easy to decide if it belongs to the categories 0 or n.
And yet, is the value ~1 correctly placed in one of these categories?
Here is a practical example of a function with problematic patterns:
fun hammingDistance [] [] = SOME 0
| hammingDistance (x::xs) (y::ys) =
if length xs <> length ys then NONE else
if x = y
then hammingDistance xs ys
else Option.map (fn d => d + 1) (hammingDistance xs ys)
It may seem that there are two logical cases: Either the lists are empty, or they're not:
The input lists are empty, in which case the first body is activated.
The input lists are not empty, in which case they have different or equal length.
If they have different lengths, NONE.
If they have equal lengths, compute the distance.
There's a subtle bug, of course, because the first list can be empty while the second one isn't, and the second list can be empty while the first one isn't. And if this is the case, the second body is never hit, and the distinction between different / equal lengths is never made. Because the task of categorizing is split between pattern matching and if-then-else with precedence to pattern matching.
What I do personally to catch problems like these preemptively is to think like this:
When I'm pattern matching on a list (just for example), I have to cover two constructors (1. [], 2. ::), and when I'm pattern matching on two lists, I have to cover the Cartesian product of its constructors (1. [], [], 2. [], ::, 3. ::, [], and 4. ::, ::).
I can count only two patterns/bodies, and none of them aim to cover more than one of my four cases, so I know that I'm missing some.
If there had been a case with variables, I have to ask how many of my common cases it covers, e.g.
fun hammingDistance (x::xs) (y::ys) =
if x = y
then hammingDistance xs ys
else Option.map (fn d => d + 1) (hammingDistance xs ys)
| hammingDistance [] [] = SOME 0
| hammingDistance _xs _ys = NONE
Here there's only three patterns/bodies, but the last one is a catch-all; _xs and _ys match all possible lists, empty or non-empty, except if they're matched by one of the previous patterns first. So this third case accounts for both of 2. [], :: and 3. ::, [].
So I can't simply count each pattern/body once. Some may account for more than one class of input if they contain very general patterns via pattern variables. And some may account for less of the total input space if they contain overly specific patterns via multiple constructors. E.g.
fun pairs (x::y::rest) = (x, y) :: pairs rest
| pairs [] = []
Here x::y::rest is so specific that I'm not covering the case of exactly one element.

Comparing strings as list objects

I was working on python 2.7 on a program wherein python prints the longest word of a sentence. I split the words into a list using the string functions. Is it possible to compare these list objects without using any inbuilt functions?
For example
Input : a aa aaaa aaa
Output: aaaa
I'm a beginner so would be cool if someone could post some good tutorials I can resort to alongwith the answer
So you want to find the longest string in a list without using built-in functions? Try the following:
l = ["a", "aa", "aaaa", "aaa"]
longest = None
for x in l:
if longest is None or len(x) > len(longest):
longest = x
print(longest)
Small and concise approach would be to get maximum value of elements in list according to elements length:
seq = ['a', 'aa', 'aaaa', 'aaa']
assert max(seq, key=len) == 'aaaa'

Pattern match failure on a list in Haskell

I just started working with Haskell and stumbled on a problem.
According to Haskell, I have a pattern match failure, but I fail to see how.
This is the code I try to execute:
statistics :: [Int] -> (Int, Int, Int)
statistics [gradelist] = ( amountParticipants, average, amountInsufficient)
where
amountParticipants= length [gradelist]
average= sum[gradelist] `div` amountParticipants
amountInsufficient= length [number| number<- [gradelist], number<6]
I call 'statistics' with:
statistics[4,6,4,6]
this causes a pattern match failure, while I expect to see : (4, 5, 2)
statistics[6]
gives the answer : ( 1, 6, 0 ) (which is correct).
Can someone tell me why my first call causes this pattern match? Because I'm pretty sure I give a list as an argument
If you write statistics [gradelist] = ... you are pattern matching against a singleton list containing a sole element referred to as gradelist. Hence, your function is only defined for lists of length exactly 1 (such as [6]); it is undefined for the empty list ([]) or lists with two or more elements (such as [4,6,4,6]).
A correct version of your function would read
statistics :: [Int] -> (Int, Int, Int)
statistics gradelist = (amountParticipants, average, amountInsufficient)
where
amountParticipants = length gradelist
average = sum gradelist `div` amountParticipants
amountInsufficient = length [number| number <- gradelist, number < 6]
As #thoferon remarked, you will also need to make special arrangements for the case in which gradelist is empty, in order to avoid dividing by zero when computing average.
Just replace your [gradelist]'s by gradelist as said before. Also, you might want to match against the empty list with [], in order to avoid dividing by zero in average, like :
statistics [] = (0,0,0)
The list syntax [ ] in a pattern deconstructs a list. The pattern [gradelist] matches a list holding exactly one value, and it names the value in the list gradelist. You get a pattern match failure if you try to call the function with a list holding four values.
To match a value without deconstructing it, use a variable as the pattern.

Bison: Shift Reduce Conflict

I believe I am having trouble understanding how shift reduce conflicts work. I understand that bison can look ahead by one, so I don't understand why I am having the issue.
In my language a List is defined as a set of numbers or lists between [ ].
For example [] [1] [1 2] [1 [2] 3] are all valid lists.
Here are the definitions that are causing problems
value: num
| stringValue
| list
;
list: LEFTBRACE RIGHTBRACE
| LEFTBRACE list RIGHTBRACE
| num list
| RIGHTBRACE
;
The conflict happens from the number, it doesn't know weather to shift by the list rule, or reduce by the value rule. I am confused because can't it check to see if a list is following the number?
Any incite on how I should proceed would be greatly appreciated.
I think I'd define things differently, in a way that avoids the problem to start with, something like:
value: num
| stringvalue
| list
;
items:
| items value
;
list: LEFTBRACE items RIGHTBRACE;
Edit: Separating lists of numbers from lists of strings can't be done cleanly unless you eliminate empty lists. The problem that arises is that you want to allow an empty list to be included in a list of numbers or a list of strings, but looking at the empty list itself doesn't let the parser decide which. For example:
[ [][][][][][][][] 1 ]
To figure out what kind of list this is, the parser would have to look ahead all the way to the 1 -- but an LALR(N) parser can only lookahead N symbols to make that decision. Yacc (and Byacc, Bison, etc.) only do LALR(1), so they can only look ahead one symbol. That leaves a few possibilities:
eliminate the possibility of empty lists entirely
Have the lexer treat an arbitrary number of consecutive empty lists as a single token
use a parser generator that isn't limited to LALR(1) grammars
Inside of a yacc grammar, however, I don't think there's much you can do -- your grammar simply doesn't fit yacc's limitations.
With a bottom up parser it is generally a good idea to avoid right recursion which is what you have in the starred line of the grammar below.
list: LEFTBRACE RIGHTBRACE
| LEFTBRACE list RIGHTBRACE
**| num list**
| RIGHTBRACE
Instead have you thought about something like this?
value:value num
| value string
| value list
| num
| string
| list
list: LEFTBRACE RIGHTBRACE
| LEFTBRACE value RIGHTBRACE
This way you have no right recursion and the nesting logic of the grammar is more simply expressed.