I'm quite new to Haskell and I'm trying to solve the following problem:
I have a function, that produces an infinite list of strings with different lengths. But the number of strings of a certain length is restricted.
Now I want to extract all substrings of the list with a certain length n . Unfortunately I did a lot of research and tried a lot of stuff, but nothing worked for me.
I know that filter() won't work, as it checks every part of the lists and results in an infinite loop.
This is my function that generates the infinite list:
allStrings = [ c : s | s <- "" : allStrings, c <- ['R', 'T', 'P']]
I've already tried this:
allStrings = [x | x <- [ c : s | s <- "" : allStrings,
c <- ['R', 'T', 'P']], length x == 4]
which didn't terminate.
Thanks for your help!
This
allStrings4 = takeWhile ((== 4) . length) .
dropWhile ((< 4) . length) $ allStrings
does the trick.
It works because your (first) allStrings definition cleverly generates all strings containing 'R', 'T', and 'P' letters in productive manner, in the non-decreasing length order.
Instead of trying to cram it all into one definition, separate your concerns! Build a solution to the more general problem first (this is your allStrings definition), then use it to solve the more restricted problem. This will often be much simpler, especially with the lazy evaluation of Haskell.
We just need to take care that our streams are always productive, never stuck.
The problem is that your filter makes it impossible to generate any solutions. In order to generate a string of length 4, you first will need to generate a string of length 3, since you each time prepend one character to it. In order to generate a list of length 3, it thus will need to generate strings of length 2, and so on, until the base case: an empty string.
It is not the filter itself that is the main problem, the problem is that you filter in such a way that emitting values is now impossible.
We can fix this by using a different list that will build strings, and filter that list like:
allStrings = filter ((==) 4 . length) vals
where vals = [x | x <- [ c : s | s <- "" : vals, c <- "RTP"]]
This will emit all lists of length 4, and then get stuck in an infinite loop, since filter will keep searching for more strings, and fail to find these.
We can however do better, for example by using replicateM :: Monad m => Int -> m a -> m [a] here:
Prelude Control.Monad> replicateM 4 "RTP"
["RRRR","RRRT","RRRP","RRTR","RRTT","RRTP","RRPR","RRPT","RRPP","RTRR","RTRT","RTRP","RTTR","RTTT","RTTP","RTPR","RTPT","RTPP","RPRR","RPRT","RPRP","RPTR","RPTT","RPTP","RPPR","RPPT","RPPP","TRRR","TRRT","TRRP","TRTR","TRTT","TRTP","TRPR","TRPT","TRPP","TTRR","TTRT","TTRP","TTTR","TTTT","TTTP","TTPR","TTPT","TTPP","TPRR","TPRT","TPRP","TPTR","TPTT","TPTP","TPPR","TPPT","TPPP","PRRR","PRRT","PRRP","PRTR","PRTT","PRTP","PRPR","PRPT","PRPP","PTRR","PTRT","PTRP","PTTR","PTTT","PTTP","PTPR","PTPT","PTPP","PPRR","PPRT","PPRP","PPTR","PPTT","PPTP","PPPR","PPPT","PPPP"]
Note that here the last character first changes when we generate the next string. I leave it as an exercise to obtain the reversed result.
Assume that there is a string like s = 'add/10/20/30/4/3/9/' or s = 'add/10/20/30/', which starts with 'add', and follows many numbers(not sure how many, only know 3 times repeat at least).
I wanted to got them in: ['10', '20', ...]
I tried to use re: r = re.compile(r"add/(?:(\d+)/){3,}")
However, only the last number matched and returned.
>>> r.findall(s)
['9']
So what's the problem and how to fix that? Thanks in advance.
Is regex a must? string split method should be faster here if you have such simple patterns:
s = "add/10/20/30/4/3/9/"
nums = [num for num in s.split('/')[1:] if num]
regex pattern would be smply:
re.findall('\d+', s)
This would return all the numerical sequence in string s.
re.findall(r"[0-9]+", s)
How to write a function counting number of word (in a list) that does not contain a specific letter? In that case the letter would be E. For example:
word_list=['eaa','eaa','eaa','aaa','aaa','aaa']
letter='e'
I want to get the result
3
And put it in a function def
Well, you can get the approach and work your way through it as,
1) split the array where , occurs.
2) Now when you have those sub-arrays you can iterate and check if e occurs,update a counter variable
3)When the loop terminates print the counter
Time Complexity would be O(n^2)
An O(n) solution would be:
len([x for x in word_list if letter in set(x)])
I would like to go with filter :
word_list=['eaa','eaa','eaa','aaa','aaa','aaa']
print(list(filter(lambda x:'e' not in x,word_list)))
output:
['aaa', 'aaa', 'aaa']
if you want to count then just use:
print(len(list(filter(lambda x:'e' not in x,word_list))))
output:
3
I'm using Ruby 2.1. I have this logic that looks for consecutive pairs of strings in a bigger string
results = line.scan(/\b((\S+?)\b.*?\b(\S+?))\b/)
My question is, how do I iterate over the list of results and print out whether there are three or more characters between the two strings? For instance if my string were
"abc def"
The above would produce
[["abc def", "abc", "def"]]
and I'd like to know whether there are three or more characters between "abc" and "def."
Use a quantifier for the spaces inbetween: \b((\S+?)\b\s{3,}\b(\S+?))\b
Also, the inner boundries are not really needed:
\b((\S+?)\s{3,}(\S+?))\b
A straightforward way to check this is by running a separate regex:
results.select!{|x|p x[/\S+?\b(.*?)\b\S+?/,1].size}
will print the size for every of the bunch.
Another way is to take the size of the captured groups and subtract them:
results = []
line.scan(/\b((\S+?)\b.*?\b(\S+?))\b/) do |s, group1, group2|
results << $~ if s.size - group1.size - group2.size >= 3
end
Let's say I have this list:
sentences = ['the cat slept', 'the dog jumped', 'the bird flew']
I want to filter out any sentences that contain terms from the following list:
terms = ['clock', 'dog']
I should get:
['the cat slept', 'the bird flew']
I tried this solution, but it doesn't work
empty = []
if any(x not in terms for x in sentences):
empty.append(x)
What's the best way to tackle this?
I'd go with a solution like this for readability rather than reducing to a one liner:
for sentence in sentences:
if all(term not in sentence for term in terms):
empty.append(sentence)
Simple brute-force O(m*n) approach using list comprehension:
For each sentence - check if any of not allowed terms are found in this sentence and allow sentence if there was no match.
[s for s in sentences if not any(t in s for t in terms)]
# ['the cat slept', 'the bird flew']
Obviously, you can also invert condition and to something like:
[s for s in sentences if all(t not in s for t in terms)]
Similar to the above two answers but using filter, perhaps being closer to the problem specification:
filter(lambda x: all([el not in terms for el in x.split(' ')]), sentences)
Binary Seach is more optimized for too long sentences and terms.
from bisect import bisect
def binary_search(a,x,lo=0,hi=-1):
i = bisect(a,x,lo,hi)
if i == 0:
return -1
elif a[i-1] == x:
return i-1
else:
return -1
sentences = ['the cat slept', 'the dog jumped', 'the bird flew', 'the a']
terms = ['clock', 'dog']
sentences_with_sorted = [(sentence, sorted(sentence.split()))
for sentence in sentences] # sort them for binary search
valid_sentences = []
for sentence in sentences_with_sorted:
list_of_word = sentence[1] # get sorted word list
if all([1 if binary_search(list_of_word, word)<0 else 0
for word in terms]): # find no word found
valid_sentences.append(sentence[0]) # append them
print valid_sentences