how iterate through lists inside of a list? - list

I want to iterate through the list and through every list inside of it and subtract the the first item from the second item and at the end I want to add the returned sum together, heres what i have:
def number(bus_stops):
for i in range(len(bus_stops)):
return sum(bus_stops[i][0] - bus_stops[i][1])
print(number([[10,0], [4,5], [3,2]]))
this looks pretty sense to me but it doesn't work any help would be appreciated (also if you can't tell I'm a beginner)

You can use this example how to iterate the list, unpack it and compute the total sum:
def number(bus_stops):
total = 0
for a, b in bus_stops:
total += b - a # <-- subtract first item from second, add to total
return total
print(number([[10, 0], [4, 5], [3, 2]]))
Prints:
-10
Or using sum():
def number(bus_stops):
return sum(b - a for a, b in bus_stops)

Related

How to find 2nd largest number in a list in python

How to find 2nd highest number in list.element in list can repeat.
when all elements in list are same it should give element not present
Create a function taking a list as argument:
def find_second(l):
# Take a set to remove duplicates and check the length
if len(set(l)) <= 1:
return "Not present"
else:
return sorted(l)[1]
Run some tests:
l1 = [1]
r1 = find_second(l1)
# Prints Not present
print(r)
l2 = [1, 3, 2]
r2 = find_second(l2)
# Prints 2
print(r)

How to filter on pandas dataframe when column data type is a list

I am having some trouble filtering a pandas dataframe on a column (let's call it column_1) whose data type is a list. Specifically, I want to return only rows such that column_1 and the intersection of another predetermined list are not empty. However, when I try to put the logic inside the arguments of the .where, function, I always get errors. Below are my attempts, with the errors returned.
Attemping to test whether or not a single element is inside the list:
table[element in table['column_1']]
returns the error ...
KeyError: False
trying to compare a list to all of the lists in the rows of the dataframe:
table[[349569] == table.column_1] returns the error Arrays were different lengths: 23041 vs 1
I'm trying to get these two intermediate steps down before I test the intersection of the two lists.
Thanks for taking the time to read over my problem!
consider the pd.Series s
s = pd.Series([[1, 2, 3], list('abcd'), [9, 8, 3], ['a', 4]])
print(s)
0 [1, 2, 3]
1 [a, b, c, d]
2 [9, 8, 3]
3 [a, 4]
dtype: object
And a testing list test
test = ['b', 3, 4]
Apply a lambda function that converts each element of s to a set and intersection with test
print(s.apply(lambda x: list(set(x).intersection(test))))
0 [3]
1 [b]
2 [3]
3 [4]
dtype: object
To use it as a mask, use bool instead of list
s.apply(lambda x: bool(set(x).intersection(test)))
0 True
1 True
2 True
3 True
dtype: bool
Hi for long term use you can wrap the whole work flow in functions and apply the functions where you need. As you did not put any example dataset. I am taking an example data set and resolving it. Considering I have text database. First I will find the #tags into a list then I will search the only #tags I want and filter the data.
# find all the tags in the message
def find_hashtags(post_msg):
combo = r'#\w+'
rx = re.compile(combo)
hash_tags = rx.findall(post_msg)
return hash_tags
# find the requered match according to a tag list and return true or false
def match_tags(tag_list, htag_list):
matched_items = bool(set(tag_list).intersection(htag_list))
return matched_items
test_data = [{'text': 'Head nipid mõnusateks sõitudeks kitsastel tänavatel. #TipStop'},
{'text': 'Homses Rooli Võimus uus #Peugeot208!\nVaata kindlasti.'},
{'text': 'Soovitame ennast tulevikuks ette valmistada, electric car sest uus #PeugeotE208 on peagi kohal! ⚡️⚡️\n#UnboringTheFuture'},
{'text': "Aeg on täiesti uueks roadtrip'i kogemuseks! \nLase ennast üllatada - #Peugeot5008!"},
{'text': 'Tõeline ikoon, mille stiil avaldab muljet läbi eco car, electric cars generatsioonide #Peugeot504!'}
]
test_df = pd.DataFrame(test_data)
# find all the hashtags
test_df["hashtags"] = test_df["text"].apply(lambda x: find_hashtags(x))
# the only hashtags we are interested
tag_search = ["#TipStop", "#Peugeot208"]
# match the tags in our list
test_df["tag_exist"] = test_df["hashtags"].apply(lambda x: match_tags(x, tag_search))
# filter the data
main_df = test_df[test_df.tag_exist]

Find top 5 word lengths in a text

I'm trying to write a program that takes two functions:
count_word_lengths which takes the argument text, a string of text, and returns a default dictionary that records the count for each word length. An example call to this function:
top5_lengths which takes the same argument text and returns a list of the top 5 word lengths.
Note: that in the event that
two lengths have the same frequency, they should be sorted in descending order. Also, if there are fewer than 5 word lengths it should return a shorter list of the sorted word lengths.
Example calls to count_word_lengths:
count_word_lengths("one one was a racehorse two two was one too"):
defaultdict(<class 'int'>, {1: 1, 3: 8, 9: 1})
Example calls to top5_lengths:
top5_lengths("one one was a racehorse two two was one too")
[3, 9, 1]
top5_lengths("feather feather feather chicken feather")
[7]
top5_lengths("the swift green fox jumped over a cool cat")
[3, 5, 4, 6, 1]
My current code is this, and seems to output all these calls, however it is failing a hidden test. What type of input am I not considering? Is my code actually behaving correctly? If not, how could I fix this?
from collections import defaultdict
length_tally = defaultdict(int)
final_list = []
def count_word_lengths(text):
words = text.split(' ')
for word in words:
length_tally[len(word)] += 1
return length_tally
def top5_word_lengths(text):
frequencies = count_word_lengths(text)
list_of_frequencies = frequencies.items()
flipped = [(t[1], t[0]) for t in list_of_frequencies]
sorted_flipped = sorted(flipped)
reversed_sorted_flipped = sorted_flipped[::-1]
for item in reversed_sorted_flipped:
final_list.append(item[1])
return final_list
One thing to note is that you do not account for an empty string. That would cause count() to return null/undefined. Also you can use iteritems() during list comprehension to get the key and value from a dict like for k,v in dict.iteritems():
I'm not a Python guy, but I can see a few things that might cause issues.
You keep referring to top5_lengths, but your code has a function called top5_word_lengths.
You use a function called count_lengths that isn't defined anywhere.
Fix these and see what happens!
Edit:
This shouldn't impact your code, but it's not great practice for your functions to update variables outside their scope. You probably want to move the variable assignments at the top to functions where they're used.
Not really an answer, but an alternative way of tracking words instead of just lengths:
from collections import defaultdict
def count_words_by_length(text):
words = [(len(word),word) for word in text.split(" ")]
d = defaultdict(list)
for k, v in words:
d[k].append(v)
return d
def top_words(dict, how_many):
return [{"word_length": length, "num_words": len(words)} for length, words in dict.items()[-how_many:]]
Use as follows:
my_dict = count_words_by_length('hello sir this is a beautiful day right')
my_top_words = num_top_words_by_length(my_dict, 5)
print(my_top_words)
print(my_dict)
Output:
[{'word_length': 9, 'num_words': 1}]
defaultdict(<type 'list'>, {1: ['a'], 2: ['is'], 3: ['sir', 'day'], 4: ['this'], 5: ['hello', 'right'], 9: ['beautiful']})

Python: simple list modify task

I need to remove the unique elements of the list, the first thought is:
def cut_uniq(data):
for x in data:
if data.count(x) == 1:
data.remove(x)
print(data)
cut_uniq([1, 2, 3, 4, 5,])
return
[2, 4]
please, tell me why?
Look at each iteration:
i x data
0 1 [1,2,3,4,5]
1 3 [2,3,4,5]
2 5 [2,4,5]
[2,4]
You can iterate over a different list than you are modifying. This returns a copy of the list
def cut_uniq(data):
return [x for x in data if data.count(x) > 1]
or more efficiently
from collection import Counter
def cut_uniq(data):
return [x for x, count in Counter(data) if count > 1]
If you really do want to modify the original list, and not return a copy
def cut_uniq(data):
i = 0
while i < len(data):
if data.count(data[i]) == 1:
del data[i]
else:
i += 1
or
from collections import Counter
def cut_uniq(data):
for x, count in Counter(data):
if count == 1:
data.remove(x)
95% of the time that you modify the same list as you're iterating over, you'll have problems.
When you use
for x in data:
it translates to
for i in [0,1,2,3,4]:
x = data[i]
So in the first loop, i = 0 data[i]=1. you remove 1 from data, the data is [2,3,4,5]
on the second loop , i = 1, because now data is [2,3,4,5], data[i] = 3. So 2 is left in the data list and never been visited.
Same as the number 4.
So when you finished your loop, the [2,4] leted in the list.

Pythonic way to convert a list of integers into a string of comma-separated ranges

I have a list of integers which I need to parse into a string of ranges.
For example:
[0, 1, 2, 3] -> "0-3"
[0, 1, 2, 4, 8] -> "0-2,4,8"
And so on.
I'm still learning more pythonic ways of handling lists, and this one is a bit difficult for me. My latest thought was to create a list of lists which keeps track of paired numbers:
[ [0, 3], [4, 4], [5, 9], [20, 20] ]
I could then iterate across this structure, printing each sub-list as either a range, or a single value.
I don't like doing this in two iterations, but I can't seem to keep track of each number within each iteration. My thought would be to do something like this:
Here's my most recent attempt. It works, but I'm not fully satisfied; I keep thinking there's a more elegant solution which completely escapes me. The string-handling iteration isn't the nicest, I know -- it's pretty early in the morning for me :)
def createRangeString(zones):
rangeIdx = 0
ranges = [[zones[0], zones[0]]]
for zone in list(zones):
if ranges[rangeIdx][1] in (zone, zone-1):
ranges[rangeIdx][1] = zone
else:
ranges.append([zone, zone])
rangeIdx += 1
rangeStr = ""
for range in ranges:
if range[0] != range[1]:
rangeStr = "%s,%d-%d" % (rangeStr, range[0], range[1])
else:
rangeStr = "%s,%d" % (rangeStr, range[0])
return rangeStr[1:]
Is there a straightforward way I can merge this into a single iteration? What else could I do to make it more Pythonic?
>>> from itertools import count, groupby
>>> L=[1, 2, 3, 4, 6, 7, 8, 9, 12, 13, 19, 20, 22, 23, 40, 44]
>>> G=(list(x) for _,x in groupby(L, lambda x,c=count(): next(c)-x))
>>> print ",".join("-".join(map(str,(g[0],g[-1])[:len(g)])) for g in G)
1-4,6-9,12-13,19-20,22-23,40,44
The idea here is to pair each element with count(). Then the difference between the value and count() is constant for consecutive values. groupby() does the rest of the work
As Jeff suggests, an alternative to count() is to use enumerate(). This adds some extra cruft that needs to be stripped out in the print statement
G=(list(x) for _,x in groupby(enumerate(L), lambda (i,x):i-x))
print ",".join("-".join(map(str,(g[0][1],g[-1][1])[:len(g)])) for g in G)
Update: for the sample list given here, the version with enumerate runs about 5% slower than the version using count() on my computer
Whether this is pythonic is up for debate. But it is very compact. The real meat is in the Rangify() function. There's still room for improvement if you want efficiency or Pythonism.
def CreateRangeString(zones):
#assuming sorted and distinct
deltas = [a-b for a, b in zip(zones[1:], zones[:-1])]
deltas.append(-1)
def Rangify((b, p), (z, d)):
if p is not None:
if d == 1: return (b, p)
b.append('%d-%d'%(p,z))
return (b, None)
else:
if d == 1: return (b, z)
b.append(str(z))
return (b, None)
return ','.join(reduce(Rangify, zip(zones, deltas), ([], None))[0])
To describe the parameters:
deltas is the distance to the next value (inspired from an answer here on SO)
Rangify() does the reduction on these parameters
b - base or accumulator
p - previous start range
z - zone number
d - delta
To concatenate strings you should use ','.join. This removes the 2nd loop.
def createRangeString(zones):
rangeIdx = 0
ranges = [[zones[0], zones[0]]]
for zone in list(zones):
if ranges[rangeIdx][1] in (zone, zone-1):
ranges[rangeIdx][1] = zone
else:
ranges.append([zone, zone])
rangeIdx += 1
return ','.join(
map(
lambda p: '%s-%s'%tuple(p) if p[0] != p[1] else str(p[0]),
ranges
)
)
Although I prefer a more generic approach:
from itertools import groupby
# auxiliary functor to allow groupby to compare by adjacent elements.
class cmp_to_groupby_key(object):
def __init__(self, f):
self.f = f
self.uninitialized = True
def __call__(self, newv):
if self.uninitialized or not self.f(self.oldv, newv):
self.curkey = newv
self.uninitialized = False
self.oldv = newv
return self.curkey
# returns the first and last element of an iterable with O(1) memory.
def first_and_last(iterable):
first = next(iterable)
last = first
for i in iterable:
last = i
return (first, last)
# convert groups into list of range strings
def create_range_string_from_groups(groups):
for _, g in groups:
first, last = first_and_last(g)
if first != last:
yield "{0}-{1}".format(first, last)
else:
yield str(first)
def create_range_string(zones):
groups = groupby(zones, cmp_to_groupby_key(lambda a,b: b-a<=1))
return ','.join(create_range_string_from_groups(groups))
assert create_range_string([0,1,2,3]) == '0-3'
assert create_range_string([0, 1, 2, 4, 8]) == '0-2,4,8'
assert create_range_string([1,2,3,4,6,7,8,9,12,13,19,20,22,22,22,23,40,44]) == '1-4,6-9,12-13,19-20,22-23,40,44'
This is more verbose, mainly because I have used generic functions that I have and that are minor variations of itertools functions and recipes:
from itertools import tee, izip_longest
def pairwise_longest(iterable):
"variation of pairwise in http://docs.python.org/library/itertools.html#recipes"
a, b = tee(iterable)
next(b, None)
return izip_longest(a, b)
def takeuntil(predicate, iterable):
"""returns all elements before and including the one for which the predicate is true
variation of http://docs.python.org/library/itertools.html#itertools.takewhile"""
for x in iterable:
yield x
if predicate(x):
break
def get_range(it):
"gets a range from a pairwise iterator"
rng = list(takeuntil(lambda (a,b): (b is None) or (b-a>1), it))
if rng:
b, e = rng[0][0], rng[-1][0]
return "%d-%d" % (b,e) if b != e else "%d" % b
def create_ranges(zones):
it = pairwise_longest(zones)
return ",".join(iter(lambda:get_range(it),None))
k=[0,1,2,4,5,7,9,12,13,14,15]
print create_ranges(k) #0-2,4-5,7,9,12-15
def createRangeString(zones):
"""Create a string with integer ranges in the format of '%d-%d'
>>> createRangeString([0, 1, 2, 4, 8])
"0-2,4,8"
>>> createRangeString([1,2,3,4,6,7,8,9,12,13,19,20,22,22,22,23,40,44])
"1-4,6-9,12-13,19-20,22-23,40,44"
"""
buffer = []
try:
st = ed = zones[0]
for i in zones[1:]:
delta = i - ed
if delta == 1: ed = i
elif not (delta == 0):
buffer.append((st, ed))
st = ed = i
else: buffer.append((st, ed))
except IndexError:
pass
return ','.join(
"%d" % st if st==ed else "%d-%d" % (st, ed)
for st, ed in buffer)
Here is my solution. You need to keep track of various pieces of information while you iterate through the list and create the result - this screams generator to me. So here goes:
def rangeStr(start, end):
'''convert two integers into a range start-end, or a single value if they are the same'''
return str(start) if start == end else "%s-%s" %(start, end)
def makeRange(seq):
'''take a sequence of ints and return a sequence
of strings with the ranges
'''
# make sure that seq is an iterator
seq = iter(seq)
start = seq.next()
current = start
for val in seq:
current += 1
if val != current:
yield rangeStr(start, current-1)
start = current = val
# make sure the last range is included in the output
yield rangeStr(start, current)
def stringifyRanges(seq):
return ','.join(makeRange(seq))
>>> l = [1,2,3, 7,8,9, 11, 20,21,22,23]
>>> l2 = [1,2,3, 7,8,9, 11, 20,21,22,23, 30]
>>> stringifyRanges(l)
'1-3,7-9,11,20-23'
>>> stringifyRanges(l2)
'1-3,7-9,11,20-23,30'
My version will work correctly if given an empty list, which I think some of the others will not.
>>> stringifyRanges( [] )
''
makeRanges will work on any iterator that returns integers and lazily returns a sequence of strings so can be used on infinite sequences.
edit: I have updated the code to handle single numbers that are not part of a range.
edit2: refactored out rangeStr to remove duplication.
how about this mess...
def rangefy(mylist):
mylist, mystr, start = mylist + [None], "", 0
for i, v in enumerate(mylist[:-1]):
if mylist[i+1] != v + 1:
mystr += ["%d,"%v,"%d-%d,"%(start,v)][start!=v]
start = mylist[i+1]
return mystr[:-1]