I would like to create a list with a variable part :
mylist = ['a1','a2','a3','a4','a5']
I am trying :
i = range(1,5)
ii = [str(x) for x in i]
which works, and then I would like to do :
mylist = list('a' + x for x in ii)
but that doesn't work
Ranges are half-open in Python so you must use xrange(1,6), and although concatenation is fine (between two strings) you could also use str.format. You can iterate over the numbers and append them to a in one go in a list comprehension with xrange ( see xrange vs range) such that you don't need to create two temporary lists in the process as you did:
>>> [str.format('a{0}', x) for x in xrange(1,6)]
['a1', 'a2', 'a3', 'a4', 'a5']
>>> ['a' + str(x) for x in xrange(1,6)]
['a1', 'a2', 'a3', 'a4', 'a5']
Related
I have list of sub-strings , and I am checking if any of the substring is found in another string. any does return boolean.
>>> list=['oh' , 'mn' , 'nz' , 'ne']
>>> name='hstntxne'
>>> any(x in name for x in list)
True
>>> name='hstnzne'
>>> any(x in name for x in list)
True
I want to return index . for example first time it should be 3 and second time it should be 2 and 3.
Firstly, do not call your list list. list is a python data structure and you do not want to be overriding that name unless you have a specific reason for doing so.
You can easily achieve this with a list comprehension in one line.
substrings = ['oh' , 'mn' , 'nz' , 'ne']
name1='hstntxne'
name2='hstnzne'
[substrings.index(x) for x in substrings if x in name1]
This returns 3
[substrings.index(x) for x in substrings if x in name2]
This returns [2, 3]
In order to make this work with any list of substrings, and names put it in a function:
def getIndex(subs, name):
return [subs.index(x) for x in subs if x in name]
getIndex(substrings, name2) #example call
you can use inbuilt enumerate() function.
def get_index(name, lis=['oh' , 'mn' , 'nz' , 'ne']):
indx = []
for index, element in enumerate(lis):
if element in name:
indx.append(index)
return indx
Now get_index(name='hstnzne') will give [2, 3]
and get_index(name='hstntxne') will give [3]
import re
# Try and use regex to see if subpattern exists
l = ['oh', 'mn', 'nz', 'ne']
name='hstnzne'
match_indx = []
for i, sub_str in enumerate(l):
result = re.split(sub_str, name)
if (len(result)>1):
# We could split our string due to match, so add index of substring
match_indx.append(i)
print(match_indx)
If I have a nested list like:
l = [['AB','BCD','TGH'], ['UTY','AB','WEQ'],['XZY','LIY']]
In this example, 'AB' is common to the first two nested lists. How can I remove 'AB' in both lists while keeping the other elements as is? In general how can I remove a element from every nested list that occurs in two or more nested lists so that each nested list is unique?
l = [['BCD','TGH'],['UTY','WEQ'],['XZY','LIY']]
Is it possible to do this with a for loop?
Thanks
from collections import Counter
from itertools import chain
counts = Counter(chain(*ls)) # find counts
result = [[e for e in l if counts[e] == 1] for l in ls] # take uniqs
One option is to do something like this:
from collections import Counter
counts = Counter([b for a in l for b in a])
for a in l:
for b in a:
if counts[b] > 1:
a.remove(b)
Edit: If you want to avoid the (awfully useful standard library) collections module (cf. the comment), you could replace counts above by the following custom counter:
counts = {}
for a in l:
for b in a:
if b in counts:
counts[b] += 1
else:
counts[b] = 1
A somewhat short solution without imports would be to create a reduced version of the original list first, then iterate through the original list and remove elements with counts greater than 1:
lst = lst = [['AB','BCD','TGH'], ['UTY','AB','WEQ'],['XZY','LIY']]
reduced_lst = [y for x in lst for y in x]
output_lst = []
for chunk in lst:
chunk_copy = chunk[:]
for elm in chunk:
if reduced_lst.count(elm)>1:
chunk_copy.remove(elm)
output_lst.append(chunk_copy)
print(output_lst)
Should print:
[['BCD', 'TGH'], ['UTY', 'WEQ'], ['XZY', 'LIY']]
I hope this proves useful.
I have a set of lists:
A = [[A1,A2,A3],[A4,A5,A6]...,[A(n-2),A(n-1),A(n)]] #A has length n
B = [[B1,B2,B3],[B4,B5,B6]...,[B(n-2),B(n-1),B(n)]] #B has length n
C = [[C1,C2,C3],[C4,C5,C6]...,[C(n-2),C(n-1),C(n)]] #C has length n
and I want to sort it into the following format:
f = [(A1,A2,A3,B1,B2,B3,C1,C2,C3),(A4,A5,A6,B4,B5,B6,C4,C5,C6),...,(A(n-2),A(n-1),A(n),B(n-2),B(n-1),B(n),C(n-2),C(n-1),C(n))]
I'm pretty new to python and I cant think of a way to do this.
Any input will be greatly appreciated.
Ive started by using:
for item in range(len(A)):
f[item][0] = A[item][0]
f[item][1] = A[item][1]
f[item][2] = A[item][2]
for item in range(len(B)):
f[item][3] = B[item][0]
f[item][4] = B[item][1]
f[item][5] = B[item][2]
for item in range(len(C)):
f[item][6] = C[item][0]
f[item][7] = C[item][1]
f[item][8] = C[item][2]
But this just sets all items in the list f to be equal to the last item in f for some reason.
interleave sublists using zip, and flatten the resulting sublists with itertools.chain in a list comprehension with this nice one-liner:
import itertools
A = [["A1","A2","A3"],["A4","A5","A6"]] #A has length n
B = [["B1","B2","B3"],["B4","B5","B6"]] #B has length n
C = [["C1","C2","C3"],["C4","C5","C6"]] #C has length n
print([tuple(itertools.chain(*l)) for l in zip(A,B,C)])
result:
[('A1', 'A2', 'A3', 'B1', 'B2', 'B3', 'C1', 'C2', 'C3'), ('A4', 'A5', 'A6', 'B4', 'B5', 'B6', 'C4', 'C5', 'C6')]
General case if you have a variable amount of lists, stored in a list of lists:
list_of_lists = [A,B,C]
print([tuple(itertools.chain(*l)) for l in zip(*list_of_lists)])
(use * operator to expand list items to arguments for zip)
note: works well if sublists have different lengths, as long as there are as many sublists in each list (else zip will drop the last one(s)):
A = [["A1","A2","A3"],["A4","A5","A6","A7"],["I will be discarded"]] #A has length n+1, last element will be lost
B = [["B1","B2","B3","B3bis"],["B4","B5","B6"]] #B has length n
C = [["C0","C1","C2","C3"],["C4","C5","C6"]] #C has length n
yields:
[('A1', 'A2', 'A3', 'B1', 'B2', 'B3', 'B3bis', 'C0', 'C1', 'C2', 'C3'), ('A4', 'A5', 'A6', 'A7', 'B4', 'B5', 'B6', 'C4', 'C5', 'C6')]
I have this code in PySpark to .
wordsList = ['cat', 'elephant', 'rat', 'rat', 'cat']
wordsRDD = sc.parallelize(wordsList, 4)
wordCounts = wordPairs.reduceByKey(lambda x,y:x+y)
print wordCounts.collect()
#PRINTS--> [('rat', 2), ('elephant', 1), ('cat', 2)]
from operator import add
totalCount = (wordCounts
.map(<< FILL IN >>)
.reduce(<< FILL IN >>))
#SHOULD PRINT 5
#(wordCounts.values().sum()) // does the trick but I want to this with map() and reduce()
I need to use a reduce() action to sum the counts in wordCounts and then divide by the number of unique words.
* But first I need to map() the pair RDD wordCounts, which consists of (key, value) pairs, to an RDD of values.
This is where I am stuck. I tried something like this below, but none of them work:
.map(lambda x:x.values())
.reduce(lambda x:sum(x)))
AND,
.map(lambda d:d[k] for k in d)
.reduce(lambda x:sum(x)))
Any help in this would be highly appreciated!
Finally I got the answer, its like this -->
wordCounts
.map(lambda x:x[1])
.reduce(lambda x,y:x + y)
Yes, your lambda function in .map takes in a tuple x as an argument and returns the 2nd element via x[1](the 2nd index in the tuple). You could also take in the tuple as an argument and return the 2nd element as follows:
.map(lambda (x,y) : y)
Mr. Tompsett, I got this to work also:
from operator import add
x = (w
.map(lambda x: x[1])
.reduce(add))
Alternatively to map-reduce you can also use aggregate which should be even faster:
In [7]: x = sc.parallelize([('rat', 2), ('elephant', 1), ('cat', 2)])
In [8]: x.aggregate(0, lambda acc, value: acc + value[1], lambda acc1, acc2: acc1 + acc2)
Out[8]: 5
I have a list of strings stringlist = ["elementOne" , "elementTwo" , "elementThree"] and I would like to search for elements that contain the "Two" string and delete that from the list so my list will become stringlist = ["elementOne" , "elementThree"]
I managed to print them but don't really know how to delete completely from the list using del because i don't know the index or by using stringlist.remove("elementTwo") because I don't know the exact string of the element containing "Two"
My code so far:
for x in stringlist:
if "Two" in x:
print(x)
Normally when we perform list comprehension, we build a new list and assign it the same name as the old list. Though this will get the desired result, but this will not remove the old list in place.
To make sure the reference remains the same, you must use this:
>>> stringlist[:] = [x for x in stringlist if "Two" not in x]
>>> stringlist
['elementOne', 'elementThree']
Advantages:
Since it is assigning to a list slice, it will replace the contents with the same Python list object, so the reference remains the same, thereby preventing some bugs if it is being referenced elsewhere.
If you do this below, you will lose the reference to the original list.
>>> stringlist = [x for x in stringlist if "Two" not in x]
>>> stringlist
['elementOne', 'elementThree']
So to preserve the reference, you build the list object and assign it the list slice.
To understand the subtle difference:
Let us take a list a1 containing some elements and assign list a2 equal to a1.
>>> a1 = [1,2,3,4]
>>> a2 = a1
Approach-1:
>>> a1 = [x for x in a1 if x<2]
>>> a1
[1]
>>> a2
[1,2,3,4]
Approach-2:
>>> a1[:] = [x for x in a1 if x<2]
>>> a1
[1]
>>> a2
[1]
Approach-2 actually replaces the contents of the original a1 list whereas Approach-1 does not.
You can use enumerate to get the index when you iterate over your list (but Note that this is not a pythonic and safe way to modify your list while iterating over it):
>>> for i,x in enumerate(stringlist):
... if "Two" in x:
... print(x)
... del stringlist[i]
...
elementTwo
>>> stringlist
['elementOne', 'elementThree']
But as a more elegant and pythonic way you can use a list comprehension to preserve the elements that doesn't contains Two :
>>> stringlist = [i for i in stringlist if not "Two" in i]
>>> stringlist
['elementOne', 'elementThree']
Doing this will help you
for i,x in enumerate(stringlist):
if "Two" in x:
del stringlist[i]
or
newList = []
for x in stringlist:
if "Two" in x:
continue
else
newList.append(x)
Using regex,
import re
txt = ["SpainTwo", "StringOne"]
for i in txt:
x = re.search(r"Two", i)
if x:
temp_list = temp_list + [x.string] if "temp_list" in locals() else [x.string]
print(temp_list)
gives
['SpainTwo']
print(list(filter(lambda x: "Two" not in x, ["elementOne" , "elementTwo" , "elementThree", "elementTwo"])))
Using lambda, if you are only looking to print.
if you want to check for multiple string and delete if detected from list of string use following method
List_of_string = [ "easyapplyone", "appliedtwotime", "approachednone", "seenthreetime", "oneseen", "twoapproached"]
q = ["one","three"]
List_of_string[:] = [x for x in List_of_string if any(xs not in x for xs in q)]
print(List_of_string)
output:[ "approachednone", "seenthreetime"]
Well this was pretty simple - sorry for all the trouble
for x in stringlist:
if "Two" in x:
stringlist.remove(x)