Assume foo is a list or some other iterator. I want some thing so that I can (pseudo-code):
for i in foo
for j in foo - [i]
for k in foo - [i, j]
...
for some_var in foo - [i, j, k, ...]//only one value left in foo
do_something(some_args)
Is there some way to do this in python? Can I do this in a loop, would I have to use recursion, or would I have to make (only if no other way) a code object?
You're question has to do with combinatorics. Specifically Cartesian products.
Without recursion you need to know how many nestings of loops you are going to run. However you don't need to know this information ahead of time. As long as you can get it dynamically it is ok.
Consider this code taken from one of my repos: https://github.com/Erotemic/utool/blob/next/utool/util_dict.py
from itertools import product
import six
varied_dict = {
'logdist_weight': [0.0, 1.0],
'pipeline_root': ['vsmany'],
'sv_on': [True, False, None]
}
def all_dict_combinations(varied_dict):
tups_list = [[(key, val) for val in val_list]
for (key, val_list) in six.iteritems(varied_dict)]
dict_list = [dict(tups) for tups in product(*tups_list)]
return dict_list
dict_list = all_dict_combinations(varied_dict)
running this code will result in dict_list being
[
{'pipeline_root': 'vsmany', 'sv_on': True, 'logdist_weight': 0.0},
{'pipeline_root': 'vsmany', 'sv_on': True, 'logdist_weight': 1.0},
{'pipeline_root': 'vsmany', 'sv_on': False, 'logdist_weight': 0.0},
{'pipeline_root': 'vsmany', 'sv_on': False, 'logdist_weight': 1.0},
{'pipeline_root': 'vsmany', 'sv_on': None, 'logdist_weight': 0.0},
{'pipeline_root': 'vsmany', 'sv_on': None, 'logdist_weight': 1.0},
]
and then you could write code like
for some_vars in dict_list:
do_something(some_vars)
To relate it back to your example if you were to list each let of values foo can take in each nested level in what I call varied_dict then you can get a solution to your question. Also note that varied_dict can be built dynamically, and it doesn't really have to be a dict. If you modified my code you could easilly specify the values using a list of some other structure.
The magic in the above code comes down to the use of the itertools.product function. I suggest you take a look at that. https://docs.python.org/2/library/itertools.html#itertools.product
Related
I have 2 lists that I want to convert them into a dict with key and values. I managed to do so but there are too many steps so I would like to know if there's a simpler way of achieving this. Basically I would like to create the dict directly in the loop without having the extra steps bellow. I just started working with python and I don't quite understand all the datatypes that it provides.
The jName form can be modified if needed.
jName=["Nose", "Neck", "RShoulder", "RElbow", "RWrist", "LShoulder", "LElbow", "LWrist", "RHip",
"RKnee","RAnkle","LHip", "LKnee", "LAnkle", "REye", "LEye", "REar", "LEar"]
def get_joints(subset, candidate):
joints_per_skeleton = [[] for i in range(len(subset))]
# for each detected skeleton
for n in range(len(subset)):
# for each joint
for i in range(18):
cidx = subset[n][i]
if cidx != -1:
y = candidate[cidx.astype(int), 0]
x = candidate[cidx.astype(int), 1]
joints_per_skeleton[n].append((y, x))
else:
joints_per_skeleton[n].append(None)
return joints_per_skeleton
joints = get_joints(subset,candidate)
print joints
Here is the output of the joints list of list
[[None, (48.0, 52.0), (72.0, 50.0), None, None, (24.0, 55.0), (5.0, 105.0), None, (63.0, 159.0), (57.0, 221.0), (55.0, 281.0), (28.0, 154.0), (23.0, 219.0), (23.0, 285.0), None, (25.0, 17.0), (55.0, 18.0), (30.0, 21.0)]]
Here I defined a function to create the dictionary from the 2 lists
def create_dict(keys, values):
return dict(zip(keys, values))
my_dict = create_dict(jointsName, joints[0])
Here is the result:
{'LAnkle': (23.0, 285.0),
'LEar': (30.0, 21.0),
'LElbow': (5.0, 105.0),
'LEye': (25.0, 17.0),
'LHip': (28.0, 154.0),
'LKnee': (23.0, 219.0),
'LShoulder': (24.0, 55.0),
'LWrist': None,
'Neck': (48.0, 52.0),
'Nose': None,
'RAnkle': (55.0, 281.0),
'REar': (55.0, 18.0),
'RElbow': None,
'REye': None,
'RHip': (63.0, 159.0),
'RKnee': (57.0, 221.0),
'RShoulder': (72.0, 50.0),
'RWrist': None}
I think defaultdict could help you. I made my own example to show that you could predefine the keys and then go through a double for loop and have the values of the dict be lists of potentially different sizes. Please let me know if this answers your question:
from collections import defaultdict
import random
joint_names = ['hip','knee','wrist']
num_skeletons = 10
d = defaultdict(list)
for skeleton in range(num_skeletons):
for joint_name in joint_names:
r1 = random.randint(0,10)
r2 = random.randint(0,10)
if r1 > 4:
d[joint_name].append(r1*r2)
print d
Output:
defaultdict(<type 'list'>, {'hip': [0, 5, 30, 36, 56], 'knee': [35, 50, 10], 'wrist': [27, 5, 15, 64, 30]})
As a note I found it very difficult to read through your code since there were some variables that were defined before the snippet you posted.
[('a',), ('b',), ('a',)]
produces
{'a': (), 'b': ()})
[('a', 1.0), ('b', 2.0), ('a', 3.0)]
produces
{'a': ([1.0, 3.0],), 'b': ([2.0],)}
[('a', 1.0, 0.1), ('b', 2.0, 0.2), ('a', 1.0, 0.3)]
produces
{'a': ([1.0, 1.0], [0.1, 0.3]), 'b': ([2.0], [0.2])}
[('a', 1.0, 0.1, 7), ('b', 2.0, 0.2, 8), ('a', 1.0, 0.3, 9)]
produces
{'a': ([1.0, 1.0], [0.1, 0.3], [7, 9]), 'b': ([2.0], [0.2], [8])}
I am new to Python - this is what I came up with.
def Collate(list_of_tuples):
if len(list_of_tuples)==0 or len(list_of_tuples[0])==0:
return defaultdict(tuple)
d = defaultdict(lambda: tuple([] for i in range(len(list_of_tuples[0])-1)))
for t in list_of_tuples:
d[t[0]]
for i,v in enumerate(t):
if i>0:
d[t[0]][i-1].append(v)
return d
In case you want to know my context, the list of tuples represents measurements. The first item in each tuple is an identification of a thing being measured.
Subsequent items are different types of measurements of that thing. The things are measured in random order, each an unknown number of times.
The function collates each things measurements together for further processing.
As the application evolves, different types of measurements will be added.
When the number of types of measurements in the client code changes, I want this Collate function to not have to change.
You can use itertools.groupby to group items first using the letters, and then collect all measurements belonging to the same id using zip(*...) before adding them to the corresponding dictionary key:
from itertools import groupby, islice
import operator
def collate(lst, f=operator.itemgetter(0)):
d = {}
for k, g in groupby(sorted(lst, key=f), f):
d[k] = ()
for v in islice(zip(*g), 1, None):
d[k] += (list(v),)
return d
Tests:
lst = [('a',), ('b',), ('a',)]
print(collate(lst))
# {'a': (), 'b': ()}
lst = [('a', 1.0), ('b', 2.0), ('a', 3.0)]
print(collate(lst))
# {'a': ([1.0, 3.0],), 'b': ([2.0],)}
lst = [('a', 1.0, 0.1, 7), ('b', 2.0, 0.2, 8), ('a', 1.0, 0.3, 9)]
print(collate(lst))
# {'a': ([1.0, 1.0], [0.1, 0.3], [7, 9]), 'b': ([2.0], [0.2], [8])}
I have avoided using defaultdict since in the case of zero measurements (i.e. [('a',), ('b',), ('a',)]) you still need to explicitly set the key value; which defeats the purpose of that collection.
In case you need to handle missing measurements, replace zip with itertools.zip_longest, and pass an explicit fillvalue to replace the default None.
I created a dictionary to match the feature importance of a Decision Tree in sklearn with the corresponding name of the feature in my df. Here the code below:
importances = clf.feature_importances_
feature_names = ['age','BP','chol','maxh',
'oldpeak','slope','vessels',
'sex_0.0','sex_1.0',
'pain_1.0','pain_2.0','pain_3.0','pain_4.0',
'bs_0.0','bs_1.0',
'ecg_0.0','ecg_1.0','ecg_2.0',
'ang_0.0','ang_1.0',
'thal_3.0','thal_6.0','thal_7.0']
CLF_sorted = dict(zip(feature_names, importances))
in output I obtained this:
{'BP': 0.053673644739136502,
'age': 0.014904980747733202,
'ang_0.0': 0.0,
'ang_1.0': 0.0,
'bs_0.0': 0.0,
'bs_1.0': 0.0,
'chol': 0.11125922817930389, ...}
as I expected. I have two question for you:
how could I create a bar plot where the x-axis represents the feature_names and the y-axis the corresponding importances?
if it is possible, how could I sort the bar plot in a descending way?
try this:
import pandas as pd
df = pd.DataFrame({'feature': feature_names , 'importance': importances})
df.sort_values('importance', ascending=False).set_index('feature').plot.bar(rot=0)
demo:
d ={'BP': 0.053673644739136502,
'age': 0.014904980747733202,
'ang_0.0': 0.0,
'ang_1.0': 0.0,
'bs_0.0': 0.0,
'bs_1.0': 0.0,
'chol': 0.11125922817930389}
df = pd.DataFrame({'feature': [x for x in d.keys()], 'importance': [x for x in d.values()]})
In [63]: import matplotlib as mpl
In [64]: mpl.style.use('ggplot')
In [65]: df.sort_values('importance', ascending=False).set_index('feature').plot.bar(rot=0)
Out[65]: <matplotlib.axes._subplots.AxesSubplot at 0x8c83748>
I am using DictVectorizer to convert my features similar to example code:
from sklearn.feature_extraction import DictVectorizer
v = DictVectorizer(sparse=False)
D = [{'foo': 1, 'bar': 2}, {'foo': 3, 'baz': 1}]
X = v.fit_transform(D)
X
array([[ 2., 0., 1.],
[ 0., 1., 3.]])
My question is, if I run this code repeatedly, is order guaranteed? i.e. will 'bar' always occur in first column, 'baz' second column, and 'foo' third column
If order is not guaranteed, do you know of an option to force this? This is important, as new unseen data to be passed into a model trained on this format will obviously need the features occurring in same columns. Perhaps something could be done with the 'vocabulary_' attribute of DictVectorizer.
Cheers,
Steven
There is no problem if you use the fit and transform methods in the correct manner. First you fit the DictVectorizer to your data, and then you transform the dataset to a sparse matrix. This is done by the fit_transform() method you have called. If you have new, unseen data, you can just transform it using the transform() method. This will project the new data into the same data structure as before.
This is illustrated by the example code you have linked to:
>>> from sklearn.feature_extraction import DictVectorizer
>>> v = DictVectorizer(sparse=False)
>>> D = [{'foo': 1, 'bar': 2}, {'foo': 3, 'baz': 1}]
>>> X = v.fit_transform(D)
>>> X
array([[ 2., 0., 1.],
[ 0., 1., 3.]])
>>> v.inverse_transform(X) == [{'bar': 2.0, 'foo': 1.0}, {'baz': 1.0, 'foo': 3.0}]
True
>>> v.transform({'foo': 4, 'unseen_feature': 3})
array([[ 0., 0., 4.]])
The final transform() call takes new, unseen data, with two features. One of these is known by the DictVectorizer (because it was previously fitted to data that also had this feature), the other one is not. As the output shows, the values for the known feature foo end up in the correct column of the matrix, whereas the unknown feature is simply ignored.
I want to print the values of the group of objects that return from the database.
I have tried like the following,
Products = productBll.listProduct(params)
print Products.__dict__
it will display like the following,
{'_result_cache': [Product: Product object, Product: Product object]}
But when i am doing like this ,
for prd in Products:
print prd.__dict__
it showing all the contents in the Products objects
{'product_price': 0.0, 'right_side_min_depth': 0.0, 'short_description': u'', 'left_side_min_depth': 0.0, 'max_depth': 0.0, 'height_scale': 2.0, 'left_side_max_depth': 0.0, 'is_hinges': u'No', 'max_height': 1.04}
{'product_price': 0.0, 'right_side_min_depth': 0.0, 'short_description': u'', 'left_side_min_depth': 0.0, 'max_depth': 1000.0, 'height_scale': 1000.0, 'left_side_max_depth': 0.0, 'is_hinges': u'No', 'max_height': 1000.0}
But i want the above result without using the for loop.
Is there any way to do it by one line of code?
If all you're looking for is a one-liner, here it is:
Products = productBll.listProduct(params)
print [prd.__dict__ for prd in Products]
You can try using values(). Assuming your model is Products you can do
Product.objects.filter(your_filter_criteria).values()
this will give you list of dict per item selected.