Looping through list of dictionary in python - list

Given list of dictionary in python
my_list=[{'id':0,'name':'cube0_cluster0','member_ids': [429, 432, 435]},
{'id': 1,'name': 'cube0_cluster1','member_ids': [0, 4, 5]},
{'id':0,'name':'cube1_cluster1','member_ids': [4, 706, 800]}]
I want to print all member_ids for cube{ }_cluster1
My expected output is to print [0,4,5,706,800]
any help would be highly appreciated
I have tried it
for k in my_list:
for j in range(len(my_list)):
if k['name']=='cube{}_cluster1'.format(j):
print(k['member_ids'])
But I am getting two separate outputs as [0,4,5] and [4,706,800]

Try this one.
import re
member_ids = []
for di in my_list:
if re.match('cube\d_cluster1', di['name']):
member_ids += di['member_ids']
print(member_ids)

You can also use list comprehension.
my_list=[{'id':0,'name':'cube0_cluster0','member_ids': [429, 432, 435]},
{'id': 1,'name': 'cube0_cluster1','member_ids': [0, 4, 5]},
{'id':0,'name':'cube1_cluster1','member_ids': [4, 706, 800]}]
res = [j for i in my_list for j in i['member_ids'] if "cluster1" in i["name"]]
print (res) # return list
print (set(res)) # to return distinct data
# Result
# [0, 4, 5, 4, 706, 800]
# {0, 800, 706, 4, 5}
I hope this helps and counts!

Related

match index from pyspark dataframe in pandas

I have following pyspark dataframe (testDF=ldamodel.describeTopics().select("termIndices").toPandas())
topic| termIndices| termWeights|
+-----+---------------+--------------------+
| 0| [6, 118, 5]|[0.01205522104545...|
| 1| [0, 55, 100]|[0.00125521761966...|
and i have following word list
['one',
'peopl',
'govern',
'think',
'econom',
'rate',
'tax',
'polici',
'year',
'like',
........]
I am trying to match vocablist to termIndices to termWeights.
So far I have following:
for i in testDF.items():
for j in i[1]:
for m in j:
t=vocablist[m],m
print(t)
which results into:
('tax', 6)
('insur', 118)
('rate', 5)
('peopl', 1)
('health', 84)
('incom', 38)
('think', 3)
('one', 0)
('social', 162)
.......
But I wanted something like
('tax', 6, 0.012055221045453202)
('insur', 118, 0.001255217619666775)
('rate', 5, 0.0032220995010401187)
('peopl', 1,0.008342115226031033)
('health', 84,0.0008332053105123403)
('incom', 38, ......)
Any help will be appreciated.
I would recommend you spread those lists in the columns termIndices and termWeights downward. Once you've done that, then you can actually map indices to their term names while having the term weights aligned with each term index. The following is an illustration:
df = pd.DataFrame(data={'topic': [0, 1],
'termIndices': [[6, 118, 5],
[0, 55, 100]],
'termWeights': [[0.012055221045453202, 0.012055221045453202, 0.012055221045453202],
[0.00125521761966, 0.00125521761966, 0.00125521761966]]})
dff = df.apply(lambda s: s.apply(pd.Series).stack().reset_index(drop=True, level=1))
vocablist = ['one', 'peopl', 'govern', 'think', 'econom', 'rate', 'tax', 'polici', 'year', 'like'] * 50
dff['termNames'] = dff.termIndices.map(vocablist.__getitem__)
dff[['termNames', 'termIndices', 'termWeights']].values.tolist()
I hope this helps.

Python 2.7 current row index on 2d array iteration

When iterating on a 2d array, how can I get the current row index? For example:
x = [[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 9. 0. 3. 6.]]
Something like:
for rows in x:
print x current index (for example, when iterating on [ 5. 6. 7. 8.], return 1)
Enumerate is a built-in function of Python. It’s usefulness can not be summarized in a single line. Yet most of the newcomers and even some advanced programmers are unaware of it. It allows us to loop over something and have an automatic counter. Here is an example:
for counter, value in enumerate(some_list):
print(counter, value)
And there is more! enumerate also accepts an optional argument which makes it even more useful.
my_list = ['apple', 'banana', 'grapes', 'pear']
for c, value in enumerate(my_list, 1):
print(c, value)
.
# Output:
# 1 apple
# 2 banana
# 3 grapes
# 4 pear
The optional argument allows us to tell enumerate from where to start the index. You can also create tuples containing the index and list item using a list. Here is an example:
my_list = ['apple', 'banana', 'grapes', 'pear']
counter_list = list(enumerate(my_list, 1))
print(counter_list)
.
# Output: [(1, 'apple'), (2, 'banana'), (3, 'grapes'), (4, 'pear')]
enumerate:
In [42]: x = [[ 1, 2, 3, 4],
...: [ 5, 6, 7, 8],
...: [ 9, 0, 3, 6]]
In [43]: for index, rows in enumerate(x):
...: print('current index {}'.format(index))
...: print('current row {}'.format(rows))
...:
current index 0
current row [1, 2, 3, 4]
current index 1
current row [5, 6, 7, 8]
current index 2
current row [9, 0, 3, 6]

ValueError: Tensor A must be from the same graph as Tensor B

I'm doing text matching using tensorflow, before i call tf.nn.embedding_lookup(word_embedding_matrix, combine_result), I have to combine some words from 2 sentence(get m words from sentence S1 and also get m words from sentence S2, then combine them together as "combine_result"), but when the code gose to tf.nn.embedding_lookup(word_embedding_matrix, combine_result) it gives me the error:
ValueError: Tensor("Reshape_7:0", shape=(1, 6), dtype=int32) must be
from the same graph as Tensor("word_embedding_matrix:0", shape=(26320,
50), dtype=float32_ref).
the code is as bellow:
import tensorflow as tf
import numpy as np
import os
import time
import datetime
import data_helpers
NUM_CLASS = 2
SEQUENCE_LENGTH = 47
# Placeholders for input, output and dropout
input_x = tf.placeholder(tf.int32, [None, 2, SEQUENCE_LENGTH], name="input_x")
input_y = tf.placeholder(tf.float32, [None, NUM_CLASS], name="input_y")
dropout_keep_prob = tf.placeholder(tf.float32, name="dropout_keep_prob")
def n_grams(text, window_size):
text_left_window = []
# text_left_window = tf.convert_to_tensor(text_left_window, dtype=tf.int32)
for z in range(SEQUENCE_LENGTH-2):
text_left = tf.slice(text, [z], [window_size])
text_left_window = tf.concat(0, [text_left_window, text_left])
text_left_window = tf.reshape(text_left_window, [-1, window_size])
return text_left_window
def inference(vocab_size, embedding_size, batch_size, slide_window_size, conv_window_size):
# # Embedding layer
word_embedding_matrix = tf.Variable(tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
name="word_embedding_matrix")
# convo_unit = tf.Variable(tf.random_uniform([slide_window_size*2, ], -1.0, 1.0), name="convo_unit")
text_comp_result = []
for x in range(batch_size):
# input_x_slice_reshape = [[1 1 1...]
# [2 2 2...]]
input_x_slice = tf.slice(input_x, [x, 0, 0], [1, 2, SEQUENCE_LENGTH])
input_x_slice_reshape = tf.reshape(input_x_slice, [2, SEQUENCE_LENGTH])
# text_left_flat: [294, 6, 2, 6, 2, 57, 2, 57, 147, 57, 147, 5, 147, 5, 2,...], length = SEQUENCE_LENGTH
# text_right_flat: [17, 2, 2325, 2, 2325, 5366, 2325, 5366, 81, 5366, 81, 1238,...]
text_left = tf.slice(input_x_slice_reshape, [0, 0], [1, SEQUENCE_LENGTH])
text_left_flat = tf.reshape(text_left, [-1])
text_right = tf.slice(input_x_slice_reshape, [1, 0], [1, SEQUENCE_LENGTH])
text_right_flat = tf.reshape(text_right, [-1])
# extract both text.
# text_left_window: [[294, 6, 2], [6, 2, 57], [2, 57, 147], [57, 147, 5], [147, 5, 2],...]
# text_right_window: [[17, 2, 2325], [2, 2325, 5366], [2325, 5366, 81], [5366, 81, 1238],...]
text_left_window = n_grams(text_left_flat, slide_window_size)
text_right_window = n_grams(text_right_flat, slide_window_size)
text_left_window_sha = text_left_window.get_shape()
print 'text_left_window_sha:', text_left_window_sha
# composite the slice
text_comp_list = []
# text_comp_list = tf.convert_to_tensor(text_comp_list, dtype=tf.float32)
for l in range(SEQUENCE_LENGTH-slide_window_size+1):
text_left_slice = tf.slice(text_left_window, [l, 0], [1, slide_window_size])
text_left_slice_flat = tf.reshape(text_left_slice, [-1])
for r in range(SEQUENCE_LENGTH-slide_window_size+1):
text_right_slice = tf.slice(text_right_window, [r, 0], [1, slide_window_size])
text_right_slice_flat = tf.reshape(text_right_slice, [-1])
# convo_unit = [294, 6, 2, 17, 2, 2325]
convo_unit = tf.concat(0, [text_left_slice_flat, text_right_slice_flat])
convo_unit_reshape = tf.reshape(convo_unit, [-1, slide_window_size*2])
# convo_unit_shape_val = convo_unit_reshape.get_shape()
# print 'convo_unit_shape_val:', convo_unit_shape_val
embedded_chars = tf.nn.embedding_lookup(word_embedding_matrix, convo_unit_reshape)
embedded_chars_expanded = tf.expand_dims(embedded_chars, -1)
...
could please someone help me? Thank you very much!
Yaroslav answered in a comment above - moving to an answer:
This error happens when you create new default graph. Try to do tf.reset_default_graph() before the computation and not create any more graphs (i.e., calls to tf.Graph)

slice a dictionary on elements contained within item arrays

Say I have a dict of country -> [cities] (potentially an ordered dict):
{'UK': ['Bristol', 'Manchester' 'London', 'Glasgow'],
'France': ['Paris', 'Calais', 'Nice', 'Cannes'],
'Germany': ['Munich', 'Berlin', 'Cologne']
}
The number of keys (countries) is variable: and the number of elements cities in the array, also variable. The resultset comes from a 'search' on city name so, for example, a search on "San%" could potentially meet with 50k results (on a worldwide search)
The data is to be used to populate a select2 widget --- and I'd like to use its paging functionality...
Is there a smart way to slice this such that [3:8] would yield:
{'UK': ['Glasgow'],
'France': ['Paris', 'Calais', 'Nice', 'Cannes'],
'Germany': ['Munich']
}
(apologies for the way this question was posed earlier -- I wasn't sure that the real usage would clarify the issue...)
If I understand your problem correctly, as talked about in the comments, this should do it
from pprint import pprint
def slice_dict(d,a, b):
big_list = []
ret_dict = {}
# Make one big list of all numbers, tagging each number with the key
# of the dict they came from.
for k, v in d.iteritems():
for n in v:
big_list.append({k:n})
# Slice it
sliced = big_list[a:b]
# Put everything back in order
for k, v in d.iteritems():
for subd in sliced:
for subk, subv in subd.iteritems():
if k == subk:
if k in ret_dict:
ret_dict[k].append(subv)
else:
ret_dict[k] = [subv]
return ret_dict
d = {
'a': [1, 2, 3, 4],
'b': [5, 6, 7, 8, 9],
'c': [10, 11, 12, 13, 14]
}
x = slice_dict(d, 3, 11)
pprint(x)
$ python slice.py
{'a': [4], 'b': [5, 6], 'c': [10, 11, 12, 13, 14]}
The output is a little different from your example output, but that's because the dict was not ordered when it was passed to the function. It was a-c-b, that's why b is cut off at 6 and c is not cut off

writing a list with multiple data to a csv file in separate columns in python

import csv
from itertools import izip
if l > 0:
for i in range(0,l):
combined.append(str(questionList[i]).encode('utf-8') + str(viewList[i]).encode('utf-8'))
# viewcsv.append(str(viewList[i]).encode('utf-8'))
# quescsv.append(str(questionList[i]).encode('utf-8'))
with open('collect.csv', 'a') as csvfile:
spamwriter = csv.writer(csvfile, delimiter='\n')
spamwriter.writerow(combined)
# spamwriter.writerows(izip(quescsv, viewcsv))
return 1
else:
return 0
I need to generate a csv file and flood it with data from 2 or more lists into separate columns and not a single column. Currently I'm trying to combine two lists in one list(combined) and use this as input for writing, but I haven't got desired o/p.
I have tried many things including the fieldnames way,izip way, but in vain.
Eg:
questionList viewList
4 3 views
5 0 views
The numbers used are just for example.
Probably, you need something like this:
import csv
X = [1, 2, 3, 4, 5]
Y = [2, 3, 5, 7, 11]
Z = ['two', 'three', 'five', 'seven', 'eleven']
with open('collect.csv', 'w') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
for row in zip(X, Y, Z):
writer.writerow(row)
import csv
X = [1, 2, 3, 4, 5]
Y = [2, 3, 5, 7, 11]
Z = ['two', 'three', 'five', 'seven', 'eleven']
with open('collect.csv', 'w') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
writer.writerow(X)
writer.writerow(Y)
writer.writerow(Z)