Replacing the values of `edgelist` with those of a `labels` dictionary - list

I am new to both Python and NetworkX. I have a square, regular graph G with NxN nodes (a lattice). Such nodes are labelled by means of a dict (see code below). Now I want the edgelist to return the start and endpoint of each edge not by referring to the node coordinates but to the label the node has been given.
Example:
N = 3
G=nx.grid_2d_graph(N,N)
labels = dict( ((i, j), i + (N-1-j) * N ) for i, j in G.nodes() )
#This gives nodes an attribute ID that is identical to their labels
for (i,j) in labels:
G.node[(i,j)] ['ID']= labels[(i,j)]
edgelist=G.edges() #This gives the list of all edges in the format (Start XY, End XY)
If I run it with N=3 I get:
In [14]: labels
Out[14]: {(0, 0): 6, (0, 1): 3, (0, 2): 0, (1, 0): 7, (1, 1): 4, (1, 2): 1, (2, 0): 8, (2, 1): 5, (2, 2): 2}
This scheme labels the upper left node as 0, with node (N-1)th being placed in the lower right corner. And this is what I want. Now the problem with edgelist:
In [15]: edgelist
Out [15]: [((0, 1), (0, 0)), ((0, 1), (1, 1)), ((0, 1), (0, 2)), ((1, 2), (1, 1)), ((1, 2), (0, 2)), ((1, 2), (2, 2)), ((0, 0), (1, 0)), ((2, 1), (2, 0)), ((2, 1), (1, 1)), ((2, 1), (2, 2)), ((1, 1), (1, 0)), ((2, 0), (1, 0))]
I tried to solve the problem with these lines (inspiration from here: Replace items in a list using a dictionary):
allKeys = {}
for subdict in (labels):
allKeys.update(subdict)
new_edgelist = [allKeys[edge] for edge in edgelist]
but I get this wonderful thing which enlightens my monday:
TypeError: cannot convert dictionary update sequence element #0 to a sequence
To sum up, I want to be able to replace the elements of the edgelist list with the values of the labels dictionary so that, say, the edge from ((2,0),(1,0)) (which correspond to nodes 8 and 7) is returned (8,7). Endless thanks!

I believe what you are looking for is simply nx.relabel_nodes(G,labels,False) here is the documentation
Here is the output when I printed the nodes of G before and after calling the relabel nodes function.
# Before relabel_nodes
[(0, 1), (1, 0), (0, 0), (1, 1)]
# After relabel_nodes
[0, 1, 2, 3]
After doing this, the edge labels automatically becomes what you expect.
# Edges before relabelling nodes
[((0, 1), (0, 0)), ((0, 1), (1, 1)), ((1, 0), (0, 0)), ((1, 0), (1, 1))]
# Edges after relabelling nodes
[(0, 1), (0, 2), (1, 3), (2, 3)]
Also, I have replied to this question in the chat that you created but it seems you were not notified.

Related

Django get a list of related id's linked to each parent record id in a query set?

I have a relationship client has many projects.
I want to create a dictionary of the form:
{
'client_id': ['project_id1', 'project_id2'],
'client_id2': ['project_id7', 'project_id8'],
}
what I tried was;
clients_projects = Client.objects.values_list('id', 'project__id')
which gave me:
<QuerySet [(3, 4), (3, 5), (3, 11), (3, 12), (2, 3), (2, 13), (4, 7), (4, 8), (4, 9), (1, 1), (1, 2), (1, 6), (1, 10)]>
which I can cast to a list with list(clients_projects):
[(3, 4),
(3, 5),
(3, 11),
(3, 12),
(2, 3),
(2, 13),
(4, 7),
(4, 8),
(4, 9),
(1, 1),
(1, 2),
(1, 6),
(1, 10)]
Assuming that projects is the ReverseManyToOneDescriptor attached to the Client model, you can write that:
clients_projects = { c.id : c.projects.values_list('id', flat=True) for c in Client.objects.all() }
This problem is quite similar to this one: Django GROUP BY field value.
Since Django doesn't provide a group_by (yet) you need to manually replicate the behavior:
result = {}
for client in Client.objects.all().distinct():
result[client.id] = Client.objects.filter(id=client.id)
.values_list('project__id', flat=True)
Breakdown:
Get a set of distinct clients from you Client model and iterate through them.(You can also order that set if you wish, by adding .order_by('id') for example)
Because you need only the project__id as a list, you can utilize values_list()'s flat=True argument, which returns a list of values.
Finally, result will look like this:
{
'client_1_id': [1, 10, ...],
'client_5_id': [2, 5, ...],
...
}
There is a module that claims to add GROUP BY functionality to Django: https://github.com/kako-nawao/django-group-by, but I haven't used it so I just list it here and not actually recommend it.

how to sort list in python which has two numbers per index value?

My code
b=[((1,1)),((1,2)),((2,1)),((2,2)),((1,3))]
for i in range(len(b)):
print b[i]
Obtained output:
(1, 1)
(1, 2)
(2, 1)
(2, 2)
(1, 3)
how do i sort this list by the first element or/and second element in each index value to get the output as:
(1, 1)
(1, 2)
(1, 3)
(2, 1)
(2, 2)
OR
(1, 1)
(2, 1)
(1, 2)
(2, 2)
(1, 3)
It would be nice if both columns are sorted as shown in the desired output, how ever if either of the output columns is sorted it will suffice.
Try this: b = sorted(b, key = lambda i: (i[0], i[1]))
The sorted builtin does this.
>>> sorted (b)
[(1, 1), (1, 2), (1, 3), (2, 1), (2, 2)]
This only sorts by the first element, to sort on the second
>>> sorted(b, key=lambda i: i[1])
[(1, 1), (2, 1), (1, 2), (2, 2), (1, 3)]
Also notice that Python doesn't allow this nested tuple; the paren inside a paren is reduced to just one.
>>> b=[((1,1)),((1,2)),((2,1)),((2,2)),((1,3))]
>>> b
[(1, 1), (1, 2), (2, 1), (2, 2), (1, 3)]

convert a list of x and y coordinates into multistring

I have a set of x and y coordinates as follows:
x = (1,1,2,2,3,4)
y= (0,1,2,3,4,5)
What is the best way of going about transforming this list into a multiline string format, e.g:
x_y = [((1,0)(1,1)),((1,1)(2,2)),((2,2)(2,3)),((2,3)(3,4)),((3,4)(4,5))]
You can pair up the elements of x and y with zip():
>>> x = (1,1,2,2,3,4)
>>> y = (0,1,2,3,4,5)
>>> xy = zip(x, y)
>>> xy
[(1, 0), (1, 1), (2, 2), (2, 3), (3, 4), (4, 5)]
Then you can rearrange this into the kind of list in your example with a list comprehension:
>>> x_y = [(xy[i], xy[i+1]) for i in xrange(len(xy)-1)]
>>> x_y
[((1, 0), (1, 1)), ((1, 1), (2, 2)), ((2, 2), (2, 3)), ((2, 3), (3, 4)), ((3, 4), (4, 5))]
If you don't care about efficiency, the second part could also be written as:
>>> x_y = zip(xy, xy[1:])

Word Labels for Document Matrix in Gensim

My ultimate goal is to produce a *.csv file containing labeled binary term vectors for each document. In essence, a term document matrix.
Using gensim, I can produce a file with an unlabeled term matrix.
I do this by essentially copying and pasting code from here: http://radimrehurek.com/gensim/tut1.html
Given a list of documents called "texts".
corpus = [dictionary.doc2bow(text) for text in texts]
print(corpus)
[(0, 1), (1, 1), (2, 1)]
[(0, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1)]
[(2, 1), (5, 1), (7, 1), (8, 1)]
[(1, 1), (5, 2), (8, 1)]
[(3, 1), (6, 1), (7, 1)]
[(9, 1)]
[(9, 1), (10, 1)]
[(9, 1), (10, 1), (11, 1)]
[(4, 1), (10, 1), (11, 1)]
To convert the above vectors into a numpy matrix, I use:
scipy_csc_matrix = gensim.matutils.corpus2csc(corpus)
I then convert the sparse numpy matrix to a full array:
full_matrix = csc_matrix(scipy_csc_matrix).toarray()
Finally, I output this to a file:
with open('file.csv','wb') as f:
writer = csv.writer(f)
writer.writerows(full_matrix)
This produces a matrix of binomial vectors, but I do not know which vector represents which word. Is there an accurate way of matching words to vectors?
I've tried parsing the dictionary to creative a list of words which I would glue to the above full_matrix.
#Retrive dictionary
tokenIDs = dictionary.token2id
#Retrieve keys from dictionary and concotanate those to full_matrix
for key, value in tokenIDs.iteritems():
temp1 = unicodedata.normalize('NFKD', key).encode('ascii','ignore')
temp = [temp1]
dictlist.append(temp)
Keys = np.asarray(dictlist)
#Combine Keys and Matrix
labeled_full_matrix = np.concatenate((Keys, full_matrix), axis=1)
However, this does not work. The word ids (Keys) are not matched to the appropriate vectors.
I am under the assumption a much simpler and more elegant approach is possible. But after some time, I haven't been able to find it. Maybe someone here can help, or point me to something fundamental I've missed.
Is this what you want?
%time lda1 = models.LdaModel(corpus1, num_topics=20, id2word=dictionary1, update_every=5, chunksize=10000, passes=100)
import pandas
mixture = [dict(lda1[x]) for x in corpus1]
pandas.DataFrame(mixture).to_csv("output.csv")

What's the most Pythonic way to identify consecutive duplicates in a list?

I've got a list of integers and I want to be able to identify contiguous blocks of duplicates: that is, I want to produce an order-preserving list of duples where each duples contains (int_in_question, number of occurrences).
For example, if I have a list like:
[0, 0, 0, 3, 3, 2, 5, 2, 6, 6]
I want the result to be:
[(0, 3), (3, 2), (2, 1), (5, 1), (2, 1), (6, 2)]
I have a fairly simple way of doing this with a for-loop, a temp, and a counter:
result_list = []
current = source_list[0]
count = 0
for value in source_list:
if value == current:
count += 1
else:
result_list.append((current, count))
current = value
count = 1
result_list.append((current, count))
But I really like python's functional programming idioms, and I'd like to be able to do this with a simple generator expression. However I find it difficult to keep sub-counts when working with generators. I have a feeling a two-step process might get me there, but for now I'm stumped.
Is there a particularly elegant/pythonic way to do this, especially with generators?
>>> from itertools import groupby
>>> L = [0, 0, 0, 3, 3, 2, 5, 2, 6, 6]
>>> grouped_L = [(k, sum(1 for i in g)) for k,g in groupby(L)]
>>> # Or (k, len(list(g))), but that creates an intermediate list
>>> grouped_L
[(0, 3), (3, 2), (2, 1), (5, 1), (2, 1), (6, 2)]
Batteries included, as they say.
Suggestion for using sum and generator expression from JBernardo; see comment.