add_edges_from three tuples networkx - python-2.7

I am trying to use networkx to create a DiGraph. I want to use add_edges_from(), and I want the edges and their data to be generated from three tuples.
I am importing the data from a CSV file. I have three columns: one for ids (first set of nodes), one for a set of names (second set of nodes), and another for capacities (no headers in the file). So, I created a dictionary for the ids and capacities.
dictionary = dict(zip(id, capacity))
then I zipped the tuples containing the edges data:
List = zip(id, name, capacity)
but when I execute the next line, it gives me an assertion error.
G.add_edges_from(List, 'weight': 1)
Can someone help me with this problem? I have been trying for a week with no luck.
P.S. I'm a newbie in programming.
EDIT:
so, i found the following solution. I am honestly not sure how it works, but it did the job!
Here is the code:
import networkx as nx
import csv
G = nx.DiGraph()
capacity_dict = dict(zip(zip(id, name),capacity))
List = zip(id, name, capacity)
G.add_edges_from(capacity_dict, weight=1)
for u,v,d in List:
G[u][v]['capacity']=d
Now when I run:
G.edges(data=True)
The result will be:
[(2.0, 'First', {'capacity': 1.0, 'weight': 1}), (3.0, 'Second', {'capacity': 2.0, 'weight': 1})]
I am using the network simplex. Now, I am trying to find a way to make the output of the flowDict more understandable, because it is only showing the ids of the flow. (Maybe i'll try to input them in a database and return the whole row of data instead of using the ids only).

A few improvements on your version. (1) NetworkX algorithms assume that weight is 1 unless you specifically set it differently. Hence there is no need to set it explicitly in your case. (2) Using the generator allows the capacity attribute to be set explicitly and other attributes to also be set once per record. (3) The use of a generator to process each record as it comes through saves you having to iterate through the whole list twice. The performance improvement is probably negligible on small datasets but still it feels more elegant. Having said that -- your method clearly works!
import networkx as nx
import csv
# simulate a csv file.
# This makes a multi-line string behave as a file.
from StringIO import StringIO
filehandle = StringIO('''a,b,30
b,c,40
d,a,20
''')
# process each row in the file
# and generate an edge from each
def edge_generator(fh):
reader = csv.reader(fh)
for row in reader:
row[-1] = float(row[-1]) # convert capacity to float
# add other attributes to the dict() below as needed...
# e.g. you might add weights here as well.
yield (row[0],
row[1],
dict(capacity=row[2]))
# create the graph
G = nx.DiGraph()
G.add_edges_from(edge_generator(filehandle))
print G.edges(data=True)
Returns this:
[('a', 'b', {'capacity': 30.0}),
('b', 'c', {'capacity': 40.0}),
('d', 'a', {'capacity': 20.0})]

Related

Viz LDA model with Bokeh and T-sne

I have tried to follow this tutorial (https://shuaiw.github.io/2016/12/22/topic-modeling-and-tsne-visualzation.html) of visualizing LDA with t-sne and bokeh.
But i run into a bit of problem.
When i tried to run the following code:
plot_lda.scatter(x=tsne_lda[:, 0], y=tsne_lda[:, 1],
color=colormap[_lda_keys][:num_example],
source=bp.ColumnDataSource({
"content": text[:num_example],
"topic_key": _lda_keys[:num_example]
}))
NB: In the tutorial the content is called news, in mine it is called text
i get this error:
Supplying a user-defined data source AND iterable values to glyph methods is
not possibe. Either:
Pass all data directly as literals:
p.circe(x=a_list, y=an_array, ...)
Or, put all data in a ColumnDataSource and pass column names:
source = ColumnDataSource(data=dict(x=a_list, y=an_array))
p.circe(x='x', y='x', source=source, ...)
To me this do not make so much sense and i have not succeded in finding any annswer to it ethier here, github or else where. Hope that some on can help. best Niels
I've been also battling with that piece of code and I've found two problems with it.
First, when you pass a source to the scatter function, like the error states, you must include all data in the dictionary, i.e., x and y axes, colors, labels, and any other information that you want to include in the tooltip.
Second, the x and y axes have a different shape than the information passed to the tooltip, so you also have to slice both arrays in the axes with the num_example variable.
The following code got me running:
# create the dictionary with all the information
plot_dict = {
'x': tsne_lda[:num_example, 0],
'y': tsne_lda[:num_example, 1],
'colors': colormap[_lda_keys][:num_example],
'content': text[:num_example],
'topic_key': _lda_keys[:num_example]
}
# create the dataframe from the dictionary
plot_df = pd.DataFrame.from_dict(plot_dict)
# declare the source
source = bp.ColumnDataSource(data=plot_df)
title = 'LDA viz'
# initialize bokeh plot
plot_lda = bp.figure(plot_width=1400, plot_height=1100,
title=title,
tools="pan,wheel_zoom,box_zoom,reset,hover,previewsave",
x_axis_type=None, y_axis_type=None, min_border=1)
# build scatter function from the columns of the dataframe
plot_lda.scatter('x', 'y', color='colors', source=source)

use python to write to a specific column is a .csv file

I have a .csv file where I need to overwrite a certain column with new values from a list.
Let's say I have the list L1 = ['La', 'Lb', 'Lc'] that I want to write in column no. 5 of the .csv file.
If I run:
L1 = ['La', 'Lb', 'Lc']
import csv
with open(r'C:\LIST.csv','wb') as f:
w = csv.writer(f)
for i in L1:
w.writerow(i)
This will write the L1 values to the first and second column.
First column will be 'L', 'L', 'L' and second column 'a', 'b', 'c'
I could not find the syntax to write to a specific column each element from the list. (this is in Python 2.7). Thank you for your help!
(for this script I must use IronPython, and just the built in Libraries that comes with IronPython)
Although you could certainly use Python's built-in csv module to read the data, modify it, and write it out, I'd recommend the excellent tablib module:
from tablib import Dataset
csv = '''Col1,Col2,Col3,Col4,Col5,Col6,Col7
a1,b1,c1,d1,e1,f1,g1
a2,b2,c2,d2,e2,f2,g2
a3,b3,c3,d3,e3,f3,g3
'''
# Read a hard-coded string just for test purposes.
# In your code, you would use open('...', 'rt').read() to read from a file.
imported_data = Dataset().load(csv, format='csv')
L1 = ['La', 'Lb', 'Lc']
for i in range(len(L1)):
# Each row is a tuple, and tuples don't support assignment.
# Convert to a list first so we can modify it.
row = list(imported_data[i])
# Put our value in the 5th column (index 4).
row[4] = L1[i]
# Store the row back into the Dataset.
imported_data[i] = row
# Export to CSV. (Of course, you could write this to a file instead.)
print imported_data.export('csv')
# Output:
# Col1,Col2,Col3,Col4,Col5,Col6,Col7
# a1,b1,c1,d1,La,f1,g1
# a2,b2,c2,d2,Lb,f2,g2
# a3,b3,c3,d3,Lc,f3,g3

Using python groupby or defaultdict effectively?

I have a csv with name, role, years of experience. I want to create a list of tuples that aggregates (name, role1, total_exp_inthisRole) for all the employess.
so far i am able to use defaultdict to do the below
import csv, urllib2
from collections import defaultdict
response = urllib2.urlopen(url)
cr = csv.reader(response)
parsed = ((row[0],row[1],int(row[2])) for row in cr)
employees =[]
for item in parsed:
employees.append(tuple(item))
employeeExp = defaultdict(int)
for x,y,z in employees: # variable unpacking
employeeExp[x] += z
employeeExp.items()
output: [('Ken', 15), ('Buckky', 5), ('Tina', 10)]
but how do i use the second column also to achieve the result i want. Shall i try to solve by groupby multiple keys or simpler way is possible? Thanks all in advance.
You can simply pass a tuple of name and role to your defaultdict, instead of only one item:
for x,y,z in employees:
employeeExp[(x, y)] += z
For your second expected output ([('Ken', ('engineer', 5),('sr. engineer', 6)), ...])
You need to aggregate the result of aforementioned snippet one more time, but this time you need to use a defaultdict with a list:
d = defaultdict(list)
for (name, rol), total_exp_inthisRole in employeeExp.items():
d[name].append(rol, total_exp_inthisRole)

Getting indices while using train test split in scikit

In order to split my data into train and test data separately, I'm using
sklearn.cross_validation.train_test_split function.
When I supply my data and labels as list of lists to this function, it returns train and test data in two separate lists.
I want to get the indices of the train and test data elements from the original data list.
Can anyone help me out with this?
Thanks in advance
You can supply the index vector as an additional argument. Using the example from sklearn:
import numpy as np
from sklearn.cross_validation import train_test_split
X, y,indices = (0.1*np.arange(10)).reshape((5, 2)),range(10,15),range(5)
X_train, X_test, y_train, y_test,indices_train,indices_test = train_test_split(X, y,indices, test_size=0.33, random_state=42)
indices_train,indices_test
#([2, 0, 3], [1, 4])
Try the below solutions (depending on whether you have imbalance):
NUM_ROWS = train.shape[0]
TEST_SIZE = 0.3
indices = np.arange(NUM_ROWS)
# usual train-val split
train_idx, val_idx = train_test_split(indices, test_size=TEST_SIZE, train_size=None)
# stratified train-val split as per Response's proportion (if imbalance)
strat_train_idx, strat_val_idx = train_test_split(indices, test_size=TEST_SIZE, stratify=y)

using weka Filter in java code

I have a problem with using weka api in java. There are 41 features(or attributes) in my training and testing dataset. I want to take only 25 attributes (eg say 1,3,5,7,8,10.....) and remove other attributes during training and testing the classifier. I have read Weka's Filter manual available at http://weka.wikispaces.com/Use+WEKA+in+your+Java+code#Filter and http://grepcode.com/file/repo1.maven.org/maven2/nz.ac.waikato.cms.weka/weka-stable/3.6.6/weka/filters/unsupervised/attribute/Remove.java but I could not understand how to use filter in my problem. Could you please help me how to write code for this situation. Your suggestions/help will be highly appreciated.
My code is like this....
import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;
Instances train = ...
Instances test = ...
Here I want to take only 25 attributes(i.e column values) out of 41.
Classifier cls = new J48();
cls.buildClassifier(train);
// evaluate classifier and print some statistics
Evaluation eval = new Evaluation(train);
eval.evaluateModel(cls, test);
.....
.....
Assuming you have this, as you said:
import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;
Instances train = ...
Instances test = ...
Then set up the array of column indices you want. I'm assuming you're doing this in a for loop or something, but I've done just put 6 indices in manually so you get the idea.
int[] indicesOfColumnsToUse = [1,3,5,7,8,10];
Then initialize and set up your removal filter (initialize it, then set the column indices, then invert your selection so that you remove the ones you don't want, then set the "input format" based on your training data)
Remove remove = new Remove();
remove.setAttributeIndices(indicesOfColumnsToUse);
remove.setInvertSelection(true);
remove.setInputFormat(train);
Then apply the removal to your training set
Instances trainingSubset = Filter.useFilter(train, remove);
And then go on as you said, except train the classifier on the subset that you just created:
Classifier cls = new J48();
cls.buildClassifier(trainingSubset);
// evaluate classifier and print some statistics
Evaluation eval = new Evaluation(train);
eval.evaluateModel(cls, test);