Python Networkx and Pandas library not working - python-2.7
My code below is supposed to print a graph/network using Networkx, Pandas and data from a CSV file. The code is (networkx3.py) -
import csv
import matplotlib.pyplot as plt
import pandas as pd
import networkx as nx
g = nx.Graph()
csv_dict = pd.read_csv('Book1.csv', index_col=[0])
csv_1 = csv_dict.values.tolist()
ini = 0
for row in csv_1:
for i in row:
if type(row[i]) is str:
g.add_edge(ini, int(i), conn_prob=(float(row[i])))
max_wg_ngs = sorted(g[ini].items(), key=lambda e: e[1]["conn_prob"], reverse=True)[:2]
sarr = [str(a) for a in max_wg_ngs]
print "Neighbours of Node %d are:" % ini
#print(max_wg_ngs)
for item in sarr:
print ''.join(str(item))[1:-1]
ini += 1
pos = nx.spring_layout(g, scale=100.)
nx.draw_networkx_nodes(g, pos)
nx.draw_networkx_edges(g, pos)
nx.draw_networkx_labels(g, pos)
#plt.axis('off')
plt.show()
The data in the CSV file is (Book1.csv) -
,1,2,3,4,5,6,7,8,9,10
1,0,0.257905291,0.775104118,0.239086843,0.002313744,0.416936603,0.194817214,0.163350301,0.252043807,0.251272559
2,0.346100279,0,0.438892758,0.598885794,0.002263231,0.406685237,0.523850975,0.257660167,0.206302228,0.161385794
3,0.753358102,0.222349243,0,0.407830809,0.001714776,0.507573592,0.169905687,0.139611318,0.187910832,0.326950557
4,0.185342928,0.571302688,0.51784403,0,0.003231018,0.295197533,0.216184462,0.153032751,0.216331326,0.317961522
5,0,0,0,0,0,0,0,0,0,0
6,0.478164621,0.418192795,0.646810223,0.410746629,0.002414973,0,0.609176897,0.203461461,0.157576977,0.636747837
7,0.24894327,0.522914349,0.33948832,0.316240267,0.002335929,0.639377086,0,0.410011123,0.540266963,0.587764182
8,0.234017887,0.320967208,0.285193773,0.258198079,0.003146737,0.224412057,0.411725737,0,0.487081815,0.469526333
9,0.302955306,0.080506624,0.261610132,0.22856311,0.001746979,0.014994905,0.63386228,0.486096957,0,0.664434415
10,0.232675407,0.121596312,0.457715027,0.310618067,0.001872929,0.57556548,0.473562887,0.32185564,0.482351246,0
The code however doesn't work. I don't understand where I'm going wrong. The error is -
Traceback (most recent call last):
File "networkx3.py", line 13, in <module>
if type(row[i]) is str:
TypeError: list indices must be integers, not float
I do not want to modify the CSV file or its data. The index column and header are supposed to be ignored.
I have previously asked this question but I did not get satisfactory answers. Can anybody help?
Thanks a lot in advance :) (Using Ubuntu 14.04 32-bit VM. Credits to #Adonis for helping in creating the original code)
A little late in answering my own question, but with some valuable help from #Joel and #Adonis, I finally figured out where I was going wrong.
The problem was in the 2nd for loop where I tried to pass a float value as a string into the Graph which gave me an error. Other minor changes would result in an output but without any edges, just nodes.
Finally, after using an enumerate function to define a connecting node (using its index giving power), I got the required output. The only changes to be made are in the 2nd for loop and the if condition:
for row in csv_1:
for idx, i in enumerate(row):
if type(row[idx]) is float:
g.add_edge(ini, idx, conn_prob=(float(row[idx])))
Thanks to all the selfless guys at SOF for the help, couldn't have done it without you :)
Related
Python - AttributeError: 'DataFrame' object has no attribute
I have a CSV file with various columns and everything worked perfectly for the past few months until I updated the file and got new information and now the one column does not appear to be picked up by Python. I am using Python 2.7 and have made sure I have the latest version of pandas. When I downloaded the csv file from Yahoo Finance, I opened it in Excel and made changes to the format of the columns in order to make it more readable as all information was in one cell. I used the "Text to Column" feature and split up the data based on where the commas were. Then I made sure that in each column there were no white spaces in the beginning of the cell using the Trim function in excel and left-aligning the data. I tried the following and still get the same or similiar: After the df = pd.read_csv("KIO.csv") I tried to read whether I can read the first few columns by using df.head() - but still got the same error. I tried renaming the problematic column as suggested in a similiar post using: df = df.rename(columns={"Close": "Closing"}) - here I got the same error again. "print df.columns" also led to the same issue. "df[1]" - gave a long error with "KeyError: 1" at the end - I can print the entire thing if it it will assist. Adding the "skipinitialspace=True" - no difference. I thought the problem might be within the actual csv file information so I deleted all the columns and made my own information and I still got the same error. Below is a portion of my code as the total code is very long: enter code here import pandas as pd import matplotlib.pyplot as plt import matplotlib.dates as pltdate import datetime import matplotlib.animation as animation import numpy as np df = pd.read_csv("KIO.csv", skipinitialspace=True) #df.head() #Close = df.columns[0] #df= df.rename(columns={"Close": "Closing"}) df1 = pd.read_csv("USD-ZAR.csv") kio_close = pd.DataFrame(df.Close) exchange = pd.DataFrame(df1.Value) dates = df["Date"] dates1 = df1["Date"] The above variables have been used throughout the remaining code though so if this issue can be solved here the remaining code will be right. This is copy/paste of the error: Blockquote Traceback (most recent call last): File "C:/Users/User/Documents/PycharmProjects/Trading_GUI/GUI_testing.py", line 33, in kio_close = pd.DataFrame(df.Close) File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 4372, in getattr return object.getattribute(self, name) AttributeError: 'DataFrame' object has no attribute 'Close' Thank you so much in advance.
#Rip_027 This is in regards to your last comment. I used to have the same issue whenever I open a csv file by simply double clicking the file icon. You need to launch Excel first, then get external data. Link below has more details,which will serve as a guideline. Hope this helps. https://www.hesa.ac.uk/support/user-guides/import-csv
Python24: Maplotlib Animation connecting the first and the last point
I am new to matplotlib and I was playing with this library to plot data from a csv file. Without using the animation function the graph looks correct, but When I tried to use the animation, the graph connected the first and the last point. I looked stuff up, but I can't figure out how to solve this. Does anyone know how to solve this issue? Below is my code. Thanks in advance! import matplotlib.pyplot as plt import matplotlib.animation as animation import csv x = [] y = [] fig = plt.figure() ax1 = fig.add_subplot(1,1,1) def animate(i): with open("example.txt", "r") as csvfile: plots = csv.reader(csvfile, delimiter=',') for row in plots: x.append(int(row[0])) y.append(int(row[1])) ax1.clear() ax1.plot(x,y) ani = animation.FuncAnimation(fig, animate, interval=1000) plt.show()
You append all the same points over and over again to the lists to plot. So say the csv file contains numbers 1,2,3 what you are doing is reading them in, appending them to the list, plotting them, then reading them in again and appending them etc. So x contains in Step 1 : 1,2,3 Step 2 : 1,2,3,1,2,3 Step 3 : 1,2,3,1,2,3,1,2,3 Hence from step 2 on there will be a connection between 3 and 1. I don't know what the purpose of this animation is since animating all the same points is quite useless. So there is no straight forward solution, apart from not animating at all.
How can I save histogram plot in python?
I have following code that generates a histogram. How can I save the histogram automatically using the code? I tried what we do for other plot types but that did not work for histogram.a is a 'numpy.ndarray'. a = [-0.86906864 -0.72122614 -0.18074998 -0.57190212 -0.25689268 -1. 0.68713553 0.29597819 0.45022949 0.37550592 0.86906864 0.17437203 0.48704826 0.2235648 0.72122614 0.14387731 0.94194514 ] fig = pl.hist(a,normed=0) pl.title('Mean') pl.xlabel("value") pl.ylabel("Frequency") pl.savefig("abc.png")
This works for me: import matplotlib.pyplot as pl import numpy as np a = np.array([-0.86906864, -0.72122614, -0.18074998, -0.57190212, -0.25689268 ,-1. ,0.68713553 ,0.29597819, 0.45022949, 0.37550592, 0.86906864, 0.17437203, 0.48704826, 0.2235648, 0.72122614, 0.14387731, 0.94194514]) fig = pl.hist(a,normed=0) pl.title('Mean') pl.xlabel("value") pl.ylabel("Frequency") pl.savefig("abc.png") a in the OP is not a numpy array and its format also needs to be modified (it needs commas, not spaces as delimiters). This program successfully saves the histogram in the working directory. If it still does not work, supply it with a full path to the location where you want to save it like this pl.savefig("/Users/atru/abc.png") The pl.show() statement should not be placed before savefig() as it creates a new figure which makes savefig() save a blank figure instead of the desired one as explained in this post.
else statement does not return to loop
I have a code that opens a file, calculates the median value and writes that value to a separate file. Some of the files maybe empty so I wrote the following loop to check it the file is empty and if so skip it, increment the count and go back to the loop. It does what is expected for the first empty file it finds ,but not the second. The loop is below t = 15.2 while t>=11.4: if os.stat(r'C:\Users\Khary\Documents\bin%.2f.txt'%t ).st_size > 0: print("All good") F= r'C:\Users\Documents\bin%.2f.txt'%t print(t) F= np.loadtxt(F,skiprows=0) LogMass = F[:,0] LogRed = F[:,1] value = np.median(LogMass) filesave(*find_nearest(LogMass,LogRed)) t -=0.2 else: t -=0.2 print("empty file") The output is as follows All good 15.2 All good 15.0 All good 14.8 All good 14.600000000000001 All good 14.400000000000002 All good 14.200000000000003 All good 14.000000000000004 All good 13.800000000000004 All good 13.600000000000005 All good 13.400000000000006 empty file All good 13.000000000000007 Traceback (most recent call last): File "C:\Users\Documents\Codes\Calculate Bin Median.py", line 35, in <module> LogMass = F[:,0] IndexError: too many indices A second issue is that t somehow goes from one decimal place to 15 and the last place seems to incrementing whats with that? Thanks for any and all help EDIT The error IndexError: too many indices only seems to apply to files with only one line example... 12.9982324 0.004321374 If I add a second line I no longer get the error can someone explain why this is? Thanks EDIT I tried a little experiment and it seems numpy does not like extracting a column if the array only has one row. In [8]: x = np.array([1,3]) In [9]: y=x[:,0] --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-9-50e27cf81d21> in <module>() ----> 1 y=x[:,0] IndexError: too many indices In [10]: y=x[:,0].shape --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-10-e8108cf30e9a> in <module>() ----> 1 y=x[:,0].shape IndexError: too many indices In [11]:
You should be using try/except blocks. Something like: t = 15.2 while t >= 11.4: F= r'C:\Users\Documents\bin%.2f.txt'%t try: F = np.loadtxt(F,skiprows=0) LogMass = F[:,0] LogRed = F[:,1] value = np.median(LogMass) filesave(*find_nearest(LogMass,LogRed)) except IndexError: print("bad file: {}".format(F)) else: print("file worked!") finally: t -=0.2 Please refer to the official tutorial for more details about exception handling. The issue with the last digit is due to how floats work they can not represent base10 numbers exactly. This can lead to fun things like: In [13]: .3 * 3 - .9 Out[13]: -1.1102230246251565e-16
To deal with the one line file case, add the ndmin parameter to np.loadtxt (review its doc): np.loadtxt('test.npy',ndmin=2) # array([[ 1., 2.]])
With the help of a user named ajcr, found the problem was that ndim=2 should have been used in numpy.loadtxt() to insure that the array always 2 has dimensions.
Python uses indentation to define if while and for blocks. It doesn't look like your if else statement is fully indented from the while. I usually use a full 'tab' keyboard key to indent instead of 'spaces'
Python TypeError: list indices must be integers, not tuple
Using the python 2.7 shell on osx lion. The .csv file has 12 columns by 892 rows. import csv as csv import numpy as np # Open up csv file into a Python object csv_file_object = csv.reader(open('/Users/scdavis6/Documents/Kaggle/train.csv', 'rb')) header = csv_file_object.next() data=[] for row in csv_file_object: data.append(row) data = np.array(data) # Convert to float for numerical calculations number_passengers = np.size(data[0::,0].astype(np.float)) And this is the error I get: Traceback (most recent call last): File "pyshell#5>", line 1, in <module> number_passengers = np.size(data[0::,0].astype(np.float)) TypeError: list indices must be integers, not tuple What am I doing wrong.
Don't use csv to read the data into a NumPy array. Use numpy.genfromtxt; using dtype=None will cause genfromtxt to make an intelligent guess at the dtypes for you. By doing it this way you won't have to manually convert strings to floats. data[0::, 0] just gives you the first column of data. data[:, 0] would give you the same result. The error message TypeError: list indices must be integers, not tuple suggests that for some reason your data variable might be holding a list rather than a ndarray. For example, the same Exception can produced like this: In [73]: data = [1,2,3] In [74]: data[1,2] TypeError: list indices must be integers, not tuple I don't know why that is happening, but if you post a sample of your CSV we should be able to help fix that. Using np.genfromtxt, your current code could be simplified to: import numpy as np filename = '/Users/scdavis6/Documents/Kaggle/train.csv' data = np.genfromtxt(filename, delimiter=',', skiprows=1, dtype=None) number_passengers = np.size(data, axis=0)