Python - AttributeError: 'DataFrame' object has no attribute - python-2.7

I have a CSV file with various columns and everything worked perfectly for the past few months until I updated the file and got new information and now the one column does not appear to be picked up by Python. I am using Python 2.7 and have made sure I have the latest version of pandas.
When I downloaded the csv file from Yahoo Finance, I opened it in Excel and made changes to the format of the columns in order to make it more readable as all information was in one cell. I used the "Text to Column" feature and split up the data based on where the commas were.
Then I made sure that in each column there were no white spaces in the beginning of the cell using the Trim function in excel and left-aligning the data.
I tried the following and still get the same or similiar:
After the df = pd.read_csv("KIO.csv") I tried to read whether I can read the first few columns by using df.head() - but still got the same error.
I tried renaming the problematic column as suggested in a similiar post using:
df = df.rename(columns={"Close": "Closing"}) - here I got the same error again. "print df.columns" also led to the same issue.
"df[1]" - gave a long error with "KeyError: 1" at the end - I can print the entire thing if it it will assist.
Adding the "skipinitialspace=True" - no difference.
I thought the problem might be within the actual csv file information so I deleted all the columns and made my own information and I still got the same error.
Below is a portion of my code as the total code is very long:
enter code here
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as pltdate
import datetime
import matplotlib.animation as animation
import numpy as np
df = pd.read_csv("KIO.csv", skipinitialspace=True)
#df.head()
#Close = df.columns[0]
#df= df.rename(columns={"Close": "Closing"})
df1 = pd.read_csv("USD-ZAR.csv")
kio_close = pd.DataFrame(df.Close)
exchange = pd.DataFrame(df1.Value)
dates = df["Date"]
dates1 = df1["Date"]
The above variables have been used throughout the remaining code though so if this issue can be solved here the remaining code will be right.
This is copy/paste of the error:
Blockquote
Traceback (most recent call last):
File "C:/Users/User/Documents/PycharmProjects/Trading_GUI/GUI_testing.py", line 33, in
kio_close = pd.DataFrame(df.Close)
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 4372, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'Close'
Thank you so much in advance.

#Rip_027 This is in regards to your last comment. I used to have the same issue whenever I open a csv file by simply double clicking the file icon. You need to launch Excel first, then get external data. Link below has more details,which will serve as a guideline. Hope this helps.
https://www.hesa.ac.uk/support/user-guides/import-csv

Related

How to create a bag of words from csv file in python?

I am new to python. I have a csv file which has cleaned tweets. I want to create a bag of words of these tweets.
I have the following code but its not working correctly.
import pandas as pd
from sklearn import svm
from sklearn.feature_extraction.text import CountVectorizer
data = pd.read_csv(open("Twidb11.csv"), sep=' ')
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(data.Text)
count_vect.vocabulary_
Error:
.ParserError: Error tokenizing data. C error: Expected 19 fields in
line 5, saw 22
It's duplicated i think. U can see answer here. There are a lot of answers and comments.
So, solution can be:
data = pd.read_csv('Twidb11.csv', error_bad_lines=False)
Or:
df = pandas.read_csv(fileName, sep='delimiter', header=None)
"In the code above, sep defines your delimiter and header=None tells pandas that your source data has no row for headers / column titles. Thus saith the docs: "If file contains no header row, then you should explicitly pass header=None". In this instance, pandas automatically creates whole-number indeces for each field {0,1,2,...}."

How can I save histogram plot in python?

I have following code that generates a histogram. How can I save the histogram automatically using the code? I tried what we do for other plot types but that did not work for histogram.a is a 'numpy.ndarray'.
a = [-0.86906864 -0.72122614 -0.18074998 -0.57190212 -0.25689268 -1.
0.68713553 0.29597819 0.45022949 0.37550592 0.86906864 0.17437203
0.48704826 0.2235648 0.72122614 0.14387731 0.94194514 ]
fig = pl.hist(a,normed=0)
pl.title('Mean')
pl.xlabel("value")
pl.ylabel("Frequency")
pl.savefig("abc.png")
This works for me:
import matplotlib.pyplot as pl
import numpy as np
a = np.array([-0.86906864, -0.72122614, -0.18074998, -0.57190212, -0.25689268 ,-1. ,0.68713553 ,0.29597819, 0.45022949, 0.37550592, 0.86906864, 0.17437203, 0.48704826, 0.2235648, 0.72122614, 0.14387731, 0.94194514])
fig = pl.hist(a,normed=0)
pl.title('Mean')
pl.xlabel("value")
pl.ylabel("Frequency")
pl.savefig("abc.png")
a in the OP is not a numpy array and its format also needs to be modified (it needs commas, not spaces as delimiters). This program successfully saves the histogram in the working directory. If it still does not work, supply it with a full path to the location where you want to save it like this
pl.savefig("/Users/atru/abc.png")
The pl.show() statement should not be placed before savefig() as it creates a new figure which makes savefig() save a blank figure instead of the desired one as explained in this post.

Python Networkx and Pandas library not working

My code below is supposed to print a graph/network using Networkx, Pandas and data from a CSV file. The code is (networkx3.py) -
import csv
import matplotlib.pyplot as plt
import pandas as pd
import networkx as nx
g = nx.Graph()
csv_dict = pd.read_csv('Book1.csv', index_col=[0])
csv_1 = csv_dict.values.tolist()
ini = 0
for row in csv_1:
for i in row:
if type(row[i]) is str:
g.add_edge(ini, int(i), conn_prob=(float(row[i])))
max_wg_ngs = sorted(g[ini].items(), key=lambda e: e[1]["conn_prob"], reverse=True)[:2]
sarr = [str(a) for a in max_wg_ngs]
print "Neighbours of Node %d are:" % ini
#print(max_wg_ngs)
for item in sarr:
print ''.join(str(item))[1:-1]
ini += 1
pos = nx.spring_layout(g, scale=100.)
nx.draw_networkx_nodes(g, pos)
nx.draw_networkx_edges(g, pos)
nx.draw_networkx_labels(g, pos)
#plt.axis('off')
plt.show()
The data in the CSV file is (Book1.csv) -
,1,2,3,4,5,6,7,8,9,10
1,0,0.257905291,0.775104118,0.239086843,0.002313744,0.416936603,0.194817214,0.163350301,0.252043807,0.251272559
2,0.346100279,0,0.438892758,0.598885794,0.002263231,0.406685237,0.523850975,0.257660167,0.206302228,0.161385794
3,0.753358102,0.222349243,0,0.407830809,0.001714776,0.507573592,0.169905687,0.139611318,0.187910832,0.326950557
4,0.185342928,0.571302688,0.51784403,0,0.003231018,0.295197533,0.216184462,0.153032751,0.216331326,0.317961522
5,0,0,0,0,0,0,0,0,0,0
6,0.478164621,0.418192795,0.646810223,0.410746629,0.002414973,0,0.609176897,0.203461461,0.157576977,0.636747837
7,0.24894327,0.522914349,0.33948832,0.316240267,0.002335929,0.639377086,0,0.410011123,0.540266963,0.587764182
8,0.234017887,0.320967208,0.285193773,0.258198079,0.003146737,0.224412057,0.411725737,0,0.487081815,0.469526333
9,0.302955306,0.080506624,0.261610132,0.22856311,0.001746979,0.014994905,0.63386228,0.486096957,0,0.664434415
10,0.232675407,0.121596312,0.457715027,0.310618067,0.001872929,0.57556548,0.473562887,0.32185564,0.482351246,0
The code however doesn't work. I don't understand where I'm going wrong. The error is -
Traceback (most recent call last):
File "networkx3.py", line 13, in <module>
if type(row[i]) is str:
TypeError: list indices must be integers, not float
I do not want to modify the CSV file or its data. The index column and header are supposed to be ignored.
I have previously asked this question but I did not get satisfactory answers. Can anybody help?
Thanks a lot in advance :) (Using Ubuntu 14.04 32-bit VM. Credits to #Adonis for helping in creating the original code)
A little late in answering my own question, but with some valuable help from #Joel and #Adonis, I finally figured out where I was going wrong.
The problem was in the 2nd for loop where I tried to pass a float value as a string into the Graph which gave me an error. Other minor changes would result in an output but without any edges, just nodes.
Finally, after using an enumerate function to define a connecting node (using its index giving power), I got the required output. The only changes to be made are in the 2nd for loop and the if condition:
for row in csv_1:
for idx, i in enumerate(row):
if type(row[idx]) is float:
g.add_edge(ini, idx, conn_prob=(float(row[idx])))
Thanks to all the selfless guys at SOF for the help, couldn't have done it without you :)

Changing column width from excel files

I have a file like so that I am reading from excel:
Year Month Day
1 2 1
2 1 2
I want to specify the column width that excel recognizes. I would like to do it in pandas but I don't see a option. I have tried to do it with the module StyleFrame.
This is my code:
from StyleFrame import StyleFrame
import pandas as pd
df=pd.read_excel(r'P:\File.xlsx')
excel_writer = StyleFrame.ExcelWriter(r'P:\File.xlsx')
sf=StyleFrame(df)
sf=sf.set_column_width(columns=['Year', 'Month'], width=4.0)
sf=sf.set_column_width(columns=['Day'], width=6.00)
sf=sf.to_excel(excel_writer=excel_writer)
excel_writer.save()
but the formatting isn't saved when I open the new file.
Is there a way to do it in pandas? I would even take a pure python solution to this, pretty much anything that works.
As for your question on how to remove the headers, you can simply pass header=False to to_excel:
sf.to_excel(excel_writer=excel_writer, header=False).
Note that this will still result with the first line of the table being bold.
If you don't want that behavior you can update to 0.1.6 that I just released.

Python/Pandas: How do I convert from datetime64[ns] to datetime

I have a script that processes an Excel file. The department that sends it has a system that generated it, and my script stopped working.
I suddenly got the error Can only use .str accessor with string values, which use np.object_ dtype in pandas for the following line of code:
df['DATE'] = df['Date'].str.replace(r'[^a-zA-Z0-9\._/-]', '')
I checked the type of the date columns in the file from the old system (dtype: object) vs the file from the new system (dtype: datetime64[ns]).
How do I change the date format to something my script will understand?
I saw this answer but my knowledge about date formats isn't this granular.
You can use apply function on the dataframe column to convert the necessary column to String. For example:
df['DATE'] = df['Date'].apply(lambda x: x.strftime('%Y-%m-%d'))
Make sure to import datetime module.
apply() will take each cell at a time for evaluation and apply the formatting as specified in the lambda function.
pd.to_datetime returns a Series of datetime64 dtype, as described here:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html
df['DATE'] = df['Date'].dt.date
or this:
df['Date'].map(datetime.datetime.date)
You can use pd.to_datetime
df['DATE'] = pd.to_datetime(df['DATE'])