Python 2.7.10 memory error

Python 2.7.10 memory error - list

I defined the function to import five (each has 16 variables and with size around 120MB) csv.files together in Python 2.7.10, it works and then I selected four time variables to be formatted as Date time, the first three variables were transformed successfully but the last one failed by Memory Error. The function I defined was shown:
def reddat(filename,year1,year2):
bigdata=defaultdict(list)
for i in range(year1,year2):
string=filename+str(i)+".csv"
with open(string,'rb') as f:
reader=csv.reader(f)
headers=reader.next()
data1 = {h:[] for h in headers}
for row in reader:
for h, v in zip(headers, row):
data1[h].append(v)
for h in headers:
bigdata[h].append(data1[h])
return bigdata
dataall=reddat("Calls_for_Service_",2011,2016)
##This function works to import five years data and combined as one dictionary as dataall##
Then I selected four variables from dataall,
TimeCreate=[]
TimeDispatch=[]
TimeArrive=[]
TimeClosed=[]
for i in range(0,len(dataall['TimeCreate'])):
TimeCreate+=dataall['TimeCreate'][i]
TimeDispatch+=dataall['TimeDispatch'][i]
TimeArrive+=dataall['TimeArrive'][i]
TimeClosed+=dataall['TimeClosed'][i]
Now, four variables were selected from dataall as lists,these four lists contained string,I wanted to change them into date time format.I defined another function as follow:
def func(x):
try:
return dt.datetime.strptime(x, "%m/%d/%Y %I:%M:%S %p")
except:
return pd.NaT
I changed four string lists as date time lists:
TimeCreatenew=[func(d) for d in TimeCreate]
TimeDispatchnew=[func(d) for d in TimeDispatch]
TimeArrivenew=[func(d) for d in TimeArrive]
TimeClosednew=[func(d) for d in TimeClosed]
However, "TimeCreatnew", "TimeDispatchnew", and "TimeArrivenew" worked well, but when "TimeClosednew" changed format, Python said
Traceback (most recent call last):
File "C:\Users\....\DataScience\scriptnew.py" line 65, in <module>
TimeClosednew=[func(d) for d in TimeClosed]
MemoryError
My python 2.7.10 is 32bit, how could I address this problem? Or if my function "reddat" is not effective? THANKS VERY MUCH
Found out one solution for using Python 3.5
I employed Python 3.5 by Anaconda3 (64-bit), which addressed the problem without memory error. I think Python 2.7.10 may not handle with such large size data. If someone has some idea about this problem which can be solved under Python 2.7.10. Please share idea. Thanks very much

Related

Python - AttributeError: 'DataFrame' object has no attribute

I have a CSV file with various columns and everything worked perfectly for the past few months until I updated the file and got new information and now the one column does not appear to be picked up by Python. I am using Python 2.7 and have made sure I have the latest version of pandas.
When I downloaded the csv file from Yahoo Finance, I opened it in Excel and made changes to the format of the columns in order to make it more readable as all information was in one cell. I used the "Text to Column" feature and split up the data based on where the commas were.
Then I made sure that in each column there were no white spaces in the beginning of the cell using the Trim function in excel and left-aligning the data.
I tried the following and still get the same or similiar:
After the df = pd.read_csv("KIO.csv") I tried to read whether I can read the first few columns by using df.head() - but still got the same error.
I tried renaming the problematic column as suggested in a similiar post using:
df = df.rename(columns={"Close": "Closing"}) - here I got the same error again. "print df.columns" also led to the same issue.
"df[1]" - gave a long error with "KeyError: 1" at the end - I can print the entire thing if it it will assist.
Adding the "skipinitialspace=True" - no difference.
I thought the problem might be within the actual csv file information so I deleted all the columns and made my own information and I still got the same error.
Below is a portion of my code as the total code is very long:
enter code here
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as pltdate
import datetime
import matplotlib.animation as animation
import numpy as np
df = pd.read_csv("KIO.csv", skipinitialspace=True)
#df.head()
#Close = df.columns[0]
#df= df.rename(columns={"Close": "Closing"})
df1 = pd.read_csv("USD-ZAR.csv")
kio_close = pd.DataFrame(df.Close)
exchange = pd.DataFrame(df1.Value)
dates = df["Date"]
dates1 = df1["Date"]
The above variables have been used throughout the remaining code though so if this issue can be solved here the remaining code will be right.
This is copy/paste of the error:
Blockquote
Traceback (most recent call last):
File "C:/Users/User/Documents/PycharmProjects/Trading_GUI/GUI_testing.py", line 33, in
kio_close = pd.DataFrame(df.Close)
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 4372, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'Close'
Thank you so much in advance.

#Rip_027 This is in regards to your last comment. I used to have the same issue whenever I open a csv file by simply double clicking the file icon. You need to launch Excel first, then get external data. Link below has more details,which will serve as a guideline. Hope this helps.
https://www.hesa.ac.uk/support/user-guides/import-csv

Python Networkx and Pandas library not working

My code below is supposed to print a graph/network using Networkx, Pandas and data from a CSV file. The code is (networkx3.py) -
import csv
import matplotlib.pyplot as plt
import pandas as pd
import networkx as nx
g = nx.Graph()
csv_dict = pd.read_csv('Book1.csv', index_col=[0])
csv_1 = csv_dict.values.tolist()
ini = 0
for row in csv_1:
for i in row:
if type(row[i]) is str:
g.add_edge(ini, int(i), conn_prob=(float(row[i])))
max_wg_ngs = sorted(g[ini].items(), key=lambda e: e[1]["conn_prob"], reverse=True)[:2]
sarr = [str(a) for a in max_wg_ngs]
print "Neighbours of Node %d are:" % ini
#print(max_wg_ngs)
for item in sarr:
print ''.join(str(item))[1:-1]
ini += 1
pos = nx.spring_layout(g, scale=100.)
nx.draw_networkx_nodes(g, pos)
nx.draw_networkx_edges(g, pos)
nx.draw_networkx_labels(g, pos)
#plt.axis('off')
plt.show()
The data in the CSV file is (Book1.csv) -
,1,2,3,4,5,6,7,8,9,10
1,0,0.257905291,0.775104118,0.239086843,0.002313744,0.416936603,0.194817214,0.163350301,0.252043807,0.251272559
2,0.346100279,0,0.438892758,0.598885794,0.002263231,0.406685237,0.523850975,0.257660167,0.206302228,0.161385794
3,0.753358102,0.222349243,0,0.407830809,0.001714776,0.507573592,0.169905687,0.139611318,0.187910832,0.326950557
4,0.185342928,0.571302688,0.51784403,0,0.003231018,0.295197533,0.216184462,0.153032751,0.216331326,0.317961522
5,0,0,0,0,0,0,0,0,0,0
6,0.478164621,0.418192795,0.646810223,0.410746629,0.002414973,0,0.609176897,0.203461461,0.157576977,0.636747837
7,0.24894327,0.522914349,0.33948832,0.316240267,0.002335929,0.639377086,0,0.410011123,0.540266963,0.587764182
8,0.234017887,0.320967208,0.285193773,0.258198079,0.003146737,0.224412057,0.411725737,0,0.487081815,0.469526333
9,0.302955306,0.080506624,0.261610132,0.22856311,0.001746979,0.014994905,0.63386228,0.486096957,0,0.664434415
10,0.232675407,0.121596312,0.457715027,0.310618067,0.001872929,0.57556548,0.473562887,0.32185564,0.482351246,0
The code however doesn't work. I don't understand where I'm going wrong. The error is -
Traceback (most recent call last):
File "networkx3.py", line 13, in <module>
if type(row[i]) is str:
TypeError: list indices must be integers, not float
I do not want to modify the CSV file or its data. The index column and header are supposed to be ignored.
I have previously asked this question but I did not get satisfactory answers. Can anybody help?
Thanks a lot in advance :) (Using Ubuntu 14.04 32-bit VM. Credits to #Adonis for helping in creating the original code)

A little late in answering my own question, but with some valuable help from #Joel and #Adonis, I finally figured out where I was going wrong.
The problem was in the 2nd for loop where I tried to pass a float value as a string into the Graph which gave me an error. Other minor changes would result in an output but without any edges, just nodes.
Finally, after using an enumerate function to define a connecting node (using its index giving power), I got the required output. The only changes to be made are in the 2nd for loop and the if condition:
for row in csv_1:
for idx, i in enumerate(row):
if type(row[idx]) is float:
g.add_edge(ini, idx, conn_prob=(float(row[idx])))
Thanks to all the selfless guys at SOF for the help, couldn't have done it without you :)

Python Pandas to_csv Output Returns Single Character for String/Object Values

I'm attempting to output the result into a pandas data frame. When I print the data frame, the object values appear correct, but when I use the to_csv function on the data frame, my csv output has only the first character for every string/object value.
df = pandas.DataFrame({'a':[u'u\x00s\x00']})
df.to_csv('test.csv')
I've also tried the following addition to the to_csv function:
df.to_csv('test_encoded.csv', encoding= 'utf-8')
But am getting the same results:
>>> print df
a
0 us
(output in csv file)
u
For reference, I'm connecting to a Vertica database and using the following setup:
OS: Mac OS X Yosemite (10.10.5)
Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, Sep 15 2015,
14:29:08)
pyodbc 3.0.10
pandas 0.16.2
ODBC: Vertica ODBC 6.1.3
Any help figuring out how to pass the entire object string using the to_csv function in pandas would be greatly appreciated.

I was facing the same problem and found this post UTF-32 in Python
To fix your problem, I believe that you need to replace all '\x00' by empty. I managed to write the correct CSV with the code below
fixer = dict.fromkeys([0x00], u'')
df['a'] = df['a'].map(lambda x: x.translate(fixer))
df.to_csv('test.csv')
To solve my problem with Vertica I had to change the encoding to UTF-16 in the file /Library/Vertica/ODBC/lib/vertica.ini with the configuration below
[Driver]
ErrorMessagesPath=/Library/Vertica/ODBC/messages/
ODBCInstLib=/usr/lib/libiodbcinst.dylib
DriverManagerEncoding=UTF-16
Best regards,
Anderson Neves

else statement does not return to loop

I have a code that opens a file, calculates the median value and writes that value to a separate file. Some of the files maybe empty so I wrote the following loop to check it the file is empty and if so skip it, increment the count and go back to the loop. It does what is expected for the first empty file it finds ,but not the second. The loop is below
t = 15.2
while t>=11.4:
if os.stat(r'C:\Users\Khary\Documents\bin%.2f.txt'%t ).st_size > 0:
print("All good")
F= r'C:\Users\Documents\bin%.2f.txt'%t
print(t)
F= np.loadtxt(F,skiprows=0)
LogMass = F[:,0]
LogRed = F[:,1]
value = np.median(LogMass)
filesave(*find_nearest(LogMass,LogRed))
t -=0.2
else:
t -=0.2
print("empty file")
The output is as follows
All good
15.2
All good
15.0
All good
14.8
All good
14.600000000000001
All good
14.400000000000002
All good
14.200000000000003
All good
14.000000000000004
All good
13.800000000000004
All good
13.600000000000005
All good
13.400000000000006
empty file
All good
13.000000000000007
Traceback (most recent call last):
File "C:\Users\Documents\Codes\Calculate Bin Median.py", line 35, in <module>
LogMass = F[:,0]
IndexError: too many indices
A second issue is that t somehow goes from one decimal place to 15 and the last place seems to incrementing whats with that?
Thanks for any and all help
EDIT
The error IndexError: too many indices only seems to apply to files with only one line example...
12.9982324 0.004321374
If I add a second line I no longer get the error can someone explain why this is? Thanks
EDIT
I tried a little experiment and it seems numpy does not like extracting a column if the array only has one row.
In [8]: x = np.array([1,3])
In [9]: y=x[:,0]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-9-50e27cf81d21> in <module>()
----> 1 y=x[:,0]
IndexError: too many indices
In [10]: y=x[:,0].shape
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-10-e8108cf30e9a> in <module>()
----> 1 y=x[:,0].shape
IndexError: too many indices
In [11]:

You should be using try/except blocks. Something like:
t = 15.2
while t >= 11.4:
F= r'C:\Users\Documents\bin%.2f.txt'%t
try:
F = np.loadtxt(F,skiprows=0)
LogMass = F[:,0]
LogRed = F[:,1]
value = np.median(LogMass)
filesave(*find_nearest(LogMass,LogRed))
except IndexError:
print("bad file: {}".format(F))
else:
print("file worked!")
finally:
t -=0.2
Please refer to the official tutorial for more details about exception handling.
The issue with the last digit is due to how floats work they can not represent base10 numbers exactly. This can lead to fun things like:
In [13]: .3 * 3 - .9
Out[13]: -1.1102230246251565e-16

To deal with the one line file case, add the ndmin parameter to np.loadtxt (review its doc):
np.loadtxt('test.npy',ndmin=2)
# array([[ 1., 2.]])

With the help of a user named ajcr, found the problem was that ndim=2 should have been used in numpy.loadtxt() to insure that the array always 2 has dimensions.

Python uses indentation to define if while and for blocks.
It doesn't look like your if else statement is fully indented from the while.
I usually use a full 'tab' keyboard key to indent instead of 'spaces'

Xively read data in Python

I have written a python 2.7 script to retrieve all my historical data from Xively.
Originally I wrote it in C#, and it works perfectly.
I am limiting the request to 6 hour blocks, to retrieve all stored data.
My version in Python is as follows:
requestString = 'http://api.xively.com/v2/feeds/41189/datastreams/0001.csv?key=YcfzZVxtXxxxxxxxxxxORnVu_dMQ&start=' + requestDate + '&duration=6hours&interval=0&per_page=1000' response = urllib2.urlopen(requestString).read()
The request date is in the correct format, I compared the full c# requestString version and the python one.
Using the above request, I only get 101 lines of data, which equates to a few minutes of results.
My suspicion is that it is the .read() function, it returns about 34k of characters which is far less than the c# version. I tried adding 100000 as an argument to the ad function, but no change in result.

Left another solution wrote in Python 2.7 too.
In my case, got data each 30 minutes because many sensors sent values every minute and Xively API has limited half hour of data to this sent frequency.
It's general module:
for day in datespan(start_datetime, end_datetime, deltatime): # loop increasing deltatime to star_datetime until finish
while(True): # assurance correct retrieval data
try:
response = urllib2.urlopen('https://api.xively.com/v2/feeds/'+str(feed)+'.csv?key='+apikey_xively+'&start='+ day.strftime("%Y-%m-%dT%H:%M:%SZ")+'&interval='+str(interval)+'&duration='+duration) # get data
break
except:
time.sleep(0.3)
raise # try again
cr = csv.reader(response) # return data in columns
print '.'
for row in cr:
if row[0] in id: # choose desired data
f.write(row[0]+","+row[1]+","+row[2]+"\n") # write "id,timestamp,value"
The full script you can find it here: https://github.com/CarlosRufo/scripts/blob/master/python/retrievalDataXively.py
Hope you might help, delighted to answer any questions :)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Python 2.7.10 memory error - list

Related

Python - AttributeError: 'DataFrame' object has no attribute

Python Networkx and Pandas library not working

Python Pandas to_csv Output Returns Single Character for String/Object Values

else statement does not return to loop

Xively read data in Python

Categories

Resources