Trying to search excel file for a date - tuples

Im searching for a date in excel using a string that I've converted to python date. I get a error trying to convert excel values to date using the following code:
from dateutil import parser
import xlrd
d = '4/8/2019'
dt_obj = parser.parse(d)
wbpath = 'XLSX FILE'
wb = xlrd.open_workbook(wbpath)
ws = wb.sheet_by_index(1)
for rowidx in range(ws.nrows):
row = ws.row(rowidx)
for colidx, cell in enumerate(row):
if xlrd.xldate_as_tuple(cell.value, wb.datemode) == dt_obj:
print(ws.name)
print(colidx)
print(rowidx)
ERROR I get:
Traceback (most recent call last):
File "C:/Users/DKisialeu/PycharmProjects/new/YIM.py", line 12, in <module>
if xlrd.xldate_as_tuple(cell.value, wb.datemode) == dt_obj:
File "C:\Users\DKisialeu\AppData\Local\Programs\Python\Python37\lib\site-packages\xlrd\xldate.py", line 95, in xldate_as_tuple
if xldate < 0.00:
TypeError: '<' not supported between instances of 'str' and 'float'

Make sure that the dates in your excel spreadsheet are formatted as dates and not as text.
I get the same error by running your code with a spreadsheet with any text formatted cells at all.

Related

How to resolve IndexError: list index out of range

I'm new to python and I have a csv file which contains 35 columns. To print the last column of each i'm executing the script below:
import csv
filename = '/home/cloudera/PMGE/Bfr.csv'
with open(filename, 'r') as csvfile:
datareader = csv.reader(csvfile)
for row in datareader:
print(row[34])
```
It prints the expected result but at the end i have this error:
```
Traceback (most recent call last):
File "barplot.py", line 15, in <module>
print(row[34])
IndexError: list index out of range
```
Can anyone help me to understand this?

Series regex extract producing a dataframe

I am working through a regex task on Dataquest. The following code snippet runs correctly
inside of the Dataquest IDE:
titles = hn["title"]
pattern = r'\[(\w+)\]'
tag_matches = titles.str.extract(pattern)
tag_freq = tag_matches.value_counts()
print(tag_freq, '\n')
However, on my PC running pandas 0.25.3 this exact same code block yields an error:
Traceback (most recent call last):
File "C:/Users/Mark/PycharmProjects/main/main.py", line 63, in <module>
tag_freq = tag_matches.value_counts()
File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\generic.py", line 5179, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'value_counts'
Why is tag_matches coming back as a dataframe? I am running an extract against the series 'titles'.
From the docs:
Pandas.Series.str.Extract
A pattern with one group will return a Series if expand=False.
>>> s.str.extract(r'[ab](\d)', expand=False)
0 1
1 2
2 NaN
dtype: object
So perhaps you must be explicit and set expand=False to get a series object?

Python - AttributeError: 'DataFrame' object has no attribute

I have a CSV file with various columns and everything worked perfectly for the past few months until I updated the file and got new information and now the one column does not appear to be picked up by Python. I am using Python 2.7 and have made sure I have the latest version of pandas.
When I downloaded the csv file from Yahoo Finance, I opened it in Excel and made changes to the format of the columns in order to make it more readable as all information was in one cell. I used the "Text to Column" feature and split up the data based on where the commas were.
Then I made sure that in each column there were no white spaces in the beginning of the cell using the Trim function in excel and left-aligning the data.
I tried the following and still get the same or similiar:
After the df = pd.read_csv("KIO.csv") I tried to read whether I can read the first few columns by using df.head() - but still got the same error.
I tried renaming the problematic column as suggested in a similiar post using:
df = df.rename(columns={"Close": "Closing"}) - here I got the same error again. "print df.columns" also led to the same issue.
"df[1]" - gave a long error with "KeyError: 1" at the end - I can print the entire thing if it it will assist.
Adding the "skipinitialspace=True" - no difference.
I thought the problem might be within the actual csv file information so I deleted all the columns and made my own information and I still got the same error.
Below is a portion of my code as the total code is very long:
enter code here
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as pltdate
import datetime
import matplotlib.animation as animation
import numpy as np
df = pd.read_csv("KIO.csv", skipinitialspace=True)
#df.head()
#Close = df.columns[0]
#df= df.rename(columns={"Close": "Closing"})
df1 = pd.read_csv("USD-ZAR.csv")
kio_close = pd.DataFrame(df.Close)
exchange = pd.DataFrame(df1.Value)
dates = df["Date"]
dates1 = df1["Date"]
The above variables have been used throughout the remaining code though so if this issue can be solved here the remaining code will be right.
This is copy/paste of the error:
Blockquote
Traceback (most recent call last):
File "C:/Users/User/Documents/PycharmProjects/Trading_GUI/GUI_testing.py", line 33, in
kio_close = pd.DataFrame(df.Close)
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 4372, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'Close'
Thank you so much in advance.
#Rip_027 This is in regards to your last comment. I used to have the same issue whenever I open a csv file by simply double clicking the file icon. You need to launch Excel first, then get external data. Link below has more details,which will serve as a guideline. Hope this helps.
https://www.hesa.ac.uk/support/user-guides/import-csv

Reading date from file

I've got the following code which errors and I'm not sure why.
from datetime import datetime
import os
Right_now = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print('Right now is ' + Right_now)
filename = 'sysdate.txt'
with open(filename,"r") as fin:
last_date = fin.read()
my_date = datetime.strptime(str(last_date), '%Y-%m-%d %H:%M:%S')
fin.close()
The file contains the following date format
2018-01-18 11:01:54
However I'm getting the following error message..
Right now is 2018-01-18 11:16:13
Traceback (most recent call last):
File "test1.py", line 11, in <module>
my_date = datetime.strptime(str(last_date), '%Y-%m-%d %H:%M:%S')
File "/usr/lib64/python2.7/_strptime.py", line 328, in _strptime
data_string[found.end():])
ValueError: unconverted data remains:
Python version is 2.7.5
Assuming you are interested only in the last date, if there are more than one, you need to modify your program to account for the empty string returned at the very end:
with open(filename,"r") as fin:
last_date = fin.read().split('\n')[-2]
my_date = datetime.strptime(last_date, '%Y-%m-%d %H:%M:%S')
Using .read() will read the whole file at once. The last element will be the empty string returned when the end of file is reached. You can read more about it in this post and in the documentation.
Your corrected program reads the file and splits it into lines based on the newline character after which it selects the pre-last element which is your target date. The remaining command runs successfully.
You don't need the .close() at the end - with open closes the file automatically,
As of Python 2.5, you can avoid having to call this method explicitly
if you use the with statement. For example, the following code will
automatically close f when the with block is exited:
as written in the documentation.
Your problem is the new line character.
I had written a very similar program, reading one date from each row in a text file, and got the same error message.
I solved it by using rstrip() to remove the newline in a similar fashion:
with open(filename,"r") as fin:
last_date = fin.read()
my_date = datetime.strptime(str.rstrip(last_date), '%Y-%m-%d %H:%M:%S')
Even though the manual does not state it explicitly rstrip() also removes newline characters.

pandas .drop() memory error large file

For reference, this is all on a Windows 7 x64 bit machine in PyCharm Educational Edition 1.0.1, with Python 3.4.2 and Pandas 0.16.1
I have an ~791MB .csv file with ~3.04 million rows x 24 columns. The file contains liquor sales data for the state of Iowa from January 2014 to February 2015. If you are interested, the file can be found here: https://data.iowa.gov/Economy/Iowa-Liquor-Sales/m3tr-qhgy.
One of the columns, titled store location, holds the address including latitude and longitude. The purpose of the program below is to take the latitude and longitude out of the store location cell and place each in its own cell. When the file is cut down to ~1.04 million rows, my program works properly.
1 import pandas as pd
2
3 #import the original file
4 sales = pd.read_csv('Iowa_Liquor_Sales.csv', header=0)
5
6 #transfer the copies into lists
7 lat = sales['STORE LOCATION']
8 lon = sales['STORE LOCATION']
9
10 #separate the latitude and longitude from each cell into their own list
11 hold = [i.split('(', 1)[1] for i in lat]
12 lat2 = [i.split(',', 1)[0] for i in hold]
13 lon2 = [i.split(',', 1)[1] for i in hold]
14 lon2 = [i.split(')', 1)[0] for i in lon2]
15
16 #put the now separate latitude and longitude back into their own columns
17 sales['LATITUDE'] = lat2
18 sales['LONGITUDE'] = lon2
19
20 #drop the store location column
21 sales = sales.drop(['STORE LOCATION'], axis=1)
22
23 #export the new panda data frame into a new file
24 sales.to_csv('liquor_data2.csv')
However, when I try to run the code with the full 3.04 million line file, it gives me this error:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 1595, in drop
dropped = self.reindex(**{axis_name: new_axis})
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 2505, in reindex
**kwargs)
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 1751, in reindex
self._consolidate_inplace()
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 2132, in _consolidate_inplace
self._data = self._protect_consolidate(f)
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 2125, in _protect_consolidate
result = f()
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 2131, in <lambda>
f = lambda: self._data.consolidate()
File "C:\Python34\lib\site-packages\pandas\core\internals.py", line 2833, in consolidate
bm._consolidate_inplace()
File "C:\Python34\lib\site-packages\pandas\core\internals.py", line 2838, in _consolidate_inplace
self.blocks = tuple(_consolidate(self.blocks))
File "C:\Python34\lib\site-packages\pandas\core\internals.py", line 3817, in _consolidate
_can_consolidate=_can_consolidate)
File "C:\Python34\lib\site-packages\pandas\core\internals.py", line 3840, in _merge_blocks
new_values = _vstack([b.values for b in blocks], dtype)
File "C:\Python34\lib\site-packages\pandas\core\internals.py", line 3870, in _vstack
return np.vstack(to_stack)
File "C:\Python34\lib\site-packages\numpy\core\shape_base.py", line 228, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
MemoryError
I tried running the code line-by-line in the python console and found that the error occurs after the program runs the sales = sales.drop(['STORE LOCATION'], axis=1) line.
I have searched for similar issues elsewhere and the only answer I have come up with is chunking the file as it is read by the program, like this:
#import the original file
df = pd.read_csv('Iowa_Liquor_Sales7.csv', header=0, chunksize=chunksize)
sales = pd.concat(df, ignore_index=True)
My only problem with that is then I get this error:
Traceback (most recent call last):
File "C:/Users/Aaron/PycharmProjects/DATA/Liquor_Reasign_Pd.py", line 14, in <module>
lat = sales['STORE LOCATION']
TypeError: 'TextFileReader' object is not subscriptable
My google-foo is all foo'd out. Anyone know what to do?
UPDATE
I should specify that with the chunking method,the error comes about when the program tries to duplicate the store location column.
So I found an answer to my issue. I ran the program in python 2.7 instead of python 3.4. The only change I made was deleting line 8, as it is unused. I don't know if 2.7 just handles the memory issue differently, or if I had improperly installed the pandas package in 3.4. I will reinstall pandas in 3.4 to see if that was the problem, but if anyone else has a similar issue, try your program in 2.7.
UPDATE Realized that I was running 32 bit python on a 64 bit machine. I upgraded my versions of python and it runs without memory errors now.