I'm new to python and I have a csv file which contains 35 columns. To print the last column of each i'm executing the script below:
import csv
filename = '/home/cloudera/PMGE/Bfr.csv'
with open(filename, 'r') as csvfile:
datareader = csv.reader(csvfile)
for row in datareader:
print(row[34])
```
It prints the expected result but at the end i have this error:
```
Traceback (most recent call last):
File "barplot.py", line 15, in <module>
print(row[34])
IndexError: list index out of range
```
Can anyone help me to understand this?
Related
I am working through a regex task on Dataquest. The following code snippet runs correctly
inside of the Dataquest IDE:
titles = hn["title"]
pattern = r'\[(\w+)\]'
tag_matches = titles.str.extract(pattern)
tag_freq = tag_matches.value_counts()
print(tag_freq, '\n')
However, on my PC running pandas 0.25.3 this exact same code block yields an error:
Traceback (most recent call last):
File "C:/Users/Mark/PycharmProjects/main/main.py", line 63, in <module>
tag_freq = tag_matches.value_counts()
File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\generic.py", line 5179, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'value_counts'
Why is tag_matches coming back as a dataframe? I am running an extract against the series 'titles'.
From the docs:
Pandas.Series.str.Extract
A pattern with one group will return a Series if expand=False.
>>> s.str.extract(r'[ab](\d)', expand=False)
0 1
1 2
2 NaN
dtype: object
So perhaps you must be explicit and set expand=False to get a series object?
Im searching for a date in excel using a string that I've converted to python date. I get a error trying to convert excel values to date using the following code:
from dateutil import parser
import xlrd
d = '4/8/2019'
dt_obj = parser.parse(d)
wbpath = 'XLSX FILE'
wb = xlrd.open_workbook(wbpath)
ws = wb.sheet_by_index(1)
for rowidx in range(ws.nrows):
row = ws.row(rowidx)
for colidx, cell in enumerate(row):
if xlrd.xldate_as_tuple(cell.value, wb.datemode) == dt_obj:
print(ws.name)
print(colidx)
print(rowidx)
ERROR I get:
Traceback (most recent call last):
File "C:/Users/DKisialeu/PycharmProjects/new/YIM.py", line 12, in <module>
if xlrd.xldate_as_tuple(cell.value, wb.datemode) == dt_obj:
File "C:\Users\DKisialeu\AppData\Local\Programs\Python\Python37\lib\site-packages\xlrd\xldate.py", line 95, in xldate_as_tuple
if xldate < 0.00:
TypeError: '<' not supported between instances of 'str' and 'float'
Make sure that the dates in your excel spreadsheet are formatted as dates and not as text.
I get the same error by running your code with a spreadsheet with any text formatted cells at all.
I've got the following code which errors and I'm not sure why.
from datetime import datetime
import os
Right_now = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print('Right now is ' + Right_now)
filename = 'sysdate.txt'
with open(filename,"r") as fin:
last_date = fin.read()
my_date = datetime.strptime(str(last_date), '%Y-%m-%d %H:%M:%S')
fin.close()
The file contains the following date format
2018-01-18 11:01:54
However I'm getting the following error message..
Right now is 2018-01-18 11:16:13
Traceback (most recent call last):
File "test1.py", line 11, in <module>
my_date = datetime.strptime(str(last_date), '%Y-%m-%d %H:%M:%S')
File "/usr/lib64/python2.7/_strptime.py", line 328, in _strptime
data_string[found.end():])
ValueError: unconverted data remains:
Python version is 2.7.5
Assuming you are interested only in the last date, if there are more than one, you need to modify your program to account for the empty string returned at the very end:
with open(filename,"r") as fin:
last_date = fin.read().split('\n')[-2]
my_date = datetime.strptime(last_date, '%Y-%m-%d %H:%M:%S')
Using .read() will read the whole file at once. The last element will be the empty string returned when the end of file is reached. You can read more about it in this post and in the documentation.
Your corrected program reads the file and splits it into lines based on the newline character after which it selects the pre-last element which is your target date. The remaining command runs successfully.
You don't need the .close() at the end - with open closes the file automatically,
As of Python 2.5, you can avoid having to call this method explicitly
if you use the with statement. For example, the following code will
automatically close f when the with block is exited:
as written in the documentation.
Your problem is the new line character.
I had written a very similar program, reading one date from each row in a text file, and got the same error message.
I solved it by using rstrip() to remove the newline in a similar fashion:
with open(filename,"r") as fin:
last_date = fin.read()
my_date = datetime.strptime(str.rstrip(last_date), '%Y-%m-%d %H:%M:%S')
Even though the manual does not state it explicitly rstrip() also removes newline characters.
I am trying to merge multiple csv files in a folder.
They look like this (there are more than two df's in actuality):
df1
LCC acres
2 10
3 20
4 40
5 5
df2
LCC acres_2
2 4
3 2
4 40
5 6
6 7
I want to put all the dataframes into one list, and then merge them with reduce. To do this they need to have the same index.
I am trying this code:
combined = []
reindex = [2,3,4,5,6]
folder = r'C:\path_to_files'
for f in os.listdir(folder):
#read each file
df = pd.read_csv(os.path.join(folder,f))
#check for duplicates - returns empty lists
print df[df.index.duplicated()]
#reindex
df.set_index([df.columns[0]], inplace=True)
df=df.reindex(reindex, fill_value=0)
#append
combined.append(df)
#merge on 'LCC' column
final = reduce(lambda left, right: pd.merge(left, right, on=['LCC'], how='outer'), combined)
but this still returns:
Traceback (most recent call last):
File "<ipython-input-31-45f925f6d48d>", line 9, in <module>
df=df.reindex(reindex, fill_value=0)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\frame.py", line 2741, in reindex
**kwargs)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\generic.py", line 2229, in reindex
fill_value, copy).__finalize__(self)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\frame.py", line 2687, in _reindex_axes
fill_value, limit, tolerance)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\frame.py", line 2698, in _reindex_index
allow_dups=False)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\generic.py", line 2341, in _reindex_with_indexers
copy=copy)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\internals.py", line 3586, in reindex_indexer
self.axes[axis]._can_reindex(indexer)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\indexes\base.py", line 2293, in _can_reindex
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
There is problem you need check duplicates of index after setting first column to index.
#set index by first column
df.set_index([df.columns[0]], inplace=True)
#check for duplicates - returns NO empty lists
print df[df.index.duplicated()]
#reindex
df=df.reindex(reindex, fill_value=0)
Or check duplicates in first column instead index, also parameter keep=False return all duplicates (if necessary):
#check duplicates in first column
print df[df.iloc[:, 0].duplicated(keep=False)]
#set index + reindex
df.set_index([df.columns[0]], inplace=True)
df=df.reindex(reindex, fill_value=0)
I am looping through many smaller dataframes and concatenating them into a single dataframe using pandas.concat(). In the middle of the looping an exception is raised with message ValueError: Plan shapes are not aligned.
The failed dataframe contains a single row (like all the previous dataframes) and the columns are a subset of the other dataframe. A sample snippet of the code is below.
import pandas as pd
df, failed = pd.DataFrame(), pd.DataFrame()
for _file in os.listdir(file_dir):
_tmp = pd.read_csv(file_dir + _file)
try:
df= pd.concat([df, _tmp])
except ValueError as e:
if 'Plan shapes are not aligned' in str(e):
failed = pd.concat([failed, _tmp])
print [x for x in failed.columns if x not in df.columns]
print len(df), len(failed)
And I end up with the result
Out[10]: []
118 1
Checking the failures it is always the same dataframe, so the dataframe must be the problem. Printing out the dataframe I get
0 timestamp actual average_estimate median_estimate \
0 1996-11-14 01:30:00 2.300000 2.380000 2.400000
0 estimate1 estimate2 estimate3 estimate4 \
0 2.400000 2.200000 2.500000 2.600000
0 estimate5
0 2.200000
Which has a similar format to the other concatenated dataframes and the df dataframe. Is there something that I'm missing?
Extra info: I am using pandas 0.16.0
Edit: full stack trace below with modifications for anonymity
Traceback (most recent call last):
File "C:\Users\<user>\Documents\GitHub\<environment>\lib\site-packages\IPython\core\interactiveshell.py", line 3066, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-48539cb93d64>", line 37, in <module>
df = pd.concat([df, _tmp])
File "C:\Users\<user>\Documents\GitHub\<environment>\lib\site-packages\pandas\tools\merge.py", line 755, in concat
return op.get_result()
File "C:\Users\<user>\Documents\GitHub\<environment>\lib\site-packages\pandas\tools\merge.py", line 926, in get_result
mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy)
File "C:\Users\<user>\Documents\GitHub\<environment>\lib\site-packages\pandas\core\internals.py", line 4040, in concatenate_block_managers
for placement, join_units in concat_plan]
File "C:\Users\<user>\Documents\GitHub\<environment>\lib\site-packages\pandas\core\internals.py", line 4258, in combine_concat_plans
raise ValueError("Plan shapes are not aligned")
ValueError: Plan shapes are not aligned
Edit 2: Tried with 0.17.1 and 0.18.0 and still have the same error.