I used two csv files create a dataframe:
import matplotlib.pyplot as plt
import pandas as pd
#import data
df1= pd.read_csv("C:\Users\Meiji\Desktop\CNST 6308-python\Hourly_TTI.csv")
df2= pd.read_csv("C:\Users\Meiji\Desktop\CNST 6308-python\Weather.csv")
#standardize date format
df1['new_date']= pd.to_datetime(df1['Date'])
df2['new_date']= pd.to_datetime(df2['EST'])
#merge TTI and weather dataframe
df=pd.merge(df1,df2,on=['new_date'])
#plot
df[df["Events"]=='Rain-Hail-Thunderstorm'].groupby('Hour').mean()['TTI'].plot()
df[df["Events"]!='Rain-Hail-Thunderstorm'].groupby('Hour').mean()['TTI'].plot()
plt.ylabel('TTI')
plt.legend(['Rain-Hail-Thunderstorm','Ohters'])
plot.show()
Here is the error I'm getting:
Process started (PID=27504) >>>
Traceback (most recent call last):
File "C:\Users\Meiji\Desktop\CNST 6308-python\HW4\new 2.py", line 12, in <module>
df[df["Events"]=='Rain-Hail-Thunderstorm'].groupby('Hour').mean()['TTI'].plot()
File "c:\python27\lib\site-packages\pandas\core\frame.py", line 2927, in __getitem__
indexer = self.columns.get_loc(key)
File "c:\python27\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Events'
<<< Process finished (PID=27504). (Exit code 1)
what am i missing? My friend has a same code, but he run perfectly.
Related
I am new to Python and was trying the Pandas library. Here is the code to read a CSV file without headers:
import pandas as pnd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
pnd.set_option('max_columns', 50)
mpl.rcParams['lines.linewidth'] = 2
headers = ['OrderId', 'OrderDate', 'UserId', 'TotalCharges']
dtypes = {'OrderId': 'int', 'OrderDate': 'str', 'UserId': 'int', 'TotalCharges':'float'}
parse_dates = ['OrderDate']
df = pnd.read_csv('Raw_flight_data.csv', sep='\t', header=None,
names=headers,converters=dtypes,parse_dates=parse_dates)
This code gives me an error :-
runfile('C:/Users/rohan.arora/Desktop/Python/example.py', wdir='C:/Users/rohan.arora/Desktop/Python')
Traceback (most recent call last):
File "<ipython-input-47-43fc22883149>", line 1, in <module>
runfile('C:/Users/rohan.arora/Desktop/Python/example.py', wdir='C:/Users/rohan.arora/Desktop/Python')
File "C:\Users\rohan.arora\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "C:\Users\rohan.arora\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/rohan.arora/Desktop/Python/example.py", line 13, in <module>
names=headers,converters=dtypes,parse_dates=parse_dates)
File "C:\Users\rohan.arora\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\rohan.arora\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 401, in _read
data = parser.read()
File "C:\Users\rohan.arora\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 939, in read
ret = self._engine.read(nrows)
File "C:\Users\rohan.arora\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 1508, in read
data = self._reader.read(nrows)
File "pandas\parser.pyx", line 848, in pandas.parser.TextReader.read (pandas\parser.c:10415)
File "pandas\parser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:10691)
File "pandas\parser.pyx", line 947, in pandas.parser.TextReader._read_rows (pandas\parser.c:11728)
File "pandas\parser.pyx", line 1044, in pandas.parser.TextReader._convert_column_data (pandas\parser.c:13129)
File "pandas\parser.pyx", line 2115, in pandas.parser._apply_converter (pandas\parser.c:28771)
TypeError: 'str' object is not callable
I am using Anaconda Spyder 3.1.2 and running Python 2.7.13.
I think you need remove ' for types, not string representation of types:
dtypes = {'OrderId': 'int', 'OrderDate': 'str', 'UserId': 'int', 'TotalCharges':'float'}
to:
dtypes = {'OrderId': int, 'OrderDate': str, 'UserId': int, 'TotalCharges': float}
I am trying to use TfIdfVectorizer of sklearn. I am having trouble because my input is probably not matching TfIdfVectorizer needs. I have a bunch of JSONs I loaded and appended into a list, and I now want that to be the corpus for TfIdfVectorizer use.
The code:
import json
import pandas
from sklearn.feature_extraction.text import TfidfVectorizer
train=pandas.read_csv("train.tsv", sep='\t')
documents=[]
for i,row in train.iterrows():
data = json.loads(row['boilerplate'].lower())
documents.append(data['body'])
vectorizer=TfidfVectorizer(min_df=1)
X = vectorizer.fit_transform(documents)
idf = vectorizer.idf_
print dict(zip(vectorizer.get_feature_names(), idf))
I am getting the following error:
Traceback (most recent call last):
File "<ipython-input-56-94a6b95b0745>", line 1, in <module>
runfile('C:/Users/Guinea Pig/Downloads/try.py', wdir='C:/Users/Guinea Pig/Downloads')
File "D:\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 585, in runfile
execfile(filename, namespace)
File "C:/Users/Guinea Pig/Downloads/try.py", line 19, in <module>
X = vectorizer.fit_transform(documents)
File "D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py", line 1219, in fit_transform
X = super(TfidfVectorizer, self).fit_transform(raw_documents)
File "D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py", line 780, in fit_transform
vocabulary, X = self._count_vocab(raw_documents, self.fixed_vocabulary)
File "D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py", line 715, in _count_vocab
for feature in analyze(doc):
File "D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py", line 229, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
File "D:\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py", line 195, in <lambda>
return lambda x: strip_accents(x.lower())
AttributeError: 'NoneType' object has no attribute 'lower'
I am getting that the documents array consists of Unicode objects, and not string objects, but I can't seem to solve this issue. ant ideas?
Eventually I used:
str_docs=[]
for item in documents:
str_docs.append(documents[i].encode('utf-8'))
As an addition
I'm having problems reading an OpenStreetMap buildings (IMPOSM GEOJSON) file into a geopandas data frame object (Python 2.7). This is on MAC OS X 10.11.3. Here are the messages I'm getting:
>>> import geopandas as gpd
>>> df=gpd.read_file('san-francisco-bay_california_buildings.geojson')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ewang/anaconda/lib/python2.7/site-packages/geopandas/io/file.py", line 28, in read_file
gdf = GeoDataFrame.from_features(f, crs=crs)
File "/Users/ewang/anaconda/lib/python2.7/site-packages/geopandas/geodataframe.py", line 193, in from_features
d = {'geometry': shape(f['geometry']) if f['geometry'] else None}
File "/Users/ewang/anaconda/lib/python2.7/site-packages/shapely/geometry/geo.py", line 34, in shape
return Polygon(ob["coordinates"][0], ob["coordinates"][1:])
File "/Users/ewang/anaconda/lib/python2.7/site-packages/shapely/geometry/polygon.py", line 229, in __init__
self._geom, self._ndim = geos_polygon_from_py(shell, holes)
File "/Users/ewang/anaconda/lib/python2.7/site-packages/shapely/geometry/polygon.py", line 508, in geos_polygon_from_py
geos_shell, ndim = geos_linearring_from_py(shell)
File "/Users/ewang/anaconda/lib/python2.7/site-packages/shapely/geometry/polygon.py", line 450, in geos_linearring_from_py
n = len(ob[0])
IndexError: list index out of range
The odd thing is that I can load OSM roads data IMPOSM GEOJSON files with geopandas. Am I missing something obvious here? Thanks very much.
EDIT - link to the data below:
OSM data from mapzen
I am a beginner in programming, started from Python. I learn by Dr Charles Severance materials. So in his book there is an example:
import urllib
fhand = urllib.urlopen('http://www.py4inf.com/code/rom...
for line in fhand:
print line.strip()
When I copy paste it to Python 2 version (I use PyCharm 5.0.4) there appears:
Traceback (most recent call last):
File "D:/Python4yk/temprehg111.py", line 2, in <module>
fhand = urllib.urlopen('http://www.py4inf.com/code/rom...
File "C:\Python27\lib\urllib.py", line 87, in urlopen
return opener.open(url)
File "C:\Python27\lib\urllib.py", line 208, in open
return getattr(self, name)(url)
File "C:\Python27\lib\urllib.py", line 292, in open_http
import httplib
File "C:\Python27\lib\httplib.py", line 79, in <module>
import mimetools
File "C:\Python27\lib\mimetools.py", line 6, in <module>
import tempfile
File "C:\Python27\lib\tempfile.py", line 35, in <module>
from random import Random as _Random
File "random.py", line 3, in <module>
integers
NameError: name 'line' is not defined
When I type another example, gets an error also. What is wrong? I don`t even write a program. I just copy paste an example. Asked Dr Chuck - still no answer.
Try this:
import urllib
fhand = urllib.urlopen('http://www.py4inf.com')
for line in fhand:
print line.strip() # notice the indentation
I have follow code in python:
from sklearn.ensemble import RandomForestClassifier
forest = RandomForestClassifier(n_estimators = 100)
forest = forest.fit( train_data_features, train["sentiment"] )
but have key error for "sentiment", I don't know why,
train = pd.read_csv("labeledTrainData.tsv", header=0, delimiter="\t", quoting=3)
-Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site--packages/pandas/core/frame.py", line 1780, in __getitem__
return self._getitem_column(key)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 1787, in _getitem_column
return self._get_item_cache(key)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 1068, in _get_item_cache
values = self._data.get(item)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 2849, in get
loc = self.items.get_loc(item)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/index.py", line 1402, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3807)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3687)
File "pandas/hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12310)
File "pandas/hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12261)
KeyError: 'sentiment'
Are you doing the Kaggle competition? https://www.kaggle.com/c/word2vec-nlp-tutorial/data
Are you sure you have downloaded and decompressed the file ok? The first part of the file reads:
id sentiment review
"5814_8" 1 "With all this stuff go
This works for me:
>>> train = pd.read_csv("labeledTrainData.tsv", delimiter="\t")
>>> train.columns
Index([u'id', u'sentiment', u'review'], dtype='object')
>>> train.head(3)
id sentiment review
0 5814_8 1 With all this stuff going down at the moment w...
1 2381_9 1 \The Classic War of the Worlds\" by Timothy Hi...
2 7759_3 0 The film starts with a manager (Nicholas Bell)...
You should check the columns are setup correctly in the train variable. You should have a sentiment column. That column seems to be missing in your dataframe.