Unable to read csv in pandas with sys.argv[] - python-2.7

I read a csv file in pandas by specifying the name directly. It works without trouble. But when I try to do the same with sys.argv[] it throws an error. This is what I tried.
import sys
import pandas as pd
filename = sys.argv[-1]
data = pd.read_csv(filename, delimiter=',')
print data
print data[data['logFC'] >= 2]
The error it throws back is:
Traceback (most recent call last): File "/Users/filter_cutoff_genexdata.py", line 7, in <module>
data = pd.read_csv(filename, delimiter=',') File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 498, in parser_f
return _read(filepath_or_buffer, kwds) File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 285, in
_read
return parser.read() File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 747, in read
ret = self._engine.read(nrows) File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 1197, in read
data = self._reader.read(nrows) File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read (pandas/parser.c:7988) File "pandas/parser.pyx", line 788, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8244) File "pandas/parser.pyx", line 842, in pandas.parser.TextReader._read_rows (pandas/parser.c:8970) File "pandas/parser.pyx", line 829, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8838) File "pandas/parser.pyx", line 1833, in pandas.parser.raise_parser_error (pandas/parser.c:22649) pandas.parser.CParserError: Error tokenizing data. C error: Expected 1 fields in line 7, saw 3
Can anyone please tell why it doesn't work ?

Related

How to fix 'ORA-19011' error in Python cx_Oracle

HI~ I want to query xml data from Oracle db with cx_Oracle, but it doesn't work with Ora-19011 error message. I think size of query data is larger than string buffer, but i don't know how to solve this problem
Oracle DB is an external DB and it's not my own DB, So i can't access directly. Therefore, I want to fix my problem on my code and print query data on python terminal.
(my software version)
windows 10 64bit
python 2.7 64bit
oracle-instant client 19.3 64bit
cx_oracle 7.2.2
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import cx_Oracle
import sys
import csv
import codecs
printHeader = True
conn = cx_Oracle.connect('id/passwd#ip:port/orcl')
print(conn.version)
curs = conn.cursor()
curs.execute('SELECT * FROM tablename')
for record in curs:
print(record)
Error occured at line 18(for record in curs) and here are error messages.
11.2.0.4.0
We've got an error while stopping in unhandled exception: <class 'cx_Oracle.DatabaseError'>.
Traceback (most recent call last):
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\_vendored\pydevd\pydevd.py", line 1740, in do_stop_on_unhandled_exception
self.do_wait_suspend(thread, frame, 'exception', arg, is_unhandled_exception=True)
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\_vendored\pydevd\pydevd.py", line 1615, in do_wait_suspend
with self._threads_suspended_single_notification.notify_thread_suspended(thread_id, stop_reason):
File "C:\Python27\lib\contextlib.py", line 17, in __enter__
return self.gen.next()
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\_vendored\pydevd\pydevd.py", line 360, in notify_thread_suspended
with AbstractSingleNotificationBehavior.notify_thread_suspended(self, thread_id, stop_reason):
File "C:\Python27\lib\contextlib.py", line 17, in __enter__
return self.gen.next()
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\_vendored\pydevd\pydevd.py", line 308, in notify_thread_suspended
self.send_suspend_notification(thread_id, stop_reason)
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\_vendored\pydevd\pydevd.py", line 354, in send_suspend_notification
py_db.writer.add_command(py_db.cmd_factory.make_thread_suspend_single_notification(py_db, thread_id, stop_reason))
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\_vendored\pydevd\_pydevd_bundle\pydevd_net_command_factory_json.py", line 309, in make_thread_suspend_single_notification
return NetCommand(CMD_THREAD_SUSPEND_SINGLE_NOTIFICATION, 0, event, is_json=True)
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\_vendored\pydevd\_pydevd_bundle\pydevd_net_command.py", line 57, in __init__
text = json.dumps(as_dict)
File "C:\Python27\lib\json\__init__.py", line 244, in dumps
return _default_encoder.encode(obj)
File "C:\Python27\lib\json\encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Python27\lib\json\encoder.py", line 270, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb9 in position 11: invalid start byte
Traceback (most recent call last):
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\ptvsd_launcher.py", line 43, in <module>
main(ptvsdArgs)
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\__main__.py", line 432, in main
run()
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\__main__.py", line 316, in run_file
runpy.run_path(target, run_name='__main__')
File "C:\Python27\lib\runpy.py", line 252, in run_path
return _run_module_code(code, init_globals, run_name, path_name)
File "C:\Python27\lib\runpy.py", line 82, in _run_module_code
mod_name, mod_fname, mod_loader, pkg_name)
File "C:\Python27\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "c:\PythonWorkspace\oraclePrc\test1.py", line 18, in <module>
for record in curs:
cx_Oracle.DatabaseError: ORA-19011: Character string buffer too small
When you connect to the database, try using this code instead:
conn = cx_Oracle.connect('id/passwd#ip:port/orcl', encoding="UTF-8", nencoding="UTF-8")
This will ensure that you are using a universal encoding -- which may eliminate the first error, and possibly the second as well. If not, adjust the code sample and error messages noted above.

Python Pandas 'str' object is not callable

I am new to Python and was trying the Pandas library. Here is the code to read a CSV file without headers:
import pandas as pnd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
pnd.set_option('max_columns', 50)
mpl.rcParams['lines.linewidth'] = 2
headers = ['OrderId', 'OrderDate', 'UserId', 'TotalCharges']
dtypes = {'OrderId': 'int', 'OrderDate': 'str', 'UserId': 'int', 'TotalCharges':'float'}
parse_dates = ['OrderDate']
df = pnd.read_csv('Raw_flight_data.csv', sep='\t', header=None,
names=headers,converters=dtypes,parse_dates=parse_dates)
This code gives me an error :-
runfile('C:/Users/rohan.arora/Desktop/Python/example.py', wdir='C:/Users/rohan.arora/Desktop/Python')
Traceback (most recent call last):
File "<ipython-input-47-43fc22883149>", line 1, in <module>
runfile('C:/Users/rohan.arora/Desktop/Python/example.py', wdir='C:/Users/rohan.arora/Desktop/Python')
File "C:\Users\rohan.arora\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "C:\Users\rohan.arora\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/rohan.arora/Desktop/Python/example.py", line 13, in <module>
names=headers,converters=dtypes,parse_dates=parse_dates)
File "C:\Users\rohan.arora\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\rohan.arora\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 401, in _read
data = parser.read()
File "C:\Users\rohan.arora\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 939, in read
ret = self._engine.read(nrows)
File "C:\Users\rohan.arora\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 1508, in read
data = self._reader.read(nrows)
File "pandas\parser.pyx", line 848, in pandas.parser.TextReader.read (pandas\parser.c:10415)
File "pandas\parser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:10691)
File "pandas\parser.pyx", line 947, in pandas.parser.TextReader._read_rows (pandas\parser.c:11728)
File "pandas\parser.pyx", line 1044, in pandas.parser.TextReader._convert_column_data (pandas\parser.c:13129)
File "pandas\parser.pyx", line 2115, in pandas.parser._apply_converter (pandas\parser.c:28771)
TypeError: 'str' object is not callable
I am using Anaconda Spyder 3.1.2 and running Python 2.7.13.
I think you need remove ' for types, not string representation of types:
dtypes = {'OrderId': 'int', 'OrderDate': 'str', 'UserId': 'int', 'TotalCharges':'float'}
to:
dtypes = {'OrderId': int, 'OrderDate': str, 'UserId': int, 'TotalCharges': float}

reading csv file in python?

i am working on a machine learning project where i am supposed to read a csv file to build a linear regression model and here is i read the csv file
data_test = pd.read_csv("/media/halawa/93B77F681EC1B4D2/GUC/Semster 8/CSEN 1022 Machine Learning/2/test.csv",delimiter=",", header=0)
but when i run i got this error
/usr/bin/python2.7 /home/halawa/PycharmProjects/ML/evergreen.py
Traceback (most recent call last):
File "/home/halawa/PycharmProjects/ML/evergreen.py", line 24, in <module>
data_test = pd.read_csv("/media/halawa/93B77F681EC1B4D2/GUC/Semster 8/CSEN 1022 Machine Learning/2/test.csv",delimiter=",", header=0)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 470, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 256, in _read
return parser.read()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 715, in read
ret = self._engine.read(nrows)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1164, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 758, in pandas.parser.TextReader.read (pandas/parser.c:7411)
File "pandas/parser.pyx", line 780, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:7651)
File "pandas/parser.pyx", line 833, in pandas.parser.TextReader._read_rows (pandas/parser.c:8268)
File "pandas/parser.pyx", line 820, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8142)
File "pandas/parser.pyx", line 1758, in pandas.parser.raise_parser_error (pandas/parser.c:20728)
pandas.parser.CParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 8
Process finished with exit code 1
Your issue is that your CSV doesn't have a consistent number of fields on each line. For example, it appears that the first line has 3 fields
x,y,z
While the third line has 8
x,y,z,a,b,c,d,e
You will need to fix your source CSV file to avoid this error.
Alternatively, if you know that you have 8 fields max, and are ok with some lines missing fields you can use names:
data_test = pd.read_csv("/media/halawa/93B77F681EC1B4D2/GUC/Semster 8/CSEN 1022 Machine Learning/2/test.csv",delimiter=",", header=0, names=list('abcdefgh'))
This parameter tells the CSV reader how many fields to expect, and the rest are filled in with a default value.
EDIT:
If your null columns are marked with a ? then you should set the pandas na_values parameter like so:
data_test = pd.read_csv("/media/halawa/93B77F681EC1B4D2/GUC/Semster 8/CSEN 1022 Machine Learning/2/test.csv",delimiter=",", header=0, na_values=['?'])

use random forest to classifier review, but hat key error?

I have follow code in python:
from sklearn.ensemble import RandomForestClassifier
forest = RandomForestClassifier(n_estimators = 100)
forest = forest.fit( train_data_features, train["sentiment"] )
but have key error for "sentiment", I don't know why,
train = pd.read_csv("labeledTrainData.tsv", header=0, delimiter="\t", quoting=3)
-Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site--packages/pandas/core/frame.py", line 1780, in __getitem__
return self._getitem_column(key)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 1787, in _getitem_column
return self._get_item_cache(key)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 1068, in _get_item_cache
values = self._data.get(item)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 2849, in get
loc = self.items.get_loc(item)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/index.py", line 1402, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3807)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3687)
File "pandas/hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12310)
File "pandas/hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12261)
KeyError: 'sentiment'
Are you doing the Kaggle competition? https://www.kaggle.com/c/word2vec-nlp-tutorial/data
Are you sure you have downloaded and decompressed the file ok? The first part of the file reads:
id sentiment review
"5814_8" 1 "With all this stuff go
This works for me:
>>> train = pd.read_csv("labeledTrainData.tsv", delimiter="\t")
>>> train.columns
Index([u'id', u'sentiment', u'review'], dtype='object')
>>> train.head(3)
id sentiment review
0 5814_8 1 With all this stuff going down at the moment w...
1 2381_9 1 \The Classic War of the Worlds\" by Timothy Hi...
2 7759_3 0 The film starts with a manager (Nicholas Bell)...
You should check the columns are setup correctly in the train variable. You should have a sentiment column. That column seems to be missing in your dataframe.

MemoryError in Django Sphinx!

The whole error is:
Traceback (most recent call last):
File "C:/Documents and Settings/aaa/trysphinx/views.py", line 30, in <module>
print list(results)
File "C:\Python27\lib\site-packages\django_sphinx-2.2.4-py2.7.egg\djangosphinx\models.py", line 243, in __iter__
return iter(self._get_data())
File "C:\Python27\lib\site-packages\django_sphinx-2.2.4-py2.7.egg\djangosphinx\models.py", line 422, in _get_data
self._result_cache = list(self._get_results())
File "C:\Python27\lib\site-packages\django_sphinx-2.2.4-py2.7.egg\djangosphinx\models.py", line 557, in _get_results
results = self._get_sphinx_results()
File "C:\Python27\lib\site-packages\django_sphinx-2.2.4-py2.7.egg\djangosphinx\models.py", line 529, in _get_sphinx_results
results = client.Query(self._query, self._index)
File "C:\Python27\lib\site-packages\django_sphinx-2.2.4-py2.7.egg\djangosphinx\apis\api263\__init__.py", line 388, in Query
response = self._GetResponse(sock, VER_COMMAND_SEARCH)
File "C:\Python27\lib\site-packages\django_sphinx-2.2.4-py2.7.egg\djangosphinx\apis\api263\__init__.py", line 144, in _GetResponse
chunk = sock.recv(left)
MemoryError
Please help me.
You're loading more data into memory than your computer can handle. Perhaps rather than:
print list(results)
…do:
for r in results:
print r