pandas get_group memory error - python-2.7

I am using pandas v0.14.1 with python 2.7
I have a groupby object and I am trying to pull out a group identified by particular key. The key is in fact in the group:
>>> key in key_groups.groups.keys()
True
but when I try to make the get_group call it fails with a memory error:
>>>> key_groups.get_group(key)
*** MemoryError:
The full stacktrace is:
Traceback (most recent call last):
File "main.py", line 141, in <module>
main(num_days=arguments.days, num_variants=arguments.variants)
File "main.py", line 76, in main
problem, solution = Solver.Solve(request, num_variants)
File "/srv/compunctuator/src/Solver.py", line 49, in Solve
solution = attempt_minimization(t)
File "/srv/compunctuator/src/Solver.py", line 41, in attempt_minimization
t.scruple()
File "/srv/compunctuator/src/Compunctuator.py", line 136, in scruple
self.__iterate__()
File "/srv/compunctuator/src/Compunctuator.py", line 95, in __iterate__
self.__maximize_impressions__()
File "/srv/compunctuator/src/Compunctuator.py", line 583, in __maximize_impressions__
df = key_groups.get_group(key)
File "/srv/compunctuator/.virtualenvs/compunctuator/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 573, in get_group
inds = self._get_index(name)
File "/srv/compunctuator/.virtualenvs/compunctuator/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 429, in _get_index
sample = next(iter(self.indices))
File "/srv/compunctuator/.virtualenvs/compunctuator/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 414, in indices
return self.grouper.indices
File "properties.pyx", line 34, in pandas.lib.cache_readonly.__get__ (pandas/lib.c:36380)
File "/srv/compunctuator/.virtualenvs/compunctuator/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 1253, in indices
return _get_indices_dict(label_list, keys)
File "/srv/compunctuator/.virtualenvs/compunctuator/local/lib/python2.7/site-packages/pandas/core/groupby.py", line 3474, in _get_indices_dict
np.prod(shape))
File "algos.pyx", line 1997, in pandas.algos.groupsort_indexer (pandas/algos.c:37521) MemoryError
If I actually use the dictionary lookup I can get the indices out:
>>>> key_groups.groups[key]
[0, 2]
It seems like everything should work here.
I realize a similar question was asked here pandas get_group causes memory error
but it was never resolved and I thought I could give more details if necessary.

Related

Compatibility with GCP aiplatform, bigquery and cloud-storage on hyperparameter tuning docker image

I am doing hyperparameter tuning on GCP using this scikit docker image. When I add the aiplatform package as a dependency, things break. The error comes from the bigquery import.
from google.cloud import bigquery
The error message is below.
The replica workerpool0-0 exited with a non-zero status of 1.
Traceback (most recent call last):
[...]
File "/root/.local/lib/python3.7/site-packages/trainer/task.py", line 7, in
from google.cloud import storage, bigquery
File "/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/__init__.py", line 35, in
from google.cloud.bigquery.client import Client
File "/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/client.py", line 60, in
from google.cloud.bigquery import _pandas_helpers
File "/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/_pandas_helpers.py", line 40, in
from google.cloud.bigquery import schema
File "/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/schema.py", line 19, in
from google.cloud.bigquery_v2 import types
File "/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery_v2/__init__.py", line 23, in
from google.cloud.bigquery_v2 import types
File "/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery_v2/types.py", line 23, in
from google.cloud.bigquery_v2.proto import encryption_config_pb2
File "/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery_v2/proto/encryption_config_pb2.py", line 64, in
file=DESCRIPTOR,
File "/root/.local/lib/python3.7/site-packages/google/protobuf/descriptor.py", line 560, in __new__
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
From the logs, I can see the system is downloading google-cloud-aiplatform v1.17.0. According to the scikit docker image, google-cloud-storage v1.35.0 is installed, but google-cloud-aiplatform drags in v2.5.0.
I am thinking I need to downgrade google-cloud-aiplatform to a specific version. Anyone know which version or how to resolve this problem?
UPDATE: FWIW, if I downgrade google-cloud-aiplatform==1.15.1 then the problem above goes away. However, this problem below shows.
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/.local/lib/python3.7/site-packages/trainer/hpt.py", line 170, in
staging_bucket=f'{args.bucket_uri}'
File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform/initializer.py", line 138, in init
backing_tensorboard=experiment_tensorboard,
File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform/metadata/metadata.py", line 235, in set_experiment
experiment_name=experiment, description=description
File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform/metadata/experiment_resources.py", line 247, in get_or_create
project=project, location=location, credentials=credentials
File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform/metadata/metadata_store.py", line 283, in ensure_default_metadata_store_exists
encryption_spec_key_name=encryption_key_spec_name,
File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform/metadata/metadata_store.py", line 123, in get_or_create
credentials=credentials,
File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform/metadata/metadata_store.py", line 241, in _get
credentials=credentials,
File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform/metadata/metadata_store.py", line 73, in __init__
self._gca_resource = self._get_gca_resource(resource_name=metadata_store_name)
File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform/base.py", line 617, in _get_gca_resource
return getattr(self.api_client, self._getter_method)(
File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform/utils/__init__.py", line 425, in __getattr__
return getattr(self._clients[self._default_version], name)
File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform/utils/__init__.py", line 359, in __getattr__
client_info=self._client_info,
File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1/services/metadata_service/client.py", line 547, in __init__
api_audience=client_options.api_audience,
File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1/services/metadata_service/transports/grpc.py", line 190, in __init__
("grpc.max_receive_message_length", -1),
File "/root/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1/services/metadata_service/transports/grpc.py", line 241, in create_channel
**kwargs,
File "/root/.local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 318, in create_channel
default_host=default_host,
File "/root/.local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 239, in _create_composite_credentials
credentials, scopes=scopes, default_scopes=default_scopes
TypeError: with_scopes_if_required() got an unexpected keyword argument 'default_scopes'

How to fix 'ORA-19011' error in Python cx_Oracle

HI~ I want to query xml data from Oracle db with cx_Oracle, but it doesn't work with Ora-19011 error message. I think size of query data is larger than string buffer, but i don't know how to solve this problem
Oracle DB is an external DB and it's not my own DB, So i can't access directly. Therefore, I want to fix my problem on my code and print query data on python terminal.
(my software version)
windows 10 64bit
python 2.7 64bit
oracle-instant client 19.3 64bit
cx_oracle 7.2.2
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import cx_Oracle
import sys
import csv
import codecs
printHeader = True
conn = cx_Oracle.connect('id/passwd#ip:port/orcl')
print(conn.version)
curs = conn.cursor()
curs.execute('SELECT * FROM tablename')
for record in curs:
print(record)
Error occured at line 18(for record in curs) and here are error messages.
11.2.0.4.0
We've got an error while stopping in unhandled exception: <class 'cx_Oracle.DatabaseError'>.
Traceback (most recent call last):
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\_vendored\pydevd\pydevd.py", line 1740, in do_stop_on_unhandled_exception
self.do_wait_suspend(thread, frame, 'exception', arg, is_unhandled_exception=True)
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\_vendored\pydevd\pydevd.py", line 1615, in do_wait_suspend
with self._threads_suspended_single_notification.notify_thread_suspended(thread_id, stop_reason):
File "C:\Python27\lib\contextlib.py", line 17, in __enter__
return self.gen.next()
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\_vendored\pydevd\pydevd.py", line 360, in notify_thread_suspended
with AbstractSingleNotificationBehavior.notify_thread_suspended(self, thread_id, stop_reason):
File "C:\Python27\lib\contextlib.py", line 17, in __enter__
return self.gen.next()
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\_vendored\pydevd\pydevd.py", line 308, in notify_thread_suspended
self.send_suspend_notification(thread_id, stop_reason)
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\_vendored\pydevd\pydevd.py", line 354, in send_suspend_notification
py_db.writer.add_command(py_db.cmd_factory.make_thread_suspend_single_notification(py_db, thread_id, stop_reason))
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\_vendored\pydevd\_pydevd_bundle\pydevd_net_command_factory_json.py", line 309, in make_thread_suspend_single_notification
return NetCommand(CMD_THREAD_SUSPEND_SINGLE_NOTIFICATION, 0, event, is_json=True)
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\_vendored\pydevd\_pydevd_bundle\pydevd_net_command.py", line 57, in __init__
text = json.dumps(as_dict)
File "C:\Python27\lib\json\__init__.py", line 244, in dumps
return _default_encoder.encode(obj)
File "C:\Python27\lib\json\encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Python27\lib\json\encoder.py", line 270, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb9 in position 11: invalid start byte
Traceback (most recent call last):
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\ptvsd_launcher.py", line 43, in <module>
main(ptvsdArgs)
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\__main__.py", line 432, in main
run()
File "c:\Users\goo41\.vscode\extensions\ms-python.python-2019.8.30787\pythonFiles\lib\python\ptvsd\__main__.py", line 316, in run_file
runpy.run_path(target, run_name='__main__')
File "C:\Python27\lib\runpy.py", line 252, in run_path
return _run_module_code(code, init_globals, run_name, path_name)
File "C:\Python27\lib\runpy.py", line 82, in _run_module_code
mod_name, mod_fname, mod_loader, pkg_name)
File "C:\Python27\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "c:\PythonWorkspace\oraclePrc\test1.py", line 18, in <module>
for record in curs:
cx_Oracle.DatabaseError: ORA-19011: Character string buffer too small
When you connect to the database, try using this code instead:
conn = cx_Oracle.connect('id/passwd#ip:port/orcl', encoding="UTF-8", nencoding="UTF-8")
This will ensure that you are using a universal encoding -- which may eliminate the first error, and possibly the second as well. If not, adjust the code sample and error messages noted above.

Error while using cql connect

con = cql.connect(host="127.0.0.1",port=9160,keyspace="my_keyspace",cql_version='3.0.0')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build/bdist.linux-x86_64/egg/cql/connection.py", line 143, in connect
File "build/bdist.linux-x86_64/egg/cql/connection.py", line 63, in __init__
File "build/bdist.linux-x86_64/egg/cql/thrifteries.py", line 177, in set_initial_keyspace
File "build/bdist.linux-x86_64/egg/cql/cursor.py", line 80, in execute
File "build/bdist.linux-x86_64/egg/cql/thrifteries.py", line 77, in get_response
File "build/bdist.linux-x86_64/egg/cql/thrifteries.py", line 98, in handle_cql_execution_errors
cql.apivalues.ProgrammingError: Bad Request: You have not logged in
I am very new to cassandra,please let me know what wrong am i doing here.
Assuming you are using https://pypi.python.org/pypi/cql, if you go to the home page, it says this is deprecated https://code.google.com/archive/a/apache-extras.org/p/cassandra-dbapi2
I would recommend using cassandra-driver
https://datastax.github.io/python-driver
Go to \var\log\cassandra there you must find system.log file. There you should see an error

Attempting to debug terminal applications made with Python+Blessed using ipdb breaks IPython?

I am using the Blessed library to build a simple terminal application.
My application builds upon the following simple example for a dumb editor: https://github.com/jquast/blessed/blob/master/bin/editor.py
Warning: the following steps will break your IPython, and I don't know how to fix it!
For the purposes of this question, I'll just use editor.py. Let's make a couple of changes to allow debugging:
1) import ipdb
2) put in ipdb.set_trace() on line 224
Run editor.py now: python editor.py. The following error should be produced:
Traceback (most recent call last):
File "editor.py", line 14, in <module>
from manager import Manager
File "/home/abcd/python_scripts/editor.py", line 25, in <module>
import ipdb
File "/usr/local/lib/python2.7/dist-packages/ipdb/__init__.py", line 7, in <module>
from ipdb.__main__ import set_trace, post_mortem, pm, run, runcall, runeval, launch_ipdb_on_exception
File "/usr/local/lib/python2.7/dist-packages/ipdb/__main__.py", line 47, in <module>
ipapp.initialize([])
File "<decorator-gen-110>", line 2, in initialize
File "/usr/lib/python2.7/dist-packages/IPython/config/application.py", line 92, in catch_config_error
return method(app, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/IPython/terminal/ipapp.py", line 332, in initialize
self.init_shell()
File "/usr/lib/python2.7/dist-packages/IPython/terminal/ipapp.py", line 348, in init_shell
ipython_dir=self.ipython_dir, user_ns=self.user_ns)
File "/usr/lib/python2.7/dist-packages/IPython/config/configurable.py", line 354, in instance
inst = cls(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/IPython/terminal/interactiveshell.py", line 328, in __init__
**kwargs
File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 483, in __init__
self.init_readline()
File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 1843, in init_readline
self.readline_startup_hook = readline.set_startup_hook
AttributeError: 'module' object has no attribute 'set_startup_hook'
If you suspect this is an IPython bug, please report it at:
https://github.com/ipython/ipython/issues
or send an email to the mailing list at ipython-dev#scipy.org
You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.
Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
c.Application.verbose_crash=True
Now, whenever one runs IPython by executing the ipython command, this error will be produced:
Traceback (most recent call last):
File "/usr/bin/ipython", line 5, in <module>
start_ipython()
File "/usr/lib/python2.7/dist-packages/IPython/__init__.py", line 120, in start_ipython
return launch_new_instance(argv=argv, **kwargs)
File "/usr/lib/python2.7/dist-packages/IPython/config/application.py", line 564, in launch_instance
app.initialize(argv)
File "<decorator-gen-110>", line 2, in initialize
File "/usr/lib/python2.7/dist-packages/IPython/config/application.py", line 92, in catch_config_error
return method(app, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/IPython/terminal/ipapp.py", line 332, in initialize
self.init_shell()
File "/usr/lib/python2.7/dist-packages/IPython/terminal/ipapp.py", line 348, in init_shell
ipython_dir=self.ipython_dir, user_ns=self.user_ns)
File "/usr/lib/python2.7/dist-packages/IPython/config/configurable.py", line 354, in instance
inst = cls(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/IPython/terminal/interactiveshell.py", line 328, in __init__
**kwargs
File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 483, in __init__
self.init_readline()
File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 1843, in init_readline
self.readline_startup_hook = readline.set_startup_hook
AttributeError: 'module' object has no attribute 'set_startup_hook'
If you suspect this is an IPython bug, please report it at:
https://github.com/ipython/ipython/issues
or send an email to the mailing list at ipython-dev#scipy.org
You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.
Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
c.Application.verbose_crash=True
So, IPython seems to be globally broken. I have gotten this issue on both Cygwin and Ubuntu.
What's going wrong?

MemoryError in Django Sphinx!

The whole error is:
Traceback (most recent call last):
File "C:/Documents and Settings/aaa/trysphinx/views.py", line 30, in <module>
print list(results)
File "C:\Python27\lib\site-packages\django_sphinx-2.2.4-py2.7.egg\djangosphinx\models.py", line 243, in __iter__
return iter(self._get_data())
File "C:\Python27\lib\site-packages\django_sphinx-2.2.4-py2.7.egg\djangosphinx\models.py", line 422, in _get_data
self._result_cache = list(self._get_results())
File "C:\Python27\lib\site-packages\django_sphinx-2.2.4-py2.7.egg\djangosphinx\models.py", line 557, in _get_results
results = self._get_sphinx_results()
File "C:\Python27\lib\site-packages\django_sphinx-2.2.4-py2.7.egg\djangosphinx\models.py", line 529, in _get_sphinx_results
results = client.Query(self._query, self._index)
File "C:\Python27\lib\site-packages\django_sphinx-2.2.4-py2.7.egg\djangosphinx\apis\api263\__init__.py", line 388, in Query
response = self._GetResponse(sock, VER_COMMAND_SEARCH)
File "C:\Python27\lib\site-packages\django_sphinx-2.2.4-py2.7.egg\djangosphinx\apis\api263\__init__.py", line 144, in _GetResponse
chunk = sock.recv(left)
MemoryError
Please help me.
You're loading more data into memory than your computer can handle. Perhaps rather than:
print list(results)
…do:
for r in results:
print r