How to fix 'MemoryError' while fitting keras model with fit_generator? - python-2.7

I'm training a unet on a google cloud tensorflow vm instance. When I run fit_generator, I get a MemoryError.
When I run the same code locally on tensorflow (cpu version), this does not happen. I have tried increasing the RAM to 13GB on the VM instance (larger than my local machine).
model = unet()
model_checkpoint = ModelCheckpoint('unet_membrane.hdf5', monitor='loss',verbose=1, save_best_only=True)
model.fit_generator(myGene,steps_per_epoch=300,epochs=1,callbacks=[model_checkpoint])
I expect the model to train, but instead I get a MemoryError with the following Traceback
Epoch 1/1
Found 30 images belonging to 1 classes.
Found 30 images belonging to 1 classes.
Traceback (most recent call last):
File "main.py", line 18, in <module>
model.fit_generator(myGene,steps_per_epoch=300,epochs=1,callbacks=[model_checkpoint])
File "/usr/local/lib/python2.7/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training_generator.py", line 181, in fit_generator
generator_output = next(output_generator)
File "/usr/local/lib/python2.7/dist-packages/keras/utils/data_utils.py", line 709, in get
six.reraise(*sys.exc_info())
File "/usr/local/lib/python2.7/dist-packages/keras/utils/data_utils.py", line 685, in get
inputs = self.queue.get(block=True).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
MemoryError

it seems like your machine has run out of memory and can't train while storing all the arrays at the same time. Try to optimize your code to maybe save arrays of data, then loading them when needed so you don't have to store them in your RAM.

Related

ValueError: Target size (torch.Size([64, 3])) must be the same as input size (torch.Size([32, 3]))

Reference github: fast-bert
I used to run the following notebook to predict the multi label classification using Bert model that means I don't need GPU driver instead I can use CPU memory,
here is the reference for jupyter notebook multilabel
It's not the memory issue, How to resolve this error?
While increasing CPU with RAM size, target size also getting increasing,
I choose n1-standard-4 (6 vCPUs, 26 GB memory) machine type.
Sample code:
I have removed this peace of code I use 'cpu' instead of 'cuda'
device = torch.device('cuda')
if torch.cuda.device_count() > 1:
args.multi_gpu = True
else:
args.multi_gpu = False
to
torch.device('cpu')
Error Logs:
Traceback (most recent call last):-------------------------------------------------------------| 0.00% [0/63 00:00<00:00]
File "bert/run.py", line 146, in <module>
learner.fit(args.num_train_epochs, args.learning_rate, validate=True)
File "/home/pt4_gcp/.local/lib/python3.7/site-packages/fast_bert/learner_cls.py", line 397, in fit
outputs = self.model(**inputs)
File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/pt4_gcp/.local/lib/python3.7/site-packages/fast_bert/modeling.py", line 205, in forward
logits.view(-1, self.num_labels), labels.view(-1, self.num_labels)
File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 617, in forward
reduction=self.reduction)
File "/home/pt4_gcp/.local/lib/python3.7/site-packages/torch/nn/functional.py", line 2433, in binary_cross_entropy_with_logits
raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
ValueError: Target size (torch.Size([64, 3])) must be the same as input size (torch.Size([32, 3]))
This looks like you have a different number of instances for input [32] and target [64]. Please make sure that you have same number of target variables as the number of inputs.

NotSupportedError when trying to build primary index in N1QL in Couchbase Python SDK

I'm trying to get into the new N1QL Queries for Couchbase in Python.
I got my database set up in Couchbase 4.0.0.
My initial try was to retreive all documents like this:
from couchbase.bucket import Bucket
bucket = Bucket('couchbase://localhost/dafault')
rv = bucket.n1ql_query('CREATE PRIMARY INDEX ON default').execute()
for row in bucket.n1ql_query('SELECT * FROM default'):
print row
But this produces a OperationNotSupportedError:
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 2357, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1777, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Users/my_user/python_tests/test_n1ql.py", line 9, in <module>
rv = bucket.n1ql_query('CREATE PRIMARY INDEX ON default').execute()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/couchbase/n1ql.py", line 215, in execute
for _ in self:
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/couchbase/n1ql.py", line 235, in __iter__
self._start()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/couchbase/n1ql.py", line 180, in _start
self._mres = self._parent._n1ql_query(self._params.encoded)
couchbase.exceptions.NotSupportedError: <RC=0x13[Operation not supported], Couldn't schedule n1ql query, C Source=(src/n1ql.c,82)>
Here the version numbers of everything I use:
Couchbase Server: 4.0.0
couchbase python library: 2.0.2
cbc: 2.5.1
python: 2.7.8
gcc: 4.2.1
Anyone an idea what might have went wrong here? I could not find any solution to this problem up to now.
There was another ticket for node.js where the same issue happened. There was a proposal to enable n1ql for the specific bucket first. Is this also needed in python?
It would seem you didn't configure any cluster nodes with the Query or Index services. As such, the error returned is one that indicates no nodes are available.
I also got similar error while trying to create primary index.
Create a primary index...
Traceback (most recent call last):
File "post-upgrade-test.py", line 45, in <module>
mgr.n1ql_index_create_primary(ignore_exists=True)
File "/usr/local/lib/python2.7/dist-packages/couchbase/bucketmanager.py", line 428, in n1ql_index_create_primary
'', defer=defer, primary=True, ignore_exists=ignore_exists)
File "/usr/local/lib/python2.7/dist-packages/couchbase/bucketmanager.py", line 412, in n1ql_index_create
return IxmgmtRequest(self._cb, 'create', info, **options).execute()
File "/usr/local/lib/python2.7/dist-packages/couchbase/_ixmgmt.py", line 160, in execute
return [x for x in self]
File "/usr/local/lib/python2.7/dist-packages/couchbase/_ixmgmt.py", line 144, in __iter__
self._start()
File "/usr/local/lib/python2.7/dist-packages/couchbase/_ixmgmt.py", line 132, in _start
self._cmd, index_to_rawjson(self._index), **self._options)
couchbase.exceptions.NotSupportedError: <RC=0x13[Operation not supported], Couldn't schedule ixmgmt operation, C Source=(src/ixmgmt.c,98)>
Adding query and index node to the cluster solved the issue.

Elastic search memoryerror on bulk insert

Im inserting 5000 records at once into elastic search
Total Size of these records is: 33936 (I got this using sys.getsizeof())
Elastic Search version: 1.5.0
Python 2.7
Ubuntu
Here is the following error
Traceback (most recent call last):
File "run_indexing.py", line 67, in <module>
index_policy_content(datatable, source, policyids)
File "run_indexing.py", line 60, in index_policy_content
bulk(elasticsearch_instance, actions)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers.py", line 148, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers.py", line 107, in streaming_bulk
resp = client.bulk(bulk_actions, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 70, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 568, in bulk
params=params, body=self._bulk_body(body))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 259, in perform_request
body = body.encode('utf-8')
MemoryError
Please help me resolve the issue.
Thanks & Regards,
Afroze
If I had to guess, I'd say this memory error is happening within python as it loads and serializes its data. Try cutting way back on the batch sizes until you get something that works, and then binary search upward until it fails again. That should help you figure out a safe batch size to use.
(Other useful information you might want to include: amount of memory in the server you're running your python process on, amount of memory for your elasticsearch server node(s), amount of heap allocated to Java.)

python-2.7 bug in inspect()?

I want to parallelize execution of a for loop on the quadcore processor of my computer's CPU. I am using pp (Python-Parallel) - rather than joblib.Parallel for reasons considered here.
But I am getting an error:
Traceback (most recent call last):
File "batching.py", line 60, in cleave_out_bad_data
job1 = job_server.submit(cleave_out, (data_dir,dirlist,), (endswithdat,))
File "/homes/ad6813/.local/lib/python2.7/site-packages/pp.py", line 459, in submit
sfunc = self.__dumpsfunc((func, ) + depfuncs, modules)
File "/homes/ad6813/.local/lib/python2.7/site-packages/pp.py", line 637, in __dumpsfunc
sources = [self.__get_source(func) for func in funcs]
File "/homes/ad6813/.local/lib/python2.7/site-packages/pp.py", line 704, in __get_source
sourcelines = inspect.getsourcelines(func)[0]
File "/usr/lib/python2.7/inspect.py", line 690, in getsourcelines
lines, lnum = findsource(object)
File "/usr/lib/python2.7/inspect.py", line 529, in findsource
raise IOError('source code not available')
IOError: source code not available
It looks like the reason is a python-2.7 bug.
Has anyone come across this and solved it?
Here is my code:
def clean_dir(data_dir, dirlist):
job_server = pp.Server()
job1 = job_server.submit(clean, (data_dir,dirlist,), (endswith,))
def clean(data_dir, dirlist):
[good_or_bad(file, data_dir) for file in dirlist if endswith(file)]
Inspired by here, the way I fixed a similar problem is to save the code and function in one file like "test.py" and call this file with python, rather than input the function line by line in python shell. I works for me.

AmbiguousTimeError Celery|Django

So i have a django site that is giving me this AmbiguousTimeError. I have a job activates when a product is saved that is given a brief timeout before updating my search index. Looks like an update was made in the Daylight Savings Time hour, and pytz cannot figure out what to do with it.
How can i prevent this from happening the next time the hour shifts for DST?
[2012-11-06 14:22:52,115: ERROR/MainProcess] Unrecoverable error: AmbiguousTimeError(datetime.datetime(2012, 11, 4, 1, 11, 4, 335637),)
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/celery/worker/__init__.py", line 353, in start
component.start()
File "/usr/local/lib/python2.6/dist-packages/celery/worker/consumer.py", line 369, in start
self.consume_messages()
File "/usr/local/lib/python2.6/dist-packages/celery/worker/consumer.py", line 842, in consume_messages
self.connection.drain_events(timeout=10.0)
File "/usr/local/lib/python2.6/dist-packages/kombu/connection.py", line 191, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/kombu/transport/virtual/__init__.py", line 760, in drain_events
self._callbacks[queue](message)
File "/usr/local/lib/python2.6/dist-packages/kombu/transport/virtual/__init__.py", line 465, in _callback
return callback(message)
File "/usr/local/lib/python2.6/dist-packages/kombu/messaging.py", line 485, in _receive_callback
self.receive(decoded, message)
File "/usr/local/lib/python2.6/dist-packages/kombu/messaging.py", line 457, in receive
[callback(body, message) for callback in callbacks]
File "/usr/local/lib/python2.6/dist-packages/celery/worker/consumer.py", line 560, in receive_message
self.strategies[name](message, body, message.ack_log_error)
File "/usr/local/lib/python2.6/dist-packages/celery/worker/strategy.py", line 25, in task_message_handler
delivery_info=message.delivery_info))
File "/usr/local/lib/python2.6/dist-packages/celery/worker/job.py", line 120, in __init__
self.eta = tz_to_local(maybe_iso8601(eta), self.tzlocal, tz)
File "/usr/local/lib/python2.6/dist-packages/celery/utils/timeutils.py", line 52, in to_local
dt = make_aware(dt, orig or self.utc)
File "/usr/local/lib/python2.6/dist-packages/celery/utils/timeutils.py", line 211, in make_aware
return localize(dt, is_dst=None)
File "/usr/local/lib/python2.6/dist-packages/pytz/tzinfo.py", line 349, in localize
raise AmbiguousTimeError(dt)
AmbiguousTimeError: 2012-11-04 01:11:04.335637
EDIT: I fixed it temporarily with this code in celery:
celery/worker/job.py # line 120
try:
self.eta = tz_to_local(maybe_iso8601(eta), self.tzlocal, tz)
except:
self.eta = None
I don't want to have changes in a pip installed app, so i need to fix what i can in my code:
This runs when i save my app:
self.task_cls.apply_async(
args=[action, get_identifier(instance)],
countdown=15
)
I'm assuming that i need to somehow detect if i'm in the ambiguous time and adjust countdown.
I think i'm going to have to clear the tasks to fix this, but how can i prevent this from happening the next time the hour shifts for DST?
It's not clear what you're doing (you haven't shown any code), but basically you need to take account for the way the world works. You can't avoid having ambiguous times when you convert from local time to UTC (or to a different zone's local time) when the time goes back an hour.
Likewise you ought to be aware that there are "gap" or "impossible" times, where a reasonable-sounding local time simply doesn't occur.
I don't know what options Python gives you, but ideally an API should let you resolve ambiguous times however you want - whether that's throwing an error, giving you the earlier occurrence, the later occurrence, or something else.
Apparently, Celery solved this issue:
https://github.com/celery/celery/issues/1061