AmbiguousTimeError Celery|Django - django

So i have a django site that is giving me this AmbiguousTimeError. I have a job activates when a product is saved that is given a brief timeout before updating my search index. Looks like an update was made in the Daylight Savings Time hour, and pytz cannot figure out what to do with it.
How can i prevent this from happening the next time the hour shifts for DST?
[2012-11-06 14:22:52,115: ERROR/MainProcess] Unrecoverable error: AmbiguousTimeError(datetime.datetime(2012, 11, 4, 1, 11, 4, 335637),)
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/celery/worker/__init__.py", line 353, in start
component.start()
File "/usr/local/lib/python2.6/dist-packages/celery/worker/consumer.py", line 369, in start
self.consume_messages()
File "/usr/local/lib/python2.6/dist-packages/celery/worker/consumer.py", line 842, in consume_messages
self.connection.drain_events(timeout=10.0)
File "/usr/local/lib/python2.6/dist-packages/kombu/connection.py", line 191, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/kombu/transport/virtual/__init__.py", line 760, in drain_events
self._callbacks[queue](message)
File "/usr/local/lib/python2.6/dist-packages/kombu/transport/virtual/__init__.py", line 465, in _callback
return callback(message)
File "/usr/local/lib/python2.6/dist-packages/kombu/messaging.py", line 485, in _receive_callback
self.receive(decoded, message)
File "/usr/local/lib/python2.6/dist-packages/kombu/messaging.py", line 457, in receive
[callback(body, message) for callback in callbacks]
File "/usr/local/lib/python2.6/dist-packages/celery/worker/consumer.py", line 560, in receive_message
self.strategies[name](message, body, message.ack_log_error)
File "/usr/local/lib/python2.6/dist-packages/celery/worker/strategy.py", line 25, in task_message_handler
delivery_info=message.delivery_info))
File "/usr/local/lib/python2.6/dist-packages/celery/worker/job.py", line 120, in __init__
self.eta = tz_to_local(maybe_iso8601(eta), self.tzlocal, tz)
File "/usr/local/lib/python2.6/dist-packages/celery/utils/timeutils.py", line 52, in to_local
dt = make_aware(dt, orig or self.utc)
File "/usr/local/lib/python2.6/dist-packages/celery/utils/timeutils.py", line 211, in make_aware
return localize(dt, is_dst=None)
File "/usr/local/lib/python2.6/dist-packages/pytz/tzinfo.py", line 349, in localize
raise AmbiguousTimeError(dt)
AmbiguousTimeError: 2012-11-04 01:11:04.335637
EDIT: I fixed it temporarily with this code in celery:
celery/worker/job.py # line 120
try:
self.eta = tz_to_local(maybe_iso8601(eta), self.tzlocal, tz)
except:
self.eta = None
I don't want to have changes in a pip installed app, so i need to fix what i can in my code:
This runs when i save my app:
self.task_cls.apply_async(
args=[action, get_identifier(instance)],
countdown=15
)
I'm assuming that i need to somehow detect if i'm in the ambiguous time and adjust countdown.

I think i'm going to have to clear the tasks to fix this, but how can i prevent this from happening the next time the hour shifts for DST?
It's not clear what you're doing (you haven't shown any code), but basically you need to take account for the way the world works. You can't avoid having ambiguous times when you convert from local time to UTC (or to a different zone's local time) when the time goes back an hour.
Likewise you ought to be aware that there are "gap" or "impossible" times, where a reasonable-sounding local time simply doesn't occur.
I don't know what options Python gives you, but ideally an API should let you resolve ambiguous times however you want - whether that's throwing an error, giving you the earlier occurrence, the later occurrence, or something else.

Apparently, Celery solved this issue:
https://github.com/celery/celery/issues/1061

Related

Elastic search memoryerror on bulk insert

Im inserting 5000 records at once into elastic search
Total Size of these records is: 33936 (I got this using sys.getsizeof())
Elastic Search version: 1.5.0
Python 2.7
Ubuntu
Here is the following error
Traceback (most recent call last):
File "run_indexing.py", line 67, in <module>
index_policy_content(datatable, source, policyids)
File "run_indexing.py", line 60, in index_policy_content
bulk(elasticsearch_instance, actions)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers.py", line 148, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers.py", line 107, in streaming_bulk
resp = client.bulk(bulk_actions, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 70, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 568, in bulk
params=params, body=self._bulk_body(body))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 259, in perform_request
body = body.encode('utf-8')
MemoryError
Please help me resolve the issue.
Thanks & Regards,
Afroze
If I had to guess, I'd say this memory error is happening within python as it loads and serializes its data. Try cutting way back on the batch sizes until you get something that works, and then binary search upward until it fails again. That should help you figure out a safe batch size to use.
(Other useful information you might want to include: amount of memory in the server you're running your python process on, amount of memory for your elasticsearch server node(s), amount of heap allocated to Java.)

python-2.7 bug in inspect()?

I want to parallelize execution of a for loop on the quadcore processor of my computer's CPU. I am using pp (Python-Parallel) - rather than joblib.Parallel for reasons considered here.
But I am getting an error:
Traceback (most recent call last):
File "batching.py", line 60, in cleave_out_bad_data
job1 = job_server.submit(cleave_out, (data_dir,dirlist,), (endswithdat,))
File "/homes/ad6813/.local/lib/python2.7/site-packages/pp.py", line 459, in submit
sfunc = self.__dumpsfunc((func, ) + depfuncs, modules)
File "/homes/ad6813/.local/lib/python2.7/site-packages/pp.py", line 637, in __dumpsfunc
sources = [self.__get_source(func) for func in funcs]
File "/homes/ad6813/.local/lib/python2.7/site-packages/pp.py", line 704, in __get_source
sourcelines = inspect.getsourcelines(func)[0]
File "/usr/lib/python2.7/inspect.py", line 690, in getsourcelines
lines, lnum = findsource(object)
File "/usr/lib/python2.7/inspect.py", line 529, in findsource
raise IOError('source code not available')
IOError: source code not available
It looks like the reason is a python-2.7 bug.
Has anyone come across this and solved it?
Here is my code:
def clean_dir(data_dir, dirlist):
job_server = pp.Server()
job1 = job_server.submit(clean, (data_dir,dirlist,), (endswith,))
def clean(data_dir, dirlist):
[good_or_bad(file, data_dir) for file in dirlist if endswith(file)]
Inspired by here, the way I fixed a similar problem is to save the code and function in one file like "test.py" and call this file with python, rather than input the function line by line in python shell. I works for me.

DataNitro - limit on how many CellRanges you can add?

I'm trying to accumulate CellRanges that meet certain condition so that I can set a property on that entire range in one go
rng=None
for c in Cell(1,1).vertical_range:
if c.value and c.value.endswith(' Total'):
rng=rng+c.horizontal_range if rng is not None else c.horizontal_range
rng.font.bold=True
I'm getting error below for ranges accumulating more 30 areas or so. It's not always the same number of areas that cause the error. So I can't put my finger on a specific limit. I do use a work around whereby I set the desired property on the range after say 20 areas were accumulated and then reset the CellRange but it would be good to be able to accumulate all the areas I need subject to whatever contrains on the number of areas excel has
Traceback (most recent call last):
File "27/scriptStarter.py", line 128, in <module>
File "C:\Users\xxxx\rangebug.py", line 3, in <module>
rng=rng+c.horizontal_range if rng is not None else c.horizontal_range
File "27/basic_io.py", line 546, in __add__
File "27/basic_io.py", line 465, in __init__
File "27/basic_io.py", line 1022, in _cell_name_parser
File "27/basic_io.py", line 1136, in _named_range_parser
File "27/iron.py", line 139, in getNamedRange
File "27/dnparser.py", line 95, in checkForErrors
dntypes.NitroException: Exception from HRESULT: 0x800A03EC
This is a bug in our software - we'll look into it and release a fix.
In the meantime, you could do this:
for c in Cell(1,1).vertical_range:
if c.value and c.value.endswith(' Total'):
c.horizontal_range.font.bold=True

django celerybeat schedule month_of_year IndexError Index out of range

I cannot seem to be able to set month_of_year for a django celerybeat schedule. It keeps throwing an index out of range error.
Here is my schedule:
# Annual task to permanently delete all transactions that are older
# than 2 years old.
'annual-transaction-deletion': {
'task': 'project.tasks.annual_transactions_deletion',
'schedule': crontab(hour='2', minute=0, day_of_month=1, month_of_year=1)
}
I have tried setting the above month_of_year also as month_of_year=[1] and month_of_year='1'
The following stack trace is printed in the celerybeat log:
[2012-12-19 09:50:06,403: CRITICAL/MainProcess] celerybeat raised exception <type 'exceptions.IndexError'>: IndexError('list index out of range',)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/apps/beat.py", line 100, in start_scheduler
beat.start()
File "/usr/local/lib/python2.7/dist-packages/celery/beat.py", line 422, in start
interval = self.scheduler.tick()
File "/usr/local/lib/python2.7/dist-packages/celery/beat.py", line 194, in tick
next_time_to_run = self.maybe_due(entry, self.publisher)
File "/usr/local/lib/python2.7/dist-packages/celery/beat.py", line 172, in maybe_due
is_due, next_time_to_run = entry.is_due()
File "/usr/local/lib/python2.7/dist-packages/djcelery/schedulers.py", line 65, in is_due
return self.schedule.is_due(self.last_run_at)
File "/usr/local/lib/python2.7/dist-packages/celery/schedules.py", line 502, in is_due
rem_delta = self.remaining_estimate(last_run_at)
File "/usr/local/lib/python2.7/dist-packages/celery/schedules.py", line 489, in remaining_estimate
next_hour, next_minute)
File "/usr/local/lib/python2.7/dist-packages/celery/schedules.py", line 389, in _delta_to_next
roll_over()
File "/usr/local/lib/python2.7/dist-packages/celery/schedules.py", line 372, in roll_over
months_of_year[datedata.moy],
IndexError: list index out of range
I have many other schedules set up that work but do not include month_of_year. The above schedule only needs to run once a year. I cannot help but feel this is a bug in the celery lib but I'm keen for someone to prove me wrong. Obviously if it is a bug I do not wish to resort to modifying the library files to fix it. Any help is much appreciated.

OperationFailure: database error when threading in MongoEngine/PyMongo

I have a function that will read data from a website, process it, and then load it into MongoDB. When I run this without threading it works fine but as soon as I set up celery tasks that just call this one function I frequently get the following error: "OperationFailure: database error: unauthorized db:dbname lock type:-1"
It's somewhat odd because if I run the non-celery version on multiple terminals, I do not get this error at all.
I suspect it has something to do with there not being an open connection to Mongo although in my code I'm opening one up right before every Mongo call.
The exact exception is below:
Task twitter[a974bfcc-d6ca-4baf-b36f-cae9143ce2d9] raised exception: OperationFailure(u'database error: unauthorized db:data lock type:-1 client:68.193.49.9',)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/execute/trace.py", line 36, in trace
return cls(states.SUCCESS, retval=fun(*args, **kwargs))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/app/task/__init__.py", line 232, in __call__
return self.run(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/celery/app/__init__.py", line 172, in run
return fun(*args, **kwargs)
File "/djangoblog/network/tasks.py", line 40, in twitter
n_twitter.GetTweetsTwitter(user)
File "/djangoblog/network/twitter.py", line 255, in GetTweetsTwitter
id = SaveTweet(user, network, tweet)
File "/djangoblog/network/twitter.py", line 150, in SaveTweet
if mmo.Moment.objects(user=user.id,source_id=id,network=network.id).count() == 0:
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 933, in count
return self._cursor.count(with_limit_and_skip=True)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 563, in _cursor
self._cursor_obj = self._collection.find(self._query,
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/mongoengine/queryset.py", line 493, in _collection
if self._collection_obj.name not in db.collection_names():
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/database.py", line 361, in collection_names
names = [r["name"] for r in results]
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 703, in next
if len(self.__data) or self._refresh():
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 666, in _refresh
self.__uuid_subtype))
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/cursor.py", line 628, in __send_message self.__tz_aware)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pymongo/helpers.py", line 101, in _unpack_response error_object["$err"])
OperationFailure: database error: unauthorized db:data lock type:-1 client:68.193.49.9
Sorry for the formatting but if you look at the line that starts with mmo.Moment there's a connection being opened right before that's called.
Doing a bit of research it looks as if it has something to do with the way threading is handled in PyMongo - http://api.mongodb.org/python/1.5.1/faq.html#how-does-connection-pooling-work-in-pymongo - I may need to start closing the connections but I'd expect MongoEngine to be doing this..
This is likely due to the fact that you are not calling db.authenticate() when you start the new connection and are using auth on MongoDB.
Regarding the closing of threads, I would recommend making sure you are using connection pooling and letting the driver manage the pools (calling close() or similar manually can lead to a lot of pain).
For more info see the note in the pymongo documentation about using authenticate() in a multi-threaded environment.