How to recover from Django Channels connection limit? - django

I'm running a server on heroku where I have intermittent channel layer communications between instances of asyncConsumers. Running heroku locally, I have no problem with redis channel connection since I can have as many of them as I want, but once I upload my server to heroku, it gives me a 20 redis connection limit. If I use more than that my server errors out, so I try to lower the expiry time so inactive redis connections would close. BUT, if a redis connection expires and I try to use the channel name with self.channel_layer.send(), I'll get the error below, how do I recover from this error without having to have external calls to create another asyncConsumer instance?
ERROR Exception inside application: Reader at end of file
File "/app/.heroku/python/lib/python3.6/site-packages/channels/consumer.py", line 59, in __call__
[receive, self.channel_receive], self.dispatch
File "/app/.heroku/python/lib/python3.6/site-packages/channels/utils.py", line 51, in await_many_dispatch
await dispatch(result)
File "/app/.heroku/python/lib/python3.6/site-packages/channels/consumer.py", line 73, in dispatch
await handler(message)
File "./myapp/webhook.py", line 133, in http_request
await self.channel_layer.send(userDB.backEndChannelName,{"type": "device.query"})
File "/app/.heroku/python/lib/python3.6/site-packages/channels_redis/core.py", line 296, in send
if await connection.llen(channel_key) >= self.get_capacity(channel):
File "/app/.heroku/python/lib/python3.6/site-packages/aioredis/commands/list.py", line 70, in llen
return self.execute(b'LLEN', key)
File "/app/.heroku/python/lib/python3.6/site-packages/aioredis/commands/__init__.py", line 51, in execute
return self._pool_or_conn.execute(command, *args, **kwargs)
File "/app/.heroku/python/lib/python3.6/site-packages/aioredis/connection.py", line 322, in execute
raise ConnectionClosedError(msg)
Reader at end of file

Related

Glue Boto Client -- NoCredentialsError

I've been running my Glue Jobs on a schedule for a few months. Last night my Glue Job failed due to botocore.exceptions.NoCredentialsError: Unable to locate credentials after calling bucket.objects.filter(Prefix=productionDirectory):
I am under the impression this is a result of not having defined a credentials file, but AWS Glue has always pulled credentials without issue. I just re-ran my job and everything worked perfectly. For reference, I define my Glue Client via: glue = boto3.client('glue'). Has anyone ever experienced this before? Is this just an edge-case?
Full Logs:
Traceback (most recent call last):
File "/tmp/data-deployment", line 67, in <module>
for obj in bucket.objects.filter(Prefix=productionDirectory):
File "/home/spark/.local/lib/python3.7/site-packages/boto3/resources/collection.py", line 83, in __iter__
for page in self.pages():
File "/home/spark/.local/lib/python3.7/site-packages/boto3/resources/collection.py", line 166, in pages
for page in pages:
File "/home/spark/.local/lib/python3.7/site-packages/botocore/paginate.py", line 255, in __iter__
response = self._make_request(current_kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/paginate.py", line 332, in _make_request
return self._method(**current_kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 316, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 613, in _make_api_call
operation_model, request_dict, request_context)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/client.py", line 632, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 132, in _send_request
request = self.create_request(request_dict, operation_model)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/endpoint.py", line 116, in create_request
operation_name=operation_model.name)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/signers.py", line 90, in handler
return self.sign(operation_name, request)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/signers.py", line 160, in sign
auth.add_auth(request)
File "/home/spark/.local/lib/python3.7/site-packages/botocore/auth.py", line 357, in add_auth
raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Edit/Update: This is a known bug. I've posted the mitigation strategy provided from AWS as an answer below.
Update: I reached out to AWS via Support and they responded. Apparently this is a known bug and issue. While they do not have a solution or ETA for solution, they do have a way to mitigate the issue. Information below:
Thank you for reporting your issue to us and product team is aware of this intermittent issue.
They are working on resolution however, I do not have an ETA.
To mitigate this issue, increase the timeout / attempts to meta service request in your code:
####START######
import os
####Increase meta service timeout and attempt########
os.environ['AWS_METADATA_SERVICE_NUM_ATTEMPTS'] ="5"
os.environ['AWS_METADATA_SERVICE_TIMEOUT'] ="30"
#####################END#################
I faced a similar issue with Glue, but not exactly the same.
We used external tables with SparkSQL and S3, and sometimes an Exception was raised out of nowhere, i.e. Table not found. The issue was never reproduced on testing and had least frequency. Since our jobs ran perfectly fine on retries, we enabled the retry mechanism to solve it.
It has something to do with the internal workings of Glue and its serverless environment.

Zappa/AWS - Emails won't send and just timeout

Currently, I have tried both plain-old Django SMTP and a few different api-based Django libraries for my transactional email provider (Postmark).
When I run my development server, everything works perfectly. Emails send via the Postmark API with no problem.
When I deploy to AWS with Zappa, visit my website, and do a task that is supposed to send an email (Ex. Resetting a user's password) the page continually loads until it says Endpoint request timed out.
I have tried setting the timeout of my AWS Lambda function to a longer duration in case Django decides to throw an error.
Here is the error that was thrown. Just keep in mind this error only happens in production. I created a custom management command in able to retrieve this error.
HTTPSConnectionPool(host='api.postmarkapp.com', port=443): Max retries exceeded with url: /email/batch (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f6cfbd5dd30>: Failed to establish a new connection: [Errno 110] Connection timed out',)): ConnectionError
Traceback (most recent call last):
File "/var/task/handler.py", line 509, in lambda_handler
return LambdaHandler.lambda_handler(event, context)
File "/var/task/handler.py", line 240, in lambda_handler
return handler.handler(event, context)
File "/var/task/handler.py", line 376, in handler
management.call_command(*event['manage'].split(' '))
File "/var/task/django/core/management/__init__.py", line 131, in call_command
return command.execute(*args, **defaults)
File "/var/task/django/core/management/base.py", line 330, in execute
output = self.handle(*args, **options)
File "/var/task/users/management/commands/sendemail.py", line 13, in handle
fail_silently=False,
File "/var/task/django/core/mail/__init__.py", line 62, in send_mail
return mail.send()
File "/var/task/django/core/mail/message.py", line 348, in send
return self.get_connection(fail_silently).send_messages([self])
File "/var/task/postmarker/django/backend.py", line 66, in send_messages
responses = self.client.emails.send_batch(*prepared_messages, TrackOpens=self.get_option('TRACK_OPENS'))
File "/var/task/postmarker/models/emails.py", line 332, in send_batch
return self.EmailBatch(*emails).send(**extra)
File "/var/task/postmarker/models/emails.py", line 247, in send
responses = [self._manager._send_batch(*batch) for batch in chunks(emails, self.MAX_SIZE)]
File "/var/task/postmarker/models/emails.py", line 247, in <listcomp>
responses = [self._manager._send_batch(*batch) for batch in chunks(emails, self.MAX_SIZE)]
File "/var/task/postmarker/models/emails.py", line 276, in _send_batch
return self.call('POST', '/email/batch', data=emails)
File "/var/task/postmarker/models/base.py", line 72, in call
return self.client.call(*args, **kwargs)
File "/var/task/postmarker/core.py", line 106, in call
**kwargs
File "/var/task/postmarker/core.py", line 129, in _call
method, url, json=data, params=kwargs, headers=default_headers, timeout=self.timeout
File "/var/task/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/var/task/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/var/task/requests/adapters.py", line 508, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.postmarkapp.com', port=443): Max retries exceeded with url: /email/batch (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f6cfbd5dd30>: Failed to establish a new connection: [Errno 110] Connection timed out',))
I have allowed all incoming and outgoing traffic to my AWS security group in an attempt to fix this. Still to no avail.
Any help would be greatly, greatly appreciated. Cheers.
The explanation is simple: A Lambda instance running in a VPC cannot access the internet:
When you add VPC configuration to a Lambda function, it can only access resources in that VPC. If a Lambda function needs to access both VPC resources and the public Internet, the VPC needs to have a Network Address Translation (NAT) instance inside the VPC.
The solution is also simple, if annoying: run a NAT Instance or NAT Gateway in the VPC. (An alternate solution is to take your Lambda out of the VPC, but that is a much bigger change.)
I am running Django / Zappa in Lambda with a NAT instance for connecting to Amazon Simple Email Service and it works fine.

SSL SYSCALL error: Bad file descriptor on Heroku with postgres and Celery

I've been using Celery successfully with a Django site on Heroku but it's just started generating the error below, which stops it running. It looks like it's having trouble with postgres, but I'm stumped as to how to fix it, given it's Celery rather than my code that's having the problem (I assume...).
I'm using CloudAMPQ as a broker, and my Django settings include:
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
Here's the traceback from the Heroku logs:
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.5/site-packages/kombu/utils/__init__.py", line 323, in __get__
return obj.__dict__[self.__name__]
KeyError: 'scheduler'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
psycopg2.OperationalError: SSL SYSCALL error: Bad file descriptor
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.5/site-packages/billiard/process.py", line 292, in _bootstrap
self.run()
File "/app/.heroku/python/lib/python3.5/site-packages/celery/beat.py", line 553, in run
self.service.start(embedded_process=True)
File "/app/.heroku/python/lib/python3.5/site-packages/celery/beat.py", line 470, in start
humanize_seconds(self.scheduler.max_interval))
File "/app/.heroku/python/lib/python3.5/site-packages/kombu/utils/__init__.py", line 325, in __get__
value = obj.__dict__[self.__name__] = self.__get(obj)
File "/app/.heroku/python/lib/python3.5/site-packages/celery/beat.py", line 512, in scheduler
return self.get_scheduler()
File "/app/.heroku/python/lib/python3.5/site-packages/celery/beat.py", line 507, in get_scheduler
lazy=lazy)
File "/app/.heroku/python/lib/python3.5/site-packages/celery/utils/imports.py", line 53, in instantiate
return symbol_by_name(name)(*args, **kwargs)
File "/app/.heroku/python/lib/python3.5/site-packages/djcelery/schedulers.py", line 151, in __init__
Scheduler.__init__(self, *args, **kwargs)
File "/app/.heroku/python/lib/python3.5/site-packages/celery/beat.py", line 185, in __init__
self.setup_schedule()
File "/app/.heroku/python/lib/python3.5/site-packages/djcelery/schedulers.py", line 158, in setup_schedule
self.install_default_entries(self.schedule)
File "/app/.heroku/python/lib/python3.5/site-packages/djcelery/schedulers.py", line 251, in schedule
self._schedule = self.all_as_schedule()
File "/app/.heroku/python/lib/python3.5/site-packages/djcelery/schedulers.py", line 164, in all_as_schedule
for model in self.Model.objects.enabled():
File "/app/.heroku/python/lib/python3.5/site-packages/django/db/models/query.py", line 258, in __iter__
self._fetch_all()
File "/app/.heroku/python/lib/python3.5/site-packages/django/db/models/query.py", line 1074, in _fetch_all
self._result_cache = list(self.iterator())
File "/app/.heroku/python/lib/python3.5/site-packages/django/db/models/query.py", line 52, in __iter__
results = compiler.execute_sql()
File "/app/.heroku/python/lib/python3.5/site-packages/django/db/models/sql/compiler.py", line 848, in execute_sql
cursor.execute(sql, params)
File "/app/.heroku/python/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
File "/app/.heroku/python/lib/python3.5/site-packages/django/db/utils.py", line 95, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/app/.heroku/python/lib/python3.5/site-packages/django/utils/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/app/.heroku/python/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
django.db.utils.OperationalError: SSL SYSCALL error: Bad file descriptor
I've resolved the issue now... there was a line of my Django code which had caused an Internal Server Error in the past -- I think, early on in Django starting up, it was trying to access a model before the migrations that created the model had run.
I'd resolved that but noticed these "SSL SYSCALL error"s started about the same time. So I removed that line of code, and Celery has started up again.
It could be coincidence. And I don't understand why this fixed things.
Ideally I'd still like to understand what the error above actually means so I'd have a better chance of fixing such a thing in the future.
I had the same error and i solved.
In my case the problem was a custom AppConfig in one of my django apps.
my folder
my_django_app/
__init__.py
apps.py
...
I have a file my_django_app/__init__.py like this
default_app_config="my_django_app.apps.MyAppConfig"
and the file my_django_app/apps.py like this
class ChallengeConfig(AppConfig):
name = 'appConfig'
verbose_name = "djangoAppConfig"
def ready(self):
... # custom logic
After removeing the line default_app_config="my_django_app.apps.MyAppConfig" celery works again
If you have something like this, remove your custom AppConfig and use a celery periodic task instead of a custom AppConfig. django_celery_beat do something strange with AppConfig so if you have a custom AppConfig it generate the problem
My python requirements:
Django==1.11.29
celery==4.4.7
django_celery_beat==1.6.0
I know this was asked years ago, but this was about the only question out there which related to my problem.
In my case, it was down to the CONN_MAX_AGE being set at 600. Reduced to 0 so that there are unlimited persistent connections.
From the docs:
Persistent connections avoid the overhead of re-establishing a connection to the database in each request. They’re controlled by the CONN_MAX_AGE parameter which defines the maximum lifetime of a connection. It can be set independently for each database.
The default value is 0, preserving the historical behavior of closing the database connection at the end of each request. To enable persistent connections, set CONN_MAX_AGE to a positive integer of seconds. For unlimited persistent connections, set it to None.
Django opens a connection to the database when it first makes a database query. It keeps this connection open and reuses it in subsequent requests. Django closes the connection once it exceeds the maximum age defined by CONN_MAX_AGE or when it isn’t usable any longer.
In detail, Django automatically opens a connection to the database whenever it needs one and doesn’t have one already — either because this is the first connection, or because the previous connection was closed.
At the beginning of each request, Django closes the connection if it has reached its maximum age. If your database terminates idle connections after some time, you should set CONN_MAX_AGE to a lower value, so that Django doesn’t attempt to use a connection that has been terminated by the database server. (This problem may only affect very low traffic sites.)
At the end of each request, Django closes the connection if it has reached its maximum age or if it is in an unrecoverable error state. If any database errors have occurred while processing the requests, Django checks whether the connection still works, and closes it if it doesn’t. Thus, database errors affect at most one request; if the connection becomes unusable, the next request gets a fresh connection.

GAE and Highwinds FTP Server

I'm running a GAE python application with 3 modules, one module run every half an hour and is in charge of fetching a file form an FTP server, processing it and saving it to CloudStorage.
Recently the communication between my application and the FTP server start failing more often, increasingly until it became unusable.
The FTP server is Highwinds, they did some changes recently, asked me some questions like the ping and traceroute from the instance running the code to their server, but I'm not able to provide that given the GAE restrictions.
I was able to overcome some errors by just repeating the same operation multiple times.
I'm including some error traces at the end of this message.
I would appreciate some help troubleshooting this issue.
files = ftp_service.nlst()
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/ftplib.py", line 509, in nlst
self.retrlines(cmd, files.append)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/ftplib.py", line 432, in retrlines
conn = self.transfercmd(cmd)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/ftplib.py", line 371, in transfercmd
return self.ntransfercmd(cmd, rest)[0]
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/ftplib.py", line 330, in ntransfercmd
conn = socket.create_connection((host, port), self.timeout)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/socket.py", line 569, in create_connection
raise err
error: [Errno 110] Connection timed out
ftp_service.retrbinary("RETR %s" % file_path, callback=handle_binary)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/ftplib.py", line 409, in retrbinary
conn = self.transfercmd(cmd, rest)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/ftplib.py", line 371, in transfercmd
return self.ntransfercmd(cmd, rest)[0]
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/ftplib.py", line 334, in ntransfercmd
resp = self.sendcmd(cmd)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/ftplib.py", line 244, in sendcmd
return self.getresp()
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/ftplib.py", line 217, in getresp
raise error_temp, resp
error_temp: 425 Rejected data connection from foreign address 74.125.183.23:53274.

app_identity_service.GetAccessToken() required more quota than is available

I am using app engine and big query as the backend for my website. Whenever the user does some click, i log them into bigquery to do analytics later in the day. I get close to 75k clicks a day. It was working fine till last week. This is the code i use.
body = {"rows":[bodyFields]}
credentials = appengine.AppAssertionCredentials(scope=BIGQUERY_SCOPE)
http = credentials.authorize(httplib2.Http())
bigquery = discovery.build('bigquery', 'v2', http=http)
response = bigquery.tabledata().insertAll(
projectId=PROJECT_ID,
datasetId=BIGQUERY_DATASETID,
tableId=BIGQUERY_TABLEID,
body=body).execute()
Now all of a sudden i am getting over quota exception. My application is a paid app engine instance. Below is the stack-trace of my exception
Attempting refresh to obtain initial access_token
The API call app_identity_service.GetAccessToken() required more quota than is available.
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1535, in __call__
rv = self.handle_exception(request, response, e)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1529, in __call__
rv = self.router.dispatch(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/filename.py", line 1611, in post
bigquery = discovery.build('bigquery', 'v2', http=http)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/oauth2client/util.py", line 129, in positional_wrapper
return wrapped(*args, **kwargs)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/apiclient/discovery.py", line 198, in build
resp, content = http.request(requested_url)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/oauth2client/util.py", line 129, in positional_wrapper
return wrapped(*args, **kwargs)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/oauth2client/client.py", line 516, in new_request
self._refresh(request_orig)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/oauth2client/appengine.py", line 194, in _refresh
scopes, service_account_id=self.service_account_id)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/app_identity/app_identity.py", line 589, in get_access_token
scopes, service_account_id=service_account_id)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/app_identity/app_identity.py", line 547, in get_access_token_uncached
return rpc.get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
return self.__get_result_hook(self)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/app_identity/app_identity.py", line 519, in get_access_token_result
rpc.check_success()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 579, in check_success
self.__rpc.CheckSuccess()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 134, in CheckSuccess
raise self.exception
OverQuotaError: The API call app_identity_service.GetAccessToken() required more quota than is available.
My traffic hasn't gone up by much also the number of time the handler is hit is almost the same as past 2 months data. So why am i getting this error.
In order to determine why you're hitting the quota error, you'll need to share more detail about your usage. The quota should reset every 24 hours. Do you know how long it takes for the error to appear and how much traffic you've successfully served to that point in time?
You mentioned that you "do analytics later in the day", which suggests that you might be using the TaskQueue API or Deferred Tasks. It's possible that those tasks are failing for other reasons and retrying, which could quickly eat up your quota. If you are using TaskQueues, you might try tuning the queue configuration and retry options.
Another way you might be able to conserve your quota would be to save the bigquery discovery service that you're building to something like the Memcache API, so that it can be reused for multiple requests to the BigQuery service.