Error "No worker is available to serve request: model" when calling SageMaker endpoint during increaesd load - flask

I have a custom container that takes a request, does some feature extraction and then passes on the enhanced request to a classifier endpoint. During feature extraction another endpoint is being called for generating text embeddings. I am using the HuggingFace estimator for my embedding model.
It has been working fine, but there was an increase in requests and looks like the embedding endpoint timed out somehow.
I am looking at adding automatic scaling to the endpoint, but I want to make sure I understand what is happening and that it properly addresses the issue. Unfortunately searching for this error message does not pull up much. The instance metrics is not showing the endpoint to be overloaded - cpu utilization was max ~30%. Would auto scaling address the no worker issue or is this something different? I was receiving a few hundred requests per minute at the time.
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 2073, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1518, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/opt/program/predictor.py", line 56, in transformation
result = preprocessor.transform(data)
File "/opt/program/preprocessor.py", line 189, in transform
response = embed_predictor.predict(data=json.dumps(payload))
File "/usr/local/lib/python3.7/site-packages/sagemaker/predictor.py", line 136, in predict
response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 386, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 705, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (503) from primary with message "{
"code": 503,
"type": "ServiceUnavailableException",
"message": "No worker is available to serve request: model"
}

I would suggest confirming MemoryUtilization is not being overwhelmed and there is no specifc error in CloudWatch Logs as well.
If MemoryUtilization is overwhelmed, you can test configuring Auto Scaling in order to distribute the load of request to multiple instances. That being said, while I am not sure of the details of your custom container, I also recommend confirming the container itself can handle multiple concurrent requests (i.e have multiple workers available to serve requests).

Related

Google Cloud 403 Error The billing account for the owning project is disabled in state absent

I went on Google Cloud and enabled a project, billing, and the Cloud Speech to Text API. Then I downloaded a .json file. Then I tried to execute this basic code in PyCharm.
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] ="instant-medium-282.json"
from google.cloud import speech_v1
from google.cloud.speech_v1 import enums
client = speech_v1.SpeechClient()
encoding = enums.RecognitionConfig.AudioEncoding.FLAC
sample_rate_hertz = 44100
language_code = 'en-US'
config = {'encoding': encoding, 'sample_rate_hertz': sample_rate_hertz, 'language_code': language_code}
uri = 'gs://bucket_name/file_name.flac'
audio = {'uri': uri}
response = client.recognize(config, audio)
However, I keep getting this error:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
return callable_(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/grpc/_channel.py", line 826, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.PERMISSION_DENIED
details = "The billing account for the owning project is disabled in state absent"
debug_error_string = "{"created":"#1593884707.640503000","description":"Error received from peer ipv6:[2607:f8b0:4009:813::200a]:443","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"The billing account for the owning project is disabled in state absent","grpc_status":7}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/sal.py", line 16, in <module>
response = client.recognize(config, audio)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/google/cloud/speech_v1/gapic/speech_client.py", line 255, in recognize
return self._inner_api_calls["recognize"](
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py", line 143, in __call__
return wrapped_func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/google/api_core/retry.py", line 281, in retry_wrapped_func
return retry_target(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/google/api_core/timeout.py", line 214, in func_with_timeout
return func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.PermissionDenied: 403 The billing account for the owning project is disabled in state absent
I also confirmed with Google Cloud customer support that billing is enabled as it should be. Any suggestions on how to fix this error?
I ran into an issue while uploading a file using the CLI where I was trying to copy into a bucket that did not actually exist. It threw this same error "AccessDeniedException: 403 The billing account for the owning project is disabled in state closed".
If all was well with billing, perhaps the bucket you were pointing to did not exist?
Just wanted to flag that there seem to be non-billing reasons for such an error could be thrown..
The "The billing account for the owning project is disabled in state absent" error message,may happen when the API has just been enabled recently. So it is recommended to try again.
In my case,
"AccessDeniedException: 403 The billing account for the owning project is disabled in state closed"
happen, because I didn't added 'Billing Account' to the project I've created.
You can link your project to a Billing Account in Billing Projects tab.
Once you did so, wait some time and the error should gone.

Google Cloud Function pulling from Pub/Sub subscription throws exception - Deadline Exceeded

I have a Google Cloud Function in Python 3.7 reading from a Pub/Sub subscription in synchronous pull mode.
After running fine 1/hour for 24 hours, it threw this exception stack trace:
Traceback (most recent call last): File
"/env/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py",
line 57, in error_remapped_callable
return callable_(*args, **kwargs) File "/env/local/lib/python3.7/site-packages/grpc/_channel.py", line 824,
in call
return _end_unary_response_blocking(state, call, False, None) File "/env/local/lib/python3.7/site-packages/grpc/_channel.py", line
726, in _end_unary_response_blocking
raise _InactiveRpcError(state) grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status =
StatusCode.DEADLINE_EXCEEDED details = "Deadline Exceeded"
debug_error_string =
"{"created":"#1580454091.145703535","description":"Error received from
peer
ipv4:74.125.202.95:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Deadline
Exceeded","grpc_status":4}"
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File
"/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py",
line 346, in run_http_function
result = _function_handler.invoke_user_function(flask.request) File
"/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py",
line 217, in invoke_user_function
return call_user_function(request_or_event) File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py",
line 210, in call_user_function
return self._user_function(request_or_event) File "/user_code/main.py", line 39, in iteration
response = sub.pull(sub_path, MAX_MESSAGES) File "/env/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/_gapic.py",
line 40, in
fx = lambda self, *a, **kw: wrapped_fx(self.api, *a, **kw) # noqa File
"/env/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/gapic/subscriber_client.py",
line 1005, in pull
request, retry=retry, timeout=timeout, metadata=metadata File "/env/local/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py",
line 143, in call
return wrapped_func(*args, **kwargs) File "/env/local/lib/python3.7/site-packages/google/api_core/retry.py",
line 286, in retry_wrapped_func
on_error=on_error, File "/env/local/lib/python3.7/site-packages/google/api_core/retry.py",
line 184, in retry_target
return target() File "/env/local/lib/python3.7/site-packages/google/api_core/timeout.py",
line 214, in func_with_timeout
return func(*args, **kwargs) File "/env/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py",
line 59, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc) File "", line 3, in raise_from
google.api_core.exceptions.DeadlineExceeded: 504 Deadline Exceeded
What is this about? Is it to be expected or a result of some configuration problem? If to be expected, how should it be handled?
The documentation ( view-source:https://googleapis.dev/python/pubsub/latest/subscriber/api/client.html ) on pull has nothing about this being a possible exception.
I ack the messages immediately after the pull completes. I only permit one function execution at a time. I have a 600 second acknowledgement deadline. A block of messages pulled at one time seem to be less than 100 in number. If this is about failing to ack a message, it seems like the error could be done much better.
This exception is raised by the client when there's no messages to read in the subscription. It is a known issue from the latest PubSub library versions >= 1.0.0. If necessary, you can downgrade to the version 0.45.0 where this issue was not present.
However, as a workaround you can catch the DeadlineExceeded exception and retry the operation again. Also, based on the comment of Hemang, here's a small monkeypatch that you can add to your running code, which might help to get the same behavior as in version 0.45.0.
from google.cloud.pubsub_v1.gapic import subscriber_client_config as sub_config
sub_config.config['interfaces']['google.pubsub.v1.Subscriber']['retry_params']['messaging']['initial_rpc_timeout_millis'] = 25000
Finally, keep in mind that when using synchronous pull, having many outstanding pull requests helps lower the delivery latency, which in turn might result in higher latency pull requests (and DeadlineExceeded errors). Although, if latency is crucial for the application, you could consider using StreamingPull

Flask Mail not working with Gmail account and AWS instance

So I built a flask app and it worked great when I ran it locally. Then I pushed it to my AWS EC2 instance. Once I was running it from my instance it was no longer sending emails. It would add to my database then the next line was to send the email. That is where it was failing. It was that Google was blocking the device. I then was able to allow that device and it was all working fine.
Fast forward and I added an elastic IP and then linked it to my domain and now it is not working again. It is still adding to my database so I think that it is the issue is google is not letting the application work. I am not sure how to solve this but I have been working on this for some time. This is the error code that I get on my instance
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "FlaskApp.py", line 130, in register
mail.send(msg)
File "/usr/local/lib/python2.7/site-packages/flask_mail.py", line 492, in send
message.send(connection)
File "/usr/local/lib/python2.7/site-packages/flask_mail.py", line 427, in send
connection.send(self)
File "/usr/local/lib/python2.7/site-packages/flask_mail.py", line 178, in send
"The message does not specify a sender and a default sender "
AssertionError: The message does not specify a sender and a default sender has not been configured
And on my website I get the tab that says "500 Internal Server Error" and the website displays the following.
Internal Server Error
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
Can someone help me figure this out. I would like to be able to send emails again using my flask app. I am new to this so my logic says that it is all because I am being denyed access to the google account. But I cannot find a link or anything that will enable me to allow access for the EC2 instance. I even have tried to use this: https://accounts.google.com/DisplayUnlockCaptcha but that doesn't work either.
UPDATE:
It was working before and I did not change anything in my source code, see below:
FROM_EMAIL = os.environ.get('EMAIL_USERNAME')
app.config.update(
DEBUG = False,
MAIL_SERVER = 'smtp.gmail.com',
MAIL_PORT = 465,
MAIL_USE_SSL = True,
MAIL_USERNAME = FROM_EMAIL,
MAIL_PASSWORD = os.environ.get('EMAIL_PASSWORD_TOKEN'),
)
mail = Mail(app)
and then when I call is I use:
msg = Message("Welcome",
sender = FROM_EMAIL,
recipients = [request.form["email"]])
msg.body = "Welcome! \n\n Congratulations on your sucessful registration."
mail.send(msg)
UPDATE #2:
I am not sure why but I had to add a default sender and that worked but I am still unable to send emails. I now get the following error:
[2019-01-22 01:40:44,973] ERROR in app: Exception on /register/ [POST]
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "FlaskApp.py", line 131, in register
mail.send(msg)
File "/usr/local/lib/python2.7/site-packages/flask_mail.py", line 492, in send
message.send(connection)
File "/usr/local/lib/python2.7/site-packages/flask_mail.py", line 427, in send
connection.send(self)
File "/usr/local/lib/python2.7/site-packages/flask_mail.py", line 192, in send
message.rcpt_options)
File "/usr/lib64/python2.7/smtplib.py", line 737, in sendmail
raise SMTPSenderRefused(code, resp, from_addr)
SMTPSenderRefused: (530, '5.5.1 Authentication Required. Learn more at\n5.5.1 https://support.google.com/mail/?p=WantAuthError d21sm19653189pfo.162 - gsmtp', u'myemail#gmail.com')
199.169.1.69 - - [22/Jan/2019 01:40:44] "POST /register/ HTTP/1.1" 500 -
Not sure if it is helpful to know or not but I am able to log into gmail using SMTP cmds from the instance but I still get the error. I followed the directions found here: How to send email using simple SMTP commands via Gmail?
If you are getting the assertion error about specifying a sender, you are definitely not providing a sender address. Here's the relevant code in Flask-Mail: https://github.com/mattupstate/flask-mail/blob/e195fca6de1077cabb711426e6378f51dc39d598/flask_mail.py#L177. As you can see, at that point in the code it hasn't even tried sending the email yet, so the cause of that error cannot be Google blocking your device.
From the code you've provided, I would be checking to make sure that your environment variable EMAIL_USERNAME is actually being set. You could try hard-coding it to remove the environment variable from the equation and verify that the code is actually able to send email via Gmail.

Zappa/AWS - Emails won't send and just timeout

Currently, I have tried both plain-old Django SMTP and a few different api-based Django libraries for my transactional email provider (Postmark).
When I run my development server, everything works perfectly. Emails send via the Postmark API with no problem.
When I deploy to AWS with Zappa, visit my website, and do a task that is supposed to send an email (Ex. Resetting a user's password) the page continually loads until it says Endpoint request timed out.
I have tried setting the timeout of my AWS Lambda function to a longer duration in case Django decides to throw an error.
Here is the error that was thrown. Just keep in mind this error only happens in production. I created a custom management command in able to retrieve this error.
HTTPSConnectionPool(host='api.postmarkapp.com', port=443): Max retries exceeded with url: /email/batch (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f6cfbd5dd30>: Failed to establish a new connection: [Errno 110] Connection timed out',)): ConnectionError
Traceback (most recent call last):
File "/var/task/handler.py", line 509, in lambda_handler
return LambdaHandler.lambda_handler(event, context)
File "/var/task/handler.py", line 240, in lambda_handler
return handler.handler(event, context)
File "/var/task/handler.py", line 376, in handler
management.call_command(*event['manage'].split(' '))
File "/var/task/django/core/management/__init__.py", line 131, in call_command
return command.execute(*args, **defaults)
File "/var/task/django/core/management/base.py", line 330, in execute
output = self.handle(*args, **options)
File "/var/task/users/management/commands/sendemail.py", line 13, in handle
fail_silently=False,
File "/var/task/django/core/mail/__init__.py", line 62, in send_mail
return mail.send()
File "/var/task/django/core/mail/message.py", line 348, in send
return self.get_connection(fail_silently).send_messages([self])
File "/var/task/postmarker/django/backend.py", line 66, in send_messages
responses = self.client.emails.send_batch(*prepared_messages, TrackOpens=self.get_option('TRACK_OPENS'))
File "/var/task/postmarker/models/emails.py", line 332, in send_batch
return self.EmailBatch(*emails).send(**extra)
File "/var/task/postmarker/models/emails.py", line 247, in send
responses = [self._manager._send_batch(*batch) for batch in chunks(emails, self.MAX_SIZE)]
File "/var/task/postmarker/models/emails.py", line 247, in <listcomp>
responses = [self._manager._send_batch(*batch) for batch in chunks(emails, self.MAX_SIZE)]
File "/var/task/postmarker/models/emails.py", line 276, in _send_batch
return self.call('POST', '/email/batch', data=emails)
File "/var/task/postmarker/models/base.py", line 72, in call
return self.client.call(*args, **kwargs)
File "/var/task/postmarker/core.py", line 106, in call
**kwargs
File "/var/task/postmarker/core.py", line 129, in _call
method, url, json=data, params=kwargs, headers=default_headers, timeout=self.timeout
File "/var/task/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/var/task/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/var/task/requests/adapters.py", line 508, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.postmarkapp.com', port=443): Max retries exceeded with url: /email/batch (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f6cfbd5dd30>: Failed to establish a new connection: [Errno 110] Connection timed out',))
I have allowed all incoming and outgoing traffic to my AWS security group in an attempt to fix this. Still to no avail.
Any help would be greatly, greatly appreciated. Cheers.
The explanation is simple: A Lambda instance running in a VPC cannot access the internet:
When you add VPC configuration to a Lambda function, it can only access resources in that VPC. If a Lambda function needs to access both VPC resources and the public Internet, the VPC needs to have a Network Address Translation (NAT) instance inside the VPC.
The solution is also simple, if annoying: run a NAT Instance or NAT Gateway in the VPC. (An alternate solution is to take your Lambda out of the VPC, but that is a much bigger change.)
I am running Django / Zappa in Lambda with a NAT instance for connecting to Amazon Simple Email Service and it works fine.

app_identity_service.GetAccessToken() required more quota than is available

I am using app engine and big query as the backend for my website. Whenever the user does some click, i log them into bigquery to do analytics later in the day. I get close to 75k clicks a day. It was working fine till last week. This is the code i use.
body = {"rows":[bodyFields]}
credentials = appengine.AppAssertionCredentials(scope=BIGQUERY_SCOPE)
http = credentials.authorize(httplib2.Http())
bigquery = discovery.build('bigquery', 'v2', http=http)
response = bigquery.tabledata().insertAll(
projectId=PROJECT_ID,
datasetId=BIGQUERY_DATASETID,
tableId=BIGQUERY_TABLEID,
body=body).execute()
Now all of a sudden i am getting over quota exception. My application is a paid app engine instance. Below is the stack-trace of my exception
Attempting refresh to obtain initial access_token
The API call app_identity_service.GetAccessToken() required more quota than is available.
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1535, in __call__
rv = self.handle_exception(request, response, e)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1529, in __call__
rv = self.router.dispatch(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/filename.py", line 1611, in post
bigquery = discovery.build('bigquery', 'v2', http=http)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/oauth2client/util.py", line 129, in positional_wrapper
return wrapped(*args, **kwargs)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/apiclient/discovery.py", line 198, in build
resp, content = http.request(requested_url)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/oauth2client/util.py", line 129, in positional_wrapper
return wrapped(*args, **kwargs)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/oauth2client/client.py", line 516, in new_request
self._refresh(request_orig)
File "/base/data/home/apps/s~projectname/bigqueryapi.387952303347375306/oauth2client/appengine.py", line 194, in _refresh
scopes, service_account_id=self.service_account_id)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/app_identity/app_identity.py", line 589, in get_access_token
scopes, service_account_id=service_account_id)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/app_identity/app_identity.py", line 547, in get_access_token_uncached
return rpc.get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
return self.__get_result_hook(self)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/app_identity/app_identity.py", line 519, in get_access_token_result
rpc.check_success()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 579, in check_success
self.__rpc.CheckSuccess()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 134, in CheckSuccess
raise self.exception
OverQuotaError: The API call app_identity_service.GetAccessToken() required more quota than is available.
My traffic hasn't gone up by much also the number of time the handler is hit is almost the same as past 2 months data. So why am i getting this error.
In order to determine why you're hitting the quota error, you'll need to share more detail about your usage. The quota should reset every 24 hours. Do you know how long it takes for the error to appear and how much traffic you've successfully served to that point in time?
You mentioned that you "do analytics later in the day", which suggests that you might be using the TaskQueue API or Deferred Tasks. It's possible that those tasks are failing for other reasons and retrying, which could quickly eat up your quota. If you are using TaskQueues, you might try tuning the queue configuration and retry options.
Another way you might be able to conserve your quota would be to save the bigquery discovery service that you're building to something like the Memcache API, so that it can be reused for multiple requests to the BigQuery service.