Is possible to avoid the 60 seconds limit in urllib2.urlopen with GAE? - python-2.7

I'm requesting a file with a size around 14MB from a slow server with urllib2.urlopen, and it spend more than 60 seconds to get the data, and I'm getting the error:
Deadline exceeded while waiting for HTTP response from URL:
http://bigfile.zip?type=CSV
Here my code:
class CronChargeBT(webapp2.RequestHandler):
def get(self):
taskqueue.add(queue_name = 'optimized-queue', url='/cronChargeBTB')
class CronChargeBTB(webapp2.RequestHandler):
def post(self):
url = "http://bigfile.zip?type=CSV"
url_request = urllib2.Request(url)
url_request.add_header('Accept-encoding', 'gzip')
urlfetch.set_default_fetch_deadline(300)
response = urllib2.urlopen(url_request, timeout=300)
buf = StringIO(response.read())
f = gzip.GzipFile(fileobj=buf)
...work with the data insiste the file...
I create a cron task who calls CronChargeBT. Here the cron.yaml:
- description: cargar BlueTomato
url: /cronChargeBT
target: charge
schedule: every wed,sun 01:00
and it create a new task and insert into a queue, here the queue configuration:
- name: optimized-queue
rate: 40/s
bucket_size: 60
max_concurrent_requests: 10
retry_parameters:
task_retry_limit: 1
min_backoff_seconds: 10
max_backoff_seconds: 200
Of coursethat the timeout=300 isn't working because the 60seconds limit in GAE but I think yhat I can avoid it using a task... anyone knows how I can get the data in the file avoiding this timeout.
Thanks a lot!!!

Cron jobs are limited to 10 minutes deadline, not 60 seconds. If your download fails, perhaps just retry? Does the download work if you download it from your computer? There's nothing you can do on GAE if the server you are downloading from is too slow or unstable.
Edit: According to https://cloud.google.com/appengine/docs/java/outbound-requests#request_timeouts, there is a maximum deadline of 60 seconds for cron job requests. Therefore, you can't get around it.

Related

Boto3 and AWS RDS: properly wait for database creation from snapshot

I have a following code in my Lambda (Python and Boto3):
rds.restore_db_instance_from_db_snapshot(
DBSnapshotIdentifier=snapshot_name,
DBInstanceIdentifier=db_id,
DBInstanceClass=rds_instance_class,
VpcSecurityGroupIds=secgroup,
DBSubnetGroupName=rds_subnet_groupname,
MultiAZ=False,
PubliclyAccessible=False,
CopyTagsToSnapshot=True
)
waiter = rds.get_waiter('db_instance_available')
waiter.wait(DBInstanceIdentifier=db_id)
# some other operation that expects that DB is up and running.
The waiter was added as an attempt to properly wait for DB. However, it looks like the waiter times out.
What would be the correct waiter to use in this case?
try setting waiter.config.delay and/or waiter.config.max_attempts.
waiter = rds.get_waiter('db_instance_available')
waiter.config.delay = 123 # this is in seconds
waiter.config.max_attempts = 123
waiter.wait(DBInstanceIdentifier=db_id)
OR
waiter = rds.get_waiter('db_instance_available')
waiter.wait(
DBClusterIdentifier=db_id
WaiterConfig={
'Delay': 123,
'MaxAttempts': 123
}
)
WaiterConfig (dict) A dictionary that provides parameters to control
waiting behavior.
Delay (integer) The amount of time in seconds to wait between
attempts. Default: 30
MaxAttempts (integer) The maximum number of attempts to be made.
Default: 60
Could it be that your waiter is actually checking the existing db and seeing that it's available before the status can update on the previous command to restore the snapshot?

AWS Lambda function fails while query Athena

I am attempting to write a simple Lambda function to query a table in Athena. But after a few seconds I see "Status: FAILED" in the Cloudwatch logs.
There is no descriptive error message on the cause of failure.
My test code is below:
import json
import time
import boto3
# athena constant
DATABASE = 'default'
TABLE = 'test'
# S3 constant
S3_OUTPUT = 's3://test-output/'
# number of retries
RETRY_COUNT = 1000
def lambda_handler(event, context):
# created query
query = "SELECT * FROM default.test limit 2"
# % (DATABASE, TABLE)
# athena client
client = boto3.client('athena')
# Execution
response = client.start_query_execution(
QueryString=query,
QueryExecutionContext={
'Database': DATABASE
},
ResultConfiguration={
'OutputLocation': S3_OUTPUT,
}
)
# get query execution id
query_execution_id = response['QueryExecutionId']
print(query_execution_id)
# get execution status
for i in range(1, 1 + RETRY_COUNT):
# get query execution
query_status = client.get_query_execution(QueryExecutionId=query_execution_id)
query_execution_status = query_status['QueryExecution']['Status']['State']
if query_execution_status == 'SUCCEEDED':
print("STATUS:" + query_execution_status)
break
if query_execution_status == 'FAILED':
#raise Exception("STATUS:" + query_execution_status)
print("STATUS:" + query_execution_status)
else:
print("STATUS:" + query_execution_status)
time.sleep(i)
else:
# Did not encounter a break event. Need to kill the query
client.stop_query_execution(QueryExecutionId=query_execution_id)
raise Exception('TIME OVER')
# get query results
result = client.get_query_results(QueryExecutionId=query_execution_id)
print(result)
return
The logs show the following:
2020-08-31T10:52:12.443-04:00
START RequestId: e5434651-d36e-48f0-8f27-0290 Version: $LATEST
2020-08-31T10:52:13.481-04:00
88162f38-bfcb-40ae-b4a3-0b5a21846e28
2020-08-31T10:52:13.500-04:00
STATUS:QUEUED
2020-08-31T10:52:14.519-04:00
STATUS:RUNNING
2020-08-31T10:52:16.540-04:00
STATUS:RUNNING
2020-08-31T10:52:19.556-04:00
STATUS:RUNNING
2020-08-31T10:52:23.574-04:00
STATUS:RUNNING
2020-08-31T10:52:28.594-04:00
STATUS:FAILED
2020-08-31T10:52:28.640-04:00
....more status: FAILED
....
END RequestId: e5434651-d36e-48f0-8f27-0290
REPORT RequestId: e5434651-d36e-48f0-8f27-0290 Duration: 30030.22 ms Billed Duration: 30000 ms Memory Size: 128 MB Max Memory Used: 72 MB Init Duration: 307.49 ms
2020-08-31T14:52:42.473Z e5434651-d36e-48f0-8f27-0290 Task timed out after 30.03 seconds
I think I have the right permissions for S3 bucket access given to the role (if not, I would have seen the error message). There are no files created in the bucket either. I am not sure what is going wrong here. What am I missing?
Thanks
The last line in your log shows
2020-08-31T14:52:42.473Z e5434651-d36e-48f0-8f27-0290 Task timed out after 30.03 seconds
To me this looks like the timeout of the Lambda Function is set to 30 seconds. Try increasing it to more than the time the Athena query needs (the maximum is 15 minutes).

ExportDicomData request of Google Cloud Healthcare API on GitHub tutorials never finish

I'm trying AutoML Vision of ML Codelabs on Cloud Healthcare API GitHub tutorials.
https://github.com/GoogleCloudPlatform/healthcare/blob/master/imaging/ml_codelab/breast_density_auto_ml.ipynb
I run the Export DICOM data cell code of Convert DICOM to JPEG section and the request as well as all the premise cell code succeeded.
But waiting for operation completion is timed out and never finish.
(ExportDicomData request status on Dataset page stays "Running" over the day. I did many times but all the requests were stacked staying "Running". A few times I tried to do from scratch and the results were same.)
I did so far:
1) Remove "output_config" since INVALID ARGUMENT error occurs.
https://github.com/GoogleCloudPlatform/healthcare/issues/133
2) Enable Cloud Resource Manager API since it is needed.
This is the cell code.
# Path to export DICOM data.
dicom_store_url = os.path.join(HEALTHCARE_API_URL, 'projects', project_id, 'locations', location, 'datasets', dataset_id, 'dicomStores', dicom_store_id)
path = dicom_store_url + ":export"
# Headers (send request in JSON format).
headers = {'Content-Type': 'application/json'}
# Body (encoded in JSON format).
# output_config = {'output_config': {'gcs_destination': {'uri_prefix': jpeg_folder, 'mime_type': 'image/jpeg; transfer-syntax=1.2.840.10008.1.2.4.50'}}}
output_config = {'gcs_destination': {'uri_prefix': jpeg_folder, 'mime_type': 'image/jpeg; transfer-syntax=1.2.840.10008.1.2.4.50'}}
body = json.dumps(output_config)
resp, content = http.request(path, method='POST', headers=headers, body=body)
assert resp.status == 200, 'error exporting to JPEG, code: {0}, response: {1}'.format(resp.status, content)
print('Full response:\n{0}'.format(content))
# Record operation_name so we can poll for it later.
response = json.loads(content)
operation_name = response['name']
This is the result of waiting.
Waiting for operation completion...
Full response:
{
"name": "projects/my-datalab-tutorials/locations/us-central1/datasets/sample-dataset/operations/18300485449992372225",
"metadata": {
"#type": "type.googleapis.com/google.cloud.healthcare.v1beta1.OperationMetadata",
"apiMethodName": "google.cloud.healthcare.v1beta1.dicom.DicomService.ExportDicomData",
"createTime": "2019-08-18T10:37:49.809136Z"
}
}
AssertionErrorTraceback (most recent call last)
<ipython-input-18-1a57fd38ea96> in <module>()
21 timeout = time.time() + 10*60 # Wait up to 10 minutes.
22 path = os.path.join(HEALTHCARE_API_URL, operation_name)
---> 23 _ = wait_for_operation_completion(path, timeout)
<ipython-input-18-1a57fd38ea96> in wait_for_operation_completion(path, timeout)
15
16 print('Full response:\n{0}'.format(content))
---> 17 assert success, "operation did not complete successfully in time limit"
18 print('Success!')
19 return response
AssertionError: operation did not complete successfully in time limit
API Version is v1beta1.
I was wondering if somebody has any suggestion.
Thank you.
After several times kept trying and stayed running one night, it finally succeeded. I don't know why.
There was a recent update to the codelabs. The error message is due to the timeout in the codelab and not the actual operation. This has been addressed in the update. Please let me know if you are still running into any issues!

zeit now with flask + telegram

I am using https://zeit.co (free) and was thinking setup a webhook for telegram chat bot.
I sent a message from the telegram app on the phone and it supposes to post a json to the webhook url. It does post the data but i cannot get the json. It seems zeit.co cannot handle the json?
It is like something stuck whenever i tried to call request.json
#app.route("/new_message", methods=["POST", "GET"])
def telegram_webhook_handler():
try:
print(request.json)
if request.method == 'POST':
r = request.get_json()
chat_id = r['message']['chat']['id']
text = "how are you?"
send_message(CHAT_ID, text)
else:
send_message(CHAT_ID, "This is a get")
except Exception as e:
print(e)
pass
return jsonify({"ping": "pong"})
The error message from zeit.co
12/27 01:42 PM (40s)
REPORT RequestId: 3462880b-09d4-11e9-b07e-77492ad19973 Duration: 300021.80 ms Billed Duration: 300000 ms Memory Size: 1024 MB Max Memory Used: 42 MB
12/27 01:42 PM (40s)
2018-12-27T12:42:42.838Z 3462880b-09d4-11e9-b07e-77492ad19973 Task timed out after 300.02 seconds
Any idea how i can get the webhook data?
Cheers
Your code has exceeded its duration limit.
Duration: 300021.80 ms Billed Duration: 300000 ms
If you want to increase the duration limit, you will have to upgrade your Zeit account.

AWS Worker tier cron - server error #500 - "post http 1.1 500 AWS aws-sqsd/2.0"

I'm trying to set up a cronjob at Elastic Beanstalk. The task is being scheduled. For testing purposes it should run every minute... However it is not working. It is a Django app. The app is running in two Environments, one is the worker and the other one is "hosting" the application.
This part is working. The command is running but it's not being executed (the files are not being deleted).
Here is views.py:
#login_required
def delete_expired_files(request):
users = DemoUser.objects.all()
for user in users:
documents = Document.objects.filter(owner=user.id)
if documents:
for doc in documents:
now = timezone.now()
if now >= doc.date_published + timedelta(days = doc.owner.group.valid_time):
doc.delete()
return redirect("user_home")
cron.yml:
version: 1
cron:
- name: "delete_expired_files"
url: "http://networksapp.elasticbeanstalk.com/networks_app/delete_expired_files"
schedule: "* * * * *"
However, it prints this on the log file at the access_log part :
"POST /myapp/management/commands/delete_expired_files HTTP/1.1" 500 124709 "-" "aws-sqsd/2.0"
This is the log file I am accessing so far:
Log file content
Why is it? How can I fix it?
Thank you so much.