pub/sub synchronous pull w deadline - google-cloud-platform

from google.api_core import retry
from google.cloud import pubsub_v1
NUM_MESSAGES = 1
with subscriber:
response = subscriber.pull(request={"subscription": subscription_path, "max_messages": NUM_MESSAGES},retry=retry.Retry(deadline=120),)
Requirement is to wait for X (120) seconds for the message to arrive in subscription, read it & validate some fields in it. Somehow, above code snippet was NOT working w.r.t deadline . It's not waiting for 120 seconds and timing out much before . what am I missing !?
ref: pub/sub synchronous pull w deadline

As Guillaume mentioned, the timeout argument should be used how long the Pull request is open for. The default timeout looks to be 60 seconds.
So, you could write:
response = subscriber.pull(request={"subscription": subscription_path, "max_messages": NUM_MESSAGES},timeout=120.0,)

Related

AWS Error "Calling the invoke API action failed with this message: Rate Exceeded" when I use s3.get_paginator('list_objects_v2')

Some third party application is uploading around 10000 object to my bucket+prefix in a day. My requirement is to fetch all objects which were uploaded to my bucket+prefix in last 24 hours.
There are so many files in my bucket+prefix.
So I assume that when I call
response = s3_paginator.paginate(Bucket=bucket,Prefix='inside-bucket-level-1/', PaginationConfig={"PageSize": 1000})
then may be it makes multiple calls to S3 API and may be that's why it is showing Rate Exceeded error.
Below is my Python Lambda Function.
import json
import boto3
import time
from datetime import datetime, timedelta
def lambda_handler(event, context):
s3 = boto3.client("s3")
from_date = datetime.today() - timedelta(days=1)
string_from_date = from_date.strftime("%Y-%m-%d, %H:%M:%S")
print("Date :", string_from_date)
s3_paginator = s3.get_paginator('list_objects_v2')
list_of_buckets = ['kush-dragon-data']
bucket_wise_list = {}
for bucket in list_of_buckets:
response = s3_paginator.paginate(Bucket=bucket,Prefix='inside-bucket-level-1/', PaginationConfig={"PageSize": 1000})
filtered_iterator = response.search(
"Contents[?to_string(LastModified)>='\"" + string_from_date + "\"'].Key")
keylist = []
for key_data in filtered_iterator:
if "/" in key_data:
splitted_array = key_data.split("/")
if len(splitted_array) > 1:
if splitted_array[-1]:
keylist.append(splitted_array[-1])
else:
keylist.append(key_data)
bucket_wise_list.update({bucket: keylist})
print("Total Number Of Object = ", bucket_wise_list)
# TODO implement
return {
'statusCode': 200,
'body': json.dumps(bucket_wise_list)
}
So when we execute above Lambda Function then it shows below error.
"Calling the invoke API action failed with this message: Rate Exceeded."
Can anyone help to resolve this error and achieve my requirement ?
This is probably due to your account restrictions, you should add retry with some seconds between retries or increase pagesize
This is most likely due to you reaching your quota limit for AWS S3 API calls. The "bigger hammer" solution is to request a quota increase, but if you don't want to do that, there is another way using botocore.Config built in retries, for example:
import json
import time
from datetime import datetime, timedelta
from boto3 import client
from botocore.config import Config
config = Config(
retries = {
'max_attempts': 10,
'mode': 'standard'
}
)
def lambda_handler(event, context):
s3 = client('s3', config=config)
###ALL OF YOUR CURRENT PYTHON CODE EXACTLY THE WAY IT IS###
This config will use exponentially increasing sleep timer for a maximum number of retries. From the docs:
Any retry attempt will include an exponential backoff by a base factor of 2 for a maximum backoff time of 20 seconds.
There is also an adaptive mode which is still experimental. For more info, see the docs on botocore.Config retries
Another (much less robust IMO) option would be to write your own paginator with a sleep programmed in, though you'd probably just want to use the builtin backoff in 99.99% of cases (even if you do have to write your own paginator). (this code is untested and isn't even asynchronous, so the sleep will be in addition to the wait time for a page response. To make the "sleep time" exactly sleep_secs, you'll need to use concurrent.futures or asyncio (AWS built in paginators mostly use concurrent.futures)):
from boto3 import client
from typing import Generator
from time import sleep
def get_pages(bucket:str,prefix:str,page_size:int,sleep_secs:float) -> Generator:
s3 = client('s3')
page:dict = client.list_objects_v2(
Bucket=bucket,
MaxKeys=page_size,
Prefix=prefix
)
next_token:str = page.get('NextContinuationToken')
yield page
while(next_token):
sleep(sleep_secs)
page = client.list_objects_v2(
Bucket=bucket,
MaxKeys=page_size,
Prefix=prefix,
ContinuationToken=next_token
)
next_token = page.get('NextContinuationToken')
yield page

"Rate of traffic exceeds capacity" error on Google Cloud VertexAI but only sending a single prediction request

As In the title. Exact response:
{
"error": {
"code": 429,
"message": "Rate of traffic exceeds capacity. Ramp your traffic up more slowly. endpoint_id: <My Endpoint>, deployed_model_id: <My model>.",
"status": "RESOURCE_EXHAUSTED"
}
I send a single prediction request which consists of an instance of 1 string. The model is a pipeline of a custom tfidf vectorizer and logistic regression. I timed the loading time: ~0.5s, prediction time < 0.01s.
I can confirm through logs that the prediction is executed successfully but for some reason this is the response I get. Any ideas?
Few things to consider:
Allow your prediction service to serve using multiple workers
Increase your number of replicas in Vertex or set your machine types to stronger types as long as you gain improvement
However, there's something worth doing first in the client side assuming most of your prediction calls go through successfully and it is not that frequent that the service is unavailable,
Configure your prediction client to use Retry (exponential backoff):
from google.api_core.retry import Retry, if_exception_type
import requests.exceptions
from google.auth import exceptions as auth_exceptions
from google.api_core import exceptions
if_error_retriable = if_exception_type(
exceptions.GatewayTimeout,
exceptions.TooManyRequests,
exceptions.ResourceExhausted,
exceptions.ServiceUnavailable,
exceptions.DeadlineExceeded,
requests.exceptions.ConnectionError, # The last three might be an overkill
requests.exceptions.ChunkedEncodingError,
auth_exceptions.TransportError,
)
def _get_retry_arg(settings: PredictionClientSettings):
return Retry(
predicate=if_error_retriable,
initial=1.0, # Initial delay
maximum=4.0, # Maximum delay
multiplier=2.0, # Delay's multiplier
deadline=9.0, # After 9 secs it won't try again and it will throw an exception
)
def predict_custom_trained_model_sample(
project: str,
endpoint_id: str,
instance_dict: Dict,
location: str = "us-central1",
api_endpoint: str = "us-central1-aiplatform.googleapis.com",
):
...
response = await client.predict(
endpoint=endpoint,
instances=instances,
parameters=parameters,
timeout=SOME_VALUE_IN_SEC,
retry=_get_retry_arg(),
)

How to run BigQuery after Dataflow job completed successfully

I am trying to run a query in BigQuery right after a dataflow job completes successfully. I have defined 3 different functions in main.py.
The first one is for running the dataflow job. The second one checks the dataflow jobs status. And the last one runs the query in BigQuery.
The trouble is the second function checks the dataflow job status multiple times for a period of time and after the dataflow job completes successfully, it does not stop checking the status.
And then function deployment fails due to 'function load attempt timed out' error.
from googleapiclient.discovery import build
from oauth2client.client import GoogleCredentials
import os
import re
import config
from google.cloud import bigquery
import time
global flag
def trigger_job(gcs_path, body):
credentials = GoogleCredentials.get_application_default()
service = build('dataflow', 'v1b3', credentials=credentials, cache_discovery=False)
request = service.projects().templates().launch(projectId=config.project_id, gcsPath=gcs_path, body=body)
response = request.execute()
def get_job_status(location, flag):
credentials=GoogleCredentials.get_application_default()
dataflow=build('dataflow', 'v1b3', credentials=credentials, cache_discovery=False)
result=dataflow.projects().jobs().list(projectId=config.project_id, location=location).execute()
for job in result['jobs']:
if re.findall(r'' + re.escape(config.job_name) + '', job['name']):
while flag==0:
if job['currentState'] != "JOB_STATE_DONE":
print('NOT DONE')
else:
flag=1
print('DONE')
break
def bq(sql):
client = bigquery.Client()
query_job = client.query(sql, location='US')
gcs_path = config.gcs_path
body=config.body
trigger_job(gcs_path,body)
flag=0
location='us-central1'
get_job_status(location,flag)
sql= """CREATE OR REPLACE TABLE 'table' AS SELECT * FROM 'table'"""
bq(SQL)
Cloud Function timeout is set to 540 seconds but deployment fails in 3-4 minutes.
Any help is very appreciated.
It appears from the code snippet provided that your HTTP-triggered cloud function is not returning a HTTP response.
All HTTP-based cloud functions must return a HTTP response for proper termination. From the google documentation Ensure HTTP functions send an HTTP response (Emphasis - mine):
If your function is HTTP-triggered, remember to send an HTTP response,
as shown below. Failing to do so can result in your function executing
until timeout. If this occurs, you will be charged for the entire
timeout time. Timeouts may also cause unpredictable behavior or cold
starts on subsequent invocations, resulting in unpredictable behavior
or additional latency.
Thus, you must have a function that in your main.py that returns some sort of value, ideally a value that can be coerced into a Flask http response.

creating export task in aws cloudwatch log

I am trying to create export task for moving the cloudwatch logs more than 30 day to move to s3 bucket. I am currently following this AWS article https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/S3ExportTasks.html. I am stuck with the creation of export task. when I ran the below command
aws logs create-export-task --profile CWLExportUser --task-name "my-log-group-$(date +%Y-%m-%d)" --log-group-name "/aws/lambda/AMICreate" --from 60 --to 155520000 --destination "diego-cw-logs" --destination-prefix "export-task-output"
I am getting below error
An error occurred (InvalidParameterException) when calling the CreateExportTask operation: Specified time range of '60' and '155520000' is not valid. Please make sure the values are within the retention period of the log groups and from value is lesser than the to values
I am missing something. It would be great if some one render their hands to fix the issue.
Warmly,
Muneesh
Use https://currentmillis.com/ to convert time in milisecond. Below is python code and use it with eventbridge to get transfer log as per your need.
import json
import boto3
import botocore
from datetime import datetime, timedelta
def hello(event, context):
# Date calculated in milisecond
time_seven_days_ago = (datetime.now() + timedelta(days=-103)).timestamp()
time_six_days_ago = (datetime.now() + timedelta(days=-102)).timestamp()
print(int(time_seven_days_ago)*1000) #printing to CloudWatch
print(int(time_six_days_ago)*1000) #printing to CloudWatch
#Boto3 objecets for CW and SNS
client = boto3.client('sns')
cli = boto3.client('logs')
try:
response = cli.create_export_task(
taskName='task-name',logGroupName='cloudwatch-log-group-name', fromTime=int(time_seven_days_ago)*1000, to=int(time_six_days_ago)*1000, destination='s3-bucket-name', destinationPrefix='its-your-wish')
print(response)
client.publish(
TopicArn='topic--arn',
#Sending Error in Email
Message='Task completed',
Subject='Create Task Job Status',
MessageStructure='Create Task Job status'
)
except BaseException as error:
print('Error From Lambda Function', error) #Printing Error to CloudWatch
try:
client.publish(
TopicArn='topic-arn',
#Sending Error in Email
Message=error,
Subject='Create Task Job Failed',
MessageStructure='Create Task Job Failed'+error
)
except BaseException as fail:
print('Error while sending email', fail)
--to and --from should be:
The start time of the range for the request, expressed as the number of milliseconds after Jan 1, 1970 00:00:00 UTC
In your case you are using --from 60 --to 155520000 which means that you want to export values from 60 millseconds after 1970 00:00:00 to 155520000 milliseconds after (~two days) after. Obviously this does not make sense.
So basically you have to provide correct timestamps in milliseconds for the range you want to use.

Create AWS sagemaker endpoint and delete the same using AWS lambda

Is there a way to create sagemaker endpoint using AWS lambda ?
The maximum timeout limit for lambda is 300 seconds while my existing model takes 5-6 mins to host ?
One way is to combine Lambda and Step functions with a wait state to create sagemaker endpoint
In the step function have tasks to
1 . Launch AWS Lambda to CreateEndpoint
import time
import boto3
client = boto3.client('sagemaker')
endpoint_name = 'DEMO-imageclassification-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
endpoint_config_name = 'DEMO-imageclassification-epc--2018-06-18-17-02-44'
print(endpoint_name)
def lambda_handler(event, context):
create_endpoint_response = client.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name)
print(create_endpoint_response['EndpointArn'])
print('EndpointArn = {}'.format(create_endpoint_response['EndpointArn']))
# get the status of the endpoint
response = client.describe_endpoint(EndpointName=endpoint_name)
status = response['EndpointStatus']
print('EndpointStatus = {}'.format(status))
return status
2 . Wait task to wait for X minutes
3 . Another task with Lambda to check EndpointStatus and depending on EndpointStatus (OutOfService | Creating | Updating | RollingBack | InService | Deleting | Failed) either stop the job or continue polling
import time
import boto3
client = boto3.client('sagemaker')
endpoint_name = 'DEMO-imageclassification-2018-07-20-18-52-30'
endpoint_config_name = 'DEMO-imageclassification-epc--2018-06-18-17-02-44'
print(endpoint_name)
def lambda_handler(event, context):
# print the status of the endpoint
endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)
status = endpoint_response['EndpointStatus']
print('Endpoint creation ended with EndpointStatus = {}'.format(status))
if status != 'InService':
raise Exception('Endpoint creation failed.')
# wait until the status has changed
client.get_waiter('endpoint_in_service').wait(EndpointName=endpoint_name)
# print the status of the endpoint
endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)
status = endpoint_response['EndpointStatus']
print('Endpoint creation ended with EndpointStatus = {}'.format(status))
if status != 'InService':
raise Exception('Endpoint creation failed.')
status = endpoint_response['EndpointStatus']
return
Another approach is to combination of AWS Lambda functions and CloudWatch rules which I think would be clumsy.
While rajesh answer is closer to what the question ask for, I like to add that sagemaker now has a batch transform job.
Instead of continously hosting a machine, this job can handle predicting large size of batches at once without caring about latency. So if the intention behind the question is to deploy the model for a short time to predict on a fix amount of batches. This might be the better approach.