creating export task in aws cloudwatch log - amazon-web-services

I am trying to create export task for moving the cloudwatch logs more than 30 day to move to s3 bucket. I am currently following this AWS article https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/S3ExportTasks.html. I am stuck with the creation of export task. when I ran the below command
aws logs create-export-task --profile CWLExportUser --task-name "my-log-group-$(date +%Y-%m-%d)" --log-group-name "/aws/lambda/AMICreate" --from 60 --to 155520000 --destination "diego-cw-logs" --destination-prefix "export-task-output"
I am getting below error
An error occurred (InvalidParameterException) when calling the CreateExportTask operation: Specified time range of '60' and '155520000' is not valid. Please make sure the values are within the retention period of the log groups and from value is lesser than the to values
I am missing something. It would be great if some one render their hands to fix the issue.
Warmly,
Muneesh

Use https://currentmillis.com/ to convert time in milisecond. Below is python code and use it with eventbridge to get transfer log as per your need.
import json
import boto3
import botocore
from datetime import datetime, timedelta
def hello(event, context):
# Date calculated in milisecond
time_seven_days_ago = (datetime.now() + timedelta(days=-103)).timestamp()
time_six_days_ago = (datetime.now() + timedelta(days=-102)).timestamp()
print(int(time_seven_days_ago)*1000) #printing to CloudWatch
print(int(time_six_days_ago)*1000) #printing to CloudWatch
#Boto3 objecets for CW and SNS
client = boto3.client('sns')
cli = boto3.client('logs')
try:
response = cli.create_export_task(
taskName='task-name',logGroupName='cloudwatch-log-group-name', fromTime=int(time_seven_days_ago)*1000, to=int(time_six_days_ago)*1000, destination='s3-bucket-name', destinationPrefix='its-your-wish')
print(response)
client.publish(
TopicArn='topic--arn',
#Sending Error in Email
Message='Task completed',
Subject='Create Task Job Status',
MessageStructure='Create Task Job status'
)
except BaseException as error:
print('Error From Lambda Function', error) #Printing Error to CloudWatch
try:
client.publish(
TopicArn='topic-arn',
#Sending Error in Email
Message=error,
Subject='Create Task Job Failed',
MessageStructure='Create Task Job Failed'+error
)
except BaseException as fail:
print('Error while sending email', fail)

--to and --from should be:
The start time of the range for the request, expressed as the number of milliseconds after Jan 1, 1970 00:00:00 UTC
In your case you are using --from 60 --to 155520000 which means that you want to export values from 60 millseconds after 1970 00:00:00 to 155520000 milliseconds after (~two days) after. Obviously this does not make sense.
So basically you have to provide correct timestamps in milliseconds for the range you want to use.

Related

AWS Error "Calling the invoke API action failed with this message: Rate Exceeded" when I use s3.get_paginator('list_objects_v2')

Some third party application is uploading around 10000 object to my bucket+prefix in a day. My requirement is to fetch all objects which were uploaded to my bucket+prefix in last 24 hours.
There are so many files in my bucket+prefix.
So I assume that when I call
response = s3_paginator.paginate(Bucket=bucket,Prefix='inside-bucket-level-1/', PaginationConfig={"PageSize": 1000})
then may be it makes multiple calls to S3 API and may be that's why it is showing Rate Exceeded error.
Below is my Python Lambda Function.
import json
import boto3
import time
from datetime import datetime, timedelta
def lambda_handler(event, context):
s3 = boto3.client("s3")
from_date = datetime.today() - timedelta(days=1)
string_from_date = from_date.strftime("%Y-%m-%d, %H:%M:%S")
print("Date :", string_from_date)
s3_paginator = s3.get_paginator('list_objects_v2')
list_of_buckets = ['kush-dragon-data']
bucket_wise_list = {}
for bucket in list_of_buckets:
response = s3_paginator.paginate(Bucket=bucket,Prefix='inside-bucket-level-1/', PaginationConfig={"PageSize": 1000})
filtered_iterator = response.search(
"Contents[?to_string(LastModified)>='\"" + string_from_date + "\"'].Key")
keylist = []
for key_data in filtered_iterator:
if "/" in key_data:
splitted_array = key_data.split("/")
if len(splitted_array) > 1:
if splitted_array[-1]:
keylist.append(splitted_array[-1])
else:
keylist.append(key_data)
bucket_wise_list.update({bucket: keylist})
print("Total Number Of Object = ", bucket_wise_list)
# TODO implement
return {
'statusCode': 200,
'body': json.dumps(bucket_wise_list)
}
So when we execute above Lambda Function then it shows below error.
"Calling the invoke API action failed with this message: Rate Exceeded."
Can anyone help to resolve this error and achieve my requirement ?
This is probably due to your account restrictions, you should add retry with some seconds between retries or increase pagesize
This is most likely due to you reaching your quota limit for AWS S3 API calls. The "bigger hammer" solution is to request a quota increase, but if you don't want to do that, there is another way using botocore.Config built in retries, for example:
import json
import time
from datetime import datetime, timedelta
from boto3 import client
from botocore.config import Config
config = Config(
retries = {
'max_attempts': 10,
'mode': 'standard'
}
)
def lambda_handler(event, context):
s3 = client('s3', config=config)
###ALL OF YOUR CURRENT PYTHON CODE EXACTLY THE WAY IT IS###
This config will use exponentially increasing sleep timer for a maximum number of retries. From the docs:
Any retry attempt will include an exponential backoff by a base factor of 2 for a maximum backoff time of 20 seconds.
There is also an adaptive mode which is still experimental. For more info, see the docs on botocore.Config retries
Another (much less robust IMO) option would be to write your own paginator with a sleep programmed in, though you'd probably just want to use the builtin backoff in 99.99% of cases (even if you do have to write your own paginator). (this code is untested and isn't even asynchronous, so the sleep will be in addition to the wait time for a page response. To make the "sleep time" exactly sleep_secs, you'll need to use concurrent.futures or asyncio (AWS built in paginators mostly use concurrent.futures)):
from boto3 import client
from typing import Generator
from time import sleep
def get_pages(bucket:str,prefix:str,page_size:int,sleep_secs:float) -> Generator:
s3 = client('s3')
page:dict = client.list_objects_v2(
Bucket=bucket,
MaxKeys=page_size,
Prefix=prefix
)
next_token:str = page.get('NextContinuationToken')
yield page
while(next_token):
sleep(sleep_secs)
page = client.list_objects_v2(
Bucket=bucket,
MaxKeys=page_size,
Prefix=prefix,
ContinuationToken=next_token
)
next_token = page.get('NextContinuationToken')
yield page

How to download new uploaded files from s3 to ec2 everytime

I have an s3 bucket which will receive new files throughout the day. I want to download these to my ec2 instance everytime a new file is uploaded to the bucket.
I have read that its possible using sqs or sns or lambda. Which is the easiest of them all? I need the file to be downloaded as early as possible once it is uploaded into the bucket.
EDIT
I basically will be getting png images in the bucket every few seconds or minutes. Everytime a new image is uploaded, I want to download that on the instance which is already running. I will do some AI processing. As the images will keeep coming into the bucket, I want to constantly keep downloading it in the ec2 and process it as soon as possible.
This is my code in the Lambda function so far.
import boto3
import json
def lambda_handler(event, context):
"""Read file from s3 on trigger."""
#print(event)
s3 = boto3.client("s3")
client = boto3.client("ec2")
ssm = boto3.client("ssm")
instanceid = "******"
if event:
file_obj = event["Records"][0]
#print(file_obj)
bucketname = str(file_obj["s3"]["bucket"]["name"])
print(bucketname)
filename = str(file_obj["s3"]["object"]["key"])
print(filename)
response = ssm.send_command(
InstanceIds=[instanceid],
DocumentName="AWS-RunShellScript",
Parameters={
"commands": [f"aws s3 cp {filename} ."]
}, # replace command_to_be_executed with command
)
# fetching command id for the output
command_id = response["Command"]["CommandId"]
time.sleep(3)
# fetching command output
output = ssm.get_command_invocation(CommandId=command_id, InstanceId=instanceid)
print(output)
return
However I am getting the following error
Test Event Name
test
Response
{
"errorMessage": "2021-12-01T14:11:30.781Z 88dbe51b-53d6-4c06-8c16-207698b3a936 Task timed out after 3.00 seconds"
}
Function Logs
START RequestId: 88dbe51b-53d6-4c06-8c16-207698b3a936 Version: $LATEST
END RequestId: 88dbe51b-53d6-4c06-8c16-207698b3a936
REPORT RequestId: 88dbe51b-53d6-4c06-8c16-207698b3a936 Duration: 3003.58 ms Billed Duration: 3000 ms Memory Size: 128 MB Max Memory Used: 87 MB Init Duration: 314.81 ms
2021-12-01T14:11:30.781Z 88dbe51b-53d6-4c06-8c16-207698b3a936 Task timed out after 3.00 seconds
Request ID
88dbe51b-53d6-4c06-8c16-207698b3a936
When I remove all the lines related to ssm, it works fine. Is there any permission issue or is there any problem with the code?
EDIT2
My code is working but I dont see any output or change in my ec2 instance. I should be seeing an empty text file in the home directory but I dont see anything
Code
import boto3
import json
import time
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
"""Read file from s3 on trigger."""
#print(event)
s3 = boto3.client("s3")
client = boto3.client("ec2")
ssm = boto3.client("ssm")
instanceid = "******"
print("HI")
if event:
file_obj = event["Records"][0]
#print(file_obj)
bucketname = str(file_obj["s3"]["bucket"]["name"])
print(bucketname)
filename = str(file_obj["s3"]["object"]["key"])
print(filename)
print("sending")
try:
response = ssm.send_command(
InstanceIds=[instanceid],
DocumentName="AWS-RunShellScript",
Parameters={
"commands": ["touch hi.txt"]
}, # replace command_to_be_executed with command
)
# fetching command id for the output
command_id = response["Command"]["CommandId"]
time.sleep(3)
# fetching command output
output = ssm.get_command_invocation(CommandId=command_id, InstanceId=instanceid)
print(output)
except Exception as e:
logger.error(e)
raise e
There are several ways. One would be to setup s3 notifications to invoke a lambda function. Then lambda function would use SSM Run Command to execute AWS CLI S3 command on your instance to download the file from S3.
I don't know why there is any recommendation of Lambda here. What you need is simple: S3 object created event notification -> SQS and some job on your EC2 instance watching a long polling queue.
Here is an example of such a python script. You need to sort out how the object key is encoded in the event, but it will be there. I haven't tested this, but it should be pretty close.
import boto3
def main() -> None:
s3 = boto3.client("s3")
sqs = boto3.client("sqs")
while True:
res = sqs.receive_message(
QueueUrl="yourQueue",
WaitTimeSeconds=20,
)
for msg in res.get("Messages", []):
s3.download_file("yourBucket", msg["key"], "local/file/path")
if __name__ == "__main__":
main()
You can use S3 Event Notifications, which react to a new file coming into the s3 bucket.
The destinations supported by s3 event are SNS, SQS or AWS lambda.
You can directly use the lambda as destination as described by #Marcin
You can use SQS has queue with a lambda behind pulling from the queue. It allows you to have some capability like dead letter queue. You can then pull messages from the queue using different methods:
AWS CLI
AWS SDK
You can use SNS with different things behind (you can have many of these desinations in a row which symbolise the fan-out pattern:
a SQS queue to manage the files
an email to notify
a lambda function
...
You can find more explication in ths article: https://aws.plainenglish.io/system-design-s3-events-to-lambda-vs-s3-events-to-sqs-sns-to-lambda-2d41477d1cc9

How to run BigQuery after Dataflow job completed successfully

I am trying to run a query in BigQuery right after a dataflow job completes successfully. I have defined 3 different functions in main.py.
The first one is for running the dataflow job. The second one checks the dataflow jobs status. And the last one runs the query in BigQuery.
The trouble is the second function checks the dataflow job status multiple times for a period of time and after the dataflow job completes successfully, it does not stop checking the status.
And then function deployment fails due to 'function load attempt timed out' error.
from googleapiclient.discovery import build
from oauth2client.client import GoogleCredentials
import os
import re
import config
from google.cloud import bigquery
import time
global flag
def trigger_job(gcs_path, body):
credentials = GoogleCredentials.get_application_default()
service = build('dataflow', 'v1b3', credentials=credentials, cache_discovery=False)
request = service.projects().templates().launch(projectId=config.project_id, gcsPath=gcs_path, body=body)
response = request.execute()
def get_job_status(location, flag):
credentials=GoogleCredentials.get_application_default()
dataflow=build('dataflow', 'v1b3', credentials=credentials, cache_discovery=False)
result=dataflow.projects().jobs().list(projectId=config.project_id, location=location).execute()
for job in result['jobs']:
if re.findall(r'' + re.escape(config.job_name) + '', job['name']):
while flag==0:
if job['currentState'] != "JOB_STATE_DONE":
print('NOT DONE')
else:
flag=1
print('DONE')
break
def bq(sql):
client = bigquery.Client()
query_job = client.query(sql, location='US')
gcs_path = config.gcs_path
body=config.body
trigger_job(gcs_path,body)
flag=0
location='us-central1'
get_job_status(location,flag)
sql= """CREATE OR REPLACE TABLE 'table' AS SELECT * FROM 'table'"""
bq(SQL)
Cloud Function timeout is set to 540 seconds but deployment fails in 3-4 minutes.
Any help is very appreciated.
It appears from the code snippet provided that your HTTP-triggered cloud function is not returning a HTTP response.
All HTTP-based cloud functions must return a HTTP response for proper termination. From the google documentation Ensure HTTP functions send an HTTP response (Emphasis - mine):
If your function is HTTP-triggered, remember to send an HTTP response,
as shown below. Failing to do so can result in your function executing
until timeout. If this occurs, you will be charged for the entire
timeout time. Timeouts may also cause unpredictable behavior or cold
starts on subsequent invocations, resulting in unpredictable behavior
or additional latency.
Thus, you must have a function that in your main.py that returns some sort of value, ideally a value that can be coerced into a Flask http response.

How to troubleshoot and solve lambda function issue?

import sys
import botocore
import boto3
from botocore.exceptions import ClientError
def lambda_handler(event, context):
# TODO implement
rds = boto3.client('rds')
lambdaFunc = boto3.client('lambda')
print 'Trying to get Environment variable'
try:
funcResponse = lambdaFunc.get_function_configuration(
FunctionName='RDSInstanceStart'
)
#print (funcResponse)
DBinstance = funcResponse['Environment']['Variables']['DBInstanceName']
print 'Starting RDS service for DBInstance : ' + DBinstance
except ClientError as e:
print(e)
try:
response = rds.start_db_instance(
DBInstanceIdentifier=DBinstance
)
print 'Success :: '
return response
except ClientError as e:
print(e)
return
{
'message' : "Script execution completed. See Cloudwatch logs for complete output"
}
I have a running rds instance. Every day I start and stop my RDS instance(db.t2.micro (MSSQL Server)) of AWS using a lambda expression. It was working fine previously but unexpectedly today I faced the issue.
Where my rds instance is not started automatically by the lambda expression. I watched an error log but there is not an issue it usually seems like the daily log. I am unable to troubleshoot and solve the issue. Can anyone tell me about this issue?
FYI, a shortened version would be:
import boto3
import os
def lambda_handler(event, context):
rds_client = boto3.client('rds')
response = rds.start_db_instance(DBInstanceIdentifier=os.environ['DBInstanceName'])
print response
You can see the logs of each lambda calls in cloudwatch or in aws lambda-> monitoring -> view logs in cloud watch. This will open a page with logs of each lambda call.
if there is not logs. it means that lambda is not invoking.
you can check the roles and policies assign to lambda if that is correct.
You should print the response of the api you use to start the db (ex- start-db-instance). The response will be printed to CloudWatch Log.
https://docs.aws.amazon.com/cli/latest/reference/rds/start-db-instance.html
for later automation you might want to create a metric-filter on the Lambda's CloudWatch Logs on a certain keyword such as -
"\"DBInstanceStatus\": \"starting\""
there will be an Alarm created as well with setting say threshold < 1, and if the keyword is not found in a log the metric will push no Value (you can customize this setting under Advanced option) and the Alarm will go in to INSUFFICIENT_DATA and you can set notification for INSUFFICIENT_DATA using SNS.
You can tweak the Alarm a bit to treat missing data as Bad and then Alarm will transition to ALARM state when metric filter does not match with the incoming log.

Dataflow stops streaming to BigQuery without errors

We started using Dataflow to read from PubSub and Stream to BigQuery.
Dataflow should work 24/7, because pubsub is constantly updated with analytics data of multiple websites around the world.
Code looks like this:
from __future__ import absolute_import
import argparse
import json
import logging
import apache_beam as beam
from apache_beam.io import ReadFromPubSub, WriteToBigQuery
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import SetupOptions
logger = logging.getLogger()
TABLE_IDS = {
'table_1': 0,
'table_2': 1,
'table_3': 2,
'table_4': 3,
'table_5': 4,
'table_6': 5,
'table_7': 6,
'table_8': 7,
'table_9': 8,
'table_10': 9,
'table_11': 10,
'table_12': 11,
'table_13': 12
}
def separate_by_table(element, num):
return TABLE_IDS[element.get('meta_type')]
class ExtractingDoFn(beam.DoFn):
def process(self, element):
yield json.loads(element)
def run(argv=None):
"""Main entry point; defines and runs the wordcount pipeline."""
logger.info('STARTED!')
parser = argparse.ArgumentParser()
parser.add_argument('--topic',
dest='topic',
default='projects/PROJECT_NAME/topics/TOPICNAME',
help='Gloud topic in form "projects/<project>/topics/<topic>"')
parser.add_argument('--table',
dest='table',
default='PROJECTNAME:DATASET_NAME.event_%s',
help='Gloud topic in form "PROJECT:DATASET.TABLE"')
known_args, pipeline_args = parser.parse_known_args(argv)
# We use the save_main_session option because one or more DoFn's in this
# workflow rely on global context (e.g., a module imported at module level).
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True
p = beam.Pipeline(options=pipeline_options)
lines = p | ReadFromPubSub(known_args.topic)
datas = lines | beam.ParDo(ExtractingDoFn())
by_table = datas | beam.Partition(separate_by_table, 13)
# Create a stream for each table
for table, id in TABLE_IDS.items():
by_table[id] | 'write to %s' % table >> WriteToBigQuery(known_args.table % table)
result = p.run()
result.wait_until_finish()
if __name__ == '__main__':
logger.setLevel(logging.INFO)
run()
It works fine but after some time (2-3 days) it stops streaming for some reason.
When I check job status, it contains no errors in the logs section (you know, ones marked with red "!" in dataflow's job details). If I cancel the job and run it again - it starts working again, as usual.
If I check Stackdriver for additional logs, here's all Errors that happened:
Here's some warnings that occur periodically while job executes:
Details of one of them:
{
insertId: "397122810208336921:865794:0:479132535"
jsonPayload: {
exception: "java.lang.IllegalStateException: Cannot be called on unstarted operation.
at com.google.cloud.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.getElementsSent(RemoteGrpcPortWriteOperation.java:111)
at com.google.cloud.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker.updateProgress(BeamFnMapTaskExecutor.java:293)
at com.google.cloud.dataflow.worker.fn.control.BeamFnMapTaskExecutor$SingularProcessBundleProgressTracker.periodicProgressUpdate(BeamFnMapTaskExecutor.java:280)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
"
job: "2018-11-30_10_35_19-13557985235326353911"
logger: "com.google.cloud.dataflow.worker.fn.control.BeamFnMapTaskExecutor"
message: "Progress updating failed 4 times. Following exception safely handled."
stage: "S0"
thread: "62"
work: "c-8756541438010208464"
worker: "beamapp-vitar-1130183512--11301035-mdna-harness-lft7"
}
labels: {
compute.googleapis.com/resource_id: "397122810208336921"
compute.googleapis.com/resource_name: "beamapp-vitar-1130183512--11301035-mdna-harness-lft7"
compute.googleapis.com/resource_type: "instance"
dataflow.googleapis.com/job_id: "2018-11-30_10_35_19-13557985235326353911"
dataflow.googleapis.com/job_name: "beamapp-vitar-1130183512-742054"
dataflow.googleapis.com/region: "europe-west1"
}
logName: "projects/PROJECTNAME/logs/dataflow.googleapis.com%2Fharness"
receiveTimestamp: "2018-12-03T20:33:00.444208704Z"
resource: {
labels: {
job_id: "2018-11-30_10_35_19-13557985235326353911"
job_name: "beamapp-vitar-1130183512-742054"
project_id: PROJECTNAME
region: "europe-west1"
step_id: ""
}
type: "dataflow_step"
}
severity: "WARNING"
timestamp: "2018-12-03T20:32:59.442Z"
}
Here's the moment when it seems to start having problems:
Additional info messages that may help:
According to these messages, we don't run out of memory/processing power etc. The job is run with these parameters:
python -m start --streaming True --runner DataflowRunner --project PROJECTNAME --temp_location gs://BUCKETNAME/tmp/ --region europe-west1 --disk_size_gb 30 --machine_type n1-standard-1 --use_public_ips false --num_workers 1 --max_num_workers 1 --autoscaling_algorithm NONE
What could be the problem here?
This isn't really an answer, more helping identify the cause: so far, all streaming Dataflow jobs I've launched using python SDK have stopped that way after some days, whether they use BigQuery as sink or not. So the cause rather seems to be the general fact that streaming jobs with the python SDK are still in beta.
My personal solution: use the Dataflow templates to stream from Pub/Sub to BigQuery (thus avoiding the python SDK), then schedule queries in BigQuery to periodically treat the data. Unfortunately that might not be appropriate for your use cases.
in my company we are experiencing the same and identical problem, as described by the OP, with a similar use case.
Unfortunately the problem is real, concrete and apparently with a random occurrence.
As a workaround, we are considering rewriting our pipeline using the java SDK.
I had a similar issue to this and found that the warning logs contained python Stack trace hidden in the java logs advising of errors.
These errors were continually re-tried by workers causing them to crash and completely freeze the pipeline. I initially thought the No. of workers was too low, so scaled up the number of workers, but the pipeline just took longer to freeze.
I ran the pipeline locally and exported the pubsub messages as text and identified they contained dirty data(messages that did not match the BQ table schema) and as I had no exception handling, that seemed to be the cause of the pipeline to freeze.
Adding a function only accept a record where the first key matches the expected column of your BQ Schema fixed my issue and the Dataflow Job has been running with no issues ongoing.
def bad_records(row):
if 'key1' in row:
yield row
else:
print('bad row',row)
|'exclude bad records' >> beam.ParDo(bad_records)