I am attempting to write a simple Lambda function to query a table in Athena. But after a few seconds I see "Status: FAILED" in the Cloudwatch logs.
There is no descriptive error message on the cause of failure.
My test code is below:
import json
import time
import boto3
# athena constant
DATABASE = 'default'
TABLE = 'test'
# S3 constant
S3_OUTPUT = 's3://test-output/'
# number of retries
RETRY_COUNT = 1000
def lambda_handler(event, context):
# created query
query = "SELECT * FROM default.test limit 2"
# % (DATABASE, TABLE)
# athena client
client = boto3.client('athena')
# Execution
response = client.start_query_execution(
QueryString=query,
QueryExecutionContext={
'Database': DATABASE
},
ResultConfiguration={
'OutputLocation': S3_OUTPUT,
}
)
# get query execution id
query_execution_id = response['QueryExecutionId']
print(query_execution_id)
# get execution status
for i in range(1, 1 + RETRY_COUNT):
# get query execution
query_status = client.get_query_execution(QueryExecutionId=query_execution_id)
query_execution_status = query_status['QueryExecution']['Status']['State']
if query_execution_status == 'SUCCEEDED':
print("STATUS:" + query_execution_status)
break
if query_execution_status == 'FAILED':
#raise Exception("STATUS:" + query_execution_status)
print("STATUS:" + query_execution_status)
else:
print("STATUS:" + query_execution_status)
time.sleep(i)
else:
# Did not encounter a break event. Need to kill the query
client.stop_query_execution(QueryExecutionId=query_execution_id)
raise Exception('TIME OVER')
# get query results
result = client.get_query_results(QueryExecutionId=query_execution_id)
print(result)
return
The logs show the following:
2020-08-31T10:52:12.443-04:00
START RequestId: e5434651-d36e-48f0-8f27-0290 Version: $LATEST
2020-08-31T10:52:13.481-04:00
88162f38-bfcb-40ae-b4a3-0b5a21846e28
2020-08-31T10:52:13.500-04:00
STATUS:QUEUED
2020-08-31T10:52:14.519-04:00
STATUS:RUNNING
2020-08-31T10:52:16.540-04:00
STATUS:RUNNING
2020-08-31T10:52:19.556-04:00
STATUS:RUNNING
2020-08-31T10:52:23.574-04:00
STATUS:RUNNING
2020-08-31T10:52:28.594-04:00
STATUS:FAILED
2020-08-31T10:52:28.640-04:00
....more status: FAILED
....
END RequestId: e5434651-d36e-48f0-8f27-0290
REPORT RequestId: e5434651-d36e-48f0-8f27-0290 Duration: 30030.22 ms Billed Duration: 30000 ms Memory Size: 128 MB Max Memory Used: 72 MB Init Duration: 307.49 ms
2020-08-31T14:52:42.473Z e5434651-d36e-48f0-8f27-0290 Task timed out after 30.03 seconds
I think I have the right permissions for S3 bucket access given to the role (if not, I would have seen the error message). There are no files created in the bucket either. I am not sure what is going wrong here. What am I missing?
Thanks
The last line in your log shows
2020-08-31T14:52:42.473Z e5434651-d36e-48f0-8f27-0290 Task timed out after 30.03 seconds
To me this looks like the timeout of the Lambda Function is set to 30 seconds. Try increasing it to more than the time the Athena query needs (the maximum is 15 minutes).
Related
I am working in AWS GovCloud I have the following configuration in AWS Lambda:
A Lambda function which decodes a payload
A Kinesis Stream set as a trigger for the aforementioned function
A Lambda Destination (we have tried Lambda functions as well as SQS, SNS)
No matter the configuration, I cannot seem to get Lambda to trigger the destination function (or queue in the event of SQS).
Here is the current Lambda Function. I have tried many permutations of the result/return payload without avail.
import base64
import json
def lambda_handler(event, context):
#print("Received event: " + json.dumps(event, indent=2))
for record in event['Records']:
payload = base64.b64decode(record['kinesis']['data']).decode('utf-8', 'ignore')
print("Success")
result = {
"statusCode": 202,
"headers": {
#'Content-Type': 'application/json',
},
"body": '{payload}'
}
return json.dumps(result)
I then send a message to Kinesis with the AWS CLI (I have noted that "Test" in the console does not observe desintations as per Jared Short ).
Every 0.1s: aws kinesis put-records --stream-name test-stream --records Data=SGVsbG8sIHRoaXMgaXMgYSB0ZXN0IGZyb20gdGhlIEFXUyBDTEkh,PartitionKey=partitionkey1 Thu Jul 8 19:03:54 2021
{
"FailedRecordCount": 0,
"Records": [
{
"SequenceNumber": "49619938447946944252072058244333476686328287240252293122",
"ShardId": "shardId-000000000000"
}
]
}
Using Cloudwatch metrics and logs I am able to observe the function being triggered by the messages sent to Kinesis every .1 second.
The metrics charts indicate a success (as I expect).
Here is an example log from Cloudwatch:
START RequestId: 0cf3fb87-06e6-4e35-9de8-b30147e7be9d Version: $LATEST
Loading function
Success
END RequestId: 0cf3fb87-06e6-4e35-9de8-b30147e7be9d
REPORT RequestId: 0cf3fb87-06e6-4e35-9de8-b30147e7be9d Duration: 1.27 ms Billed Duration: 2 ms Memory Size: 128 MB Max Memory Used: 51 MB Init Duration: 113.64 ms
START RequestId: e663fa4a-2d0b-42d6-9e38-599712b71101 Version: $LATEST
Success
END RequestId: e663fa4a-2d0b-42d6-9e38-599712b71101
REPORT RequestId: e663fa4a-2d0b-42d6-9e38-599712b71101 Duration: 1.04 ms Billed Duration: 2 ms Memory Size: 128 MB Max Memory Used: 51 MB
START RequestId: b1373bbe-d2c6-49fb-a71f-dcedaf9210eb Version: $LATEST
Success
END RequestId: b1373bbe-d2c6-49fb-a71f-dcedaf9210eb
REPORT RequestId: b1373bbe-d2c6-49fb-a71f-dcedaf9210eb Duration: 0.98 ms Billed Duration: 1 ms Memory Size: 128 MB Max Memory Used: 51 MB
START RequestId: e0382653-9c33-44d6-82a7-a82f0f416297 Version: $LATEST
Success
END RequestId: e0382653-9c33-44d6-82a7-a82f0f416297
REPORT RequestId: e0382653-9c33-44d6-82a7-a82f0f416297 Duration: 1.05 ms Billed Duration: 2 ms Memory Size: 128 MB Max Memory Used: 51 MB
START RequestId: f9600ef5-419f-4271-9680-7368ccc5512d Version: $LATEST
Success
However, viewing the cloudwatch logs/metrics for the destination lambda function or SQS queue clearly show that the destination is not being triggered.
Over the course of troubleshooting, I have over-provisioned IAM policies to the Lambda function execution role so I am fairly confident that it is not an IAM related issue. Additionally, both functions are sharing the same execution role.
One thing I am not clear on after reviewing AWS documentation and 3rd party information is the criteria by which AWS determines success or failure for a given function. I am currently researching the invokation docs in search of what might be wrong here - but my interpretation is that AWS knows our function is successful based on the above Cloudwatch metrics showing a 100% success rate.
Does anyone know what I am doing wrong or how to try to troubleshoot the destination trigger for lambda?
Edit: As pointed out, the code is not correct for multiple record events. This is a function of senseless troubleshooting/changes to the code to get the Destination to trigger. Even something as simple as this does not invoke the destination.
import base64
import json
def lambda_handler(event, context):
#print("Received event: " + json.dumps(event, indent=2))
# for record in event['Records']:
# payload = base64.b64decode(record['kinesis']['data']).decode('utf-8', 'ignore')
# print("Success")
# result = {
# "statusCode": 202,
# "headers": {
# 'Content-Type': 'application/json',
# },
# "body": '{"Success":True, payload}'
# }
return { "result": "OK" }
So, the question: Can someone demonstrate it is possible to have a Kinesis Stream Event Source Trigger a Lambda Function which successfully triggers a Lambda destination in AWS Govcloud?
A help because are not writing the file to the S3 bucket
What did I do:
import time
import boto3
query = 'SELECT * FROM db_lambda.tb_inicial limit 10'
DATABASE = 'db_lambda'
output = 's3: // bucket-lambda-test1 / result /'
def lambda_handler (event, context):
client = boto3.client ('athena')
# Execution
response = client.start_query_execution (
QueryString = query,
QueryExecutionContext = {
Database: DATABASE
},
ResultConfiguration = {
'OutputLocation': output,
}
)
return response
return
IAM role created with:
AmazonS3FullAccess
AmazonAthenaFullAccess
CloudWatchLogsFullAccess
AmazonVPCFullAccess
AWSLambda_FullAccess
When running Lambda message:
Response:
{
"statusCode": 200,
"body": "\" Hello from Lambda! \ ""
}
Request ID:
"f2dd5cd2-070c-41ea-939f-d4909ce39fd0"
Function logs:
START RequestId: f2dd5cd2-070c-41ea-939f-d4909ce39fd0 Version: $ LATEST
END RequestId: f2dd5cd2-070c-41ea-939f-d4909ce39fd0
REPORT RequestId: f2dd5cd2-070c-41ea-939f-d4909ce39fd0 Duration: 0.84 ms Billed Duration: 1 ms Memory Size: 128 MB Max Memory Used: 52 MB
How I did the test:
Configure test event
A function can have a maximum of 10 test events. The events are maintained, so that you can change your computer or web browser and test the function with the same events.
Create new test event
Edit saved test events
Test event saved
{
}
The "Hello from Lambda" message is the default code in a Lambda function. It would appear that you did not click 'Deploy' before testing the function. Clicking Deploy will save the Lambda code.
Also, once you get it running, please note that start_query_execution() will simply start the Athena query. You will need to use get_query_results() to obtain the results.
Is there a way to delete all AWS Log Groups that haven't had any writes to them in the past 30 days?
Or conversely, get the list of log groups that haven't had anything written to them in the past 30 days?
Here is some quick script I wrote:
#!/usr/bin/python3
# describe log groups
# describe log streams
# get log groups with the lastEventTimestamp after some time
# delete those log groups
# have a dry run option
# support profile
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/logs.html#CloudWatchLogs.Client.describe_log_streams
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/logs.html#CloudWatchLogs.Client.describe_log_groups
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/logs.html#CloudWatchLogs.Client.delete_log_group
import boto3
import time
millis = int(round(time.time() * 1000))
delete = False
debug = False
log_group_prefix='/' # NEED TO CHANGE THESE
days = 30
# Create CloudWatchLogs client
cloudwatch_logs = boto3.client('logs')
log_groups=[]
# List log groups through the pagination interface
paginator = cloudwatch_logs.get_paginator('describe_log_groups')
for response in paginator.paginate(logGroupNamePrefix=log_group_prefix):
for log_group in response['logGroups']:
log_groups.append(log_group['logGroupName'])
if debug:
print(log_groups)
old_log_groups=[]
empty_log_groups=[]
for log_group in log_groups:
response = cloudwatch_logs.describe_log_streams(
logGroupName=log_group, #logStreamNamePrefix='',
orderBy='LastEventTime',
descending=True,
limit=1
)
# The time of the most recent log event in the log stream in CloudWatch Logs. This number is expressed as the number of milliseconds after Jan 1, 1970 00:00:00 UTC.
if len(response['logStreams']) > 0:
if debug:
print("full response is:")
print(response)
print("Last event is:")
print(response['logStreams'][0]['lastEventTimestamp'])
print("current millis is:")
print(millis)
if response['logStreams'][0]['lastEventTimestamp'] < millis - (days * 24 * 60 * 60 * 1000):
old_log_groups.append(log_group)
else:
empty_log_groups.append(log_group)
# delete log group
if delete:
for log_group in old_log_groups:
response = cloudwatch_logs.delete_log_group(logGroupName=log_group)
#for log_group in empty_log_groups:
# response = cloudwatch_logs.delete_log_group(logGroupName=log_group)
else:
print("old log groups are:")
print(old_log_groups)
print("Number of log groups:")
print(len(old_log_groups))
print("empty log groups are:")
print(empty_log_groups)
I have been using aws-cloudwatch-log-clean and I can say it works quite well.
you need boto3 installed and then you:
./sweep_log_streams.py [log_group_name]
It has a --dry-run option for you to check it what you expect first.
A note of caution, If you have a long running process in ECS which is quiet on the Logs, and the log has been truncated to empty in CW due to the logs retention period. Deleting its empty log stream can break and hang the service, as it has nowhere to post its logs to...
I'm not aware of a simple way to do this, but you could use the awscli (or preferably python/boto3) to describe-log-groups, then for each log group invoke describe-log-streams, then for each log group/stream pair, invoke get-log-events with a --start-time of 30 days ago. If the union of all the events arrays for the log group is empty then you know you can delete the log group.
I did the same setup, I my case I want to delete log stream older than X days from the Cloudwatch Log Stream.
remove.py
import optparse
import sys
import os
import json
import datetime
def deletefunc(loggroupname,days):
os.system("aws logs describe-log-streams --log-group-name {} --order-by LastEventTime > test.json".format(loggroupname))
oldstream=[]
milli= days * 24 * 60 * 60 * 1000
with open('test.json') as json_file:
data = json.load(json_file)
for p in data['logStreams']:
subtract=p['creationTime']+milli
sub1=subtract/1000
sub2=datetime.datetime.fromtimestamp(sub1).strftime('%Y-%m-%d')
op=p['creationTime']/1000
original=datetime.datetime.fromtimestamp(op).strftime('%Y-%m-%d')
name=p['logStreamName']
if original < sub2:
oldstream.append(name)
for i in oldstream:
os.system("aws logs delete-log-stream --log-group-name {} --log-stream-name {}".format(loggroupname,i))
parser = optparse.OptionParser()
parser.add_option('--log-group-name', action="store", dest="loggroupname", help="LogGroupName For Eg: testing-vpc-flow-logs",type="string")
parser.add_option('-d','--days', action="store", dest="days", help="Type No.of Days in Integer to Delete Logs other than days provided",type="int")
options, args = parser.parse_args()
if options.loggroupname is None:
print ("Provide log group name to continue.\n")
cmd = 'python3 ' + sys.argv[0] + ' -h'
os.system(cmd)
sys.exit(0)
if options.days is None:
print ("Provide date to continue.\n")
cmd = 'python3 ' + sys.argv[0] + ' -h'
os.system(cmd)
sys.exit(0)
elif options.days and options.loggroupname:
loggroupname = options.loggroupname
days = options.days
deletefunc(loggroupname,days)
you can run this file using the command:
python3 remove.py --log-group-name=testing-vpc-flow-logs --days=7
I started migrating my code to boto 3 and one nice addition I noticed are the waiters.
I want to create a snapshot from a db instance and I want to check for it's availability before I resume with my code.
My approach is the following:
# Notice: Step : Check snapshot availability [1st account - Oregon]
print "--- Check snapshot availability [1st account - Oregon] ---"
new_snap = client1.describe_db_snapshots(DBSnapshotIdentifier=new_snapshot_name)['DBSnapshots'][0]
# print pprint.pprint(new_snap) #debug
waiter = client1.get_waiter('db_snapshot_completed')
print "Manual snapshot is -pending-"
sleep(60)
waiter.wait(
DBSnapshotIdentifier = new_snapshot_name,
IncludeShared = True,
IncludePublic = False
)
print "OK. Manual snapshot is -available-"
,but the documentation says that it polls the status every 15 seconds for 40 times. That is 10 minutes. Yet, a rather big DB will need more than that .
How could I use the waiter to alleviate for that?
Waiters have configuration parameters'delay' and 'max_attempts'
like this :
waiter = rds_client.get_waiter('db_instance_available')
print( "waiter delay: " + str(waiter.config.delay) )
waiter.py on github
You could do it without the waiter if you like.
From the documentation for that waiter:
Polls RDS.Client.describe_db_snapshots() every 15 seconds until a successful state is reached. An error is returned after 40 failed checks.
Basically that means it does the following:
RDS = boto3.client('rds')
RDS.describe_db_snapshots()
You can just run that but filter to your snapshot id, here is the syntax.http://boto3.readthedocs.io/en/latest/reference/services/rds.html#RDS.Client.describe_db_snapshots
response = client.describe_db_snapshots(
DBInstanceIdentifier='string',
DBSnapshotIdentifier='string',
SnapshotType='string',
Filters=[
{
'Name': 'string',
'Values': [
'string',
]
},
],
MaxRecords=123,
Marker='string',
IncludeShared=True|False,
IncludePublic=True|False
)
This will end up looking something like this:
snapshot_description = RDS.describe_db_snapshots(DBSnapshotIdentifier='YOURIDHERE')
then you can just loop until that returns a snapshot which is available. So here is a very rough idea.
import boto3
import time
RDS = boto3.client('rds')
RDS.describe_db_snapshots()
snapshot_description = RDS.describe_db_snapshots(DBSnapshotIdentifier='YOURIDHERE')
while snapshot_description['DBSnapshots'][0]['Status'] != 'available' :
print("still waiting")
time.sleep(15)
snapshot_description = RDS.describe_db_snapshots(DBSnapshotIdentifier='YOURIDHERE')
I think the other answer alluded to this solution but here it is expressly.
[snip]
...
# Create your waiter
waiter_db_snapshot = client1.get_waiter('db_snapshot_completed')
# Increase the max number of tries as appropriate
waiter_db_snapshot.config.max_attempts = 120
# Add a 60 second delay between attempts
waiter_db_snapshot.config.delay = 60
print "Manual snapshot is -pending-"
....
[snip]
I am currently working with a log collection product and want to be able to pull in my CloudTrail logs from AWS. I started using the boto3 client in order to lookup the events in CloudTrail. I got the script to work right when I am running it directly from the commandline, but as soon as I tried to put it in cron to pull the logs automatically over time, it stopped collecting the logs!
Here's a sample of the basics of what's in the script to pull the logs:
#!/usr/bin/python
import boto3
import datetime
import json
import time
import sys
import os
def initialize_log():
try:
log = open('/var/log/aws-cloudtrail.log', 'ab')
except IOError as e:
print " [!] ERROR: Cannot open /var/log/aws-cloudtrail.log (%s)" % (e.strerror)
sys.exit(1)
return log
def date_handler(obj):
return obj.isoformat() if hasattr(obj, 'isoformat') else obj
def read_logs(log):
print "[+] START: Connecting to CloudTrail Logs"
cloudTrail = boto3.client('cloudtrail')
starttime = ""
endtime = ""
if os.path.isfile('/var/log/aws-cloudtrail.bookmark'):
try:
with open('/var/log/aws-cloudtrail.bookmark', 'r') as myfile:
strdate=myfile.read().replace('\n', '')
starttime = datetime.datetime.strptime( strdate, "%Y-%m-%dT%H:%M:%S.%f" )
print " [-] INFO: Found bookmark! Querying with a start time of " + str(starttime)
except IOError as e:
print " [!] ERROR: Cannot open /var/log/aws-cloudtrail.log (%s)" % (e.strerror)
else:
starttime = datetime.datetime.now() - datetime.timedelta(minutes=15)
print " [-] INFO: Cannot find bookmark...Querying with start time of" + str(starttime)
endtime = datetime.datetime.now()
print " [-] INFO: Querying for CloudTrail Logs"
response = cloudTrail.lookup_events(StartTime=starttime, EndTime=endtime, MaxResults=50)
for event in response['Events']:
log.write(json.dumps(event, default=date_handler))
log.write("\n")
print json.dumps(event, default=date_handler)
print "------------------------------------------------------------"
if 'NextToken' in response.keys():
while 'NextToken' in response.keys():
time.sleep(1)
response = cloudTrail.lookup_events(StartTime=starttime, EndTime=endtime, MaxResults=50, NextToken=str(response['NextToken']))
for event in response['Events']:
log.write(json.dumps(event, default=date_handler))
log.write("\n")
print json.dumps(event, default=date_handler)
print "------------------------------------------------------------"
# log.write("\n TESTING 1,2,3 \n")
log.close()
try:
bookmark_file = open('/var/log/aws-cloudtrail.bookmark','w')
bookmark_file.write(str(endtime.isoformat()))
bookmark_file.close()
except IOError as e:
print " [!] ERROR: Cannot set bookmark for last pull time in /var/log/aws-cloudtrail.bookmark (%s)" % (e.strerror)
sys.exit(1)
return True
log = initialize_log()
success = read_logs(log)
if success:
print "[+] DONE: All results printed"
else:
print "[+] ERROR: CloudTrail results were not able to be pulled"
I looked into it more and did some testing to confirm that permissions were right on the destination files and that the script could write to them when run from root's crontab, but I still wasn't getting the logs returned from the boto cloudtrail client unless I ran it manually.
I also checked to make sure that the default region was getting read correctly from /root/.aws/config and it looks like it is, because if I move it I see the cron email show a stack trace instead of the success messages I have built in.
I am hoping someone has already run into this and it's a quick simple answer!
EDIT: The permissions to the cloudtrail logs is allowed via the instance's IAM Role, and yes, the task is scheduled under root's crontab.
Here's the email output:
From root#system Mon Mar 28 23:00:02 2016
X-Original-To: root
From: root#system (Cron Daemon)
To: root#system
Subject: Cron <root#system> /usr/bin/python /root/scripts/get-cloudtrail.py
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/root>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=root>
Date: Mon, 28 Mar 2016 19:00:02 -0400 (EDT)
[+] START: Connecting to CloudTrail Logs
[-] INFO: Found bookmark! Querying with a start time of 2016-03-28 22:55:01.395001
[-] INFO: Querying for CloudTrail Logs
[+] DONE: All results printed