list automated RDS snapshots created today and copy to other region using boto3 - amazon-web-services

We are building an automated DR cold site on other region, currently are working on retrieving a list of RDS automated snapshots created today, and passed them to another function to copy them to another AWS region.
The issue is with RDS boto3 client where it returned a unique format of date, making filtering on creation date more difficult.
today = (datetime.today()).date()
rds_client = boto3.client('rds')
snapshots = rds_client.describe_db_snapshots(SnapshotType='automated')
harini = "datetime("+ today.strftime('%Y,%m,%d') + ")"
print harini
print snapshots
for i in snapshots['DBSnapshots']:
if i['SnapshotCreateTime'].date() == harini:
print(i['DBSnapshotIdentifier'])
print (today)
despite already converted the date "harini" to the format 'SnapshotCreateTime': datetime(2015, 1, 1), the Lambda function still unable to list out the snapshots.

The better method is to copy the files as they are created by invoking a lambda function using a cloud watch event.
See step by step instruction:
https://geektopia.tech/post.php?blogpost=Automating_The_Cross_Region_Copy_Of_RDS_Snapshots
Alternatively, you can issue a copy for each snapshot regardless of the date. The client will raise an exception and you can trap it like this
# Written By GeekTopia
#
# Copy All Snapshots for an RDS Instance To a new region
# --Free to use under all conditions
# --Script is provied as is. No Warranty, Express or Implied
import json
import boto3
from botocore.exceptions import ClientError
import time
destinationRegion = "us-east-1"
sourceRegion = 'us-west-2'
rdsInstanceName = 'needbackups'
def lambda_handler(event, context):
#We need two clients
# rdsDestinationClient -- Used to start the copy processes. All cross region
copies must be started from the destination and reference the source
# rdsSourceClient -- Used to list the snapshots that need to be copied.
rdsDestinationClient = boto3.client('rds',region_name=destinationRegion)
rdsSourceClient=boto3.client('rds',region_name=sourceRegion)
#List All Automated for A Single Instance
snapshots = rdsSourceClient.describe_db_snapshots(DBInstanceIdentifier=rdsInstanceName,SnapshotType='automated')
for snapshot in snapshots['DBSnapshots']:
#Check the the snapshot is NOT in the process of being created
if snapshot['Status'] == 'available':
#Get the Source Snapshot ARN. - Always use the ARN when copying snapshots across region
sourceSnapshotARN = snapshot['DBSnapshotArn']
#build a new snapshot name
sourceSnapshotIdentifer = snapshot['DBSnapshotIdentifier']
targetSnapshotIdentifer ="{0}-ManualCopy".format(sourceSnapshotIdentifer)
targetSnapshotIdentifer = targetSnapshotIdentifer.replace(":","-")
#Adding a delay to stop from reaching the api rate limit when there are large amount of snapshots -
#This should never occur in this use-case, but may if the script is modified to copy more than one instance.
time.sleep(.2)
#Execute copy
try:
copy = rdsDestinationClient.copy_db_snapshot(SourceDBSnapshotIdentifier=sourceSnapshotARN,TargetDBSnapshotIdentifier=targetSnapshotIdentifer,SourceRegion=sourceRegion)
print("Started Copy of Snapshot {0} in {2} to {1} in {3} ".format(sourceSnapshotIdentifer,targetSnapshotIdentifer,sourceRegion,destinationRegion))
except ClientError as ex:
if ex.response['Error']['Code'] == 'DBSnapshotAlreadyExists':
print("Snapshot {0} already exist".format(targetSnapshotIdentifer))
else:
print("ERROR: {0}".format(ex.response['Error']['Code']))
return {
'statusCode': 200,
'body': json.dumps('Opearation Complete')
}

The code below will take automated snapshots created today.
import boto3
from datetime import date, datetime
region_src = 'us-east-1'
client_src = boto3.client('rds', region_name=region_src)
date_today = datetime.today().strftime('%Y-%m-%d')
def get_db_snapshots_src():
response = client_src.describe_db_snapshots(
SnapshotType = 'automated',
IncludeShared=False,
IncludePublic=False
)
snapshotsInDay = []
for i in response["DBSnapshots"]:
if i["SnapshotCreateTime"].strftime('%Y-%m-%d') == date.isoformat(date.today()):
snapshotsInDay.append(i)
return snapshotsInDay

Related

AWS Lambda failing to fetch EC2 AZ details

I am trying to create lambda script using Python3.9 which will return total ec2 servers in AWS account, their status & details. Some of my code snippet is -
def lambda_handler(event, context):
client = boto3.client("ec2")
#s3 = boto3.client("s3")
# fetch information about all the instances
status = client.describe_instances()
for i in status["Reservations"]:
instance_details = i["Instances"][0]
if instance_details["State"]["Name"].lower() in ["shutting-down","stopped","stopping","terminated",]:
print("AvailabilityZone: ", instance_details['AvailabilityZone'])
print("\nInstanceId: ", instance_details["InstanceId"])
print("\nInstanceType: ",instance_details['InstanceType'])
On ruunning this code i get error -
If I comment AZ details, code works fine.If I create a new function with only AZ parameter in it, all AZs are returned. Not getting why it fails in above mentioned code.
In python, its always a best practice to use get method to fetch value from list or dict to handle exception.
AvailibilityZone is actually present in Placement dict and not under instance details. You can check the entire response structure from below boto 3 documentation
Reference
def lambda_handler(event, context):
client = boto3.client("ec2")
#s3 = boto3.client("s3")
# fetch information about all the instances
status = client.describe_instances()
for i in status["Reservations"]:
instance_details = i["Instances"][0]
if instance_details["State"]["Name"].lower() in ["shutting-down","stopped","stopping","terminated",]:
print(f"AvailabilityZone: {instance_details.get('Placement', dict()).get('AvailabilityZone')}")
print(f"\nInstanceId: {instance_details.get('InstanceId')}")
print(f"\nInstanceType: {instance_details.get('InstanceType')}")
The problem is that in response of describe_instances availability zone is not in first level of instance dictionary (in your case instance_details). Availability zone is under Placement dictionary, so what you need is
print(f"AvailabilityZone: {instance_details.get('Placement', dict()).get('AvailabilityZone')}")

AWS lambda function not giving correct output

I am new to python using boto3 for AWS. I am creating a lambda function that will return orphaned snapshots list. Code is -
def lambda_handler(event, context):
ec2_resource = boto3.resource('ec2')
# Make a list of existing volumes
all_volumes = ec2_resource.volumes.all()
volumes = [volume.volume_id for volume in all_volumes]
# Find snapshots without existing volume
snapshots = ec2_resource.snapshots.filter(OwnerIds=['self'])
# Create list of all snapshots
osl =[]
for snapshot in snapshots:
if snapshot.volume_id not in volumes:
osl.append(snapshot)
print('\n Snapshot ID is :- '+str(snapshot))
#snapshot.delete()
continue
for tag in snapshot.tags:
if tag['Key'] == 'Name':
value=tag['Value']
print('\n Snapshot Tags are:- '+str(tag))
break
print('Total orphaned snapshots are:- '+str(len(osl)))
This returns list of snapshots & tags too in incorrect format.
Surprisingly, when I run same code in another account, it gives lambda function error -
I have created same permissions IAM role. But different results in different accounts is something i am not gettting.
You are printing snapshot object but you want to print id.
Second problem is all snapshot do not need to have tags, you need to check if snapshot have tags or not.
def lambda_handler(event, context):
ec2_resource = boto3.resource('ec2')
# Make a list of existing volumes
all_volumes = ec2_resource.volumes.all()
volumes = [volume.volume_id for volume in all_volumes]
# Find snapshots without existing volume
snapshots = ec2_resource.snapshots.filter(OwnerIds=['self'])
# Create list of all snapshots
osl =[]
for snapshot in snapshots:
if snapshot.volume_id not in volumes:
osl.append(snapshot.id)
print('\n Snapshot ID is :- ' + snapshot.id)
for snapshot in snapshots:
if snapshot.volume_id not in volumes:
osl.append(snapshot.id)
print('\n Snapshot ID is :- ' + snapshot.id)
if snapshot.tags:
for tag in snapshot.tags:
if tag['Key'] == 'Name':
value = tag['Value']
print('\n Snapshot Tags are '+ tag['Key'] + ", value = " + value)
break
print('Total orphaned snapshots are:- '+str(len(osl)))

How start an EC2 instance through Apache Guacamole?

In my project, some EC2 instances will be shut down. These instances will only be connected when the user needs to work.
Users will access the instances using a clientless remote desktop gateway called Apache Guacamole.
If the instance is stopped, how start an EC2 instance through Apache Guacamole?
Home Screen
Guacamole is, essentially, an RDP/VNC/SSH client and I don't think you can get the instances to startup by themselves since there is no possibility for a wake-on-LAN feature or something like it out-of-the-box.
I used to have a similar issue and we always had one instance up and running and used it to run the AWS CLI to startup the instances we wanted.
Alternatively you could modify the calls from Guacamole to invoke a Lambda function to check if the instance you wish to connect to is running and start it up if not; but then you'd have to deal with the timeout for starting a session from Guacamole (not sure if this is a configurable value from the web admin console, or files), or set up another way of getting feedback for when your instance becomes available.
There was a discussion in the Guacamole mailing list regarding Wake-on-LAN feature and one approach was proposed. It is based on the script that monitors connection attempts and launches instances when needed.
Although it is more a workaround, maybe it will be helpful for you. For the proper solution, it is possible to develop an extension.
You may find the discussion and a link to the script here:
http://apache-guacamole-general-user-mailing-list.2363388.n4.nabble.com/guacamole-and-wake-on-LAN-td7526.html
http://apache-guacamole-general-user-mailing-list.2363388.n4.nabble.com/Wake-on-lan-function-working-td2832.html
There is unfortunately not a very simple solution. The Lambda approach is the way we solved it.
Guacamole has a feature that logs accesses to Cloudwatch Logs.
So next we need the the information of the connection_id and the username/id as a tag on the instance. We are automatically assigning theses tags with our back-end tool when starting the instances.
Now when a user connects to a machine, a log is written to Cloudwatch Logs.
A filter is applied to only get login attempts and trigger Lambda.
The triggered Lambda script checks if there is an instance with such tags corresponding to the current connection attempt and if the instance is stopped, plus other constraints, like if an instance is expired for example.
If yes, then the instance gets started, and in roughly 40 seconds the user is able to connect.
The lambda scripts looks like this:
#receive information from cloudwatch event, parse it call function to start instances
import re
import boto3
import datetime
from conn_inc import *
from start_instance import *
def lambda_handler(event, context):
# Variables
region = "eu-central-1"
cw_region = "eu-central-1"
# Clients
ec2Client = boto3.client('ec2')
# Session
session = boto3.Session(region_name=region)
# Resource
ec2 = session.resource('ec2', region)
print(event)
#print ("awsdata: ", event['awslogs']['data'])
userdata ={}
userdata = get_userdata(event['awslogs']['data'])
print ("logDataUserName: ", userdata["logDataUserName"], "connection_ids: ", userdata["startConnectionId"])
start_instance(ec2,ec2Client, userdata["logDataUserName"],userdata["startConnectionId"])
import boto3
import datetime
from datetime import date
import gzip
import json
import base64
from start_related_instances import *
def start_instance(ec2,ec2Client,logDataUserName,startConnectionId):
# Boto 3
# Use the filter() method of the instances collection to retrieve
# all stopped EC2 instances which have the tag connection_ids.
instances = ec2.instances.filter(
Filters=[
{
'Name': 'instance-state-name',
'Values': ['stopped'],
},
{
'Name': 'tag:connection_ids',
'Values': [f"*{startConnectionId}*"],
}
]
)
# print ("instances: ", list(instances))
#check if instances are found
if len(list(instances)) == 0:
print("No instances with connectionId ", startConnectionId, " found that is stopped.")
else:
for instance in instances:
print(instance.id, instance.instance_type)
expire = ""
connectionName = ""
for tag in instance.tags:
if tag["Key"] == 'expire': #get expiration date
expire = tag["Value"]
if (expire == ""):
print ("Start instance: ", instance.id, ", no expire found")
ec2Client.start_instances(
InstanceIds=[instance.id]
)
else:
print("Check if instance already expired.")
splitDate = expire.split(".")
expire = datetime.datetime(int(splitDate[2]) , int(splitDate[1]) , int(splitDate[0]) )
args = date.today().timetuple()[:6]
today = datetime.datetime(*args)
if (expire >= today):
print("Instance is not yet expired.")
print ("Start instance: ", instance.id, "expire: ", expire, ", today: ", today)
ec2Client.start_instances(
InstanceIds=[instance.id]
)
else:
print ("Instance not started, because it already expired: ", instance.id,"expiration: ", f"{expire}", "today:", f"{today}")
def get_userdata(cw_data):
compressed_payload = base64.b64decode(cw_data)
uncompressed_payload = gzip.decompress(compressed_payload)
payload = json.loads(uncompressed_payload)
message = ""
log_events = payload['logEvents']
for log_event in log_events:
message = log_event['message']
# print(f'LogEvent: {log_event}')
#regex = r"\'.*?\'"
#m = re.search(str(regex), str(message), re.DOTALL)
logDataUserName = message.split('"')[1] #get the username from the user logged into guacamole "Adm_EKoester_1134faD"
startConnectionId = message.split('"')[3] #get the connection Id of the connection which should be started
# create dict
dict={}
dict["connected"] = False
dict["disconnected"] = False
dict["error"] = True
dict["guacamole"] = payload["logStream"]
dict["logDataUserName"] = logDataUserName
dict["startConnectionId"] = startConnectionId
# check for connected or disconnected
ind_connected = message.find("connected to connection")
ind_disconnected = message.find("disconnected from connection")
# print ("ind_connected: ", ind_connected)
# print ("ind_disconnected: ", ind_disconnected)
if ind_connected > 0 and not ind_disconnected > 0:
dict["connected"] = True
dict["error"] = False
elif ind_disconnected > 0 and not ind_connected > 0:
dict["disconnected"] = True
dict["error"] = False
return dict
The cloudwatch logs trigger for lambda like that:

Move S3 files older than 100 days to another bucket

Is there a way to find all files that are older than 100 days in one S3 bucket and move them to a different bucket? Solutions using AWS CLI or SDK both welcome.
In the src bucket, the files are organized like bucket/type/year/month/day/hour/file
S3://my-logs-bucket/logtype/2020/04/30/16/logfile.csv
For instance, on 2020/04/30, log files on or before 2020/01/21 will have to be moved.
Here's some Python code that will:
Move files from Bucket-A to Bucket-B if they are older than a given period
Full names and paths will be retained
import boto3
from datetime import datetime, timedelta
SOURCE_BUCKET = 'bucket-a'
DESTINATION_BUCKET = 'bucket-b'
s3_client = boto3.client('s3')
# Create a reusable Paginator
paginator = s3_client.get_paginator('list_objects_v2')
# Create a PageIterator from the Paginator
page_iterator = paginator.paginate(Bucket=SOURCE_BUCKET)
# Loop through each object, looking for ones older than a given time period
for page in page_iterator:
for object in page['Contents']:
if object['LastModified'] < datetime.now().astimezone() - timedelta(days=2): # <-- Change time period here
print(f"Moving {object['Key']}")
# Copy object
s3_client.copy_object(
Bucket=DESTINATION_BUCKET,
Key=object['Key'],
CopySource={'Bucket':SOURCE_BUCKET, 'Key':object['Key']}
)
# Delete original object
s3_client.delete_object(Bucket=SOURCE_BUCKET, Key=object['Key'])
It worked for me, but please test it on less-important data before deploying in production since it deletes objects!
The code uses a paginator in case there are over 1000 objects in the bucket.
You can change the time period as desired.
(In addition to the license granted under the terms of service of this site the contents of this post are licensed under MIT-0.)
As mentioned in my comments you can create a lifecycle policy for an S3 bucket. Here is steps to do it https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html
It's optional to delete\expire an object using Lifecycle policy rules, you define the actions you want on the objects in your S3 bucket.
Lifecycle policies uses different storage classes to transition your objects. Before configuring Lifecycle policies I suggest reading up on the different storage classes as each have their own associated cost: Standard-IA, One Zone-IA, Glacier, and Deep Archive storage classes
Your use case of 100 days, I recommend transitioning your logs to a archive storage class such as S3 Glacier. This might prove to be more cost effective.
Adding on from John's answer, if the objects are not in the root directory of the bucket then a few adjustments to the script need to be made. If they are in the root directory, use John's answer, this script will only work if the objects are in a sub-directory. This script moves objects from bucket/path/to/objects/ to bucket2/path/to/objects/ assuming you have access to each bucket from same set of aws cli credentials.
import boto3
from datetime import datetime, timedelta
SOURCE_BUCKET = 'bucket-a'
SOURCE_PATH = 'path/to/objects/'
DESTINATION_BUCKET = 'bucket-b'
DESTINATION_PATH = 'path/to/send/objects/' #<- you may need to add a prefix of the filenames to the end so that paginator doesn't look at the 'objects' directory
s3_client = boto3.client('s3')
# Create a reusable Paginator
paginator = s3_client.get_paginator('list_objects_v2')
# Create a PageIterator from the Paginator and include Prefix argument and optional PaginationConfig argument to control the number of objects you want to iterate over (incase you have a lot)
page_iterator = paginator.paginate(Bucket=SOURCE_BUCKET, Prefix=SOURCE_PATH, PaginationConfig={'MaxItems':10000})
# Loop through each object, looking for ones older than a given time period
for page in page_iterator:
for object in page.get("Contents", []):
if object['LastModified'] < datetime.now().astimezone() - timedelta(days=100): # <-- Change time period here
# grab filename from path/to/filename
FILENAME = object['Key'].rsplit('/',1)[1]
# Copy object
s3_client.copy_object(
Bucket=DESTINATION_BUCKET,
Key=DESTINATION_PATH+FILENAME,
CopySource={'Bucket':SOURCE_BUCKET, 'Key':object['Key']}
)
# Delete original object
s3_client.delete_object(Bucket=SOURCE_BUCKET, Key=object['Key'])

Handling EC2 Description Rate Limiting In Boto3 Lambda?

I'm creating a Lambda function with the intent of backing up my EC2 instances with their snapshots. However, I noticed reading the boto documentation the call to ec2.describe_instances is rate limited with MaxResults/NextToken. How can I combine the two of these to safely iterate through the list 50 at a time? Below is my work in progress:
import boto3
import datetime
import time
ec2 = boto3.client('ec2')
def lambda_handler(event, context):
try:
print("Creating snapshots on " + str(datetime.datetime.today()) + ".")
maxResults = 50
schedulers = ec2.describe_instances(Filters=[{'Name':'tag:GL-sub-purpose', 'Values':[Schedule]}], MaxResults=maxResults)
nextToken = schedulers['NextToken']
totalSchedulers = len(schedulers)
while totalSchedulers == maxResults:
schedulers = ec2.describe_instances(Filters=[{'Name':'tag:GL-sub-purpose', 'Values':[Schedule]}], MaxResults=maxResults, NextToken=nextToken)
nextToken = result['NextToken']
totalSchedulers = len(schedulers)
print("Performing backup on " + str(len(schedulers)) + " schedules.")
successful = []
failed = []
for s in schedulers:
#[...] More operations here, done 50 at a time.
I'm not really sure if I'm using the MaxResults/NextToken parameters correctly or efficiently here. Is this the best way to achieve my desired result/am I on the right track?
Just iterate through until NextToken is not returned. Here is a sample code to iterate through a batch of instances. Change it to suit your needs.
import boto3
ec2 = boto3.client('ec2')
insts = ec2.describe_instances(MaxResults=50)
while True:
#
# Process Instances (insts)
#
if 'NextToken' not in insts: break
next_token = insts['NextToken']
insts = ec2.describe_instances(MaxResults=50, NextToken=next_token)