Does boto3 have a credential cache comparable to awscli? - amazon-web-services

With awscli there's a credential cache in ~/.aws/cli/cache which allows me to cache credentials for a while. This is very helpful when using MFA. Does boto3 have a similar capability or do I have to explicitly cache my credentials returned from session = boto3.session.Session(profile_name='CTO:Admin')?

It is already there.
http://boto3.readthedocs.org/en/latest/guide/configuration.html#assume-role-provider
When you specify a profile that has IAM role configuration, boto3 will make an AssumeRole call to retrieve temporary credentials. Subsequent boto3 API calls will use the cached temporary credentials until they expire, in which case boto3 will automatically refresh credentials. boto3 does not write these temporary credentials to disk. This means that temporary credentials from the AssumeRole calls are only cached in memory within a single Session. All clients created from that session will share the same temporary credentials.

I created a Python library that provides this for you - see https://github.com/mixja/boto3-session-cache
Example:
import boto3_session_cache
# This returns a regular boto3 client object with the underlying session configured with local credential cache
client = boto3_session_cache.client('ecs')
ecs_clusters = client.list_clusters()

To summarise above points, a working example:
from os import path
import os
import sys
import json
import datetime
from distutils.spawn import find_executable
from botocore.exceptions import ProfileNotFound
import boto3
import botocore
def json_encoder(obj):
"""JSON encoder that formats datetimes as ISO8601 format."""
if isinstance(obj, datetime.datetime):
return obj.isoformat()
else:
return obj
class JSONFileCache(object):
"""JSON file cache.
This provides a dict like interface that stores JSON serializable
objects.
The objects are serialized to JSON and stored in a file. These
values can be retrieved at a later time.
"""
CACHE_DIR = path.expanduser(path.join('~', '.aws', 'ansible-ec2', 'cache'))
def __init__(self, working_dir=CACHE_DIR):
self._working_dir = working_dir
def __contains__(self, cache_key):
actual_key = self._convert_cache_key(cache_key)
return path.isfile(actual_key)
def __getitem__(self, cache_key):
"""Retrieve value from a cache key."""
actual_key = self._convert_cache_key(cache_key)
try:
with open(actual_key) as f:
return json.load(f)
except (OSError, ValueError, IOError):
raise KeyError(cache_key)
def __setitem__(self, cache_key, value):
full_key = self._convert_cache_key(cache_key)
try:
file_content = json.dumps(value, default=json_encoder)
except (TypeError, ValueError):
raise ValueError("Value cannot be cached, must be "
"JSON serializable: %s" % value)
if not path.isdir(self._working_dir):
os.makedirs(self._working_dir)
with os.fdopen(os.open(full_key,
os.O_WRONLY | os.O_CREAT, 0o600), 'w') as f:
f.truncate()
f.write(file_content)
def _convert_cache_key(self, cache_key):
full_path = path.join(self._working_dir, cache_key + '.json')
return full_path
session = boto3.session.Session()
try:
cred_chain = session._session.get_component('credential_provider')
except ProfileNotFound:
print "Invalid Profile"
sys.exit(1)
provider = cred_chain.get_provider('assume-role')
provider.cache = JSONFileCache()
# Do something with the session...
ec2 = session.resource('ec2')

Originally, the credential caching and automatic renewing of temporary credentials was part of the AWSCLI but this commit (and some subsequent ones) moved that functionality to botocore which means it is now available in boto3, as well.

Related

Lambda task timeout but no application log

I have a python lambda that triggers by S3 uploads to a specific folder. The lambda function is to process the uploaded file and outputs it to another folder on the same S3 bucket.
The issue is that when I do a bulk upload using AWS console, some files do not get processed. I ended up setting a dead letter queue to catch these invocations. While inspecting the message in the queue, there is a request ID which I tried to find it in the lambda logs.
These are the logs for the request ID:
Now the odd part is that in the python code, the first line after the imports is print('Loading function') which does not show up in the lambda log?
Added the python code here. It should still print the Processing file name: " + key which is inside the handler ya?
import urllib.parse
from datetime import datetime
import boto3
from constants import CONTENT_TYPE, XML_EXTENSION, VALIDATING
from xml_process import *
from s3Integration import download_file
print('Loading function')
s3 = boto3.client('s3')
def lambda_handler(event, context):
# Get the object from the event and show its content type
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
print("Processing file name: " + key)
try:
response = s3.get_object(Bucket=bucket, Key=key)
xml_content = response["Body"].read()
content_type = response["ContentType"]
tree = ET.fromstring(xml_content)
key_file_name = key.split("/")[1]
# Creating a temporary copy by downloading file to get the namespaces
temp_file_name = "/tmp/" + key_file_name
download_file(key, temp_file_name)
namespaces = {node[0]: node[1] for _, node in ET.iterparse(temp_file_name, events=['start-ns'])}
for name, value in namespaces.items():
ET.register_namespace(name, value)
# Preparing path for file processing
processed_file = key_file_name.split(".")[0] + "_processed." + key_file_name.split(".")[1]
print(processed_file, "processed")
db_record = XMLMapping(file_path=key,
processed_file_path=processed_file,
uploaded_by="lambda",
status=VALIDATING, uploaded_date=datetime.now(), is_active=True)
session.add(db_record)
session.commit()
if key_file_name.split(".")[1] == XML_EXTENSION:
if content_type in CONTENT_TYPE:
xml_parse(tree, db_record, processed_file, True)
else:
print("Content Type is not valid. Provided value: ", content_type)
else:
print("File extension is not valid. Provided extension: ", key_file_name.split(".")[1])
return "success"
except Exception as e:
print(e)
raise e
I don't think its a permission issue as other files uploaded in the same batch were processed successfully.

Copying S3 objects from one account to other using Lambda python

I'm using boto3 to copy files from s3 bucket from one account to other. I need a similar functionality like aws s3 sync. Please see my code. My company has decided to 'PULL' from other S3 bucket (source account). Please don't suggest replication, S3 batch, S3 trigger Lambda..etc. We have gone through all these options and my management do not want to do any configuration at source side. Can you please review this code and let me know if this code works for thousands of objects. Source bucket has nearly 10000 objects. We will create this lambda function in destination account and create a cloudwatch event to trigger the lambda once in a day.
I am checking ETag so that modified files will be copied across when this function is triggered.
Edit: I simplified my code just to see pagination works. It's working if I don't add client.copy(). If I add this line in for loop after reading 3,4 objects it's throwing "errorMessage": "2021-08-07T15:29:07.827Z 82757747-7b72-4f29-ae9f-22e95f969d6c Task timed out after 3.00 seconds". Please advise. Please note that 'test/' folder in my source bucket has around 1100 objects.
import os
import logging
import botocore
logger = logging.getLogger()
logger.setLevel(os.getenv('debug_level', 'INFO'))
client = boto3.client('s3')
def handler(event, context):
main(event, logger)
def main(event, logger):
try:
SOURCE_BUCKET = os.environ.get('SRC_BUCKET')
DEST_BUCKET = os.environ.get('DST_BUCKET')
REGION = os.environ.get('REGION')
prefix = 'test/'
# Create a reusable Paginator
paginator = client.get_paginator('list_objects_v2')
print ('after paginator')
# Create a PageIterator from the Paginator
page_iterator = paginator.paginate(Bucket=SOURCE_BUCKET,Prefix = prefix)
print ('after page iterator')
index = 0
for page in page_iterator:
for obj in page['Contents']:
index += 1
print ("I am looking for {} in the source bucket".format(obj['ETag']))
copy_source = {'Bucket': SOURCE_BUCKET, 'Key': obj['Key']}
client.copy(copy_source, DEST_BUCKET, obj['Key'])
logger.info("number of objects copied {}:".format(index))
except botocore.exceptions.ClientError as e:
raise
This version is working fine if I increase the Lambda timeout to 15 min and memory to 512MB. This checks if the source object already exists in destination before copying.
import boto3
import os
import logging
import botocore
from botocore.client import Config
logger = logging.getLogger()
logger.setLevel(os.getenv('debug_level', 'INFO'))
config = Config(connect_timeout=5, retries={'max_attempts': 0})
client = boto3.client('s3', config=config)
#client = boto3.client('s3')
def handler(event, context):
main(event, logger)
def main(event, logger):
try:
DEST_BUCKET = os.environ.get('DST_BUCKET')
SOURCE_BUCKET = os.environ.get('SRC_BUCKET')
REGION = os.environ.get('REGION')
prefix = ''
# Create a reusable Paginator
paginator = client.get_paginator('list_objects_v2')
print ('after paginator')
# Create a PageIterator from the Paginator
page_iterator_src = paginator.paginate(Bucket=SOURCE_BUCKET,Prefix = prefix)
page_iterator_dest = paginator.paginate(Bucket=DEST_BUCKET,Prefix = prefix)
print ('after page iterator')
index = 0
for page_source in page_iterator_src:
for obj_src in page_source['Contents']:
flag = "FALSE"
for page_dest in page_iterator_dest:
for obj_dest in page_dest['Contents']:
# checks if source ETag already exists in destination
if obj_src['ETag'] in obj_dest['ETag']:
flag = "TRUE"
break
if flag == "TRUE":
break
if flag != "TRUE":
index += 1
client.copy_object(Bucket=DEST_BUCKET, CopySource={'Bucket': SOURCE_BUCKET, 'Key': obj_src['Key']}, Key=obj_src['Key'],)
print ("source ETag {} and destination ETag {}".format(obj_src['ETag'],obj_dest['ETag']))
print ("source Key {} and destination Key {}".format(obj_src['Key'],obj_dest['Key']))
print ("Number of objects copied{}".format(index))
logger.info("number of objects copied {}:".format(index))
except botocore.exceptions.ClientError as e:
raise

AWS > Lambda > Canary Function

I am looking to use a Lambda function to monitor multiple websites, how would you alter the below codes (from BluePrint)?
Multiple env variable can be set up (e.g. site2, expected2), I just need help with making them work inside the function.
Thank you.
import os
from datetime import datetime
from urllib.request import Request, urlopen
SITE = os.environ['site'] # URL of the site to check, stored in the site environment variable
EXPECTED = os.environ['expected'] # String expected to be on the page, stored in the expected environment variable
def validate(res):
'''Return False to trigger the canary
Currently this simply checks whether the EXPECTED string is present.
However, you could modify this to perform any number of arbitrary
checks on the contents of SITE.
'''
return EXPECTED in res
def lambda_handler(event, context):
print('Checking {} at {}...'.format(SITE, event['time']))
try:
req = Request(SITE, headers={'User-Agent': 'AWS Lambda'})
if not validate(str(urlopen(req).read())):
raise Exception('Validation failed')
except:
print('Check failed!')
raise
else:
print('Check passed!')
return event['time']
finally:
print('Check complete at {}'.format(str(datetime.now())))
Several ways to go here, but rather than adding new envars for each site and expected value, you could just use two envvars to hold lists. Then in the lambda function, split the lists, zip them together into a dictionary, and iterate through the dict items using the key and value to call your modified validate() function.
Something like
import os
from datetime import datetime
from urllib.request import Request, urlopen
SITES = os.environ['sites'] # comma-separated URLs of the sites to check, stored in the site environment variable
EXPECTED_VALUES = os.environ['expected_values'] # comma-separated strings expected to be o site's page, stored in the expected environment variable
sites = SITES.split(',')
expected_values = EXPECTED_VALUES.split(',')
sites_to_check = dict(zip( sites,expected_values)
def validate(page_content, expected_value):
'''Return False to trigger the canary
Currently this simply checks whether the EXPECTED string is present.
However, you could modify this to perform any number of arbitrary
checks on the contents of SITE.
'''
return expected_value in page_content
def lambda_handler(event, context):
for site in sites_to_check.items():
url = site(0)
expected_val = site(1)
print('Checking {} at {}...'.format(url, event['time']))
try:
req = Request(url, headers={'User-Agent': 'AWS Lambda'})
if not validate(str(urlopen(req).read()), expected_val):
raise Exception('Validation failed')
except:
print('Check failed!')
raise
else:
print('Check passed!')
# keep on looping until we get through them all
finally:
print('Check complete for {} at {}'.format(url, str(datetime.now())))
# if we got here, we checked all the sites and they passed
print('Checked all sites and they all passed at {}'.format(str(datetime.now())))
I have not run or tested this code, but you get the idea - make it a loop!

How to store Google Auth Credentials object in Django?

I'm trying integrate Google Tag Manager into my project. In official document, Google suggests oauth2client library. Unfortunately, this library deprecated. I used google_auth_oauthlib. I can get token and send request to Google Tag Manager API with this token. But I don't decide how can I store credentials object in Django. In oauth2client lib, there is CredentialsField for model. We can store Credentials object in this field with using DjangoORMStorage. But I can't use this deprecated library. Are there any alternative ways?
Google Tag Manager document is here
My codes here:
from google_auth_oauthlib.flow import Flow
from googleapiclient.discovery import build
FLOW = Flow.from_client_secrets_file(
settings.GOOGLE_OAUTH2_CLIENT_SECRETS_JSON,
scopes=[settings.GOOGLE_OAUTH2_SCOPE])
class GTMAuthenticateView(APIView):
def get(self,request,**kwargs):
FLOW.redirect_uri = settings.GOOGLE_OAUTH2_REDIRECT_URI
authorization_url, state = FLOW.authorization_url(access_type='offline',
include_granted_scopes='true')
return Response({'authorization_url':authorization_url,'state':state})
def post(self, request, **kwargs):
FLOW.fetch_token(code=request.data['code'])
credentials = FLOW.credentials
return Response({'token':credentials.token})
class GTMContainerView(APIView):
def get(self,request,**kwargs):
service = get_service()
containers = self.get_containers(service)
return Response({'containers':str(containers),'credentials':str(FLOW.credentials)})
#staticmethod
def get_containers(service):
account_list = service.accounts().list().execute()
container_list = []
for account in account_list['account']:
containers = service.accounts().containers().list(parent=account["path"]).execute()
for container in containers["container"]:
container["usageContext"] = container["usageContext"][0].replace("['", "").replace("']", "")
container_list.append(container)
return container_list
def get_service():
try:
credentials = FLOW.credentials
service = build('tagmanager', 'v2', credentials=credentials)
return service
except Exception as ex:
print(ex)
Just switch over to Django 2
pip install Django==2.2.12
I did it and works fine

Are there any Django packages to create signed urls for Google Cloud Storage resources?

I'm writing a fairly simply photo app using django-rest-framework for the API and django-storages for the storage engine. The front end is being written in Vue.js. I have the uploading part working, and now I'm trying to serve up the photos. As now seems obvious when the browser tries to load the images from GCS, I just get a bunch of 403 Forbidden errors. I did some reading up on this and it seems that the best practice in my case would be to generate signed urls that expire in some amount of time. I haven't been able to find a package for this, which is what I was hoping for. Short of that, it's not clear to me precisely how to do this in Django.
This is a working code in django 1.11 with python3.5.
import os
from google.oauth2 import service_account
from google.cloud import storage
class CloudStorageURLSigner(object):
#staticmethod
def get_video_signed_url(bucket_name, file_path):
creds = service_account.Credentials.from_service_account_file(
os.environ.get('GOOGLE_APPLICATION_CREDENTIALS')
)
bucket = storage.Client().get_bucket(bucket_name)
blob = bucket.blob(file_path)
signed_url = blob.generate_signed_url(
method='PUT',
expiration=1545367030, #epoch time
content_type='audio/mpeg', #change_accordingly
credentials=creds
)
return signed_url
Yes, take a look at google-cloud-storage
Installation:
pip install google-cloud-storage
Also, make sure to refer to API Documentation as you need more things.
Hope it helps!
I ended up solving this problem by using to_representation in serializers.py:
from google.cloud.storage import Blob
client = storage.Client()
bucket = client.get_bucket('myBucket')
def to_representation(self, value):
try:
blob = Blob(name=value.name, bucket=bucket)
signed_url = blob.generate_signed_url(expiration=datetime.timedelta(minutes=5))
return signed_url
except ValueError as e:
print(e)
return value
Extending #Evan Zamir's answer, instead of reassigning client and bucket you can get them from Django's default_storage (this will save time since these are already available).
This is in settings.py
from datetime import timedelta
from google.oauth2 import service_account
GS_CREDENTIALS = service_account.Credentials.from_service_account_file('credentials.json')
DEFAULT_FILE_STORAGE = "storages.backends.gcloud.GoogleCloudStorage"
GS_BUCKET_NAME = "my-bucket"
GS_EXPIRATION = timedelta(seconds=60)
In serializers.py
from django.core.files.storage import default_storage
from google.cloud.storage import Blob
from rest_framework import serializers
class SignedURLField(serializers.FileField):
def to_representation(self, value):
try:
blob = Blob(name=value.name, bucket=default_storage.bucket)
signed_url = blob.generate_signed_url(expiration=default_storage.expiration)
return signed_url
except ValueError as e:
print(e)
return value
You can use this class in your serializer like this,
class MyModelSerializer(serializers.ModelSerializer):
file = SignedURLField()
Note: Do not provide GS_DEFAULT_ACL = 'publicRead' if you want signed URLs as it creates public URLs (that do not expire)