Botocore Stubber - Unable to locate credentials - unit-testing

I'm working on unit tests for my lambda which is getting some files from S3, processing them and loading data from them to DynamoDB. I created botocore stubbers that are used during tests, but I got botocore.exceptions.NoCredentialsError: Unable to locate credentials
My lambda handler code
s3_client = boto3.client('s3')
ddb_client = boto3.resource('dynamodb', region_name='eu-west-1')
def lambda_handler(event, context):
for record in event['Records']:
s3_event = record.get('s3')
bucket = s3_event.get('bucket', {}).get('name', '')
file_key = s3_event.get('object', {}).get('key', '')
file = s3_client.get_object(Bucket=bucket, Key=file_key)
and tests file:
class TestLambda(unittest.TestCase):
def setUp(self) -> None:
self.session = botocore.session.get_session()
# S3 Stubber Set Up
self.s3_client = self.session.create_client('s3', region_name='eu-west-1')
self.s3_stubber = Stubber(self.s3_client)
# DDB Stubber Set Up
self.ddb_resource = boto3.resource('dynamodb', region_name='eu-west-1')
self.ddb_stubber = Stubber(self.ddb_resource.meta.client)
def test_s3_to_ddb_handler(self) -> None:
event = {}
with self.s3_stubber:
with self.ddb_stubber:
response = s3_to_ddb_handler.s3_to_ddb_handler(event, ANY)
Issue seems to be that actual call to AWS resources is done which shouldnt be the case and stubber should be used, how can I force that?

You need to call .activate() on your Stubber instances: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/stubber.html#botocore.stub.Stubber.activate

Related

Copying files between S3 buckets -- only 2 files copy

I am trying to copy multiple files from one s3 bucket to another s3 bucket using lambda function but it is just copying 2 files in destination s3 bucket.
Here is my code:
# using python and boto3
import json
import boto3
s3_client = boto3.client('s3')
def lambda_handler(event, context):
source_bucket_name = event['Records'][0]['s3']['bucket']['name']
file_name = event['Records'][0]['s3']['object']['key']
destination_bucket_name = 'nishantnkd'
copy_object = {'Bucket': source_bucket_name, 'Key': file_name}
s3_client.copy_object(CopySource=copy_object,
Bucket=destination_bucket_name, Key=file_name)
return {'statusCode': 3000,
'body': json.dumps('File has been Successfully Copied')}
I presume that the Amazon S3 bucket is configured to trigger the AWS Lambda function when a new object is created.
When the Lambda function is triggered, it is possible that multiple event records are sent to the function. Therefore, it should loop through the event records like this:
# using python and boto3
import json
import boto3
s3_client = boto3.client('s3')
def lambda_handler(event, context):
for record in event['Records']: # This loop added
source_bucket_name = record['s3']['bucket']['name']
file_name = urllib.parse.unquote_plus(record['s3']['object']['key']) # Note this change too
destination_bucket_name = 'nishantnkd'
copy_object = {'Bucket': source_bucket_name, 'Key': file_name}
s3_client.copy_object(CopySource=copy_object, Bucket=destination_bucket_name, Key=file_name)
return {'statusCode': 3000,
'body': json.dumps('File has been Successfully Copied')}

Copying S3 objects from one account to other using Lambda python

I'm using boto3 to copy files from s3 bucket from one account to other. I need a similar functionality like aws s3 sync. Please see my code. My company has decided to 'PULL' from other S3 bucket (source account). Please don't suggest replication, S3 batch, S3 trigger Lambda..etc. We have gone through all these options and my management do not want to do any configuration at source side. Can you please review this code and let me know if this code works for thousands of objects. Source bucket has nearly 10000 objects. We will create this lambda function in destination account and create a cloudwatch event to trigger the lambda once in a day.
I am checking ETag so that modified files will be copied across when this function is triggered.
Edit: I simplified my code just to see pagination works. It's working if I don't add client.copy(). If I add this line in for loop after reading 3,4 objects it's throwing "errorMessage": "2021-08-07T15:29:07.827Z 82757747-7b72-4f29-ae9f-22e95f969d6c Task timed out after 3.00 seconds". Please advise. Please note that 'test/' folder in my source bucket has around 1100 objects.
import os
import logging
import botocore
logger = logging.getLogger()
logger.setLevel(os.getenv('debug_level', 'INFO'))
client = boto3.client('s3')
def handler(event, context):
main(event, logger)
def main(event, logger):
try:
SOURCE_BUCKET = os.environ.get('SRC_BUCKET')
DEST_BUCKET = os.environ.get('DST_BUCKET')
REGION = os.environ.get('REGION')
prefix = 'test/'
# Create a reusable Paginator
paginator = client.get_paginator('list_objects_v2')
print ('after paginator')
# Create a PageIterator from the Paginator
page_iterator = paginator.paginate(Bucket=SOURCE_BUCKET,Prefix = prefix)
print ('after page iterator')
index = 0
for page in page_iterator:
for obj in page['Contents']:
index += 1
print ("I am looking for {} in the source bucket".format(obj['ETag']))
copy_source = {'Bucket': SOURCE_BUCKET, 'Key': obj['Key']}
client.copy(copy_source, DEST_BUCKET, obj['Key'])
logger.info("number of objects copied {}:".format(index))
except botocore.exceptions.ClientError as e:
raise
This version is working fine if I increase the Lambda timeout to 15 min and memory to 512MB. This checks if the source object already exists in destination before copying.
import boto3
import os
import logging
import botocore
from botocore.client import Config
logger = logging.getLogger()
logger.setLevel(os.getenv('debug_level', 'INFO'))
config = Config(connect_timeout=5, retries={'max_attempts': 0})
client = boto3.client('s3', config=config)
#client = boto3.client('s3')
def handler(event, context):
main(event, logger)
def main(event, logger):
try:
DEST_BUCKET = os.environ.get('DST_BUCKET')
SOURCE_BUCKET = os.environ.get('SRC_BUCKET')
REGION = os.environ.get('REGION')
prefix = ''
# Create a reusable Paginator
paginator = client.get_paginator('list_objects_v2')
print ('after paginator')
# Create a PageIterator from the Paginator
page_iterator_src = paginator.paginate(Bucket=SOURCE_BUCKET,Prefix = prefix)
page_iterator_dest = paginator.paginate(Bucket=DEST_BUCKET,Prefix = prefix)
print ('after page iterator')
index = 0
for page_source in page_iterator_src:
for obj_src in page_source['Contents']:
flag = "FALSE"
for page_dest in page_iterator_dest:
for obj_dest in page_dest['Contents']:
# checks if source ETag already exists in destination
if obj_src['ETag'] in obj_dest['ETag']:
flag = "TRUE"
break
if flag == "TRUE":
break
if flag != "TRUE":
index += 1
client.copy_object(Bucket=DEST_BUCKET, CopySource={'Bucket': SOURCE_BUCKET, 'Key': obj_src['Key']}, Key=obj_src['Key'],)
print ("source ETag {} and destination ETag {}".format(obj_src['ETag'],obj_dest['ETag']))
print ("source Key {} and destination Key {}".format(obj_src['Key'],obj_dest['Key']))
print ("Number of objects copied{}".format(index))
logger.info("number of objects copied {}:".format(index))
except botocore.exceptions.ClientError as e:
raise

Dump lambda output to csv and have it email as an attachment

I have a lambda function that generates a list of untagged buckets in AWS environment. Currently I send the output to a slack channel directly. Instead I would like to have my lambda dump the output to a csv file and send it as a report. Here is the code for it, let me know if you need any other details.
import boto3
from botocore.exceptions import ClientError
import urllib3
import json
http = urllib3.PoolManager()
def lambda_handler(event, context):
#Printing the S3 buckets with no tags
s3 = boto3.client('s3')
s3_re = boto3.resource('s3')
buckets = []
print('Printing buckets with no tags..')
for bucket in s3_re.buckets.all():
s3_bucket = bucket
s3_bucket_name = s3_bucket.name
try:
response = s3.get_bucket_tagging(Bucket=s3_bucket_name)
except ClientError:
buckets.append(bucket)
print(bucket)
for bucket in buckets:
data = {"text": "%s bucket has no tags" % (bucket)}
r = http.request("POST", "https://hooks.slack.com/services/~/~/~",
body = json.dumps(data),
headers = {"Content-Type": "application/json"})

Sending EMR Logs to CloudWatch

Is there a way to send EMR logs to CloudWatch instead of S3. We would like to have all our services logs in one location. Seems like the only thing you can do is set up alarms for monitoring but that doesn't cover logging.
https://docs.aws.amazon.com/emr/latest/ManagementGuide/UsingEMR_ViewingMetrics.html
Would I have to install CloudWatch agent on the nodes in the cluster https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html
you can install the CloudWatch agent via EMR’s bootstrap configuration, and configure it to watch log directories. It then starts to push logs to Amazon CloudWatch Logs
You can read the logs from s3 and push them to the cloudwatch using boto3 and delete them from s3 if you do not need. In some use-cases stdout.gz log will be needed to be in the cloudwatch for monitoring purposes.
boto3 documentation on put_log_events
import boto3
import botocore.session
import logging
import time
import datetime
import gzip
def get_session(service_name):
session = botocore.session.get_session()
aws_access_key_id = session.get_credentials().access_key
aws_secret_access_key = session.get_credentials().secret_key
aws_session_token = session.get_credentials().token
region = session.get_config_variable('region')
return boto3.client(
service_name = service_name,
region_name = region,
aws_access_key_id = aws_access_key_id,
aws_secret_access_key = aws_secret_access_key,
aws_session_token = aws_session_token
)
def get_log_file(s3, bucket, key):
log_file = None
try:
obj = s3.get_object(Bucket=bucket, Key=key)
compressed_body = obj['Body'].read()
log_file = gzip.decompress(compressed_body)
except Exception as e:
logger.error(f"Error reading from bucket : {e}")
raise
return log_file
def create_log_events(logs, batch_size):
log_event_batch = []
log_event_batch_collection = []
try:
for line in logs.splitlines():
log_event = {'timestamp': int(round(time.time() * 1000)), 'message':line.decode('utf-8')}
if len(log_event_batch) < batch_size:
log_event_batch.append(log_event)
else:
log_event_batch_collection.append(log_event_batch)
log_event_batch = []
log_event_batch.append(log_event)
except Exception as e:
logger.error(f"Error creating log events : {e}")
raise
log_event_batch_collection.append(log_event_batch)
return log_event_batch_collection
def create_log_stream_and_push_log_events(logs, log_group, log_stream, log_event_batch_collection, delay):
response = logs.create_log_stream(logGroupName=log_group, logStreamName=log_stream)
seq_token = None
try:
for log_event_batch in log_event_batch_collection:
log_event = {
'logGroupName': log_group,
'logStreamName': log_stream,
'logEvents': log_event_batch
}
if seq_token:
log_event['sequenceToken'] = seq_token
response = logs.put_log_events(**log_event)
seq_token = response['nextSequenceToken']
time.sleep(delay)
except Exception as e:
logger.error(f"Error pushing log events : {e}")
raise
The caller function
def main():
s3 = get_session('s3')
logs = get_session('logs')
BUCKET_NAME = 'Your_Bucket_Name'
KEY = 'logs/emr/Path_To_Log/stdout.gz'
BATCH_SIZE = 10000 #According to boto3 docs
PUSH_DELAY = 0.2 #According to boto3 docs
LOG_GROUP='test_log_group' #Destination log group
LOG_STREAM='{}-{}'.format(time.strftime('%Y-%m-%d'),'logstream.log')
log_file = get_log_file(s3, BUCKET_NAME, KEY)
log_event_batch_collection = create_log_events(log_file, BATCH_SIZE)
create_log_stream_and_push_log_events(logs, LOG_GROUP, LOG_STREAM, log_event_batch_collection, PUSH_DELAY)

Invoke a Lambda function with S3 payload from boto3

I need to invoke a Lambda function that accepts an S3 path. Below sample code of the lambda function.
def lambda_handler(event, context):
bucket = "mybucket"
key = "mykey/output/model.tar.gz"
model = load_model(bucket, key)
somecalc = some_func(model)
result = {'mycalc': json.dumps(somecalc)}
return result
I need to invoke this handler from my client code using boto3. I know I can do a request like below
lambda_client = boto3.client('lambda')
response = lambda_client.invoke(
FunctionName='mylambda_function',
InvocationType='RequestResponse',
LogType='Tail',
ClientContext='myContext',
Payload=b'bytes'|file,
Qualifier='1'
)
But I am not sure how to specify an S3 path in the payload. Looks like it is expecting a JSON.
Any suggestion?
You can specify a payload like so:
payload = json.dumps({ 'bucket': 'myS3Bucket' })
lambda_client = boto3.client('lambda')
response = lambda_client.invoke(
FunctionName='mylambda_function',
InvocationType='RequestResponse',
LogType='Tail',
ClientContext='myContext',
Payload=payload,
Qualifier='1'
)
And access the payload properties in your lamdba handler like so:
def lambda_handler(event, context):
bucket = event['bucket'] # pull from 'event' argument
key = "mykey/output/model.tar.gz"
model = load_model(bucket, key)
somecalc = some_func(model)
result = {'mycalc': json.dumps(somecalc)}
return result