Extract and save attachment from email (via SES) into AWS S3 - amazon-web-services

I want to extract the attachment from email and save it into my new S3 bucket. So far, I have configured AWS Simple Email Service to intercept incoming emails. Now I have an AWS lambda python function, which gets triggered on S3 Put.
Until this it is working. But my lambda is giving error saying: "[Errno 2] No such file or directory: 'abc.docx': OSError". I see that the attachment with the name abc.docx is mentioned in the raw email in S3.
I assume the problem is in my upload_file. Could you please help me here.
Please find below the relevant parts of my code.
s3 = boto3.client('s3')
s3resource = boto3.resource('s3')
waiterFlg = s3.get_waiter('object_exists')
waiterFlg.wait(Bucket=bucket, Key=key)
response = s3resource.Bucket(bucket).Object(key)
message = email.message_from_string(response.get()["Body"].read())
if len(message.get_payload()) == 2:
attachment = msg.get_payload()[1]
s3resource.meta.client.upload_file(attachment.get_filename(), outputBucket, attachment.get_filename())
else:
print("Could not see file/attachment.")

You can download the attachment to /tmp directory in Lambda and then upload to S3.

The following code solved the issue:
open('/tmp/newFile.docx', 'wb') as f:
f.write(attachment.get_payload(decode=True))
s3r.meta.client.upload_file('/tmp/newFile.docx', outputBucket, attachment.get_filename())

Related

failed to download files from AWS S3

Senario:
commit Athena query with boto3 and output to s3
download result in s3
Error: An error occurred (404) when calling the HeadObject operation: Not Found
It's weird that the file exists in S3 and I can copy it down with aws s3 cp command. But I just cannot download with boto3 and failed to execute head-object.
aws s3api head-object --bucket dsp-smaato-sink-prod --key /athena_query_results/c96bdc09-d545-4ee3-bc66-be3be928e3f2.csv
It does work. I've checked account policies and it has granted admin policy.
# snippets
def s3_donwload(url, target=None):
# s3 = boto3.resource('s3')
# client = s3.meta.client
client = boto3.client("s3", region_name=constant.AWS_REGION, endpoint_url='https://s3.ap-southeast-1.amazonaws.com')
s3_file = urlparse(url)
if target:
target = os.path.abspath(target)
else:
target = os.path.abspath(os.path.basename(s3_file.path))
logger.info(f"download {url} to {target}...")
client.download_file(s3_file.netloc, s3_file.path, target)
logger.info(f"download {url} to {target} done!")
Take a look at the value of s3_file.path -- does it start with a slash? If so, it needs to change because Amazon S3 keys do not start with a slash.
I suggest that you print the content of netloc, path and target to see what values it is actually passing.
It's a bit strange to use os.path with an S3 URL, so it might need some tweaking.

How to access response from boto3 bucket.put_object?

Looking at the boto3 docs, I see that client.put_object has a response shown, but I don't see a way to get the response from bucket.put_object.
Sample snippet:
s3 = boto3.resource(
's3',
aws_access_key_id=redacted,
aws_secret_access_key=redacted,
)
s3.Bucket(bucketName).put_object(Key="bucket-path/" + fileName, Body=blob, ContentMD5=md5Checksum)
logging.info("Uploaded to S3 successfully")
How is this accomplished?
put_object returns S3.Object, which in turn has the wait_until_exists method.
Therefore, something along these lines should be sufficient (my verification code is bellow):
import boto3
s3 = boto3.resource('s3')
with open('test.img', 'rb') as f:
obj = s3.Bucket('test-ssss4444').put_object(
Key='fileName',
Body=f)
obj.wait_until_exists() # optional
print("Uploaded to S3 successfully")
put_object is a blocking operation. Thus it will block your program until your file is uploaded. Therefore wait_until_exists is not really needed. But if you want to make sure that the upload actually went through and the object is in S3 you can use it.
You have to use boto3.client instead of boto3.resource to get the response information like ETag and etc. It has a little bit different syntax.
import boto3
s3 = boto3.resource('s3')
s3.put_object(Bucket='bucket-name', Key='fileName', Body=body)

boto3 generate_presigned_url with SSE encryption

I am looking for examples to generate presigned url using boto3 and sse encryption.
Here is my code so far
s3_client = boto3.client('s3',
region_name='ap-south-1',
endpoint_url='http://s3.ap-south-1.amazonaws.com',
config=boto3.session.Config(signature_version='s3v4'),
)
try:
response = s3_client.generate_presigned_url('put_object',
Params={'Bucket': bucket_name,
'Key': object_name},
ExpiresIn=expiration)
except ClientError as e:
logging.error("In client error exception code")
logging.error(e)
return None
I am struggling to find the right parameters to use SSE encryption.
I am able to use PUT call to upload a file. I would also like to know the headers to use from the client side to adhere to sse encryption.
import boto3
access_key = "..."
secret_key = "..."
bucket = "..."
s3 = boto3.client('s3',
aws_access_key_id=access_key,
aws_secret_access_key=secret_key)
return(s3.generate_presigned_url(
ClientMethod='get_object',
Params={
'Bucket': bucket,
'Key': filename,
'SSECustomerAlgorithm': 'AES256',
}
))
Also add the header:-
'x-amz-server-side-encryption': 'AES256'
in the front end code while calling the presigned url
You can add Conditions to the pre-signed URL that must be met for the upload to be valid. This could probably include x-amz-server-side-encryption.
See: Creating a POST Policy - Amazon S3
Alternatively, you could add a bucket policy that denies any request that is not encrypted.
See: How to Prevent Uploads of Unencrypted Objects to Amazon S3 | AWS Security Blog

Error when using continuation token on S3 download

I'm trying to download a large amount of small files from an S3 bucket - I'm doing this by using the following:
s3 = boto3.client('s3')
kwargs = {'Bucket': bucket}
with open('/Users/hr/Desktop/s3_backup/files.csv','w') as file:
while True:
# The S3 API response is a large blob of metadata.
# 'Contents' contains information about the listed objects.
resp = s3.list_objects_v2(**kwargs)
try:
contents = resp['Contents']
except KeyError:
return
for obj in contents:
key = obj['Key']
file.write(key)
file.write('\n')
# The S3 API is paginated, returning up to 1000 keys at a time.
# Pass the continuation token into the next response, until we
# reach the final page (when this field is missing).
try:
kwargs['ContinuationToken'] = resp['NextContinuationToken']
except KeyError:
break
However, after a certain amount of time I received this error message 'EndpointConnectionError: Could not connect to the endpoint URL'.
I know that there is still considerably more files on the s3 bucket. I have three questions:
Why is this error occurring when I haven't downloaded all files in the bucket?
Is there a way to start my code from the last file I downloaded from the S3 bucket (I don't want to have to re-download the file names I've already downloaded)
Is there a default ordering of the S3 bucket, is it alphabetical?

AWS S3 - Able to Upload File from Local but not from Deployed (Access Denied)

My problem is that I cannot upload a file from my deployed project to a S3 bucket, even though I am able to upload from local host. Expect the URL, everything remains the same (headers, body etc.) when I am calling the method.
I am using boto3 to interact with s3 and using created IAM users' credentials. Also, for deployment, I am using AWS Elastic Beanstalk.
Below is the code I am using for uploading;
def put(self, bytes, data, folder, file_name):
self.ext = file_name.split(".")[-1]
if self.__is_audio_ext(self.ext):
if folder == self.__voice_record:
self.__create_voice_record(data, folder, file_name)
elif folder == self.__voice_message:
self.__create_voice_message(data, folder, file_name)
else:
return "Response cannot be constructed."
self.s3_client.put_object(Body=bytes, Bucket=self.bucket_name, Key=folder + "/" + file_name)
return "Successfully created at URL " \
+ self.bucket_url + self.bucket_name + "/" + folder + "/" + file_name
else:
return "Invalid file type"
Also, below is how I setup the boto3
def __init__(self):
self.ext = ""
self.env = {
"aws_access_key_id": settings.AWS_ACCESS_KEY_ID,
"aws_secret_access_key": settings.AWS_SECRET_ACCESS_KEY,
"region_name": 'eu-central-1'
}
self.bucket_name = "********"
self.session = session.Session(region_name='eu-central-1')
self.s3_client = self.session.client('s3', config=boto3.session.Config(signature_version='s3v4'))
self.bucket_url = "http://s3-eu-central-1.amazonaws.com/"
When I make my PUT request to the my server, this is the error I got:
An error occurred (AccessDenied) when calling the PutObject operation:
Access Denied"
Note that I created IAM user and give it the full permission of using S3 and I am sure that I am using the right credentials. This can be understood easily from that I can actually upload file from local.
This is why I believe the problem is somewhere between the file in my request and the deployment project. But it does not seem still right to me. Anyway, do not listen to me, I am pretty confused here.
Please do not hesitate asking me about what you do not understand. I may skip clearing some points.
I am working on it for hours and could not come up with any proper solutions, so I will be really glad for any help!
Thanks!
It's too late but hope fully helpful to other new users. We should attach instance profile to EC2 with right permissions for S3 bucket permission and make sure bucket policy should allow to the role attached to instance.
Follow this link