Reading incoming emails saved to s3 bucket in SES

Reading incoming emails saved to s3 bucket in SES - amazon-web-services

I have configured AWS SES for sending and receiving emails I have verified my domain and created rule set by which all incoming emails will now be stored in an S3 bucket with object key prefix as email. I found the following codes for reading files from an S3 bucket:
http://docs.aws.amazon.com/AmazonS3/latest/dev/RetrievingObjectUsingJava.html
I am trying to read emails. My rule for SES stores all incoming emails to my specified s3 bucket. I am trying to add the code that reads the bucket, get the emails. The next time when I read the bucket, how can I understand which emails were read before and which to read. So is there any way I could read the bucket with emails and them mak them as read so that I dont have to process them again

S3 is just storage. It has no sense of "read" vs "unread," and if you're discovering messages by listing objects in the bucket, your best solution would be something like this:
After processing each message, move it somewhere else. This could be another bucket, or a different prefix in the same bucket.
S3 doesn't have a "move" operation, but it does have copy and it does have delete... so, for each message you process, modify the object key (the path+filename).
If your emails are being stored with a prefix, like "incoming/" so that an individual message has a key that looks like (e.g.) "incoming/jozxyqkblahexample," change that string to "processed/jozxyqkblahexample." Then tell S3 to copy from the old to the new. When that succeeds, tell S3 to delete the original.
This (mostly? solves your problem, because since you only list objects with the prefix "incoming/" then you won't see those the next time -- they're now out of the way.
But, there's one potential problem with this solution... specifically, you may run afoul of the S3 consistency model. S3 does not guarantee that fetching a list of objects will immediately give you a response that reflects all of your recently-completed activity against the bucket... it's possible for objects to linger for a brief time in the object listing after being deleted... so it's still possible to see a message in the listing after you've deleted it. The chances are reasonably low, but you need to be aware of the possibility.
When SES drops a message into your bucket, it's also possible to configure it to notify you that it just did that.
Typically, a better solution than polling the bucket for mail is for SES to send you an SNS notification that the message was received. The notification will include information about the message, including the key where it was stored in the bucket. You then fetch exactly that message from the bucket, and process it, so no bucket object listing is needed.
Note that SES has two different notification types -- for small emails, SES can actually include the mail in the SNS notification, but that'a not the notification type referred to, above. Above, I'm suggesting that you investigate the possibility of using an alert notification, sent by SES through SNS to tell you about each email as it is dropped into S3.

Related

Detecting when all parts of an S3 multipart upload have been uploaded

I'd like to be able to detect when all parts of an S3 multipart upload have been uploaded.
Context
I'm working on a Backend application that sits between a Frontend and an S3 bucket.
When the Frontend needs to upload a large file it makes a call to the Backend (step 1). The latter initiates a multipart upload in S3, generates a bunch of presinged URLs, and hands them to the Frontend (steps 2 - 5). The Frontend uploads segment data directly to S3 (steps 6, 10).
S3 multipart uploads need to be explicitly completed. One obvious way to perform it would be to make another call from the Frontend to the Backend to notify about the fact that all parts have been uploaded. But if possible I'd like to avoid that extra call.
A possible solution: S3 Event Notifications
I have S3 Event Notifications enabled on the S3 bucket so whenever something happens, it notifies an SNS topic which in turn calls the Backend.
If the bucket sent S3 notifications after each part is done uploading, I could use those in the Backend to see if it's time to complete the upload (steps 7 - 9, 11 - 14).
But although some folks claim (one, two) that it's the case, I wasn't able to reproduce it.
For proof of concept, I used this guide from Amazon to upload a file using aws s3api create-multipart-upload, several aws s3api upload-part, and aws s3api complete-multipart-upload. I would expect to get a notification after each upload-part, but I only got a single "s3:ObjectCreated:CompleteMultipartUpload" after, well, complete-multipart-upload.
My bucket is configured to send notification for all object creation events: "s3:ObjectCreated:*".
Questions
Is it possible to somehow instruct S3 to send notifications upon upload of each part?
Are there any other mechanisms to find out in the Backend that all parts have been uploaded?
Maybe what I want is complete nonsense and even if there was a way to implement it, it would bring significant drawbacks?

How to get notification from S3 Server Access Loggings to CloudWatch?

I used Terraform to create a new S3 bucket for getting logs automatically from three different existing S3 bucket, next step I want to make the most use of these logs, getting various notification, e.g if someone created/deleted/modified a S3 bucket, relevant users will get notified about the events. Because at the moment all the logs are a mess, and the filename of the logs are all meaningless, I've been messing around with CloudWatch and CloudTrail, not sure what is the right way to do this.Can someone help me, many thanks.

Customizing / Updating the Default S3 bucket notification message

We are planning to use s3 bucket event notification to use for further processing. Our requirement is,
When an object is PUT / POST / COPY to s3 bucket, an event notification is generated.
The destination for this generated event notification is SQS.
We have tested 1st and 2nd part. But we are not getting an idea about how to customize the the default notification content to suit our processing.
We went thorough AWS dev guide. But, we could not find the expected solution.

The s3 event notification does not contain something like subject or message so I dont think you can change much of the generated JSON (see http://docs.aws.amazon.com/AmazonS3/latest/dev/notification-content-structure.html)
Each notification is delivered as a JSON object with the following fields:
Region Timestamp
Event Type (PUT/COPY ...)
Request Actor Principal
ID Source IP of the request Request
ID Host ID Notification
Configuration Destination ID
Bucket Name
Bucket ARN
Bucket Owner
Principal ID
Object Key
Object Size
Object ETag
Object Version ID (if versioning is enabled on the bucket)
You might have better chance to send a custom notification by running a lambda function (http://docs.aws.amazon.com/lambda/latest/dg/with-s3.html)

S3 feature to publish to multiple buckets

We are currently publishing data to an S3 bucket. We now have multiple clients to consume this data that we stored in our bucket. Each client wants to have their own bucket. The ask is to publish data to each bucket.
Option 1: Have our publisher publish to each S3 bucket.
cons: More logic on our publishing application. Handle failures/retries based on clients.
Option 2: Use S3's Cross-region replication
reason against it: Even though we can transfer objects to other accounts, Only one destination can be specified. If source bucket has server side encryption we cannot replicate.
Option 3: AWS Lamba. Have S3 invoke Lamba and lamba publish to multiple buckets.
confused: Not sure how different this is from option 1.
Option 4: Restrict access to our S3 bucket with read only. Have clients read from it. But wondering how clients can know if an object is already read! I do not prefer time based folders, we have multiple publishers to this S3 bucket and clients cant know for sure if the folder is indeed complete.
Is there any good option to solve the above problem?

I would go with option 3, Lambda. Your Lambda function could be triggered by S3 events so you wouldn't have to add any manual steps or change your current publishing process at all.

Can I restrict Amazon S3 Browser-Based Uploads by URL in my bucket policy

Based on: http://s3.amazonaws.com/doc/s3-example-code/post/post_sample.html
Is there a way to limit a browser based upload to Amazon S3 such that it is rejected if it does not originate from my secure URL (i.e. https://www.someurl.com)?
Thanks!

I want to absolutely guarantee the post is coming from my website
That is impossible.
The web is stateless and a POST coming "from" a specific domain is just not a valid concept, because the Referer: header is trivial to spoof, and a malicious user most likely knows this. Running through an EC2 server will gain you nothing, because it will tell you nothing new and meaningful.
The post policy document not only expires, it also can constrain the object key to a prefix or an exact match. How is a malicious user going to defeat this? They can't.
in your client form you have encrypted/hashed versions of your credentials.
No, you do not.
What you have is a signature that attests to your authorization for S3 to honor the form post. It can't feasibly be reverse-engineered such that the policy can be modified, and that's the point. The form has to match the policy, which can't be edited and still remain valid.
You generate this signature using information known only to you and AWS; specifically, the secret that accompanies your access key.
When S3 receives the request, it computes what the signature should have been. If it's a match, then the privileges of the specific user owning that key are checked to see whether the request is authorized.
By constraining the object key in the policy, you prevent the user from uploading (or overwriting) any object other than the specific one authorized by the policy. Or the specific object ket prefix, in which case, you restrict the user from harm to anything not under that prefix.
If you are handing over a policy that allows any object key to be overwritten in the entire bucket, then you're solving the wrong problem by trying to constrain posts as coming "from" your website.

I think you've misunderstood how the S3 service authenticates.
Your server would have a credentials file holding your access id and key and then your server signs the file as it is uploaded to your S3 bucket.
Amazon's S3 servers then check that the uploaded file has been signed by your access id and key.
This credentials file should never be publicly exposed anywhere and there's no way to get the keys off the wire.
In the case of browser based uploads your form should contain a signature that is passed to Amazon's S3 servers and authenticated against. This signature is generated from a combination of the upload policy, your access id and key but it is hashed so you shouldn't be able to get back to the secret key.
As you mentioned, this could mean that someone would be able to upload to your bucket from outside the confines of your app by simply reusing the signature in the X-Amz-Signature header.
This is what the policy's expiration header is for as it allows you to set a reasonably short expiration period on the form to prevent misuse.
So when a user goes to your upload page your server should generate a policy with a short expiration date (for example, 5 minutes after generation time). It should then create a signature from this policy and your Amazon credentials. From here you can now create a form that will post any data to your S3 bucket with the relevant policy and signature.
If a malicious user was to then attempt to copy the policy and signature and use that directly elsewhere then it would still expire 5 minutes after they originally landed on your upload page.
You can also use the policy to restrict other things such as the name of the file or mime types.
More detailed information is available in the AWS docs about browser based uploads to S3 and how S3 authenticates requests.
To further restrict where requests can come from you should look into enabling Cross-Origin Resource Sharing (CORS) permissions on your S3 bucket.
This allows you to specify which domain(s) each type of request may originate from.

Instead of trying to barricade the door. Remove the door.
A better solution IMHO would be to prevent any uploads at all directly to s3.
Meaning delete your s3 upload policy that allows strangers to upload.
Make them upload to one of your servers.
Validate the upload however you like.
If it is acceptable then your server could move the file to s3.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js