What happens if you’re in the middle of a process when AWSAssumeRole times out? - amazon-web-services

I’m currently working with a role that I need to assume to access certain buckets on S3.
I was wondering, if the duration given to an STSAssumeRoleSessiomCredentialsProvider is 1 hour and you’re doing something like downloading a file that takes 1.5 hours, does it finish the process or does it stop in the middle because the duration ended?

The validity of the credentials is verified when the request is initiated. Once initiated successfully, the response will be sent completely. In your download example case, if the credentials were valid when the download request was initiated, that is sufficient for the file to be downloaded completely.
The STS credentials expiry is a problem where repeated connections are made to AWS as part of a long running program and the program reads the credentials at the beginning and stores them. It is generally a good practice to decouple the sts-credential-acquisition process from the users of those credentials and the users should ensure the credentials are always read when the underlying source of credentials (typically a file) is modified.
These aspects are handled by AWS Java SDK's ProfileCredentialsProvider class automatically. Not sure if a similar module exists in other language bindings too.

Credentials are validated when presented on an API call. If you make your API call(s) before the credentials expire then you are fine.
If, however, you need to make multiple API calls, and one of them exceeds the expiration time, then that call will fail.
This is particularly relevant to S3 multi-part uploads, each part of which is a distinct API call, and which presents credentials each time. The solution to this generally is one of:
get credentials that are valid for long enough to complete the
operation
refresh credentials when you are close to expiration and
use the new credentials for subsequent part uploads

Related

How to add custom authentication to aws s3 download of large files

I'm trying to figure out how to implement these requirements for S3 downloads:
Signed URL (links should become invalid after some amount of time).
Download only 1 time - any other requests to the same URL should fail.
Need to restrict downloads to the user/browser who made the request to generate the signed URL - no other user should be able to download.
Be able to deal with large files (ideally, streaming, just like when someone downloads directly from a standard S3 access point).
Things that I've tried:
S3 Object Lambda + Access Point
Generate pre-signed URL to lambda access point, this works well.
Make use of S3 object metadata to store download state / restrict downloads to just 1 time. This works well.
No way to access user-agent or requestor's IP.
Large files are a problem. Timeout has been configured to 15 minutes (the max), but request still times out much earlier. This was done with NodeJS.
Lambda + Lambda URL
Pre-signed URL is generated and passed to lambda URL as encoded param - the lambda makes the request if auth/validation passes. This approach seems to work fine.
Can use same approach of leveraging S3 object metadata to limit downloads to just 1 time.
User-agent and requestor IP is available, this is great.
Large files are a problem. I've tried NodeJS and it behaves the same as the S3 Object Lambda (eventually times out, even earlier than the configured time), Also implemented the Java streaming handler but it dies with an "out of memory" error, even when I bump the memory up to 3GB (the file is only 1GB and I thought streaming would get around the memory problem anyway). I've tried several ways to stream (Java 11), but it really seems like the streaming handler is not really streaming, but buffering somewhere outside of the lambda.
I'm now unsure if AWS lambda will be able to handle all of these requirements, but I would really like to know if others might have ideas, or if I'm missing something.

Why does aws s3 getObject executes slowly even with small files?

I am relatively new to amazon web services. There is problem that came up while I was coding my new web app. I am currently storing profile pictures in an s3 bucket.
I don’t want these profile pictures to be seen by the public, only authorized members. So I have a php file like this:
This php file executes getObject and sends out a header to show the picture but only if the user is allowed to see the picture. I query the database and also check session to make sure that the currently logged in user has access to the picture. All is working fine, but it takes around 500 milliseconds to the get request to execute, even on small files (40kb). On bigger files it gets even longer as well as if I embed the php file in an img tag multiple times with different query string values.
I need to mention that I’m testing this in a localhost environment with apache webserver.
Could be the the problem is that getObject is optimized to be run from an ec2 instance and that if I would test this on an ec2 the response time is much better?
My s3 is based in London, and I’m testing it in Hungary with a good internet connection so I’m not sure if this response time is what I should get here.
I read that other people had similar issues, but from my understanding the time it takes from s3 to transfer the files to an ec2 should be minimal as they are all in the cloud and the latency between these services and all the other aws services should be minimal (At least if they are in the same region).
Please don’t tell me in comments that I should just make my bucket public and embed the direct link to the file as it is not a viable option for obvious reasons. I also don’t want to generate pre-signed urls for various reasons.
I also tested this without querying the database and essentially the only logic in my code is to get the object and show it to the user. Even with this I get 400+ milliseconds response time.
I also tried using doesObjectExist() and I still need to wait around 300-400 milliseconds for that to give me a response.
Multiple get request to the same php file as image source
UPDATE
I tested it on my ec2 instance and I've got much better response time. I tested it with multiple files and all is fine. It seems like that if you use getObject on localhost, the time it takes to connect to s3 and fetch the data multiplies.
Thank you for the answers!

Transfer file from AWS S3 to OneDrive with AWS Lambda

A client of ours requested that we have copies of their files on both AWS S3 and OneDrive.
The usual MO: File is sent from an iOS application to an AWS S3 bucket. This triggers an AWS Lambda Function which attaches the file to an email and sends a copy to the client, which they again store on OneDrive. Now, we want to skip the email part and transfer the file directly to OneDrive.
All my research so far points to Zapier or CloudRail or MS Graph REST Api. The problem I'm having is that we want to transfer the file with an AWS Lambda function (Java8), automagically. Almost all the tutorials and examples on MS Graph needs a client to log in manually. Mostly client side logic. The other methods have more overhead, and we don't (unnecessarily) want to make our stack more complicated than it already is.
I realize this is a very specific case. We are systematically replacing the client's file management system, without disrupting their day-to-day operations too much.
Any conclusive pointers/examples/tutorials to get this done server side would be greatly appreciated.
I'm not sure how well S3 aligns with OneDrive, they are quite different models. OneDrive is provisioned by user which begs the question, which user would you want to copy this file too? I would think Azure Storage would be a far better fit as it uses a similar model to S3.
You can use Microsoft Graph API to upload the file to a user's OneDrive. You would need to authenticate the user in order to obtain an Access and Refresh Token. Once this process is done, you can store that Refresh Token and retrieve an updated Access Token as needed.
Also with CloudRail it's necessary to authenticate the user, but there are methods to store and use an access token.
The services have two methods, loadAsString and saveAsString, and they are used to store and load credentials. You could call loadAsString with your access token, the string can be different from service to service, but will look something like this: [{“access_token”: “YOUR ACCESS TOKEN”}]
To add to this, Microsoft now has a cloud migration tool www.mover.io that allows you to sync files & folders from most clouds into Azure blob, Sharepoint or OneDrive directly, so without download/upload to a client machine.
Personally used it only for a one-time sync, but leaving it here for posterity.
The client only has to login once so if you already have the client and secret keys, you can do the manual flow once then save the generated token file together with your code files in AWS. Next time the code is ran, it uses the refresh token. Last time I did this I was able to set the refresh token to never expire but I think Microsoft has randomly removed that option and now the token can only last something like 2 or 3 years max

Google Drive API to update permissions using File's Patch Endpoint

We are using Google Drive API to upload files and update permissions in our application. Requirement is to update the permissions ~60 users/groups.
There are three ways by which we can update permissions on a file :
Use File's Patch Endpoint
Use File's Update Endpoint
Use Permissions's Insert Endpoint
If we go with #3, we have to make ~60 calls based on the permission change which is not good actually as it has to make that much http calls and it affects the quota usage.
So we tried with #1, we provide the necessary input in "permissions" key. It returns 200 but the file is not shared as per the given input.
Is there anything that I am missing ?
Permissions.Insert is the only way to add permissions to a file; it's not feasible via operations on the Files API.
The Google Drive API does however support batching, which means that instead of sending 60 separate HTTP requests, you can send a single batch that contains 60 requests. This won't help with quota, but will likely perform better. More information here:
https://developers.google.com/drive/v3/web/batch

Log delay in Amazon S3

I have recently hosted in Amazon S3, and I need the log files to calculate the statistics for the "get", "put", "list" operations in the objects.
And I've observed that the log files are organized weirdly. I don't know when the log will appear(not immediatly, at least 20 minutes after the operation) and how many lines of logs will be contained in one log file.
After that, I need to download these log files and analyse them. But I can't figure out how often I will do this.
Can somebody help? Thanks.
What you describe (log files being made available with delays and being in unpredictable order) is exactly what is declared by AWS as behaviour to expect. This is by nature of distributed system, AWS S3 is using to provide S3 service, the same request may be served each time from different server - I have seen 5 different IP addresses being provided for publishing.
So the only solution is: accept the delay, see the delay you experience and add some extra time and learn living with this total delay (I would expect something like 30 to 60 minutes, but statistics could tell more).
If you need log records ordered, you have either sort them yourself, or search for some log processing solutions - I have seen some applications being offered exactly for this purpose.
In case, you really need to get your log file with very short delay, you have to make the logs yourself and this means, you have to write and run some frontend, which gives access to your files on S3 and at the same time keeps logging as needed.
I run such a solution, users get user name and password and url of my frontend. As they send the request, I evaluate, if they provide proper credentials and if they are allowed to see given resource, and if so, I create few minutes valid temporary url for that resource and redirect the request to that.
But such a fronted costs money (you have to run your frontend somewhere) and is less robust, then accessing directly the AWS S3.
Good luck, Lulu.
A lot has changed since the time that the question was originally posted. The delay is still there, but one of OP concerns was when to download the logs to analyze them.
One option right now would be to leverage Event Notifications: https://docs.aws.amazon.com/AmazonS3/latest/user-guide/setup-event-notification-destination.html
This way, whenever an object is created in the access logs bucket, you can trigger a notification either to SNS, SQS or Lamba, and based on that download and analyze the log files.