Make static file accessible to only my app hosted on either AWS or Google Cloud - amazon-web-services

I have a static site (purely HTML, JS & CSS) I will be hosting on either AWS or Google Cloud.
The site pulls data from a CSV that can either be located as a local file on the site, or preferably, on another endpoint.
My issue is, the client does not want the CSV file to be publicly accessible (people shouldn't be able to directly get to it, and download it).
The file needs to live on AWS S3 or Google Cloud Storage, as the client will be updating it periodically.
However, I can't seem to work out how to make it visible to my app, but not if you try to visit the file directly. I can either make it public, so my app can see it, but then so can everyone else. Or make it not public, so it can't be downloaded, but then my app also can't see it.
My question is - is what I'm trying to achieve even possible? Or does the CSV have to either be public or not?
My ideal option would be two separate buckets, one with my static site on, the other, with the CSV files.
Any suggestions would be most welcome.

What you're asking for isn't really possible in the way you've described it. If the CSV is to be consumed directly by the web page, then the web page needs to be able to get it, and if the web page can get it then so can anyone who can view the page. The same is true of any data that is on a web page. Is there a particular reason why you don't want the CSV to be accessed directly? It wouldn't be something you could just go to, you'd have to know the URI (which would be easy to find, but most users wouldn't bother). Are there things in the data that shouldn't be exposed? If so, you need to rethink your approach entirely.

AWS recently released new S3 features that can help with your need. One is Amazon S3 Access Point (https://aws.amazon.com/s3/features/access-points). The other is called Amazon S3 Object Lambda (https://aws.amazon.com/es/blogs/aws/introducing-amazon-s3-object-lambda-use-your-code-to-process-data-as-it-is-being-retrieved-from-s3/).
It looks like you can put Lambdas in front of an S3 bucket to process and transform requests to its files (also known as "objects"). Unfortunately, I do not have a precise answer to your question as I have never implemented this solution but I believe that you might be able to store the CSV file in S3 and give access to your static site only thanks to an S3 Access Point.
Alternatively, you might be able to use CloudFront and Lambda Associations to also restrict access to your CSV files to specific origins.

Related

Zip images on web server and return the url

I am currently looking for a way to improve the traffic flow of an app.
Currently the user uploads his data via the app, using Google Cloud Platform as storage provider. Other users can then download this data again.
This works well so far, but since the download traffic at GCP is relatively expensive I had the idea to outsource this to a cheap web server.
The idea is that the user requests the file(s) at GCP. There it is checked if the file(s) are already on the web server. If not, the file(s) will be uploaded to the server.
At the server the files are zipped and the link is sent back to GCP, where it is emailed to the user.
TL:DR My question is, how can i zip a specific selection of files on a web server without nodejs etc. and send the link of the generated file back to GCP
I'm open for other ideas aswell
This is a particular case, covered by Google Cloud CDN (Content Delivery Network) service.
As you can read here, there already is a way to connect the CDN to a Storage bucket, and it will do exactly what you've thought to do with your own web server. The only difference is that it's already production ready. It handles cache misses, cache hits and so on.
You can compare the prices: here you can find CDN prices, and here you can find Storage prices. The important difference is that Storage costs per TB of Egress, meanwhile CDN costs per 10TB of Egress, and the price is still lower.
Of course, you can still stick to your idea. I would implement it by developing a REST API. The API, with just one endpoint will serve the file, if it is present on the web server. If it is not present, it will:
perform a redirect to the direct link for the file hosted in Storage;
start to fetch the file form Storage and put it in the cache.
You would still need to handle the cache: what happens when somebody changes a file? That's something related to the way you're working with those files, so it strictly depends on your app functional domain, and in any case, Cloud CDN would solve it without any further development.

Why does aws s3 getObject executes slowly even with small files?

I am relatively new to amazon web services. There is problem that came up while I was coding my new web app. I am currently storing profile pictures in an s3 bucket.
I don’t want these profile pictures to be seen by the public, only authorized members. So I have a php file like this:
This php file executes getObject and sends out a header to show the picture but only if the user is allowed to see the picture. I query the database and also check session to make sure that the currently logged in user has access to the picture. All is working fine, but it takes around 500 milliseconds to the get request to execute, even on small files (40kb). On bigger files it gets even longer as well as if I embed the php file in an img tag multiple times with different query string values.
I need to mention that I’m testing this in a localhost environment with apache webserver.
Could be the the problem is that getObject is optimized to be run from an ec2 instance and that if I would test this on an ec2 the response time is much better?
My s3 is based in London, and I’m testing it in Hungary with a good internet connection so I’m not sure if this response time is what I should get here.
I read that other people had similar issues, but from my understanding the time it takes from s3 to transfer the files to an ec2 should be minimal as they are all in the cloud and the latency between these services and all the other aws services should be minimal (At least if they are in the same region).
Please don’t tell me in comments that I should just make my bucket public and embed the direct link to the file as it is not a viable option for obvious reasons. I also don’t want to generate pre-signed urls for various reasons.
I also tested this without querying the database and essentially the only logic in my code is to get the object and show it to the user. Even with this I get 400+ milliseconds response time.
I also tried using doesObjectExist() and I still need to wait around 300-400 milliseconds for that to give me a response.
Multiple get request to the same php file as image source
UPDATE
I tested it on my ec2 instance and I've got much better response time. I tested it with multiple files and all is fine. It seems like that if you use getObject on localhost, the time it takes to connect to s3 and fetch the data multiplies.
Thank you for the answers!

How do I transfer images from public database to Google Cloud Bucket without downloading locally

I have a a csv file that has over 10,000 urls pointing to images on the internet. I want to perform some machine learning task on them. I am using Google Cloud Platform infrastructure for this task. My first task is to transfer all this images from the urls to a GCP bucket, so that I can access them later via docker containers.
I do not want to download them locally first and then upload them as that is just too much work, instead just transfer them directly to bucket. I have looked at Storage Transfer Service and for my specific case I think, I will be using a URL list. Can anyone help me figure out how do I proceed next. Is this even a possible option?
If yes, how do I generate an MD5 has that is mentioned here for each url in my list and also get the number of bytes for image for each url ?
As you noted, Storage Transfer Service requires that you provide it with the MD5 of each file. Fortunately, many HTTP servers may provide you with the MD5 of an object without requiring that you download it. Issuing an HTTP HEAD request may result in the server providing you with a Content-MD5 header in its response, which may not be in the form that Storage Transfer service requires, but it can be converted into that form.
The downside here is that web servers are not necessarily going to provide you with that information. There's no way of knowing without checking.
Another option worth considering is to set up one or more GCE instances and run a script from there to download the objects to your GCE instance and from there upload them into GCS. This still involves downloading them "locally," but locally no longer means a place off of Google Cloud, which should speed things up substantially. You can also divide up the work by splitting your CSV file into, say, 10 files with 1000 objects each in them, and setting up 10 GCE instances to do the work.

Transfer file from AWS S3 to OneDrive with AWS Lambda

A client of ours requested that we have copies of their files on both AWS S3 and OneDrive.
The usual MO: File is sent from an iOS application to an AWS S3 bucket. This triggers an AWS Lambda Function which attaches the file to an email and sends a copy to the client, which they again store on OneDrive. Now, we want to skip the email part and transfer the file directly to OneDrive.
All my research so far points to Zapier or CloudRail or MS Graph REST Api. The problem I'm having is that we want to transfer the file with an AWS Lambda function (Java8), automagically. Almost all the tutorials and examples on MS Graph needs a client to log in manually. Mostly client side logic. The other methods have more overhead, and we don't (unnecessarily) want to make our stack more complicated than it already is.
I realize this is a very specific case. We are systematically replacing the client's file management system, without disrupting their day-to-day operations too much.
Any conclusive pointers/examples/tutorials to get this done server side would be greatly appreciated.
I'm not sure how well S3 aligns with OneDrive, they are quite different models. OneDrive is provisioned by user which begs the question, which user would you want to copy this file too? I would think Azure Storage would be a far better fit as it uses a similar model to S3.
You can use Microsoft Graph API to upload the file to a user's OneDrive. You would need to authenticate the user in order to obtain an Access and Refresh Token. Once this process is done, you can store that Refresh Token and retrieve an updated Access Token as needed.
Also with CloudRail it's necessary to authenticate the user, but there are methods to store and use an access token.
The services have two methods, loadAsString and saveAsString, and they are used to store and load credentials. You could call loadAsString with your access token, the string can be different from service to service, but will look something like this: [{“access_token”: “YOUR ACCESS TOKEN”}]
To add to this, Microsoft now has a cloud migration tool www.mover.io that allows you to sync files & folders from most clouds into Azure blob, Sharepoint or OneDrive directly, so without download/upload to a client machine.
Personally used it only for a one-time sync, but leaving it here for posterity.
The client only has to login once so if you already have the client and secret keys, you can do the manual flow once then save the generated token file together with your code files in AWS. Next time the code is ran, it uses the refresh token. Last time I did this I was able to set the refresh token to never expire but I think Microsoft has randomly removed that option and now the token can only last something like 2 or 3 years max

Service on amazon to download temporary files

I have to decide one thing, and would be very glad if some one could help me with that.
So the thing is we have an infrastructure on Amazon. Back-end processes write multiple files to the S3. Than when customer request a report, - we launch EMR job and create a result. So, the question is how to give this report file back to the customer?
What I would like to have is some temporary storage, that will give a unique url, that customer can download it from.
I was also thinking about storing result file on S3, but don't know if it's a good idea.
Is there some kind of a service on amazon that can help me with that?
You can create a signed URL (that you can later shorten if needed) to download the results file from S3: http://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html