Image Storage in S3 and serving securely

Image Storage in S3 and serving securely - django

I am building a photos-site where users can upload photo and use it view later. These photos are not public and private. I am storing the photos and the thumbnails in S3. Currently the implementation that I am following is that when a user comes to page I serve signed urls of the thumbnails and that its loaded from S3(though I am also thinking about using signed urls from cloudfront).
The issues now are:
In each request a different url is served for each thumbnail, so the browser cache can't be used. This makes the browser load each image again when the user refreshes the site. It makes our page slow.
This also creates one more problem that if someone snoops into the source and all, he can find out the signed-url of the photos and distribute it to others for viewing(though the signed url is only for 10 mins). What I would preferably like is that the url be passed by my application so that I can decide if the user should be allowed or not.
Please help me with what approach I should take, I want the page loading time to be fast and also serve the security concern. I would also like to know that will serving from cloudfront be faster than browser cache( I have read it someplace) even for different signed url everytime.
Feel free to be descriptive in your answer.

I don't think there is a perfect answer to what you want. Some random ideas/tradeoffs:
1) switch to HTTPS. That way you can ignore people sniffing URLs. But HTTPS items cannot be cached in the browser for very long.
2) If you are giving out signed urls, don't set expires = "time + 10m", but do "time + 20m and round to nearest 10m". This way, the URLs will be constant for at least 10m, and the browser can cache them. (Be sure to also set the expires: headers on the files in S3 so the browser knows they can be cached.)
3) You could proxy all the URLs. Have the browser request the photo from your server, then write a web proxy to proxy the request to the photo in S3. Along the way, you can check the user auth, generate a signed URL for S3, and even cache the photo locally.) This seems "less efficient" for you, but it lets the browser cache your URLs for as long as they want. It's also convenient for your users, since they can bookmark a photo URL, and it always works. Even if they move to a different computer, they hit your server which can ask them to sign in before showing the photo.
Make sure to use an "evented" server like Python Twisted, or Node.js. That way, you can be proxying thousands of photos at the same time without using a lot of memory/CPU on your server. (You will use a lot of bandwidth, since all data goes thru your server. But you can "scale out" easily by running multiple servers.)
4) Cloudfront is a cache. It will be slower (by a few hundred ms) the first time a resource is requested from a CF server. But don't expect the second request to be cached! Each CF location has ~20 different servers, and you'll hit a random one each time. So requesting a photo 10 times will likely generate 10 cache misses, and you still only have a 50% chance of getting a cache hit on the next request. CF is only useful for popular content that is going to be requested hundreds of times. CF is somewhat useful for foreign users because the private CF-to-S3 connection can be better than the normal internet.
I'm not sure exactly how you would have CF do your security checking for you. But if you pass thru the S3 auth, (not the default), then you could use the "mod 10 minutes" trick to make URLs that can be cached for 10 minutes.
It is impossible for CF to be "faster than a browser cache". But if you are NOT using your browser cache, CF can be faster than S3, but mostly in foreign locations.
Take a look at what other people do (i.e. smugmug uses S3, I think.)

Related

Serve private, user-uploaded media from Google Cloud Storage

I'm evaluating using GCP for my new project, however, I'm still trying to figure out how to implement the following feature and what kind of costs it will have.
TL;DR
What's the best strategy to serve user-uploaded media from GCP while giving users full control on who will be able to access them?
Feature Description
As an User, I want to upload some kind of media (eg: image, videos, etc...) in a private and secure way.
The media must be visible by me and by a specific subgroup of users to which I've granted access to.
Anybody else must not be able to access the media, even if he obtained the URL.
The media content would then be displayed on the website.
Dilemma
I would like to use Cloud Storage to store all the media, however, I'm struggling to find a suitable solution for the authorization part.
As far as I can tell, features related to "Access Control" are mostly tailored at Project and Organisational level.
The closest feature so far are Signed URLs, but this doesn't satisfy the requirement of not being able to access it even if you have the URL, even though it expires soon after and perhaps it could be a good compromise.
Another problem with this approach is that the media cannot be cached at the browser level, which could save quite some bandwidth in the long run...
Expensive Solution?
One solution that came to my mind, is that I could serve it through a GCE instance by putting an App there that validate a user, probably through a JWT, and then stream it back while using the appropriate cache headers.
This should satisfy all requirements, but I'm afraid about egress costs skyrocketing :(
Thank you to whoever will help!

Signed URLs are the solution you want.
Create a service account that represents your application. When a user of your application wants to upload an object, vend them a signed URL for performing the upload. The new object will be readable only by your service account (and other members of your project).
When a user wants to view an object, perform whatever checks you like and then vend them a signed URL for reading the object. Set a short expiration time if you are worried about the URLs being shared.
I would not advise the GCE-based approach unless you get some additional benefit out of it. I don't see how it adds any additional security to serve the data directly instead of via a signed URL.

Parallel website running to my original website

We have been working on a gaming website. Recently while making note of the major traffic sources I noticed a website that I found to be a carbon-copy of our website. It uses our logo,everything same as ours but a different domain name. It cannot be, that domain name is pointing to our domain name. This is because at several places links are like ccwebsite/our-links. That website even has links to some images as ccwebsite/our-images.
What has happened ? How could have they done that ? What can I do to stop this ?

There are a number of things they might have done to copy your site, including but not limited to:
Using a tool to scrape a complete copy of your site and place it on their server
Use their DNS name to point to your site
Manually re-create your site as their own
Respond to requests to their site by scraping yours real-time and returning that as the response
etc.
What can I do to stop this?
Not a whole lot. You can try to prevent direct linking to your content by requiring referrer headers for your images and other resources so that requests need to come from pages you serve, but 1) those can be faked and 2) not all browsers will send those so you'd break a small percentage of legitimate users. This also won't stop anybody from copying content, just from "deep linking" to it.
Ultimately, by having a website you are exposing that information to the internet. On a technical level anybody can get that information. If some information should be private you can secure that information behind a login or other authorization measures. But if the information is publicly available then anybody can copy it.
"Stopping this" is more of a legal/jurisdictional/interpersonal concern than a technical one I'm afraid. And Stack Overflow isn't in a position to offer that sort of advice.

You could run your site with some lightweight authentication. Just issue a cookie passively when they pull a page, and require the cookie to get access to resources. If a user visits your site and then the parallel site, they'll still be able to get in, but if a user only knows about the parallel site and has never visited the real site, they will just see a crap ton of broken links and images. This could be enough to discourage your doppelganger from keeping his site up.
Another (similar but more complex) option is to implement a CSRF mitigation. Even though this isn't a CSRF situation, the same mitigation will work. Essentially you'd issue a cookie as described above, but in addition insert the cookie value in the URLs for everything and require them to match. This requires a bit more work (you'll need a filter or module inserted into the pipeline) but will keep out everybody except your own users.

Django Cache Implementation

Well, I'm designing a web application using Django. The application allow users to select a photo from the computer system and keep populating onto the users timeline. The timeline view have a list/grid of all the photos which the user has uploaded sorted chronologically, showing 50 photos and then a pull to refresh to fetch the next 50 photos on the timeline.The implementation works for multiple users.
Now for fast user experience of the app I'm considering caching. Like most sites store the timeline of the user onto cache so that whenever the user logs in it the first place to check for information the request is served out of the cache and if it is not available there then you go to the DB to query for the information.
Primarily in one line I'm trying to cache all the timelines for different users in cache for now.
I'm done with building of the webapp minus the cache part. So , my question is how do I cache all the timelines of different users??

There is a big difference between public caching and the caching of private data. I feel your data is private and thus needs a different strategy. There is a nice overview of the different ways to implement testing and, more importantly, the different things you need to take into account: The Server Side (Tom Eastman). This has a part on speed and caching (16:20 onward). It explains how to use etag and last_modified headers with django.

Bypass specific URL from Akamai if certain cookie exist

I would like Akamai not to cache certain URLs if a specified cookie exist (i.e) If user logged in on specific pages. Is there anyway we can do with Akamai?

The good news, is that I have done exactly this in the past for the Top Gear site (www.topgear.com/uk). The logic goes that if a cookie is present (in this case "TGCACHEKEY") then the Akamai cache is to be bypassed for certain url paths. This basically turns off Akamai caching of html pages when logged in.
The bad news is that you require an Akamai consultant to make this change for you.
If this isn't an option for you, then Peter's suggestions are all good ones. I considered all of these before implementing the cookie based approach for Top Gear, but in the end none were feasible.
Remember also that Akamai strips cookies for cached resources by default. That may or may not effect you in your situation.

The Edge Server doesn't check for a cookie before it does the request to your origin server and I have never seen anything like that in any of their menus, conf screens or documentation.
However, there are a few ways I can think of that you can get the effect that I think you're looking for.
You can specify in the configuration settings for the respective digital property what path(s) or URL(s) you don't want it to cache. If you're talking about a logged on user, you might have a path that only they would get to or you could set up such a thing server side. E.g. for an online course you would have www.course.com/php.html that anybody could get to whereas you might use www.course.com/student/php-lesson-1.html for the actual logged on lessons content. Specifying that /student/* would not be cached would solve that.
If you are serving the same pages to both logged on and not-logged on users and can't do it that way, you could check server-side if they're logged on and if so add a cache-breaker to the links so when they follow a link a cache-breaker is automatically added. You could also do this client side if you want, but it would be more secure and faster to do it server-side. As a note on this, this could be userid-random#. That would keep it unique enough when combined with the page that nobody else would request it and get the earlier 'cache-broken' page.
If neither of the above are workable, there is one other way I can think of, which is a bit unconventional to say the least, but it would work. Create symbolically linked directory in your document root with another name so that you can apply the first option and exempt it from cacheing. Then you check if the guy is logged on and if so prepend the extra directory to the links. From akamai's point of view www.mysite.com/logged-on/page.html can be exempt from cache where www.mysite.com/content/page.html is cached. On your server if /logged-on/ symbolically links over to /content/ then you're all set.
When they login you could send them to a subdomain which is set up as a ServerAlias, so on your side it's the same, but on Akamai has differnt cache handling rules.

Following the same answer than #llevera, you can use the cookies on CloudFlare without intervention of engineers to make the change for you.
Having that sort of cookies to bypass content is a technique that its becoming more popular with the time, and even bug companies like Magento are using it for Magento 2 platform.
But solutions from above still valid, Maybe Akamai supports that that already now, we are in 2017!

Hotlinking Twitter avatar images?

The Twitter API returns this value for the Twitter account 'image_url':
http://a1.twimg.com/profile_images/75075164/twitter_bird_profile_bigger.png
In my Twitter client webapp, I am considering hotlinking the HTTPS version of avatars which is hosted on Amazon S3 : https://s3.amazonaws.com/twitter_production/profile_images/75075164/twitter_bird_profile_bigger.png
Any best practices which would discourage me from doing this ? Do 3rd party Twitter client applications typically host their own copies of avatars ?
EDIT: To clarify, I need to use HTTPS for images because my webapp will use a HTTPS connection and I don't want my users to get security warnings from their browser about the page containing some content which is not authenticated. For example, Firefox is known to complain about mixed http/https content.
My problem is to figure out whether or not hotlinking the https URLs is forbidden by Twitter, since these URLs are not "public" from their API. I got them by analyzing their web client HTML source when connected to my Twitter account in HTTPS.

Are you thinking of storing the image URL in your application or retrieving it for the user as it is required?
If its the latter option then I don't see an issue with hot-linking the images. If you are storing the location of the image url in your own system then I see you having broken links whenever the images change (I'm sure they will change the URLs at some point in the future).
Edit
Ok, now i see your dilemma. I've looked through the API docs and there doesnt seem to be too much in terms of being able to get images served in HTTPS or getting the URL of the Amazon S3 image. You could possibly write a handler on your own server that would essentially cache & re-serve the HTTP image as HTTPS however thats a bit of un-neccesary load on your servers. Short of that I haven't come across a better solution. GL

the things seems updated since that.
Please check: https://dev.twitter.com/docs/user-profile-images-and-banners
The SSL-enabled path template for a profile image is indicated in the profile_image_url_https. The table above demonstrates how to apply the same variant selection techniques to SSL-based images.

Why would you want to copy the image to your own webspace? This will increase your bandwidth cost and you get cache consistency issues.
Use the URL that the API gives you.
I can see that you may want to cache the URL that the API returns for some time in order to reduce the amount of API calls.
If you are writing something like an iPhone app, it makes sense to cache the image locally (on the phone), in order to avoid web traffic altogether, but replacing one URL with another URL should not make a difference (assuming that the Twitter image server works reliably).
Why do you want HTTPS?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js