Hotlinking Twitter avatar images? - web-services

The Twitter API returns this value for the Twitter account 'image_url':
http://a1.twimg.com/profile_images/75075164/twitter_bird_profile_bigger.png
In my Twitter client webapp, I am considering hotlinking the HTTPS version of avatars which is hosted on Amazon S3 : https://s3.amazonaws.com/twitter_production/profile_images/75075164/twitter_bird_profile_bigger.png
Any best practices which would discourage me from doing this ? Do 3rd party Twitter client applications typically host their own copies of avatars ?
EDIT: To clarify, I need to use HTTPS for images because my webapp will use a HTTPS connection and I don't want my users to get security warnings from their browser about the page containing some content which is not authenticated. For example, Firefox is known to complain about mixed http/https content.
My problem is to figure out whether or not hotlinking the https URLs is forbidden by Twitter, since these URLs are not "public" from their API. I got them by analyzing their web client HTML source when connected to my Twitter account in HTTPS.

Are you thinking of storing the image URL in your application or retrieving it for the user as it is required?
If its the latter option then I don't see an issue with hot-linking the images. If you are storing the location of the image url in your own system then I see you having broken links whenever the images change (I'm sure they will change the URLs at some point in the future).
Edit
Ok, now i see your dilemma. I've looked through the API docs and there doesnt seem to be too much in terms of being able to get images served in HTTPS or getting the URL of the Amazon S3 image. You could possibly write a handler on your own server that would essentially cache & re-serve the HTTP image as HTTPS however thats a bit of un-neccesary load on your servers. Short of that I haven't come across a better solution. GL

the things seems updated since that.
Please check: https://dev.twitter.com/docs/user-profile-images-and-banners
The SSL-enabled path template for a profile image is indicated in the profile_image_url_https. The table above demonstrates how to apply the same variant selection techniques to SSL-based images.

Why would you want to copy the image to your own webspace? This will increase your bandwidth cost and you get cache consistency issues.
Use the URL that the API gives you.
I can see that you may want to cache the URL that the API returns for some time in order to reduce the amount of API calls.
If you are writing something like an iPhone app, it makes sense to cache the image locally (on the phone), in order to avoid web traffic altogether, but replacing one URL with another URL should not make a difference (assuming that the Twitter image server works reliably).
Why do you want HTTPS?

Related

Parallel website running to my original website

We have been working on a gaming website. Recently while making note of the major traffic sources I noticed a website that I found to be a carbon-copy of our website. It uses our logo,everything same as ours but a different domain name. It cannot be, that domain name is pointing to our domain name. This is because at several places links are like ccwebsite/our-links. That website even has links to some images as ccwebsite/our-images.
What has happened ? How could have they done that ? What can I do to stop this ?
There are a number of things they might have done to copy your site, including but not limited to:
Using a tool to scrape a complete copy of your site and place it on their server
Use their DNS name to point to your site
Manually re-create your site as their own
Respond to requests to their site by scraping yours real-time and returning that as the response
etc.
What can I do to stop this?
Not a whole lot. You can try to prevent direct linking to your content by requiring referrer headers for your images and other resources so that requests need to come from pages you serve, but 1) those can be faked and 2) not all browsers will send those so you'd break a small percentage of legitimate users. This also won't stop anybody from copying content, just from "deep linking" to it.
Ultimately, by having a website you are exposing that information to the internet. On a technical level anybody can get that information. If some information should be private you can secure that information behind a login or other authorization measures. But if the information is publicly available then anybody can copy it.
"Stopping this" is more of a legal/jurisdictional/interpersonal concern than a technical one I'm afraid. And Stack Overflow isn't in a position to offer that sort of advice.
You could run your site with some lightweight authentication. Just issue a cookie passively when they pull a page, and require the cookie to get access to resources. If a user visits your site and then the parallel site, they'll still be able to get in, but if a user only knows about the parallel site and has never visited the real site, they will just see a crap ton of broken links and images. This could be enough to discourage your doppelganger from keeping his site up.
Another (similar but more complex) option is to implement a CSRF mitigation. Even though this isn't a CSRF situation, the same mitigation will work. Essentially you'd issue a cookie as described above, but in addition insert the cookie value in the URLs for everything and require them to match. This requires a bit more work (you'll need a filter or module inserted into the pipeline) but will keep out everybody except your own users.

Restrict S3 permissions to just website

I have people uploading video content and I'd like to restrict the video content to ONLY be streamed from my site. Since the video URLs in the video tag are easily accessible through the HTML source, I was to stop people from copying the direct s3 url and putting it in another tab.
I was looking over the docs here: http://docs.aws.amazon.com/IAM/latest/UserGuide/AccessPolicyLanguage_ElementDescriptions.html#Condition
But it wasn't immediately obvious to me.
Thanks for your help!
You need to make this bucket private and use the signed URL to give access only to your users on your website. Signed URLs have short life (and required policy baked into it) when you generate them. This will prevent misuse even if somebody steals the URLs (or sends you the faked referrer headers etc).
You can create these URLs manually (difficult to manage) or programmatically (some coding work required). In the second case, once your website user contacts your server, then generate and serve the auto-expiring URL. Use this URL then on your website.
Overview of Signed URLs - Amazon CloudFront.

Image Storage in S3 and serving securely

I am building a photos-site where users can upload photo and use it view later. These photos are not public and private. I am storing the photos and the thumbnails in S3. Currently the implementation that I am following is that when a user comes to page I serve signed urls of the thumbnails and that its loaded from S3(though I am also thinking about using signed urls from cloudfront).
The issues now are:
In each request a different url is served for each thumbnail, so the browser cache can't be used. This makes the browser load each image again when the user refreshes the site. It makes our page slow.
This also creates one more problem that if someone snoops into the source and all, he can find out the signed-url of the photos and distribute it to others for viewing(though the signed url is only for 10 mins). What I would preferably like is that the url be passed by my application so that I can decide if the user should be allowed or not.
Please help me with what approach I should take, I want the page loading time to be fast and also serve the security concern. I would also like to know that will serving from cloudfront be faster than browser cache( I have read it someplace) even for different signed url everytime.
Feel free to be descriptive in your answer.
I don't think there is a perfect answer to what you want. Some random ideas/tradeoffs:
1) switch to HTTPS. That way you can ignore people sniffing URLs. But HTTPS items cannot be cached in the browser for very long.
2) If you are giving out signed urls, don't set expires = "time + 10m", but do "time + 20m and round to nearest 10m". This way, the URLs will be constant for at least 10m, and the browser can cache them. (Be sure to also set the expires: headers on the files in S3 so the browser knows they can be cached.)
3) You could proxy all the URLs. Have the browser request the photo from your server, then write a web proxy to proxy the request to the photo in S3. Along the way, you can check the user auth, generate a signed URL for S3, and even cache the photo locally.) This seems "less efficient" for you, but it lets the browser cache your URLs for as long as they want. It's also convenient for your users, since they can bookmark a photo URL, and it always works. Even if they move to a different computer, they hit your server which can ask them to sign in before showing the photo.
Make sure to use an "evented" server like Python Twisted, or Node.js. That way, you can be proxying thousands of photos at the same time without using a lot of memory/CPU on your server. (You will use a lot of bandwidth, since all data goes thru your server. But you can "scale out" easily by running multiple servers.)
4) Cloudfront is a cache. It will be slower (by a few hundred ms) the first time a resource is requested from a CF server. But don't expect the second request to be cached! Each CF location has ~20 different servers, and you'll hit a random one each time. So requesting a photo 10 times will likely generate 10 cache misses, and you still only have a 50% chance of getting a cache hit on the next request. CF is only useful for popular content that is going to be requested hundreds of times. CF is somewhat useful for foreign users because the private CF-to-S3 connection can be better than the normal internet.
I'm not sure exactly how you would have CF do your security checking for you. But if you pass thru the S3 auth, (not the default), then you could use the "mod 10 minutes" trick to make URLs that can be cached for 10 minutes.
It is impossible for CF to be "faster than a browser cache". But if you are NOT using your browser cache, CF can be faster than S3, but mostly in foreign locations.
Take a look at what other people do (i.e. smugmug uses S3, I think.)

iPhone App: Making a webpage accessible only to people using a specific app

I was just wondering if it is possible and if so what the best way to create a web-page that is only accessible from a custom iPhone application? For example, if you tried to access the webpage from the iPhone's built in browser, or any other browser it would display an error page but when accessed from a custom built application it would be fully functional.
One idea that has come up is to change the User-Agent string in the embedded browser inside the application to something custom. I'm not sure if this is viable though.
I hope this makes sense.
Thanks in advance.
-Ben
Any and all request headers can and will be spoofed. Authentication is the only plausible solution.
Changing the User-Agent string is a good method. I haven't tried it personally, but you should be able to alter the NSURLRequest object and change the user-agent before the request is made.
You could also use other custom data in the HTTP request to allow/block visits. You could add a query string to the URL or include some unique POST data.
Note this isn't a real security measure as anyone could fake any part of the HTTP request to gain access. Someone could easily read the HTTP traffic generated from your app and use that to figure out how to access the site with any browser.

Cross Domain User Tracking

We have several websites on different domains and I'd like to be able to track users' movements on these sites.
Obviously cookies are not feasable, because they don't cross domain borders.
I could look at a combination of IP address and User Agent, but there are some cases where that does not work.
I don't want to use flash or other plugins.
Any ideas? Or am I doomed to rely on the IP/User_Agent combination?
You can designate one domain or subdomain to tracking and have it serve a 1x1 pixel image which you include in all pages you would like to track. Serve a cookie with the image, look at the tracking domain's server logs, voilĂ .
This solution requires no JavaScript, and works even if the user disables third-party cookies.
First, let's make sure the user agent is sending cookies:
If getCookie("c") == null then setCookie("c", "anyValue")
Then let the request finish (aka wait for next request)
Let's call our tracker cookie uaid.
If GET http://child.com/any-page and getCookie("c") is not null and getCookie("uaid") is null...
Redirect to http://parent.com/give-me-a-uaid?returnTo=http://child.com/any-page
On http://parent.com/give-me-a-uaid, check for cookie uaid
If not exists, create it and add it to response. If it exists, get its value.
Redirect to http://child.com/any-page?uaid=valueOfParentsUAIDCookie
Child.com sets cookie uaid with valueOfParentsUAIDCookie
Redirect to http://child.com/any-page
And of course, you are validating input, and white-listing your redirect URLs :)
Flows:
This question is closely related to the Question Accessing Domain Cookies within an iFrame on Internet Explorer.
For Internet Explorer I need to take P3P Policies into account and set an additional P3P HTTP-Header to allow images to set cookies across domain borders. Then I can use simon's suggestion.
You can follow the same concept used in Google Analytics. Injecting javascript in the pages you want to track.
You do not give any context to your situation -just the basic problem. So it is difficult to give an answer that clearly fits. However, here are some techniques/mechanisms for passing information from one page to another, regardless of what domain is involved.
include hyperlink to a 1x1 pixel transparent gif image (sometimes called a "beacon")
rely on referrer information in HTTP request headers to identify page hyperlink is on
include extra parameters in hyperlinks to other site - assuming you run both sites
buy services of a company like Akamai to do user tracking for you
possibly use cross domain cookie mechanism in the future if standard is ever approved
Which techniques really come down to whether you can place software on all of the sites (servers) that the user will visit where you have interest - or you cannot place your software on all of them.