We have been working on a gaming website. Recently while making note of the major traffic sources I noticed a website that I found to be a carbon-copy of our website. It uses our logo,everything same as ours but a different domain name. It cannot be, that domain name is pointing to our domain name. This is because at several places links are like ccwebsite/our-links. That website even has links to some images as ccwebsite/our-images.
What has happened ? How could have they done that ? What can I do to stop this ?
There are a number of things they might have done to copy your site, including but not limited to:
Using a tool to scrape a complete copy of your site and place it on their server
Use their DNS name to point to your site
Manually re-create your site as their own
Respond to requests to their site by scraping yours real-time and returning that as the response
etc.
What can I do to stop this?
Not a whole lot. You can try to prevent direct linking to your content by requiring referrer headers for your images and other resources so that requests need to come from pages you serve, but 1) those can be faked and 2) not all browsers will send those so you'd break a small percentage of legitimate users. This also won't stop anybody from copying content, just from "deep linking" to it.
Ultimately, by having a website you are exposing that information to the internet. On a technical level anybody can get that information. If some information should be private you can secure that information behind a login or other authorization measures. But if the information is publicly available then anybody can copy it.
"Stopping this" is more of a legal/jurisdictional/interpersonal concern than a technical one I'm afraid. And Stack Overflow isn't in a position to offer that sort of advice.
You could run your site with some lightweight authentication. Just issue a cookie passively when they pull a page, and require the cookie to get access to resources. If a user visits your site and then the parallel site, they'll still be able to get in, but if a user only knows about the parallel site and has never visited the real site, they will just see a crap ton of broken links and images. This could be enough to discourage your doppelganger from keeping his site up.
Another (similar but more complex) option is to implement a CSRF mitigation. Even though this isn't a CSRF situation, the same mitigation will work. Essentially you'd issue a cookie as described above, but in addition insert the cookie value in the URLs for everything and require them to match. This requires a bit more work (you'll need a filter or module inserted into the pipeline) but will keep out everybody except your own users.
Related
I am very confused as to how Safari ITP 2.3 works in certain respects, and why sites can’t easily circumvent it. I don’t understand under what circumstances limits are applied, what the exact limits are, to what they are applied, and for how long.
To clarify my question I broke it down into several cases. I will be referring to Apple’s official blog post about ITP 2.3 [1] which you can quote from, but feel free to link to any other authoritative or factually correct sources in your answer.
For third-party sites loaded in iframes:
Why can’t they just use localStorage to store the values of cookies, and send this data back and forth not as actual browser cookie headers 🍪, but as data in the body of the request or a header like Set-AuxCookie? Similarly, they can parse the response to updaye localStorage. What limits does ITP actually place on localStorage in third party iframes?
If the localStorage is frequently purged (see question 1), why can’t they simply use postMessage to tell a script on the enclosing website to store some information (perhaps encrypted) and then spit it back whenever it loads an iframe?
For sites that use link decoration
I still don’t understand what the limits on localStorage are in third party sites in iframes, which did NOT get classified as link decorator sites. But let’s say they are link decorator sites. According to [1] Apple only start limiting stuff further if there is a querystring or fragment. But can’t a website rather trivially store this information in the URL path before the querystring, ie /in/here without ?in=here … certainly large companies like Google can trivially choose to do that?
In the case a site has been labeled as a tracking site, does that mean all its non-cookie data is limited to 7 days? What about cookies set by the server, aren’t they exempted? So then simply make a request to your server to set the cookie instead of using Javascript. After all, the operator of the site is very likely to also have access to its HTTP server and app code.
For all sites
Why can’t a service like Google Analytics or Facebook’s widgets simply convince a site to additional add a CNAME to their DNS and get Google’s and Facebook’s servers under a subdomain like gmail.mysite.com or analytics.mysite.com ? And then boom, they can read and set cookies again, in some cases even on the top-level domain for website owners who don’t know better. Doesn’t this completely defeat the goals of Apple’s ITP, since Google and Facebook have now become a “second party” in some sense?
Here on StackOverflow, when we log out on iOS Safari the StackOverflow network is able to log out of multiple sites at once … how is that even accomplished if no one can track users across websites? I have heard it said that “second party cookies” still can be stored but what exactly makes a second party cookie different from a third party?
My question is broken down into 6 cases but the overall theme is, in each case: how does Apple’s latest ITP work in that case, and how does it actually block all cases of potentially malicious tracking (to the point where a well-funded company can’t just do the workarounds above) while at the same time allowing legitimate use cases?
[1] https://webkit.org/blog/9521/intelligent-tracking-prevention-2-3/
I am not sure if the below answers are correct, please comment if they are not:
It seems applications can use localStorage with no problem, up to 7 days. But it won’t be persisted across multiple enclosing domains. I would even recommend using sessionStorage, since the goal is just to have nothing more than a seamless session. You can then roll your own cookie mechanism using a different set of headers, the only thing you can’t implement is http-only cookies.
They can, but ITP won’t let the JavaScript on the enclosing page store cookies (at least, not if your third party domain was flagged as a tracker by Safari).
Yeah, the description of “link decoration” technically doesn’t mention this workaround, but probably Apple has or will update its classifier to handle this workaround.
Yes, if a first-party webpage will send a request to the server and it sets a cookie in the response headers, then these aren’t blocked by ITP, even if it has an iframe to a tracking site. They say that’s not their goal.
Yes, in fact your first-party site can just let your site redirect to google.com and back quickly (like with oAuth) and thereby inform Google of whatever you wanted, without cookies. Google’s JavaScript can do this as well, if you allow it. Then the JavaScript can just load your google-hosted subdomain in an iframe and set a cookie that persists for years, tracking the user. However, ITP 2.3 seems to have also added mitigation to this, so you might use A records instead? https://cookiesaver.io/archives/analytics-guides/cname-cloaking-mitigation-eliminates-safari-itp-workarounds/
Probably the StackExchange network uses a version of #5
I use Safari and Firefox. Since Safari doesn't offer the cookie handling I prefer, I frequently invoke a separate tool to clear out most of the cookies. Neither Kayak nor Google are on my exceptions list. A couple of times recently, I cleaned out cache, local storage, flash, etc. to get rid of the so-called zombie cookies (while no webkit clients were running). I was in USA when I did this.
In spite of this, every time I access a Google site, it redirects to google.es and every time I access Kayak.com, it redirects to kayak.uk
I understand about browser fingerprint, but I have occasionally changed a few things that affect that. And even if that weren't enough, there is nothin to hint that they have identified me specifically. I would expect that in the absence of any completely unique identifier, they would assume the country of the IP address(es), which for over four weeks have been Comcast and ATT. (And various public WiFi sites.)
It's not log in—I don't have a login for Kayak and I never log in to Google unless absolutely necessary.
With kayak, I changed it (menu lower right corner of search page) to USA/dollars, but as soon as I go to another site without deleting cookies, when I come back, it's on UK/pounds again.
What might be the cause of this? There are a few other sites behaving similarly.
I think I understand it now. Someone correct me if I missed something.
It was not cookies or tracking or HTML redirect.
It was the browser "remembering" that I had visited kayak.co.uk and google.es while over there and "fixing" the URL for me.
I would like Akamai not to cache certain URLs if a specified cookie exist (i.e) If user logged in on specific pages. Is there anyway we can do with Akamai?
The good news, is that I have done exactly this in the past for the Top Gear site (www.topgear.com/uk). The logic goes that if a cookie is present (in this case "TGCACHEKEY") then the Akamai cache is to be bypassed for certain url paths. This basically turns off Akamai caching of html pages when logged in.
The bad news is that you require an Akamai consultant to make this change for you.
If this isn't an option for you, then Peter's suggestions are all good ones. I considered all of these before implementing the cookie based approach for Top Gear, but in the end none were feasible.
Remember also that Akamai strips cookies for cached resources by default. That may or may not effect you in your situation.
The Edge Server doesn't check for a cookie before it does the request to your origin server and I have never seen anything like that in any of their menus, conf screens or documentation.
However, there are a few ways I can think of that you can get the effect that I think you're looking for.
You can specify in the configuration settings for the respective digital property what path(s) or URL(s) you don't want it to cache. If you're talking about a logged on user, you might have a path that only they would get to or you could set up such a thing server side. E.g. for an online course you would have www.course.com/php.html that anybody could get to whereas you might use www.course.com/student/php-lesson-1.html for the actual logged on lessons content. Specifying that /student/* would not be cached would solve that.
If you are serving the same pages to both logged on and not-logged on users and can't do it that way, you could check server-side if they're logged on and if so add a cache-breaker to the links so when they follow a link a cache-breaker is automatically added. You could also do this client side if you want, but it would be more secure and faster to do it server-side. As a note on this, this could be userid-random#. That would keep it unique enough when combined with the page that nobody else would request it and get the earlier 'cache-broken' page.
If neither of the above are workable, there is one other way I can think of, which is a bit unconventional to say the least, but it would work. Create symbolically linked directory in your document root with another name so that you can apply the first option and exempt it from cacheing. Then you check if the guy is logged on and if so prepend the extra directory to the links. From akamai's point of view www.mysite.com/logged-on/page.html can be exempt from cache where www.mysite.com/content/page.html is cached. On your server if /logged-on/ symbolically links over to /content/ then you're all set.
When they login you could send them to a subdomain which is set up as a ServerAlias, so on your side it's the same, but on Akamai has differnt cache handling rules.
Following the same answer than #llevera, you can use the cookies on CloudFlare without intervention of engineers to make the change for you.
Having that sort of cookies to bypass content is a technique that its becoming more popular with the time, and even bug companies like Magento are using it for Magento 2 platform.
But solutions from above still valid, Maybe Akamai supports that that already now, we are in 2017!
Is it dangerous to have your admin interface in a Django app accessible by using just a plain old admin url? For security should it be hidden under an obfuscated url that is like a 64 bit unique uuid?
Also, if you create such an obfuscated link to your admin interface, how can you avoid having anyone find out where it is? Does the google-bot know how to find that url if there is no link to that url anywhere on your site or the internet?
You might want to watch out for dictionary attacks. The safest thing to do is IP restrict access to that URL using your web server configuration. You could also rate limit access to that URL - I posted an article about this last week.
If a URL is nowhere on the internet "the googlebot" can't know about it ... unless somebody tells it about it. Unfortunately many users have toolbars installed in their browser, which submit all URLs visited by the browser to various Servers (e.g. Alexa, Google).
So keeping an URL secret will not work in the long run.
Also an uuid is hard to remember and to type - leading to additional support ("What was the URL again?").
But I still strongly suggest to change the URL (e.g. to /myadmin/). This will foil automatic scanning and attack tools. So If one day an "great Django worm" hits the Internet, you have a much lower chance of being hit.
People using PHPmyAdmin had this experience for the last few years: changing the default URL avoids most attacks.
Whilst there is no harm in adding an extra layer of protection (an obfuscated url) enforcing good password choice (checking password strength and checking it's not in a large list of common passwords) would be a much better use of your time.
Assuming you've picked a good password, no, it's not dangerous. People may see the page, but they won't be able to get in anyway.
If you don't want Google to index a directory, you can use a robots.txt file to control that.
We have several websites on different domains and I'd like to be able to track users' movements on these sites.
Obviously cookies are not feasable, because they don't cross domain borders.
I could look at a combination of IP address and User Agent, but there are some cases where that does not work.
I don't want to use flash or other plugins.
Any ideas? Or am I doomed to rely on the IP/User_Agent combination?
You can designate one domain or subdomain to tracking and have it serve a 1x1 pixel image which you include in all pages you would like to track. Serve a cookie with the image, look at the tracking domain's server logs, voilà.
This solution requires no JavaScript, and works even if the user disables third-party cookies.
First, let's make sure the user agent is sending cookies:
If getCookie("c") == null then setCookie("c", "anyValue")
Then let the request finish (aka wait for next request)
Let's call our tracker cookie uaid.
If GET http://child.com/any-page and getCookie("c") is not null and getCookie("uaid") is null...
Redirect to http://parent.com/give-me-a-uaid?returnTo=http://child.com/any-page
On http://parent.com/give-me-a-uaid, check for cookie uaid
If not exists, create it and add it to response. If it exists, get its value.
Redirect to http://child.com/any-page?uaid=valueOfParentsUAIDCookie
Child.com sets cookie uaid with valueOfParentsUAIDCookie
Redirect to http://child.com/any-page
And of course, you are validating input, and white-listing your redirect URLs :)
Flows:
This question is closely related to the Question Accessing Domain Cookies within an iFrame on Internet Explorer.
For Internet Explorer I need to take P3P Policies into account and set an additional P3P HTTP-Header to allow images to set cookies across domain borders. Then I can use simon's suggestion.
You can follow the same concept used in Google Analytics. Injecting javascript in the pages you want to track.
You do not give any context to your situation -just the basic problem. So it is difficult to give an answer that clearly fits. However, here are some techniques/mechanisms for passing information from one page to another, regardless of what domain is involved.
include hyperlink to a 1x1 pixel transparent gif image (sometimes called a "beacon")
rely on referrer information in HTTP request headers to identify page hyperlink is on
include extra parameters in hyperlinks to other site - assuming you run both sites
buy services of a company like Akamai to do user tracking for you
possibly use cross domain cookie mechanism in the future if standard is ever approved
Which techniques really come down to whether you can place software on all of the sites (servers) that the user will visit where you have interest - or you cannot place your software on all of them.