Trying to understand an apparent web tracking malfunction

Trying to understand an apparent web tracking malfunction - cookies

I use Safari and Firefox. Since Safari doesn't offer the cookie handling I prefer, I frequently invoke a separate tool to clear out most of the cookies. Neither Kayak nor Google are on my exceptions list. A couple of times recently, I cleaned out cache, local storage, flash, etc. to get rid of the so-called zombie cookies (while no webkit clients were running). I was in USA when I did this.
In spite of this, every time I access a Google site, it redirects to google.es and every time I access Kayak.com, it redirects to kayak.uk
I understand about browser fingerprint, but I have occasionally changed a few things that affect that. And even if that weren't enough, there is nothin to hint that they have identified me specifically. I would expect that in the absence of any completely unique identifier, they would assume the country of the IP address(es), which for over four weeks have been Comcast and ATT. (And various public WiFi sites.)
It's not log in—I don't have a login for Kayak and I never log in to Google unless absolutely necessary.
With kayak, I changed it (menu lower right corner of search page) to USA/dollars, but as soon as I go to another site without deleting cookies, when I come back, it's on UK/pounds again.
What might be the cause of this? There are a few other sites behaving similarly.

I think I understand it now. Someone correct me if I missed something.
It was not cookies or tracking or HTML redirect.
It was the browser "remembering" that I had visited kayak.co.uk and google.es while over there and "fixing" the URL for me.

Related

What exactly does Safari ITP do?

I am very confused as to how Safari ITP 2.3 works in certain respects, and why sites can’t easily circumvent it. I don’t understand under what circumstances limits are applied, what the exact limits are, to what they are applied, and for how long.
To clarify my question I broke it down into several cases. I will be referring to Apple’s official blog post about ITP 2.3 [1] which you can quote from, but feel free to link to any other authoritative or factually correct sources in your answer.
For third-party sites loaded in iframes:
Why can’t they just use localStorage to store the values of cookies, and send this data back and forth not as actual browser cookie headers 🍪, but as data in the body of the request or a header like Set-AuxCookie? Similarly, they can parse the response to updaye localStorage. What limits does ITP actually place on localStorage in third party iframes?
If the localStorage is frequently purged (see question 1), why can’t they simply use postMessage to tell a script on the enclosing website to store some information (perhaps encrypted) and then spit it back whenever it loads an iframe?
For sites that use link decoration
I still don’t understand what the limits on localStorage are in third party sites in iframes, which did NOT get classified as link decorator sites. But let’s say they are link decorator sites. According to [1] Apple only start limiting stuff further if there is a querystring or fragment. But can’t a website rather trivially store this information in the URL path before the querystring, ie /in/here without ?in=here … certainly large companies like Google can trivially choose to do that?
In the case a site has been labeled as a tracking site, does that mean all its non-cookie data is limited to 7 days? What about cookies set by the server, aren’t they exempted? So then simply make a request to your server to set the cookie instead of using Javascript. After all, the operator of the site is very likely to also have access to its HTTP server and app code.
For all sites
Why can’t a service like Google Analytics or Facebook’s widgets simply convince a site to additional add a CNAME to their DNS and get Google’s and Facebook’s servers under a subdomain like gmail.mysite.com or analytics.mysite.com ? And then boom, they can read and set cookies again, in some cases even on the top-level domain for website owners who don’t know better. Doesn’t this completely defeat the goals of Apple’s ITP, since Google and Facebook have now become a “second party” in some sense?
Here on StackOverflow, when we log out on iOS Safari the StackOverflow network is able to log out of multiple sites at once … how is that even accomplished if no one can track users across websites? I have heard it said that “second party cookies” still can be stored but what exactly makes a second party cookie different from a third party?
My question is broken down into 6 cases but the overall theme is, in each case: how does Apple’s latest ITP work in that case, and how does it actually block all cases of potentially malicious tracking (to the point where a well-funded company can’t just do the workarounds above) while at the same time allowing legitimate use cases?
[1] https://webkit.org/blog/9521/intelligent-tracking-prevention-2-3/

I am not sure if the below answers are correct, please comment if they are not:
It seems applications can use localStorage with no problem, up to 7 days. But it won’t be persisted across multiple enclosing domains. I would even recommend using sessionStorage, since the goal is just to have nothing more than a seamless session. You can then roll your own cookie mechanism using a different set of headers, the only thing you can’t implement is http-only cookies.
They can, but ITP won’t let the JavaScript on the enclosing page store cookies (at least, not if your third party domain was flagged as a tracker by Safari).
Yeah, the description of “link decoration” technically doesn’t mention this workaround, but probably Apple has or will update its classifier to handle this workaround.
Yes, if a first-party webpage will send a request to the server and it sets a cookie in the response headers, then these aren’t blocked by ITP, even if it has an iframe to a tracking site. They say that’s not their goal.
Yes, in fact your first-party site can just let your site redirect to google.com and back quickly (like with oAuth) and thereby inform Google of whatever you wanted, without cookies. Google’s JavaScript can do this as well, if you allow it. Then the JavaScript can just load your google-hosted subdomain in an iframe and set a cookie that persists for years, tracking the user. However, ITP 2.3 seems to have also added mitigation to this, so you might use A records instead? https://cookiesaver.io/archives/analytics-guides/cname-cloaking-mitigation-eliminates-safari-itp-workarounds/
Probably the StackExchange network uses a version of #5

Remove 400 error (cookies) caused by and update of the Consent Management Platform

The site of the company I work at is using a Consent Management Platform which was functioning ok. Lately, we had to make some modifications in it and had to reimplement it. The implementation went ok, even the engineers who offer support for the CMP I'm using confirmed that everything I did was fine.
And now the problem: some users are still having the old cookie on their devices. So now when they are entering the site they receive a 400 error and can not access the site anymore. The fix would be so that every user manually deletes the cookie on their device but this is impossible to do as our visitors are not very technical and we can't reach all of them.
So, is there anyway to somehow make any kind of change/implementation, from our side, from the server-side, in order to refresh the users session and make their 400 error disappear without them having to do it manually?
I'm really in a pinch right now and am in need of real advice.

Parallel website running to my original website

We have been working on a gaming website. Recently while making note of the major traffic sources I noticed a website that I found to be a carbon-copy of our website. It uses our logo,everything same as ours but a different domain name. It cannot be, that domain name is pointing to our domain name. This is because at several places links are like ccwebsite/our-links. That website even has links to some images as ccwebsite/our-images.
What has happened ? How could have they done that ? What can I do to stop this ?

There are a number of things they might have done to copy your site, including but not limited to:
Using a tool to scrape a complete copy of your site and place it on their server
Use their DNS name to point to your site
Manually re-create your site as their own
Respond to requests to their site by scraping yours real-time and returning that as the response
etc.
What can I do to stop this?
Not a whole lot. You can try to prevent direct linking to your content by requiring referrer headers for your images and other resources so that requests need to come from pages you serve, but 1) those can be faked and 2) not all browsers will send those so you'd break a small percentage of legitimate users. This also won't stop anybody from copying content, just from "deep linking" to it.
Ultimately, by having a website you are exposing that information to the internet. On a technical level anybody can get that information. If some information should be private you can secure that information behind a login or other authorization measures. But if the information is publicly available then anybody can copy it.
"Stopping this" is more of a legal/jurisdictional/interpersonal concern than a technical one I'm afraid. And Stack Overflow isn't in a position to offer that sort of advice.

You could run your site with some lightweight authentication. Just issue a cookie passively when they pull a page, and require the cookie to get access to resources. If a user visits your site and then the parallel site, they'll still be able to get in, but if a user only knows about the parallel site and has never visited the real site, they will just see a crap ton of broken links and images. This could be enough to discourage your doppelganger from keeping his site up.
Another (similar but more complex) option is to implement a CSRF mitigation. Even though this isn't a CSRF situation, the same mitigation will work. Essentially you'd issue a cookie as described above, but in addition insert the cookie value in the URLs for everything and require them to match. This requires a bit more work (you'll need a filter or module inserted into the pipeline) but will keep out everybody except your own users.

Do we still need to worry about users turning off cookies?

I've noticed that a lot of sites don't bother anymore with work-arounds so users who have turned their cookies off can still get the same experience on the site. Has that problem just gone away in modern web development? Have we gotten to a point where nobody does it, so we don't need to bother?

I think I put this in the same category as JavaScript. Most people will have cookies enabled, but there will be a few people who have them turned off. There isn't the scare like there was in the mid 90s about evil corporations tracking you all over the net etc. People have become more accepting about how the web works and what is required to have the convenience of web sites remembering who you are etc.

Some people still turn off cookies every once in a while. Usually because they wanted to test something and then forget to leave them off. Nowadays most web apps require cookies on so I think it's perfectly acceptable that instead of complex workarounds to provide the same user experience with or without cookies you can live with just a simple check and a message stating that without cookies the user won't be able to use the site.
There are lots of major websites that behave this way.

My 2c: cookies are good by default and Javascript is evil by default.
As to what general user sentiment is... I'd do cookie detection still so that you can display a meaningful error rather than simply not working if your users are blocking cookies for whatever reason. Don't bother trying to work around it though.

I'm going to guess that it'd be worthwhile running a test to see specifically if your visitors have cookies turned off, because different groups would have different issues. (if it's paranoia, security restrictions, etc.) A website catering to government employees might see higher percentages of cookie non-acceptance than other sites.
As some browsers (or plugins) allow customizing your acceptance of cookies by server or domain, it's possible that even two sites with identical user populations might have different levels of 'trust', if the users believe that one site seems shady.

Is there much of an anti-cookie movement anymore?

I'm not sure whether this belongs on StackOverflow or on ServerFault, so I've picked SO for as first go.
A number of years ago, there was a highly visible discussion about mis-use of HTTP cookies, leading to various cookie filtering proxys and eventually to active cookie filtering in browsers like Firefox and Opera. Even now, Google will admit that currently about 7% of end-users will reject their tracking cookies, which is quite a lot, actually.
I still vett all cookies that get set in my browser. I have for years. I personally do not know anyone else who does this, but it has given me a few interesting insights into web tracking. For instance, there are many many more sites using Google Analytics than there were even two years ago. And there are still sites (extremely few, fortunately) which malfunction hideously if you don't let them set cookies. But advertisers in particular are still setting cookies to track your way across the web.
So is there much of an anti-cookie movement anymore? Has anyone tried to take Google to task for setting so many with Analytics? Is anyone trying to vilify sites like Ebay and PayPal who use a dodgy cross-site cookie to let you login?
Or am I making too much of a stupidly small problem?

Nowadays, there are other ways to block these annoyances. Rick752's EasyList has the EasyPrivacy list, which blocks most of them with no work at all other than adding the subscription once to Adblock Plus. NoScript can (with a little configuration, mostly removing some misguided entries on the default whitelist) easily block the ones which depend on JavaScript.
That said, I set up my browser to empty all the cookies on logout. Then they can track you only for the duration of a session, which will be short unless you tend to keep your browser open for a long time (or use the session save/restore all the time).
If you use Flash, know that it also has a kind of cookies, and the interface to manage them is most probably poorer than your browser's.

There's always people who misunsderstand cookies - on both sides. Ultimatey, it's up to the browsers to properly identify the sites for cookies. As long as the site's being set properly and the browser's respecting that, it's just not much of a problem. I think thta, with the increased use of web toolkits that take care of the programmatic details (and better, slightly more security-conscious browsers), it's not much of an issue now for end-users.
Beyond that, the proliferation of DHTML and XML-based partial-page-loading mechanisms (as well as database-backends and similar), the need to track session between stateless pages is reduced now. Your web app can very easily keep state without the need for cookies, and that may well have partially been driven by the number of [generally misinformed] end-users who blocked cookies all together.
In shorter words: "IMHO, no".

I gave up both as user and developer.
As a user the convenience of staying logged into sites is just too tempting, the pain of some sites not working too annoying. And I'm not that sensitive about my privacy, so I stopped caring and let all cookies through.
As a developer I always try to be as RESTful as possible, but I don't know any decent way of handling authentication without cookies. HTTP Basic Auth is just too broken, I can't assume HTTPS all the time and mangling URLs is painful and inelegant. What's left is form-based authentication with cookies. So my applications have one auth cookie -- I don't need any more than that, but that by itself requires the user to have cookies on if they want to authenticate themselves. Maybe OpenID and other federated identity services might fix that one day, but at the moment I can't rely on any of these yet.

My biggest annoyance with cookies is that I want to block Analytics cookies but at the same time I need to login to analytics to manage some customer sites. As far as I can tell they are the same cookie (in fact it may be the same cookie across all google services).
I really don't trust the Google cookie. They were apparently one of the first large companies to set cookie expiration to 2038 (the maximum) and their business model is almost entirely advertising based (targeted advertising at that). I suspect they know more about the day-to-day online activities and interests of people than any other government or organisation on the planet.
That's not to say it's all evil or anything but that really is a lot of trust to be given one entity. They may claim it's all anonymised but I'm pretty sure that claim would be hard to verify. At any rate there is no guarantee that this data won't be stolen, legally acquired or otherwise misused at some future point for other purposes.
It isn't impossible that one day this kind of profiling could be used to target people for more serious things than ads. How hard would it be for some future Hitler to establish the IP addresses, bank accounts, schools, employers, club memberships etc of some arbitary class of person for incarceration or worse?
So my answer is that this is not a small problem and history has already taught us many times over what can happen when you start classifying and tracking people. Cookies are not the only means but they are certainly a part of the problem and I recommend blocking them and clearing at every convenient opportunity.

I am also one of the hold-outs who doesn't automatically accept cookies. I do appreciate sites that need fewer, and I am more likely to return to those sites and allow cookies from them in the future.
That said, I do think that being vigilant about cookies is not (rationally) worth the effort. (In other words, I expect I will keep doing what I'm doing because it makes me feel better, even though I don't have evidence of commensurate tangible benefit.)

Every now and again I clear all my cookies. It's a pain as I then have to login to sites again (or set preferences) but this is also a good test as to whether either me or my browser can remember the login details..

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js