My question is: I'm developing a website and I want to monitor analytics with Google Analytics, however I've been reading articles about cookies and I didn't realize if I need to program my website with some kind of cookies in order to use google tool, or if I simply don't need to do anything on my website.
Thanks
To do tracking you simply need to insert the code snippet that you can get from the GA admin interface.
However since you are in the EU you need to point out to your visitors that they are being tracked on your web page and that the site uses cookies to do so (and I think you need to provide an opt-out, although that might be a German thing). This is mandated by the European Privacy directive, which is sometimes referred to as "Cookie Law" (technically incorrect, since it is neither a law nor specifically about cookies), so maybe this gave you the idea that you need to do extra programming.
Related
I am very confused as to how Safari ITP 2.3 works in certain respects, and why sites can’t easily circumvent it. I don’t understand under what circumstances limits are applied, what the exact limits are, to what they are applied, and for how long.
To clarify my question I broke it down into several cases. I will be referring to Apple’s official blog post about ITP 2.3 [1] which you can quote from, but feel free to link to any other authoritative or factually correct sources in your answer.
For third-party sites loaded in iframes:
Why can’t they just use localStorage to store the values of cookies, and send this data back and forth not as actual browser cookie headers 🍪, but as data in the body of the request or a header like Set-AuxCookie? Similarly, they can parse the response to updaye localStorage. What limits does ITP actually place on localStorage in third party iframes?
If the localStorage is frequently purged (see question 1), why can’t they simply use postMessage to tell a script on the enclosing website to store some information (perhaps encrypted) and then spit it back whenever it loads an iframe?
For sites that use link decoration
I still don’t understand what the limits on localStorage are in third party sites in iframes, which did NOT get classified as link decorator sites. But let’s say they are link decorator sites. According to [1] Apple only start limiting stuff further if there is a querystring or fragment. But can’t a website rather trivially store this information in the URL path before the querystring, ie /in/here without ?in=here … certainly large companies like Google can trivially choose to do that?
In the case a site has been labeled as a tracking site, does that mean all its non-cookie data is limited to 7 days? What about cookies set by the server, aren’t they exempted? So then simply make a request to your server to set the cookie instead of using Javascript. After all, the operator of the site is very likely to also have access to its HTTP server and app code.
For all sites
Why can’t a service like Google Analytics or Facebook’s widgets simply convince a site to additional add a CNAME to their DNS and get Google’s and Facebook’s servers under a subdomain like gmail.mysite.com or analytics.mysite.com ? And then boom, they can read and set cookies again, in some cases even on the top-level domain for website owners who don’t know better. Doesn’t this completely defeat the goals of Apple’s ITP, since Google and Facebook have now become a “second party” in some sense?
Here on StackOverflow, when we log out on iOS Safari the StackOverflow network is able to log out of multiple sites at once … how is that even accomplished if no one can track users across websites? I have heard it said that “second party cookies” still can be stored but what exactly makes a second party cookie different from a third party?
My question is broken down into 6 cases but the overall theme is, in each case: how does Apple’s latest ITP work in that case, and how does it actually block all cases of potentially malicious tracking (to the point where a well-funded company can’t just do the workarounds above) while at the same time allowing legitimate use cases?
[1] https://webkit.org/blog/9521/intelligent-tracking-prevention-2-3/
I am not sure if the below answers are correct, please comment if they are not:
It seems applications can use localStorage with no problem, up to 7 days. But it won’t be persisted across multiple enclosing domains. I would even recommend using sessionStorage, since the goal is just to have nothing more than a seamless session. You can then roll your own cookie mechanism using a different set of headers, the only thing you can’t implement is http-only cookies.
They can, but ITP won’t let the JavaScript on the enclosing page store cookies (at least, not if your third party domain was flagged as a tracker by Safari).
Yeah, the description of “link decoration” technically doesn’t mention this workaround, but probably Apple has or will update its classifier to handle this workaround.
Yes, if a first-party webpage will send a request to the server and it sets a cookie in the response headers, then these aren’t blocked by ITP, even if it has an iframe to a tracking site. They say that’s not their goal.
Yes, in fact your first-party site can just let your site redirect to google.com and back quickly (like with oAuth) and thereby inform Google of whatever you wanted, without cookies. Google’s JavaScript can do this as well, if you allow it. Then the JavaScript can just load your google-hosted subdomain in an iframe and set a cookie that persists for years, tracking the user. However, ITP 2.3 seems to have also added mitigation to this, so you might use A records instead? https://cookiesaver.io/archives/analytics-guides/cname-cloaking-mitigation-eliminates-safari-itp-workarounds/
Probably the StackExchange network uses a version of #5
We have been working on a gaming website. Recently while making note of the major traffic sources I noticed a website that I found to be a carbon-copy of our website. It uses our logo,everything same as ours but a different domain name. It cannot be, that domain name is pointing to our domain name. This is because at several places links are like ccwebsite/our-links. That website even has links to some images as ccwebsite/our-images.
What has happened ? How could have they done that ? What can I do to stop this ?
There are a number of things they might have done to copy your site, including but not limited to:
Using a tool to scrape a complete copy of your site and place it on their server
Use their DNS name to point to your site
Manually re-create your site as their own
Respond to requests to their site by scraping yours real-time and returning that as the response
etc.
What can I do to stop this?
Not a whole lot. You can try to prevent direct linking to your content by requiring referrer headers for your images and other resources so that requests need to come from pages you serve, but 1) those can be faked and 2) not all browsers will send those so you'd break a small percentage of legitimate users. This also won't stop anybody from copying content, just from "deep linking" to it.
Ultimately, by having a website you are exposing that information to the internet. On a technical level anybody can get that information. If some information should be private you can secure that information behind a login or other authorization measures. But if the information is publicly available then anybody can copy it.
"Stopping this" is more of a legal/jurisdictional/interpersonal concern than a technical one I'm afraid. And Stack Overflow isn't in a position to offer that sort of advice.
You could run your site with some lightweight authentication. Just issue a cookie passively when they pull a page, and require the cookie to get access to resources. If a user visits your site and then the parallel site, they'll still be able to get in, but if a user only knows about the parallel site and has never visited the real site, they will just see a crap ton of broken links and images. This could be enough to discourage your doppelganger from keeping his site up.
Another (similar but more complex) option is to implement a CSRF mitigation. Even though this isn't a CSRF situation, the same mitigation will work. Essentially you'd issue a cookie as described above, but in addition insert the cookie value in the URLs for everything and require them to match. This requires a bit more work (you'll need a filter or module inserted into the pipeline) but will keep out everybody except your own users.
I am working on a system that needs to associate URLs with data based on keywords. I was hoping I could use a web service to automatically perform full-web searches based on keywords or tags, and the results would be in a machine-friendly format like JSON.
My first thought was Google, and their Google Custom Search service looks pretty good, and has proven itself in tests. It has a simple REST-like URL and returns results in JSON format. The only problem is that it has a limit of 100 queries per day. I need more like 1000. Their higher-quota pay option (Google Site Search) does not allow full-web searches, so is useless to me.
Surely others have wanted to do programmatic web searches before. Does Google offer another B2B search service that we could use? We are happy to pay per query, sign agreements, etc. I fear I am not looking in the right place on Google's site.
As I wrote this question I found Microsoft's Bing web services home page. At first blush it looks pretty good. I have a slight preference for Google, but am open to Microsoft. I would love to hear any advice about using Microsoft's APIs.
Google custom search offers a 'pay for >100 queries' option, I believe:
https://developers.google.com/custom-search/v1/overview
(see 'paid usage' section at the bottom)
#Sync found the right way in, and I believe I now understand the problem: Google has two control panels for custom search, and you can't get to one from the other.
I was on the panel for my Google Custom Search engine (www.google.com/cse/panel), which gives me control over low-level aspects of my search engine, and the only pay option was to convert to Google Site Search, but in so doing I would lose my full-web search power.
There is another, higher-level, control panel for all of Google's APIs (code.google.com/apis/console), of which Custom Search is a component. And from here, setting up billing to get a larger quota is clearly linked.
Sorry I am not providing proper links, as the relevant pages require login to access. While I consider this answer to be the authoritative one for my question, I am giving the green checkmark to #sync, without whose help I would not have been able to figure it out. I'd still love to see some comments on Bing's APIs, however!
The content of my site depends of cookies in the request, and when Google crawler bot visits my site it deoesn't index much content, because it does't have the specific cookies in each of its requests.
Is it possible to setup some rule that when the crawler bot is crawling my site it uses the specific cookies?
Googlebot does not honor cookies on purpose -- it has to "see" what anybody else will see on your website, the "smallest common denominator" if you will; otherwise search results would be meaningless to an unknown amount of searchers.
Please google for "Googlebot cookies" to get pointed to discussions and documentations about search engines, how they work and why they work how they work; one solution to your problem might be to implement the "first visit/view free" rule.
Yes, the Google crawler has the word "Googlebot" in it's request header. Simply check for that, but be warned that people can spoof this to get access to your site's content as well. As curiousguys stated in the comments, this is generally looked down upon by people who use your site and probably against Google's TOS.
I use Google Analytics on my site, and I want to read __umtz cookie to get referring link. I made some research and I wrote such code:
$refer=explode('utmcsr=',$_COOKIE['__utmz']);
if(count($refer)>1) $refer=explode('|',$refer[1]);
$refer=addslashes($refer[0]);
The problem is, this is not always working, sometimes I get junk as result. What I am doing wrong? Maybe someone have a good description of this cookie?
Check my Google Analytics Cookie Parser.
Google Analytics PHP Cookie Parser is a PHP Class that you can use to obtain data from GA cookies such as campaign, source, medium, etc. You can use this parser to get this data on your contact forms or CRM.
Just updated to version 1.2 with minor bugfixes and more info, number of pages viewed in current visit.
You could use $_SERVER['HTTP_REFERER'] to get the Referer.
Overall it is a bad idea to use other's people's cookies to get data unless you know exactly how they work, and when they update, or you use an API that THEY have made available.
Lets say the Google decides to revamp the cookie altogether so that the Referer information isn't available on the cookie, your system would break. It is best to get data directly from your own sources rather than someone else's.