Maxmind GeoIP - How to identify requests from Google, Yahoo or Bing bots to avoid redirects

Maxmind GeoIP - How to identify requests from Google, Yahoo or Bing bots to avoid redirects - geoip

We redirect the customers to country specific website based on the country returned by the Maxmind APIs. But due to this when the requests come from the search engine bots, which can have any IP based on the country where bot runs from, those requests are also being redirected to respective country specific sites. For e.g. bot running from US is unable to crawl UK site as the request is redrcted to US site. Due to this bots are unable to crawl the targeted website, rankings are getting affected and country specific sites are not shown on top when the search is done from country specific Google domain such as co.uk. We can add logic to handle this scenario. But in future when new IP ranges are added by the bots or any new bots are introduced, we need to update the code again. Hence this approach doesn't look feasible. Is there any better way that Maxmind recommends to handle such exceptions?

Related

Cookiewall and content cloaking

To comply with the European cookie law, we should implement cookie wall. But search engines should be able to see and index actual page content not cookie wall.
Searching online I found that many people recommend checking user-agent and feeding actual content for bots and crawlers and show cookie wall for real users. Popular WordPress Cookie wall plugins also implement this way by checking bots & crawlers/real users
My question is: Does google count this as content cloaking and penalize SEO ranking or not? Or is there another way to implement cookie wall without affecting SEO ranking

Cloaking is a search engine optimization (SEO) technique in which the content presented to the search engine spider is different from that presented to the user's browser. This is done by delivering content based on the IP addresses or the User-Agent HTTP header of the user requesting the page.
Cloaking takes a user to other sites than he or she expects by disguising those sites' true content. During cloaking, the search engine spider and the browser are presented with different content for the same Web page. HTTP header information or IP addresses assist in sending the wrong Web pages. Searchers will then access websites that contain information they simply were not seeking, including pornographic sites. Website directories also offer up their share of cloaking techniques.
Many of the larger search engine companies oppose cloaking because it frustrates their users and does not comply with their standards. In the search engine optimization (SEO) industry, cloaking is considered to be a black hat technique that, while used, is frowned on by most legitimate SEO firms and Web publishers. Getting caught cloaking can result in huge penalties from the search engines, including being removed from the index altogether.
So, yeah, this count as cloacking.

Put the cookie disclaimer in an <aside> element. Make sure you initialise this with some internet explorer js code as it's HTML5 only. Google will generally ignore these based on their content, position and it's relevance to the rest of the page.

Official documentation of the _dc_gtm cookie from google tag manager

I don't know if this is the right exchange to ask in, but if it isn't please point me in the right direction.
I'm searching for some information from Google on the _dc_gtm_UA-XXXXXXXX-X cookie, where the X's are the GA code.
But I can't find any official documentation.
Can anyone provide som official documentation?

Update: The Google Analytics' Cookie Usage Developer documentation informs about this cookie now:
_gat (1 minute TTL) Used to throttle request rate. If Google Analytics is deployed via Google Tag Manager, this cookie will be named _dc_gtm_<property-id>.
I could not find official documentation either, but here's a wild guess, from right to left:
_UA-XXXXXXXX-X is your Google Analytics (GA) property ID, or account number.
_gtm is Google Tag Manager (GTM), which means that GA was not integrated directly but injected via GTM.
_dc is DoubleClick, which most likely means that your Google Analytics account has been connected to your Google DoubleClick Campaign Manager (DCM).
(The DCM help page about cookies mentions __gads as cookie name prefix though.)
...so _dc_gtm_UA-XXXXXXXX-X is your Google Analytics ID, injected via Google Tag Manager, so that DoubleClick Campaign Manager can consume it -- presumably to associate and track the performance of ad campaigns via Analytics.
This cookie only seems to appear on sites that integrate GA via GTM.
Its value always appears to be 1.
Various random websites on the net present (exactly) the following explanation in their cookie policy:
_dc_gtm: used to help identify the visitors by either age, gender, or interests by DoubleClick - Google Tag Manager.
So that text appears to copied or auto-generated from an official resource, but is not referenced anywhere.
Some sites additionally present a link to Google Analytics' Cookie Usage Developer documentation, but that does not list the cookie name.
Note that this _dc_gtm_UA-... cookie is a first-class cookie; i.e., it is set for the domain of your website.
When a visitor of your website requests additional pages/files/resources on your website domain, then this cookie will be sent along with every request.
Therefore, ensure to adjust your HTTP reverse-proxy (e.g., Varnish) configuration accordingly, so that this pure client-side cookie does not cause subsequent client requests to miss your cache. Most website backend applications do not need this cookie.
Google normally uses two underscores as prefix for all cookies that are only relevant on the client-side; not sure why they diverged from that emerging standard here.

Google Analytics Referrals coming from third party payment provider

I am using universal analytics on my website via Google Tag Manager with data layer e-commerce tracking enabled.
The referral addresses are appearing to be coming from the payment providers (e.g. secure.arcot5.com)
I have included all my URLS in to the autolinker and after some testing the _ga cookie value appears to be consistent all the way through the booking process but it appears differently on the page after the secure payment takes place.
This suggests the session is being treated as a new one, hence the referral address issue I am having.
I have been trying to set a cookie on the entry page which equals the _ga cookie value but currently I am unable to retreive it on the confirmation page.
Has anyone got any ideas for a possible solution?
You will most definitely save my life!
Dan

Have you read this article? There could be a couple of pointers in there however I'm not sure what you have and haven't tried
Accurately reporting referrer from payments made with PayPal in Google Analytics

Retrieve user data from Google Analytics based on the __utma cookie

I am trying to find out how active are the users of my web page after registration, based on what was the source/landing page of their first visit. I would rather not try to track users myself - I am already employing Google Analytics on my web page and I know it uses the __utma cookie to tell one user from another. I can see summarized landing pages/sources in my Analytics reports but would need to have this data per specific user in the time of their sign up.
Essentially, when the user signs up with my web page I would like to retrieve their landing page and source from Google Analytics and store it in my application's database along with user's name, password, activity etc. This way I could check later, for example whether users who came from Google were more prone to buying premium service that those who came from Facebook etc.
I checked the Google Analytics API reference but it doesn't seem to provide getters for this specific data. I've been looking in up in Google and in Stack Overflow for a while.
This seems like a pretty useful functionality, which many websites should need. What am I missing? Maybe I should seek for a solution that doesn't involve GA? Or switch to a different analytics? Or track user's landing pages with cookies myself?

What's a reasonable instance where this regex might not catch a webmail referrer in Google Analytics?

^mail\.(.*)?|(.*)?(web|\.)mail(.*)? is the exact regex I'm looking to scrutinize.
For example,
e3.mail.yahoo.com
webmail.example.com
hotmail.com
mail.aol.com
etc.

To be totally honest, its a fruitless effort, especially because even if you do manage to somehow do a re-write of all of the email domains that referred people to your site, there are 3 reasons it won't work:
You can't possibly account for all of the email domains out there.
If the email is hosted on HTTPS, and your pages are HTTP, you won't see a referrer anyways.
A very significant portion of the email using population uses non-web mail, like Outlook, Entourage, Mac Mail, iPhone Mail, Blackberry Mail, Android Gmail, to name a few, that never have a referrer.
Instead, if you're looking to segment the referrals of all of email referrals for tracking in Google Analytics, you should use utm variables in your URLs.
If you tag your URLs with utm_source and utm_medium, you'll be able to track them, regardless of the 3 restrictions listed above.
Traditionally, you'd set utm_medium to be email, and utm_source to be the mailing list name, and utm_campaign for the name of the specific campaign.
You can get assistance in building the URLs here: http://www.google.com/support/analytics/bin/answer.py?answer=55578

Even if links in email messages should be tagged with utm_xxx parameters, I like to clean and group my referral sources into clusters as much as possible. It is the way to go to understand effectively the sources of traffic that are missing proper tagging, and then prioritize and fix them.
The regex I use is the following, and honestly it works pretty well (it catches more than 95% of webmails that show up as referrals and can be split over dozens of subdomains like for yahoo or live, thus diluting their visibility as a source)
(messag|courrier|zimbra|imp|mail)(.*)\.(.*)\..{2,4}
You may update the subdomain names with values frequent in your area. The end catches any domain using a tld of 2-4 chars, and any domain.
I output the result as
Output To -> Constructor : Campaign Source : Webmail - $A3

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js