How can I get the number of hits of a query in a search engine in Java/JavaScript? - search-engine-api

I need to find the number of hits that query gets in a web search engine (such as Google) in a specific domain. For instance, if the user searches '"dry martini recipe" site:.uk', Google shows "About 15,800 results". I need to get that number (15,800).
The Google API used to be able to do this, but now is deprecated. I have considered web scraping, but since it is not allowed, I would rather not go this way. Using Google is in no way a must, I have considered engines like Duck Duck Go, but the web results do not seem to include the total count of this of a query.

Related

How to check spammyness of a link/url

I know that most spams are related with one or more links, so I am wondering if there is any web service which can check the spam-weight/spammyness of a URL. Similar to how Akismet can check the spammyness of text content.
p.s. - I searched in google and couldn't find anything satisfactory :)
There are a number of different URI DNS-based Blackhole (or DNSBL) services available to the public for low volume lookups. Two of the most well-known are SURBL and URIBL. PhishTank (run by OpenDNS) is also worth a look as many of the URLs are categorized and classified along with being listed.

How can I search for other countries using Amazon Product Advertising API?

I need to be able to search for a product based on barcode, keyword or ASIN and show the results from amazon.co.uk, amazon.com, amazon.de and amazon.fr (UK, USA, Germany and France).
Is that possible? If so, how do I do it?
At the bottom of this page, you will find links to the various Locales where this Product Advertising API works.
However, you need to make a separate subscription to each one of these different APIs. In other words, you need a different AssociateTag, AccessKeyId and a secret key in order to access those different APIs.
You can implement a search engine that uses all of those APIs in sequence. Start by searching for the product in the US API, then UK, then DE, ... You can encapsulate this complexity in a facade, so that your client simply makes a single call to search. You might also encounter some performance issues with this idea, so the use of cache (and common sense) is advisable.

How does a tool like SEOMoz Rank Checker work?

It seems there are a number of tools that allow you to check a site's position in search results for long lists of keywords. I'd like to integrate a feature like that in an analytics project I'm working on, but I cannot think of a way to run queries at such high volumes (1000s per hour) without violating the Google TOS and potentially running afoul of their automatic query detection system (the one that institutes a CAPTCHA if search volume at your IP gets too high).
Is there an alternative method for running these automated searches, or is the only way forward to scrape search result pages?
Use a third party to scrape it if you're scared of Google's TOS.
Google is very hot on banning/blocking temporarily IP addresses that appear to be sending automated queries. And yes of course, this is against their TOS.
It's also quite difficult to know exactly how they are detecting them but the main reason is definitely identical keyword searches from the same IP address.
The short answer is basically: Get a lot of proxies
Some more tips:
Don't search further than you need to (e.g. the first 10 pages)
Wait around 4-5 seconds between queries for the same keyword
Make sure you use real browser headers and not something like "CURL..."
Stop scraping with an IP when you hit the road blocks and wait a few days before using the same proxy.
Try and make your program act like a real user would and you won't have too many issues.
You can scrape Google quite easily but to do it at a very high volume will be challenging.

Finding latitude and longitude of many places once

I have a long list of towns and cities, and I'd like to add latitude and longitude information to each of them.
Does anyone know the easiest way to generate this information once?
See also Geocode multiple addresses
The first part of the third video shows how to get latitude and Longitude using Google Refine and geocoding. No need to write a new script. Ideal for doing this kind of change once.
http://code.google.com/p/google-refine/
Or use www.geonames.org - there's language APIs for that. Or Open Street Map's Nominatim: http://wiki.openstreetmap.org/wiki/Nominatim - google have slightly more restrictive terms of service.
You can use the Google Geocoding API. Check the API at this URL: http://code.google.com/apis/maps/documentation/geocoding/
What follows next is writing some code. I am doing something similar in C# and it is quite easy here.
Most geocoding services can handle queries with only administrative names which is what you're after, e.g., municipality and region. So I'd choose one you like that also handles batch or bulk requests, e.g., the Bing Spatial Data API (here's an article on batch geocoding with it.)
An alternative approach that might be useful if you're on a budget and have a lot of these to do would be to download the Geonames database and write a bit of code to import it into your database or index it; then query it however and how often you like, e.g., if you put your places in another table you could SELECT [...] FROM my_places LEFT JOIN geonames [...]. I used to import Geonames DB into a vanilla PostgreSQL nightly and probably still have the code in a git repo somewhere if that's a route you want to try (comment and I'll find it and attach.)
For a service that uses google, which I find most accurate.
Look at http://www.torchproducts.com/tools/geocode

How do sites count other sites' visitors and "value", and how can they tell users' location?

Hi actually this is a simple question but just came up out of the curiosity...
I have seen a web evaluation online tool recently called teqpad.com.I have lots of queries on it
How do they do it?? eg:page views daily visitors etc. without mapping real website??...
Website worth...is this getting any near to any site??
I don't know how do they got daily revenue??
I like traffic by country..it has seen same like in Google analytic s..how they got that info??
another one is ISP info and Google map location of server..
is there any one here done similar scripts?? if so what is your opinion??
They may be tracking user browser stats like Alexa does. (More info on Wikipedia.) A group of users installs a plug-in that reports which sites each user visits, like TV ratings work in most (all?) countries. This method is obviously not very reliable, and often nowhere near the actual numbers of visitors.
This is usually based on bullshit pseudo-scientific calculations and never a viable basis for evaluating the "value" of a web site, even though it may be possible to guesstimate the approximate ad revenues a site yields (see 3) But that is only one revenue stream - it says nothing about how expensive the site's daily maintenance is - servers, staff, content creation....
It should be possible to very roughly estimate daily revenue by taking the guesses on daily visitors/page views, count the frequency with which ads are shown, and look at what those ads usually yield per page view. It is probably pretty easy to get some rough numbers on what an ad view is worth on a big site if you're in the market.
and 5. It is possible to track down most IP addresses down to the visitor's country and sometimes even city. See the Geo targeting article on Wikipedia