recommendations for programmatic web searches - web-services

I am working on a system that needs to associate URLs with data based on keywords. I was hoping I could use a web service to automatically perform full-web searches based on keywords or tags, and the results would be in a machine-friendly format like JSON.
My first thought was Google, and their Google Custom Search service looks pretty good, and has proven itself in tests. It has a simple REST-like URL and returns results in JSON format. The only problem is that it has a limit of 100 queries per day. I need more like 1000. Their higher-quota pay option (Google Site Search) does not allow full-web searches, so is useless to me.
Surely others have wanted to do programmatic web searches before. Does Google offer another B2B search service that we could use? We are happy to pay per query, sign agreements, etc. I fear I am not looking in the right place on Google's site.
As I wrote this question I found Microsoft's Bing web services home page. At first blush it looks pretty good. I have a slight preference for Google, but am open to Microsoft. I would love to hear any advice about using Microsoft's APIs.

Google custom search offers a 'pay for >100 queries' option, I believe:
https://developers.google.com/custom-search/v1/overview
(see 'paid usage' section at the bottom)

#Sync found the right way in, and I believe I now understand the problem: Google has two control panels for custom search, and you can't get to one from the other.
I was on the panel for my Google Custom Search engine (www.google.com/cse/panel), which gives me control over low-level aspects of my search engine, and the only pay option was to convert to Google Site Search, but in so doing I would lose my full-web search power.
There is another, higher-level, control panel for all of Google's APIs (code.google.com/apis/console), of which Custom Search is a component. And from here, setting up billing to get a larger quota is clearly linked.
Sorry I am not providing proper links, as the relevant pages require login to access. While I consider this answer to be the authoritative one for my question, I am giving the green checkmark to #sync, without whose help I would not have been able to figure it out. I'd still love to see some comments on Bing's APIs, however!

Related

Get Github Pages Site found in Google Search Results

I have a site built with Jekyll Now on GitHub and I want it to appear in a google search. If I just google my GitHub username followed by 'GitHub' and 'io', it does not find my site. How do I get google to find my site in a google search?
You have to create a Google Search Console account and add your page, then typically you just drop a "marker" file in the root (Search Console generates this) so that Google can confirm you really own the page.
Google Search Console
Instructions
(Since the instructions are long and have many links to sub-steps, I'm only providing the link.)
Also, if you're going to use a registered domain name, set that up before you register the site for search.
(Edit: Technically you don't have to do this, sooner or later Google will find you... but this will give your content a much higher-quality score.)
It can take a few days before the site is indexed by search engines. Google for google index site and you will find quite a lot of information about the process and how it can be speed up.
Generally, google finds all website and index them. Sometimes, it's takes time to crawl the new website.
But, you can do this thing manually by following these steps:
Go to Google search
Add the website as your property
Then, verify your property that you're the owner of this.

Google Analytics and cookies

My question is: I'm developing a website and I want to monitor analytics with Google Analytics, however I've been reading articles about cookies and I didn't realize if I need to program my website with some kind of cookies in order to use google tool, or if I simply don't need to do anything on my website.
Thanks
To do tracking you simply need to insert the code snippet that you can get from the GA admin interface.
However since you are in the EU you need to point out to your visitors that they are being tracked on your web page and that the site uses cookies to do so (and I think you need to provide an opt-out, although that might be a German thing). This is mandated by the European Privacy directive, which is sometimes referred to as "Cookie Law" (technically incorrect, since it is neither a law nor specifically about cookies), so maybe this gave you the idea that you need to do extra programming.

Getting data from website requiring authentication

In PowerBI, I'd like to get data from a website requiring authentication (http://kdp.amazon.com/). Going to New Source, Web, Advanced, doesn't show me anything that looks promising. Hopefully I'm missing something.
My ideal would be to go to a specific webpage (post authentication), and click on a link that allows me to download an excel spreadsheet.
Thanks for any ideas/pointers.
It depends, and chances are slim for your case.
If it is a direct URL to where the data or file resided (e.g data is on the page, file link, web API endpoint), then it depends on what kind of authentication method is used by the website, and whether you can provide the credentials through the Web.Contents options. (commonly used for web API authentication)
If it requires further navigation (e.g. click, type in info) to access the data / file after the authentication, then the answer is no.
That type of data scraping can be accomplished using a headless browser and a scripting/macro engine.
For example xvfb (X virtual framebuffer) + firefox + iMacros. I do consider this beyond power bi's capabilities. If you wish to pursue this further here are some references:
https://en.wikipedia.org/wiki/Xvfb
https://addons.mozilla.org/en-us/firefox/addon/imacros-for-firefox/
Again, similar but using an alternate toolset:
http://scraping.pro/use-headless-firefox-scraping-linux/
BTW, having done this once or twice before - this is not a great value proposition. If you have to resort to this sort of tactic, it may be time to consider why the developers didn't expose this functionality to you in an API - maybe there is a good reason?

Displaying my own analytics data to unauthenticated users

I am writing this question after considerable investigation into this matter.
I have gone through Google's easy dashboards (gadash JS library), superProxy and plain analytics API, and couldn't find the best solution for my needs, although I can't believe my needs are so uncommon.
This is why I am turning to you, I have got a feeling I am missing something.
My requirement:
Display my own analytics account data to users on my website, preferably with Google's chart API or ga-dash, to resemble google analytics views as much as possible.
Users will not have to take part in authentication with Google API
Each user has his own query which is built dynamically !! (this is probably why superProxy cannot work for me because I think you need to manually set the queries in advance)
I use django-python as the basis for my website
problems with solutions I tried:
GAdash library - the problem is that each user has to be authenticated, and shown their own data, meaning they need access to my profile- that's simply not what I am looking for. It works great, but only for me. On the other hand if there was a way to make my profile truly public...
superProxy - sounds like a solution for this need exactly, however I don't think that you can programmatically set the queries.
I did find a way to retrieve the data for a query on the server side using my own credential which is a bit hacky, I am still missing that JS library which will parse this XML on the client side and display it as charts.
EDIT:
I ended up using Mark's solution (embeddedanalytics), since I could not find a better, easier solution.
Other alternatives were:
1. superProxy (lacking the ability to dynamically, programmatically loading new queries)
2. gaDash library - requires authentication from each user
3. Implement my own server side querying, and display to the user with some js graphics library - which would require considerable work on my side.
Check out www.embeddedanalytics.com. This is a platform/service which will do exactly what you are looking to do (disclosure - I work with them).
We also support your requirement that each user have its own dynamically built query. This is what we call our CMS Integration version. Are you trying to create a dashboard system for a CMS system you have built?

RESTFul web service URL style

Is the difference between having url parameters passed as
http://myserver/someoperation/bob/sally
versus
http://myserver/someoperation?arg1=bob&arg2=sally
purely up to user preference or are there good reasons for each?
I have a web service that is using the first style, but I am wondering if I am missing part of the equation.
As far as search engine optimization goes, Google has stated:
If your URL contains relevant words, this provides users and search engines with more information about the page than an ID or oddly named parameter would.
That seems to imply that having pages where your URLs have meaningful information would give your page a better ranking although they don't outright say that.
You can read more on their full guide: Google Search Engine Optimization Starter Guide (PDF). That quote was from page 8.