Location mining from text - data-mining

I'm working on a text mining problem: extract the place from the text. The place could be either only states, or more specific such as name of a neighborhood in Chicago, or even a specific address. But it's only in US.
I've been trying Yahoo Place maker api, but I can't create the api key ( the website is not responding). Is there anyway to do it, such as rapid miner, or write a comprehensive regex?

Consider Stanford Named Entity Recognizer (NER). Online demo here:
http://nlp.stanford.edu:8080/ner/process
It's a java library. License is GPL v2, though the license to distribute in a standalone app is pricey.

Related

Is there a tool & tip & method for existing system analysis

I have received a legacy code from my company's previous project without any document and description left. The only part of these code I can recognize is Jetty for API. I can't even find the database it use yet 😛
Miserably, I probably need to do some modification of this system.
Is there a way&tools for figuring the component and relationship of this system? I mean like modeling a causal loop diagram or something by monitoring data interaction through running process and IPC etc.
I have used some previous knowledge of regular website forming langue (HTML) to locate particular syntax of code for modification.
but there should be a more general mythology and tools for analyzing the dynamic of cooperating process.

Open source concept mining tools?

Are there to day any concept mining open source tools available? I have only be coming across like Leximancer, which although seem to fit the role is not open source and quite expensive for a undergraduate student. I have been unsuccessful so far since the word 'concept' on both google and google scholar seems to be un-matching what I want.
It seems to me you need a text mining tool for clustering. RapidMiner has an open-source, Java based Community Edition which has several extensions (Text Mining, R, etc.). In addition you can develop and integrate your own algorithms too.
Moreover Rexer Analytics offers a comprehensive data mining survey annually, you can call for reports for free.

Geocode Area Names of a City to get Lat and Long

I have a list of areas names in my City and i need the Lat and Long of the same.
Is there any service which i can use the get the Data ?
I dont wanna use any map. I would like to make simple api calls and get the lat long via json or xml.
Though the question is already answered I would like to add that google is not the only service which provides geo coding support, few main providers are listed below
Available Solutions for the Address / ZipCode to (latitude, longitude) mapping...
Google API, http://code.google.com/apis/maps/documentation/geocoding/ )
The most popular due to Google’s name.
Extensive support available on internet (less development time).
Alas! not available freely for non-commercial usage.
The free version has 2500 queries/day limitations. ( http://code.google.com/apis/maps/documentation/geocoding/ )
Their is one clause in “Usage Terms” which says the result must be displayed in google maps, reference http://code.google.com/apis/maps/documentation/geocoding/#GeocodingRequests terms and conditions are mentioned here. http://code.google.com/apis/maps/terms.html#section_10_12
Nominatium, (http://wiki.openstreetmap.org/wiki/Nominatim )
A less common organization which provides an open source, free ware solution for the address to (latitude, longitude) mapping.
Can be also used in commercial projects.
Their are no restrictions for number of queries per day and no hidden clauses in “usage terms”.
Since the organization itself is not much popular, very less support is available.
Yahoo API, http://developer.yahoo.com/geo/placefinder/
Another popular API like google’s but some what more available.
supports upto 50000 requests per day.
Like google extensive support is available.
Yahoo encourages us to use the “powered by yahoo logo” but doesn’t force us for that. ( http://info.yahoo.com/legal/us/yahoo/api/api-2140.html )
Can be used for non-commercial purposes (I have read Yahoo Terms and doesnt find any clause which restricts us in doing that, reference http://info.yahoo.com/legal/us/yahoo/maps/mapsapi/mapsapi-2141.html ).
YQL (Yahoo Query language see:- http://developer.yahoo.com/yql/ )
A SQl based query language which queries yahoo web service. for example if we have a zip code “382025” we can write a YQL as [ select centroid from geo.places where text="Enter some pin code here"].
Yahoo encourages us to use YQL in commercial applications. (http://developer.yahoo.com/yql/faq/ )
Most number of allowed queries as compared to all other options. (see http://developer.yahoo.com/yql/faq/ ).
Support (?).
You can use the Google Geocoding service:
REST format:
http://code.google.com/apis/maps/documentation/geocoding/index.html
JavaScript:
http://code.google.com/apis/maps/documentation/javascript/services.html#Geocoding
EDIT: For those too lazy to read, here's the REST format example..
xml response:
http://maps.googleapis.com/maps/api/geocode/xml?address=Bangalore&sensor=false
json response:
http://maps.googleapis.com/maps/api/geocode/json?address=Bangalore&sensor=false

Enterprise-grade template printing system

I'm looking for an enterprise-grade template printing system. I'm interested in every software I can get my hands on to evaluate. Commercial or not.
What I need - a separate system ready to receive tags in order to print (digital or paper) a template (like a contract, invoice, etc). Templates should be managed by the same software. It should operate via web services or via enterprise bus (preferable JMS or MQSeries connectors).
Can I ask for some names and possibly some URLs? Anything will be helpful even if it does not fit the requirements exactly.
Thanks.
This is an old question, but for the Googlers out there, we use a couple of products to render documents in XSL-FO (a W3C standard paper specification that we generate using XSL) either to PDF, PostScript, etc. We use it to show documents online as well as bulk print a few hundred thousand of them monthly.
RenderX (.NET, Java, whatever)
provides a very powerful solution for
our bulk printing needs
IBEX PDF Creator (.NET
only) for online rendering to PDF
Calligo is a commercial package from InSystems. Can't reach the web site right now; could be a bad sign.
Then there are these open source possibilities.

Connecting to IMDB

Has any one done this before? It would seem to me that there should be a webservice but i can't find one. I am writing an application for personal use that would just show basic info from IMDB.
The libraries for IMDb seem quite unreliable at present and highly inefficient. I really wish IMDb would just create a webservice.
After a bit of searching I found a reasonable alternative to IMDb. It provides all the basic information such as overview, year, ratings, posters, trailers etc.:
The Movie Database (TMDb).
It provides a webservice with wrappers for several languages and seems reliable so far. The search results have been, for myself, more accurate as well.
There is no webservice available.
But there are enough html scrapers written in every language to suit your needs!
I've used the .NET 3.5 Imdb Services opensource project in a few personal projects.
1 minute google results:
Perl: IMDB-Film
Ruby: libimdb-ruby
Python: IMDbPY
The only "API" the IMDb publishes is a set of plain-text data files containing formatted lists of actors, directors, movies, etc. You would likely need to write your own parser unless somebody has released one for your language. Try Google searches like "imdb api" and "imdb parser".
A screen scraper might be useful, but they specifically prohibit scrapers in their terms of use.
Though this was posted over two years ago, here is a simple python code
import urllib2
movie_id = raw_input('Enter the ID of the movie: ')
json = urllib2.urlopen('http://imdbapi.com/?i=' + movie_id + '&r=json')
print json.read()
save as imdb.py and then run as in shell or terminal or whatever
if you want xml data just replace json with xml
please note that this is using the imdbapi.com website to return a json result visit that website to view more options.
Here is my own solution using RegEx:
private const string UglyMovieRegex = "(?<=5>|3>)(Cast|Director:|Fun\\sStuff|Genre:|Plot:|Runtime:|Tagline:|Writers:)"
+ "|href=\"[\\w\\d/]+?(Genres|name|character)/([\\w]+?)/\".*?>([.\\-\\s\\w]+)</a>"
+ "|(?<=h\\d>)([.\\w\\s'\\-\"]+)(?=<a\\sc|</d|\\|)";
Regex MovieData = new Regex (UglyMovieRegex, RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.Singleline );
IMDB prohibits scrapers, and change the page layout every once in a while, so parsing HTML is an option, but be prepared to adjust your code 2-3 times a year (been there, done that, given up). They do have a fee-based service giving the full access to the data, but you'll also need to explain what is it for, and convince them you are not building a competitive website (I had a link to that, but it seems to have changed and can't find it now).
Another alternative is to run the IMDB database on your local machine. Java Movie Database imports the IMDB database files, converts them and provides a locally-accessible copy of IMDB. IMDB has some functionality which Java Movie Database does not have and visa-versa but if what you're looking for is quick access to all the data it might be worth giving this a try.
Now there's is an (undocumented) API like http://www.imdb.com/xml/find?json=1&q=Harry+Potter. See Does IMDB provide an API?
TRYNT Heavy Technologies provides (for free) a web service for retrieving basic IMDb data -- check out their site at http://www.trynt.com/trynt-movie-imdb-api/. They also have a separate service for Television data.
There is at least one unofficial IMDb API called IMDb8. It has about 31 endpoints including
actors/list-born-today
actors/get-awards-summary
title/get-plots
title/get-top-crew
etc. Like any other API it is very straightforward to use. I used this API for building a fun trivia project. You can find a tutorial on how to get started here.