Wikipedia API: How to get all pageIDs for all persons with same name? - wiki

I see some of the persons with same name.
such as Rico Rodriguez
http://en.wikipedia.org/wiki/Rico_Rodriguez
shows there are 2 person (pageIDs)
but by using this api, i can only get one pageID
http://en.wikipedia.org/w/api.php?action=query&titles=Rico_Rodriguez&format=json
How can I get all of the two persons' pageIDs?

As leo mentioned, on Wikipedia, you would have to analyze the Disambiguation page. However, wikidata.org is now collecting data sets about things described on wikipedia. Wikidata has the notion of "labels" and "aliases", which can be the same for multiple pages, and are defiend per language. Here is a query that you can use to find all Wikidata entries for "Rico Rodriguez":
https://www.wikidata.org/w/api.php?action=wbsearchentities&search=Rico%20Rodriguez&language=en
You can then ask for the "sitelinks" of each of those "data items":
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q552090|Q954614&props=aliases|sitelinks&languages=en
That will give you the corresponding page titles on each Wikipedia language version. You can then go to the respective wiki's API and ask for the page ID.

All MediaWiki page titles are unique, and disambiguition is handled manually by the editors, so there is no way to know for sure if there are multiple persons with the same name. You can, however, check if the page is a disambiguation page, as in your example:
https://en.wikipedia.org/w/api.php?action=query&titles=Rico_Rodriguez&prop=pageprops
returns disambiguation under pageprops.
To get the pages linked, you will have to parse the wiki code, though. Disambiguation links can look any way, really, so there will be no easy way to catch them all, but as long as you work within one single language edition, there is a fair chance that the syntax will be more or less consistent. In English Wikipedia, that would usually be either Rico Rodriguez (musician), or Kimberley, British Columbia.
An disambiguation page can be very complex, though. For instance: https://en.wikipedia.org/wiki/Joker

Related

Case Insensitive Search parameters for API endpoint

I am working on a project that involves integrating the PUBG API. From my site, the player can lookup stats using their player name, platform and season. One issue I am facing is that the player name have to be exact and is case sensitive. Now I assumed it to be the case at the beginning. However, after searching for the name in this site I found that they don't need the name to be case sensitive. Also, referring to this post from the PUBG Dev community here I saw that it confirmed my initial assumption. So my question is if PUBG API requires the names to be case sensitive then, how is the site (linked) can search for the player even if the name provided is not in exact, matching case? For example,:
I looked up the player name MyCholula. From the PUBG API page for player lookup, it returns the proper value. When I tried mycholula, it doesn't and sends a 404. From the linked site above, both combination seems to work. Now if spaces or other separators were involved in the name then, it would be easy to convert it assuming that separated words are all capitalized (somewhat naive assumption though). For this name, I don't see any way of converting mycholula to MyCholula. I also tried many other combination in the linked site above (also different user names I got from my friends) to confirm that the linked site is actually returning the data as expected for any combination of user names. I also tried it on other sites like this and it didn't work just like it doesn't work from the PUBG DEV API page or from my page.
I am really confused as to how they are doing it. The only possible explanation I can come up with is that they have the player records stored in their database from where, they can perform advanced regexp based search to get the actual name. However, this sounds far fetched since, there are millions of players and it would require them to know all the player names and associated IDs. Also, as far as I know, it is not possible to use regex or other string manipulation to convert to the actual name because there can be many combinations (not an expert on regex so can't be definitive on this).
Any help or suggestions will be greatly appreciated. Thanks.

What's the correct way to create a REST service that allows for different types of identifiers?

I need to create a RESTful webservice that allows for addressing entities by using different types of IDs. I will give you an example based on books (which is not what I need to process but I want to build a common understanding this way).
Books can be identifier by:
ISBN 13
ID
title
I can create a book by POSTing to /api/v1/books/The%20Bible. This book can then later be addressed by its ISBN /api/v1/books/12312312301 or ID /api/v1/books/A9471IZ1. If I implemented it this way I would need to analyze whatever identifier gets sent and convert it internally.
Is it 'legal' to add the type of identifier to the URL ? Like /api/v1/books/title/The%20Bible?
It seems that what you need is not simply retrieving resources, but searching for them by certain criteria (in your case, by ISBN, title or ID). In that case, rather than complicate your /books endpoint (which, ideally, should only returns books by ID), I'd create a separate /search function. You can then use it search for books by any field.
For example, you would have:
GET /search?title=bible
GET /search?isbn=12312312301
It can even be easily expanded to add more fields later on.
First: A RESTful URl should only contain nouns and not verbs. You can find a lot of best-practices online, as example: RESTful API Design: nouns are good, verbs are bad
One approach would be to detect the id/identifier in code.
The pattern would be, as you already mentioned:
GET /api/v1/books/{id}, like /api/v1/books/12312312301 or /api/v1/books/The%20Bible
Another approach, similar to this.lau_, would be with a query parameter. But I suggest to add the query parameter to the books URL (because only nouns, no verbs):
GET /api/v1/books?isbn=12312312301
The better solution? Not sure…
Because you are selecting “one book by id” (except title), rather than performing a query/search, I prefer the first approach (…/books should return “a collection of books” and .../books/{id} should return only one book).
But maybe someone has a better approach/idea?
Edit:
I suggest to avoid adding the identifier to the URL, it has “bad smell”. But is also a possible approach and I saw that a lot in other APIs. Let’s see if I can find some information on that, if its “ok” or should be avoided.
Edit 2:
See REST API DESIGN - Getting a resource through REST with different parameters but same url pattern and REST - supporting multiple possible identifiers

National Weather Service (NWS) Valid Time Event Code (VTEC) Parser Regular Expression (Regex)

The National Weather Service (NWS) embeds machine readable components in its text bulletins and syndicated format feeds, called Valid Time Event Code (VTEC).
More information on VTEC http://www.nws.noaa.gov/os/vtec/
Example of Text Bulletins: http://www.nws.noaa.gov/view/national.php?prodtype=allwarnings
I am developing a parser to interpret a sequence of VTECs embedded within an NWS bulletin and have a regular expression to capture the logic, that I am happy to share, see below, but not 100% sure if I am doing this right.
Specifically,
1. Is there any specification on how many VTECs may be embedded in any one NWS message (or its update)? Usually seeing just one, but if there are multiple, what is the hierarchy, if any - does the last one cancel the previous? Or, do all the VTECs have the same weight?
2. If a Hydrological or H-VTEC is issued, is it always immediately following a P-VTEC?
3. Is there a "parent-child" relationship, in the XML document sense, between an H-VTEC element and P-VTEC element?
4. Can the VTEC be used as a unique identifier for a message or its update? If not, what would be the "primary key" in the database sense? Could perhaps a hash of the VTEC along with bulletin update date be used? Or is any other combination of fields recommended?
The following regular expression is able to pick up the VTEC, assuming any number of P-VTECs may be released and if there is an H-VTEC it will always be preceded by a "parent" P-VTEC.
[/][OTEX][.](NEW|CON|EXT|EXA|EXB|UPG|CAN|EXP|COR|ROU)[.][\w]{4}[.][A-Z][A-Z][.][WAYSFON][.][0-9]{4}[.][0-9]{6}[T][0-9]{4}[Z][-][0-9]{6}[T][0-9]{4}[Z][/]([^/]*[/][\w]{5}[.][[N0-3U]][.][A-Z][A-Z][.][0-9]{6}[T][0-9]{4}[Z][.][0-9]{6}[T][0-9]{4}[Z][.][0-9]{6}[T][0-9]{4}[Z][.](NO|NR|UU|OO)[/])?
The VTEC is described in more detail at: http://www.nws.noaa.gov/directives/sym/pd01017003curr.pdf
In case the link expires, this may be found also by drilling down as follows:
NWS directives. http://www.nws.noaa.gov/directives/
(Click on) Operations and Services.
(Scroll Down) Dissemination.
After reading the document, answers to #2 and #3 are a resounding YES. H-VTEC will always be supplemental to an immediately preceding P-VTEC. Regarding #1, multiple P-VTECs are possible and the logic is probably more complex than regex can weed out. Regarding #4, the answer is almost certainly NO, mainly because VTEC could be missing in an NWS bulletin, so does not classify as a primary key.
So the regex needed to parse out a VTEC string, thanks to Suamere, is most likely:
/[OTEX]\.(NEW|CON|EXT|EXA|EXB|UPG|CAN|EXP|COR|ROU)\.\w{4}\.[A-Z]{2}\.[WAYSFON]\.\d{4}\.\d{6}T\d{4}Z-\d{6}T\d{4}Z/([^/]*/\w{5}\.[N0-3U]\.[A-Z]{2}\.\d{6}T\d{4}Z\.\d{6}T\d{4}Z\.\d{6}T\d{4}Z\.(NO|NR|UU|OO)/)?

Graph API: Number of Comments for Posts Are Inconsistent Among Various API Calls

Hello Graph API experts,
When you call /[post_id , the result contains "comments" field which has "count" field that is supposed to have the total number of comments for this particular post.
Now, if you call /[post_id]/comments , you get the actual comment data, one by one.
The problem I am facing is that, when I compare the "comments.count" field's value and the number of all of the actual comment data returned, they are different.
What's even worse, if you then look at the same post on Facebook.com's Timeline where you can see the number of comments for that post (i.e. "view all * comments" link), this number is also different from the "comments.count" field value.
And this is not only happening to one post, but to many of them - I observe this tend to happen more to posts with more than 100 comments (I actually counted all the comments on Timeline, and it matched the number of the actual comment data returned from /[post_id]/comments API call).
Is this a normal API behaviour? Which number should I or would you trust if this is the way it is?
ok, when you looking some facebook comment counts on some timeline posts, you woulld see that count for ex. 16 comments, and when you try to count comments manually on the post you may see it's looking 15 comments, so where is it that missing comments ? is that a wrong count by facebook ? no not actually, it's because, some people changing profile privacies as like don't show my comments people who aren't my friends, or we haven't any mutual friends, etc. it's because you cannot get these privatized comments from graph api, but these comments aren't excluding in total count. So what's the solution, just be sure get all the data correctly what facebook provide you. And compare it, how many comments looking like missing, and show missing counts as private comments count in your application. I think is much better.
Welcome to the world of Facebook API programming. Yes, this is normal (but apparently not desired) API behavior. This is one of the inconsistencies we're faced with when programming around their API. CBroe is probably correct in his comment above, it is data inconsistencies between servers in their API cluster.
in addition to this there are problems with pagination, you can use the offset + limit parameters to say how much data you want and from where to take it, if you deal with number of posts, you can say offset=0 and limit=50 and it'll work, but then if you try offset=100 and limit=50 it might return empty data, but then try offset=100 and limit=100 and it'll return 100 posts.
the api is just buggy and full of inconsistencies which don't seem to have any way to solve them.
I think we got oversold on the opengraph, I don't think it's what facebook told us it would be and I'm starting to feel the burn from selling that to my boss and finding out that I perhaps can't deliver :(

What types of posts are in a feed?

I was wondering what all the possible types of posts I can expect in a feed. The documentation at http://developers.facebook.com/docs/reference/api/post/ mentions that the type field could be link, video, and photo, but that's clearly not a comprehensive list. I know that there are at least the following possible types (because I've seen them): status, link, video, photo, checkin, note, swf, and music.
But are there more that I'm missing? Is there a complete list of these types somewhere?
I know of someone who says that they've seen event attendance and friendship acceptance posts in their home feed (from /me/feed), but I can't seem to recreate that. Are those also types of posts that I could expect?
It appears that currently there is no official list from a FB resource. The closest thing I could find was a related StackOverflow question
I would suggest reaching out to their developer community directly, and, of course, providing a link here once you've done so!
As of 2013-12-18 the documentation says:
name: status_type
description: Type of post
permissions: read_stream
Returns: One of mobile_status_update, created_note, added_photos, added_video, shared_story, created_group, created_event, wall_post, app_created_story, published_story, tagged_in_photo, approved_friend
Doc reference
You can find a list for Graph v4.0 here (search for status_type):
https://developers.facebook.com/docs/graph-api/reference/v4.0/page/feed
The type of a status update. Values include:
added_photos
added_video
app_created_story
approved_friend
created_event
created_group
created_note
mobile_status_update
published_story
shared_story
tagged_in_photo
wall_post
Though these are listed for Page feed documentation, but I guess it's the same for others as well. I couldn't find it any place else. Also note that mobile_status_update is a legacy one kept for backward compatibility. See this: https://developers.facebook.com/bugs/564448573658836/