Musicbrainz querying artist and release - web-services

I am trying to get an artist and their albums. So reading this page https://musicbrainz.org/doc/Development/XML_Web_Service/Version_2 i created the following query to get Michael Jackson's albums
http://musicbrainz.org/ws/2/artist/?query=artist:michael%20jackson?inc=releases+recordings
My understanding is to add ?inc=releases+recordings at the end of the URL which should return Michael Jackson's albums however this doesnt seem to return the correct results or i cant seem to narrow down the results? I then thought to use the {MBID} but again thats not returned in the artists query (which is why im trying to use inc in my query)
http://musicbrainz.org/ws/2/artist/?query=artist:michael%20jackson
Can anyone suggest where im going wrong with this?

You're not searching for the correct Entity. What you want is to get the discography, not artist's infos. Additionally, query fields syntax is not correct (you must use Lucene Search Syntax).
Here is what you're looking for:
http://musicbrainz.org/ws/2/release-group/?query=artist:"michael jackson" AND primarytype:"album"
We're targeting the release-group entity to get the albums, searching for a specific artist and filtering the results to limit them to albums. (accepted values are: album, single, ep, other)
There are more options to fit your needs, for example you can filter the type of albums using the secondarytype parameter. Here is the query to retrieve only live albums:
http://musicbrainz.org/ws/2/release-group/?query=artist:"michael jackson" AND primarytype:"album" AND secondarytype="live"
Here is the doc:
https://musicbrainz.org/doc/Development/XML_Web_Service/Version_2/Search
Note that to be able to use MB's API you need to understand how it is structured, especially, the relations between release_group, release and medium.

Related

DynamoDB GSI data modelling for an articles app

I want to create an articles application using serverless (AWS Lambda + DynamoDB + S3 for hosting the FE).
I have some questions regarding the "1 table approach".
The actions I want to follow:
Get latest (6) articles sorted by date
Get an article by id
Get the prev/next article relative to the article opened (based on creation date)
Get related articles by tags
Get comments by article
I have created an initial spreadsheet for the information:
The first problem I have is that for action nr. 1, I cannot get all the articles based on date, I've added the SK for articles as a date, but because the PK has separate articles, each with its id: article-1, article-2.. and so on, I don't know how to fetch all the articles only by SK.
I then tried creating a LSI , but then I noticed that the LSI needs to have the PK the same as the table, so I can select based on LSI type = 'ARTICLE', but I still cannot selected them ordered by date (entities_sort value)
I know AWS says its good for PK to be unique, but then how do you group the data in this case?
I've created a GSI
This helps me get articles by type(GSI2PK)='ARTICLE' sorted by entities_sort (GSI2SK), but isn't there a better way of achieving this? Having your articles as a PK in a table, but somehow still being able to get them sorted by date?
Having GSI1PK, GSI1SK this way - I can get all the comments for an article using reverse lookup, so thats good.
But I still also don't know how to implement number 3. Get the prev/next article relative to the article opened (based on creation date): getting an article by id, check its creation date(entities_sort), then somehow get the next article before and after based on that creation date (entities_sort), is there a function in DynamoDB that can do this for me?
In my approach I try to query/process as few items as possible so I don't want to use filter functions, rather partition my information.
My question is, how should I achieve 1 and 3? And isn't creating 2 GSI's for such few actions overkill?
What is the pattern to have articles on a PK, unique with ids, but still being able to get them sorted by creation date?
Thank you
So what I've ended up doing is:
My access patterns in detail are:
Get any Article by Id (for edit/delete)
Get any Comment by Id (for edit/delete)
Get any Tag by Id (for edit/delete)
Get all Articles ordered by date
Get all the Tags of an Article
Get all comments for an article, sorted by date
Get all Articles that have a specific tag, ordered by date (because I want to show only the last 3 ones)
This is the way I've implemented my model, and I can get all the informations needed.
Also, all my data is partitioned and the queries are really efficient, I always get exactly what I need and the ScannedDocuments value is always the number or returned objects.
The Global Secondary Index helps me query by Article Id and I get, all the comments and tags of that Article.
I've solved the many-to-many between Tags and Articles by a new record in the end:
tag_id, article_date, arct_id, tag_id
So, if I want all articles that have a specific tag sorted by date I can query the PK of the table and sort by SK. If I want to get a single Tag (for edit/delete) I can use the GSI by: article_id, tag_id .. and I get the relation between them.
For getting all Articles sorted by date, i query PK: ARTICLE and an option condition if I want to get only the ones after a date or not I can condition the SK.
For all the comments and tags of an Article I can use the GSI with : article_link_pk: article_id and I get all comments and tags. If I want only comments I can say article_link_pk: article_id and article_link_sk: begins_with(article_link_sk, '2020') in this way I get only comments, without tags.
The data model in NoSQL Developer looks like this:
The GSI reverse lookup looks like this:
It's been a journey, but I feel like I finally got a grasp on how to do data modelling in DynamoDB

Django Query, Distinct and Order_By combination not working

There are similar questions here but I haven't been able to find one that helps me.
I have two models, Chat and Post
there are multiple Chats, and each chat has multiple posts attached to it.
I'm trying to get the latest post for each chat.
Post.objects.order_by('-id').distinct('Chat')
Filter the posts by ID (so the newest post is first), and then grab the distinct ones based on the Chats.
but since order_by and distinct don't match I'm getting the error:
SELECT DISTINCT ON expressions must match initial ORDER BY expressions
So how exactly do I go about doing this? Rawsql? Thanks!
If you use distinct by related model, you must use ordering based of this model:
Post.objects.order_by('chat', '-id').distinct('chat')
Also you can look at this question

Information of re-shared status

I am working with Facebook graph api for few days. I am trying to extract user's status and the information of reshared if any. I can easily find status of a user using fields=id,name,statuses query. But I could not find any information about re-sharing. I found a field of status sharedposts. But could not understand what it actually does. Can anyone enlighten me about how can I collect information about resharing (who reshared,when reshared,resharing location). I used user_status access token.
The sharedposts field applies to a status id. For example, the status id 10151794781777494 is from a status update by the TheKrazyCouponLady which has been shared 4 times. This query:
/10151794781777494?fields=sharedposts
Will return all the information about the users that have shared it. If you want to limit the returned fields to the name and id of the sharer, and the time and location it was shared, you could do this:
/10151794781777494?fields=sharedposts.fields(from,created_time,place)
Although I expect there won't be any location data most of the time.
To find the status id in the first place, you could just query the statuses field for a particular user. Again, using TheKrazyCouponLady (uid 255919387493) as an example:
/255919387493?fields=statuses
To get just the ids:
/255919387493?fields=statuses.fields(id)
As an alternative to that, you may want to consider querying the user's posts instead. The advantage to using posts, is that you can get back the share count for each post in that query.
/255919387493?fields=posts.fields(id,shares)
If the share count on a post is zero, then there is obviously no need to run another query to retrieve the users that have shared that post.
The downside of using posts is that the post id is slightly different from a status id. You'll see ids that look like this:
255919387493_10151794781777494
The first half of that string is the user id of the post owner. The second half is the actual status id. If you want to query the sharedposts field for the post, you first have to extract the second half (the status id) and use that for the query.
Having said that, it occurs to me that you could actually retrieve all the information you need in one go if you chain the statuses query and the sharedposts query together. For example, something like this:
/255919387493?fields=statuses.fields(id,message,sharedposts.fields(from,created_time,place))
That will return the status id and message text for each status from that user, and the user details, create time and location for each person that shared each of those statuses.
Even with paging, though, that is likely to be a fairly slow query, so I'm not sure if that's such a good idea. It's worth considering though.
According new version of API 2.1 and documentation from here
https://developers.facebook.com/docs/graph-api/reference/v2.1/post
there is a new edge called "sharedposts"
As described here https://developers.facebook.com/docs/graph-api/reference/v2.1/object/sharedposts
This reference describes the /sharedposts edge that is common to
multiple Graph API nodes. The structure and operations are the same
for each node.
This edge represents any posts where the original object was shared on
Facebook.
If the post type is photo sharedposts will return empty as the object is different to the postID
/317380948302131_847979698575584 => Object : 847979378575616
/317380948302131_847979698575584/sharedposts?fields=from,via
ObjectID will work as expected
/847979378575616//sharedposts?fields=from,via
The only problem if the object is a shared_post it will show all shares from the original post object too and no via node is present .
Just struggle around some time why the APi only sometimes return sharedposts

String Ids are not quoted in dependent batch-request to api. Workaround?

I'm currently trying to query the facebook api to retrieve some data via batch-requests with two fql queries.
One of the queries fetches a set of album ids in the form of:
Select aid FROM album WHERE ...
While the other one tries to retrieve photos for the found albums:
SELECT ... FROM photo WHERE aid IN ({result=album_ids:$.*.aid})
Where 'album_ids' is the name of the first query.
Most of the time this works perfectly but sometimes a album comes along with an aid containing a '_' - Which would be perfectly fine since the documentation specifies the aid as string.
However the jsonpath in the second query does not quote the ids according to the facebook api:
Parser error: unexpected '_xxxxx' at position xx
...
SELECT ... FROM photo WHERE aid IN (10000xxxxxxxxxx_xxxxx)
The json result for the first query clearly has them quoted:
[{\"aid\":\"xxxxxxxxxxxxxxxxxxx\"},{\"aid\":\"10000xxxxxxxxxx_xxxxx\"},...]
Am i missing something here or does facebook wrongly skip to quote the ids in the second query even though they are clearly strings.
As far as i see in the facebook-api and jsonpath specs this should be working.
Or is there a work-around to get this to behave as expected? (Except of doing the quoting client-side and with two seperate requests).
Right now i'm trying to change my query as suggested here: Quoting/escaping jsonpath elements for in clause of dependent fql queries
But maybe there is a way without completely re-structuring the queries itself.

REST API question on how to handle collections as effective as possible while still conforming to the REST principles

Im pretty new to REST but as far as i have gathered i understand that the following URL's conform to the REST principles. Where the resources are laid out as follows:
/user/<username>/library/book/<id>/tags
^ ^ ^ ^
|---------|-----------|---|- user resource with username as a variable
|-----------|---|- many to one collection (books)
|---|- book id
|- many to one collection (tags)
GET /user/dave/library/book //retrieves a list of books id's
GET /user/dave/library/book/1 //retrieves info on book id=1
GET /user/dave/library/book/1/tags //retrieves tags collection (book id=1)
However, how would one go about optimizing this example API? Say for example i have 10K books in my library and i want to fetch the details of every book in my library. should i really force a http call to /library/book/<id> for every id given in /library/book? Or should i enable multiple id's as parameters? /library/book/<id1>,<id2>... and do like bulk fetching with a 100 id's at a time?
What does the REST principles say about this kind of situation? and what are your opinion(s)?
Thanks again.
This is strictly a design matter.
I could define a bookc resource and use it like this:
GET /user/dave/library/book?bookList=...
how do you further specify the bookList argument is really a matter of what kind of usage you envisage of this resource. You could have, e.g.:
GET /user/dave/library/book?bookList=1-10
GET /user/dave/library/book?bookList=1,2,5,20-25
or you could simply page through all of the books:
GET /user/dave/library/book?page=7&pagesize=50
But in my mind, especially the form with a long list of "random" ids seems pretty unfit. Maybe I would instead define a filter parameter so I can specify:
GET /user/dave/library/book?filter=key,value&filter=key,value
As to your question about HTTP URL length limit, the standard does not set any. But browser may vary... look at this S.O. topic
To be more strictly RESTful, the query parameter could be specified through HTTP headers, but the general idea I wanted to convey does not change.
Hope this seems suitable to you...
Above looks good, but I would change to plural names, it reads better:
/users/{username}/books/{bookId}
What I don't understand is the use-case of passing comma-separated list of ids. The question is how you get to the ids? I guess behind the list of ids there are semantics, i.e. they represent a result of a filter. So instead of passing ids I would go for a search api. Simplistic example:
/users/dave/books?puchasedAfter=2011-01-01
If you want to iterate through your 10K collection of books, use paging parameters.
this is just my opinion:
GET /user/dave/library/book/IDList //retrieves a list of books id's
or
GET /user/dave/library/bookID //retrieves a list of books id's
GET /user/dave/library/book //retrieves a list of books
GET /user/dave/library/book/1 //retrieves info on book id=1
GET /user/dave/library/book/1-3 //retrieves info on book id>=1 and id <=3
GET /user/dave/library/book/1/tags //retrieves tags collection (book id=1)
You can use a paginator
Some restful API's work with a paginator for huge resources like:
http://example.org/api/books?page=2
The server delivers for example 100 records (in this case books) per page. And you can sort the books using a sortby in your get request. With the above request you would get books 101-200 (if so many in the database). The response can tell you something about the amount of books and amount of pages, what is the next page and the previous page but then you go more to HATEOAS.
Otherwise if you want to get certain id's i would do it like this:
http://example.org/books?id=[]2&id=[]5&id=[]7&id=[]21
A get request with an array of id's (id = [2,5,7,21]) which returns the books with those respective id's