There is a set of event_ids. I need to find how many friends are attending to the each event. FB makes that in its suggested events page. Ex. 3 friends are attending to xxx.
Here explained for one event but i want to take a map like response in a single query. Response like
{
e1:{f2},
e2:{f3,f1},
e3:{f2,f3,f5,f8}
}
Is it possible?
Related
First time building a single table design and was just wondering if anyone had any advice/feedback/better ways on the following plan?
Going to be building a basic 'meetup' clone. so e.g. Users can create events, and then users can attend those events essentially.
How the entities in the app relate to eachother:
Entities: (Also added an 'ItemType' to each entity) so e.g. ItemType=Event
Key Structure:
Access Patterns:
Get All Attendees for an event
Get All events for a specific user
Get all events
Get a single event
Global Secondary Indexes:
Inverted Index: SK-PK-index
ItemType-SK-Index
Queries:
1. Get all attendees for an event:
PK=EVENT#E1
SK=ATTENDEE# (begins with)
2. Get All Events for a specific User
Index: SK-PK-index
SK=ATTENDEE#User1
PK=EVENT# (Begins With)
3. Get All Events (I feel like there's a much better way to do this, if there is please let me know)
Index: ItemType-SK-Index
ItemType=Event
SK=EVENT# (Begins With)
4. Get a single event
PK=EVENT#E1
SK=EVENT#E1
A couple of questions I had:
When returning a list of attendees, I'd want to be able to get extra data for that attendee, e.g. First/Lastname etc.
Based on this example: https://aws.amazon.com/getting-started/hands-on/design-a-database-for-a-mobile-app-with-dynamodb/module-5/
To avoid having to duplicate data and handle when data changes, (e.g. user changes name) use Partial Normalization and the BatchGetItem API to retrieve details?
For fuzzy searches etc, is the best approach to stream this data into e.g. elastic/opensearch?
If so, when building API's - would you still use dynamoDB for some queries, or just use elasticsearch for everything?
e.g. for Get All Events - would using an ItemType of 'Events' end up creating a hot partition if there's a huge number of events?
Sorry for the long post, Would appreciate any feedback/advice/better ways to do things, thank you!!
Heres the relation I'm trying to model in DynamoDB:
My service contains posts and topics. A post may belong to multiple topics. A topic may have multiple posts. All posts have an interest value which would be adjusted based on a combination of likes and time since posted, interest measures the popularity of a post at the current moment. If a post gets too old, its interest value will be 0 and stay that way forever (archival).
The REST api end points work like this:
GET /posts/{id} returns a post-object containing title, text, author name and a link to the authors rest endpoint (doesn't matter for this example) and the number of likes (the interest value is not included)
GET /topics/{name} should return an object with both a list with the N newest posts of the topics as well as one for the N currently most interesting posts
POST /posts/ creates a new post where multiple topics can be specified
POST /topics/
creates a new topic
POST /likes/ creates a like for a specified post (does not actually create an object, just adds the user to the given post-object's list of likers, which is invisible to the users)
The problem now becomes, how do I create a relationship between topics and and posts in DynamoDB NoSql?
I thought about adding a list of copies of posts to tag entries in DynamboDB, where every tag has a list of both the newest and the most interesting Posts.
One way I could do this is by creating a cloudwatch job that would run every 10 minutes and loop through every topic object, finding both the most interesting and newest entries and then replacing the old lists of the topic.
Another job would also have to regularly update the "interest" value of every non archived post (keep in mind both likes and time have an effect on the interest value).
One problem with this is that a lot of posts in the Tag list would be out of date for 10 minutes in case the User makes a change or deletes the post. Likes will also not be properly tracked on the Tags post list. This could perhaps be solved with transactions, although dynamoDB is limited to 10 objects per transaction.
Another problem is that it would require the add-posts-to-tags job to load all the non archived posts into memory in order to manually sort them by both time and interest, split them up by tag and then adding the first N of both sets to the tag lists every 10 minutes.
I also had a another idea, by limiting the tags of a post that are allowed to 1, I could add the tag as a partition key, with the post-time as the sort key, and use a GSI to add Interest as a second sort key.
This does have several downsides though:
very popular tags may be limited to a single parition since all the posts share a single partition key
Tag limit is 1
A cloudwatch job to adjust the Interest value of posts may still be required
It would require use of a GSI which may lead to dangerous race conditions
But it would have the advantage that there are no replications of the post objects aside from the GSI. It would also allow basically infinite paging of all posts by date instead of being limited to just the N newest posts.
So what is a good approach here? It seams both of my solutions have horrible dealbreakers. Is this just one of those problems that NoSQL simply can't solve?
You are trying to model relational data using a non relational DB ,
to do this I would use 2 types of DB ,
I would store in dynamo the post information
in your example it would be :
GET /posts/{id}
POST /posts/
POST /likes/creates
For the topic related information I would use Elastic search (Amazon Elasticsearch Service)
GET /topics/{name} : the search index would stored the full topic info as well post id's that , and the relevant fields you want to search for (in your case update date to get the most recent posts)
what this will entail is background process (in dynamoDB this can be done via streams) that takes changes to the dynamoDB for new post's , update to like count etc.. and populates the search index.
Note: this can also be solved using graphDB but for scaling purposes better separate the source of the data (post's ) and the data relations (topic).
I'm trying to get a list of friends who have checked in / posted a status with place for a particular Facebook place. Also any of them who've been tagged in those posts.
With Facebook phasing out checkins, I'm using location_post table, which seems to contain the data I need anyway.
This is what I've tried:
SELECT author_uid,tagged_uids FROM location_post WHERE page_id='116076945072769'
I'm only getting results for the two posts that I made on that place, even though I've just made a post from another account (who I'm a friend with) with that particular place attached. Also, when I go to the FB page for that place, it lists this other user as 'Friends who've recently visited'.
Trying to get all location_post objects where the author is that user (let's call him Friend1), doesn't give me any results:
SELECT author_uid,tagged_uids FROM location_post WHERE author_uid=<Friend1>
(note, this user is my friend, I see it when I try to get a list of friends)
Trying to get a list of all friends who have authored or have been tagged in a location_post, I get a surprisingly small list:
SELECT author_uid,tagged_uids,page_id FROM location_post WHERE (author_uid IN (SELECT uid2 FROM friend WHERE uid1=me())) OR (author_uid = me())
In particular, I see 10 entries for a particular user (Friend2). However, when I try this:
SELECT author_uid,tagged_uids,page_id FROM location_post WHERE author_uid=<Friend2>
I get 30 results. Too round numbers as well.
Any ideas on what could be causing the inconsistency? And why I wouldn't be seeing location_posts from one particular friend?
I tried getting all possible permissions, but it didn't help. This is all in FQL explorer.
Thanks!
I would like to query earliest posts of a Facebook user using FQL or Graph API. The big issue is by default, Facebook limit return items, which are ordered by descending time.
I know I can limit my query by until, but I don't know what date to put in, because I have no idea when my user become Facebook member. I have to do search like:
find post until Jan 2006
if null, then find post until Jan 2007
if null, then find post until Jan 2008
....
which I hate so much.
Is there a smarter way to find out earliest posts by user?
First off, it's near impossible to have an all encompassing program that determines when a user joined Facebook, to put it quite bluntly. I know from your past questions, you have been trying but many have tried before you, it's not possible.
For example what happens if no one decides to write anything on my wall from the date I joined to 1 year after? That indicator becomes pretty inaccurate now does it?
Anything smarter is based on assumptions that may or may not hold true.
e.g.
Assumption 1: Every Facebook user would publish a post on or near when they joined
this give an initial guess based on A1
Assumptions 2: Given A1, any post by a friend on a user's wall that is posted before the unix time returned by A1 will be earlier in date
this will always be true as long as A1 holds.
All of this falls when there is a year between actual activity and join date.
You can minimize the set returned by calling less data per item and more items overall
/me/feed?fields=created_time&limit=200
Then you page until there is no next paging parameter left.
If you are indeed trying to find when did a user join Facebook, I agree with phwd's answer.
The best way I have been able to find out (which is also cheaper than having to reiterate through tons of posts) is accessing the earliest "profile pictures" of the user. This is making the assumption that a user would post a profile picture soon after creating their account.
Once you can get access to "Profile Pictures" album, you might be able to use created_time field for the album (or sort Profile Pictures by created_time for individual photos).
Even if the earliest photo was deleted, what are the chances that the user stays without any profile picture for a long time?
Reference:
https://developers.facebook.com/docs/graph-api/reference/v2.0/album
Im pretty new to REST but as far as i have gathered i understand that the following URL's conform to the REST principles. Where the resources are laid out as follows:
/user/<username>/library/book/<id>/tags
^ ^ ^ ^
|---------|-----------|---|- user resource with username as a variable
|-----------|---|- many to one collection (books)
|---|- book id
|- many to one collection (tags)
GET /user/dave/library/book //retrieves a list of books id's
GET /user/dave/library/book/1 //retrieves info on book id=1
GET /user/dave/library/book/1/tags //retrieves tags collection (book id=1)
However, how would one go about optimizing this example API? Say for example i have 10K books in my library and i want to fetch the details of every book in my library. should i really force a http call to /library/book/<id> for every id given in /library/book? Or should i enable multiple id's as parameters? /library/book/<id1>,<id2>... and do like bulk fetching with a 100 id's at a time?
What does the REST principles say about this kind of situation? and what are your opinion(s)?
Thanks again.
This is strictly a design matter.
I could define a bookc resource and use it like this:
GET /user/dave/library/book?bookList=...
how do you further specify the bookList argument is really a matter of what kind of usage you envisage of this resource. You could have, e.g.:
GET /user/dave/library/book?bookList=1-10
GET /user/dave/library/book?bookList=1,2,5,20-25
or you could simply page through all of the books:
GET /user/dave/library/book?page=7&pagesize=50
But in my mind, especially the form with a long list of "random" ids seems pretty unfit. Maybe I would instead define a filter parameter so I can specify:
GET /user/dave/library/book?filter=key,value&filter=key,value
As to your question about HTTP URL length limit, the standard does not set any. But browser may vary... look at this S.O. topic
To be more strictly RESTful, the query parameter could be specified through HTTP headers, but the general idea I wanted to convey does not change.
Hope this seems suitable to you...
Above looks good, but I would change to plural names, it reads better:
/users/{username}/books/{bookId}
What I don't understand is the use-case of passing comma-separated list of ids. The question is how you get to the ids? I guess behind the list of ids there are semantics, i.e. they represent a result of a filter. So instead of passing ids I would go for a search api. Simplistic example:
/users/dave/books?puchasedAfter=2011-01-01
If you want to iterate through your 10K collection of books, use paging parameters.
this is just my opinion:
GET /user/dave/library/book/IDList //retrieves a list of books id's
or
GET /user/dave/library/bookID //retrieves a list of books id's
GET /user/dave/library/book //retrieves a list of books
GET /user/dave/library/book/1 //retrieves info on book id=1
GET /user/dave/library/book/1-3 //retrieves info on book id>=1 and id <=3
GET /user/dave/library/book/1/tags //retrieves tags collection (book id=1)
You can use a paginator
Some restful API's work with a paginator for huge resources like:
http://example.org/api/books?page=2
The server delivers for example 100 records (in this case books) per page. And you can sort the books using a sortby in your get request. With the above request you would get books 101-200 (if so many in the database). The response can tell you something about the amount of books and amount of pages, what is the next page and the previous page but then you go more to HATEOAS.
Otherwise if you want to get certain id's i would do it like this:
http://example.org/books?id=[]2&id=[]5&id=[]7&id=[]21
A get request with an array of id's (id = [2,5,7,21]) which returns the books with those respective id's