Or, I'm fine downloading all the cities in a country, and searching in memory.
It's actually quite simple but not (obviously) documented... Just query http://graph.facebook.com/106078429431815 with 106078429431815 being the city id (London in this example).
Related
I have two entities, Books and Authors with a strict one-to-many relationship (many-to-many relationship not required for my use case)
The access patterns I want to satisfy are:
Get Author Info by Author Name
Get Book Info By just ISBN
Get all Books records by an Author using Author Name.
Do I need any GSI given the constraint that I can make only a single request to DB when adding a Book or an Author, and fulfill above three access patterns also with a single request?
If my Author Entity uses this key schema:
Partition Key: AUTHOR#XYZ
Sort Key: AUTHOR#XYZ
and for Book Entity I use
Partition Key: BOOK#123
Sort Key BOOK#123
I can get author info by name and book info by ISBN easily. How do I get the 3rd access pattern, entire book data by author name?
Two approaches I thought of:
Have a third entity in the table with PK AUTHOR#XYZ, SK BOOK#123, and use BEGINS_WITH(SK, 'BOOK') but in this approach, when adding a book to DB, I will have to write two items, PK BOOK#, SK BOOK# for getting book by just ISBN and PK AUTHOR#, SK BOOK# for getting all books by author, and the book info will be duplicated in both items.
Add an attribute GSIAuthorName to Book entity when adding a book, and create a GSI with PK GSIAuthorName (AUTHOR#XYZ) and SK being PK of Book entity (BOOK#123). But in this the issue is, in projections I will have to select ALL, since I want all book info attributes by author name, and need to fetch in single query to the GSI, so entire Book Entity will be duplicated in this GSI.
Is there an easier way to model this data?
Since you're trying to have two different access patterns for a single entity that require a different partition key value, there is basically only the two options you have identified correctly.
Your design seems to only work for books that have a single author. In the real world that's not sufficient. There are plenty of books with multiple authors such as "The Dictator's handbook" by Bruce Bueno de Mesquita and Alastair Smith - your data model might want to account for that. Author <-> Book isn't One-to-Many, it's Many-to-Many.
I'd go for something like this which uses a Global Secondary Index. It's very close to your second suggestion.
PK
SK
GSI1PK
GSI1SK
type
attributes
AUTHOR#ALASTAIR SMITH
AUTHOR#ALASTAIR SMITH
author
name, birthdate, ...
AUTHOR#BRUCE BUENO DE MESQUITA
AUTHOR#BRUCE BUENO DE MESQUITA
author
name, birthdate, ...
BOOK#978-1610391849
AUTHOR#ALASTAIR SMITH
AUTHOR#ALASTAIR SMITH
BOOK#978-1610391849
book
title, publisher, author,...
BOOK#978-1610391849
AUTHOR#BRUCE BUENO DE MESQUITA
AUTHOR#BRUCE BUENO DE MESQUITA
BOOK#978-1610391849
book
title, publisher, author,...
Does this introduce data duplication? - Yes
Does this introduce complexity on writes? - Yes
Does it work in the real world? - Yes
The model I've chose allows you to fulfill the requirements:
Get Author Info by Author Name: GetItem on the primary index with PK=AUTHOR#... and SK=Author#...
Get Book Info by just ISBN: Query on primary index with PK=BOOK#... and limit 1
Get all books for an Author: Query on GSI1 with PK=AUTHOR#
When you write a book, you need to add a book record for each author and potentially the author entries. For updates on a books info (which should be very rare) you first do the query as in 2) without the limit and then update each item that comes back.
Update
To address the requests for clarification in the comments:
If you require a strict One-to-Many relationship, I'd pick the second approach
Frequent writes are typically not a problem in your one-to-many case as long as you don't exceed the write throughput of a single partition, which is unlikely given the data. I don't see why you'd need frequent writes though.
The extra complexity is typically only a one-time penalty when you create your data access layer. The code for update_book_by_isbn will have to include the steps I outlined above and the create_book might store multiple records.
I have a order I want to store in DynamoDB, with the following fields:
Order date: 2019-03-27 02:09pm
First Name: John
Last Name: Doe
Email: john#example.com
Phone: 555-11434
Address: 13 Lorong K Changi, Sunny Shores
City: Singapore
Zip: 654321
Country: Singapore
Status: new, confirmed, delivered
(There is no unique order identifier decreed)
At first I combined First&Last name "John Doe" as the partition key and put order date as the sort key. That worked quite well until:
I figured I can't query the partition key (name of customer). I want to be able to look up customer orders, by customer!
Secondly URLs addressing the order would look like: https://example.com/2019-03-27/John%20Doe... i.e. the space does cause some confusion. Is there a more efficient way to encode the name?
I am most keen on email address, but from researching that, it seems like email is a bad field to use.
The access patterns are pretty simple. Need a way to:
Look up an order
Search by customer (could be name, could be email)
Query by order status
I tried making a composite key with order status and order date, but that has not gone well: Replace an old item with a new item in DynamoDB
Most people in this scenario generate a UUID for the user, and make that the partition key.
If you use an email address as the partition key, it means your user cannot ever change their email address, at least not without some creative coding on your part.
It might be valid to use an email address in your case, for example if a user can never change email address. In that case you should just be able to URL encode the email address on your client. However if you want to avoid that altogether, you could accept the parameter in a Base64 encoded format, and decode it before use with DynamoDB.
If you decide to generate UUIDs and make these your partition keys, you would probably then create GSIs with partition keys of email address and order state. You can use these GSIs to access your data quickly with your specified access patterns.
I'm building web app and using django and Sphinx for free text search. I need to apply additional restrictions before making request to searchd, consider 2 tables:
Entity
id
title
description
created_by_id
updated_by_id
created_date
updated_date
and
EntityUser
id
entity_id [FK to the table above]
joining_user_id
is_approved
created_by_id
updated_by_id
created_date
updated_date
I've built RT index for main table Entity, all works fine, but then I want to make a query only on those entities to which user has joined, i.e. where for specific user_id & entity_id exists record in EntityUser with is_approved=1. Problem is that I can't index EntityUser, because there are no string fields - this table only holds integers/timestamps as you see. Not sure if I could make a query in SphinxQL containing subquery to another idex even if I could build index for that table. Knowing that Sphinx was used for quite big projects with great success, I doubt it's a limitation of Sphinx - is it bad design of DB/application or leak of knowledge how to build proper RT index? Can I somehow extend existing index so that I can use restriction above?
I was thinking that I could apply the additional restrictions after Sphinx returns IDs of records on MySQL side, but that's not going to work: N records with highest weight would be returned, but after applying additional restrictions the result could be empty. So I need to get an area of search and then perform query only on those entities user can possibly see.
Adapting the example from http://sphinxsearch.com/docs/current.html#attributes, you might be able to use something like this in your conf:
...
sql_query = SELECT app_entity.id as id,
app_entity.title as title,
app_entity.description as description,
app_entityuser.id as userid
FROM app_entity, app_entityuser
WHERE app_entity.id = app_entityuser.entity_id AND app_entityuser.is_approved = 1
sql_attr_uint = id
sql_attr_uint = userid
...
I should provide a disclaimer: I have not tried this.
I did find a related SO post, but it doesn't look like they quite solved it: Django-sphinx result filtering using attributes?
Good luck!
Actually I've found the answer and it has nothing to do with the design of application or DB.
In fact that's simple - I just need to use MVA for RT index as I would do for plain one (rt_attr_multi or rt_attr_multi_64). In configuration file I will have to do something like this:
...
rt_attr_multi = entity_users
}
and then populate it with IDs of users which have joined the Entity and have been approved. Problem was that I couldn't understand how to use MVA with RT index, but not it's clear. There are not enough real-word examples with RT indexes and MVA I think, so I've shared this to help to solve similar problems.
UPDATE: was fighting last hour to generate RT index and always was getting "unknown column: 'entity_users'". Finally found the reason - if you add MVA to RT index (don't know if that's the same for plain), you've got to not only restart searchd daemon (service), but also DELETE everything you have in "data" folder (or where you have stored your index)!
In my app, a user is at a location and is looking for her friends who have been anywhere withing 10 miles of where she is. How do I find this with either FQL or graph? The only way that I can see is by running a search like so: https://graph.facebook.com/search?type=checkin and then running through the results to find out which location was within 10 miles. Is there a better way for this?
Thanks for your help!
Doles
From http://developers.facebook.com/docs/reference/fql/location_post/
It says
An FQL table that returns Posts that have locations associated with
them and that satisfy at least one of the following conditions:
you were tagged in the Post
a friend was tagged in the Post
you authored the Post
a friend authored the Post
Note: This query can process a large amount of data. In order to
ensure that a manageable amount of data is returned within a
reasonable timeframe, you should specify a recent timestamp to narrow
the results.
Return posts within 10,000 meters of a given location:
SELECT id, page_id
FROM location_post
WHERE distance(latitude, longitude, '37.86564', '-122.25061') < 10000
Although the initial answer did work for me for part of my purpose, it quickly became inadequate. Now, after banging my head against the wall, it finally broke (not my head - the wall). Here are two more BETTER ways that WORK to find what I need:
This is to find just checkins:
SELECT checkin_id, coords, tagged_uids, page_id FROM checkin WHERE
(author_uid IN (SELECT uid2 FROM friend WHERE uid1 = me()) or
author_uid=me()) and coords.latitude<'45.0' and coords.latitude>'29'
and coords.longitude>'-175' and coords.longitude<'-5';
This is to find all location posts:
SELECT id, page_id FROM location_post WHERE (author_uid IN (SELECT
uid2 FROM friend WHERE uid1 = me()) or author_uid=me()) and
coords.latitude<'45.0' and coords.latitude>'29' and
coords.longitude>'-175' and coords.longitude<'-5'
I have an application that allows for "contacts" to be made completely customized. My method of doing that is letting the administrator setup all of the fields allowed for the contact. My database is as follows:
Contacts
id
active
lastactive
created_on
Fields
id
label
FieldValues
id
fieldid
contactid
response
So the contact table only tells whether they are active and their identifier; the fields tables only holds the label of the field and identifier, and the fieldvalues table is what actually holds the data for contacts (name, address, etc.)
So this setup has worked just fine for me up until now. The client would like to be able to pull a cumulative report, but say state of all the contacts in a certain city. Effectively the data would have to look like the following
California (from fields table)
Costa Mesa - (from fields table) 5 - (counted in fieldvalues table)
Newport 2
Connecticut
Wallingford 2
Clinton 2
Berlin 5
The state field might be id 6 and the city field might be id 4. I don't know if I have just been looking at this code way to long to figure it out or what,
The SQL to create those three tables can be found at https://s3.amazonaws.com/davejlong/Contact.sql
You've got an Entity Attribute Value (EAV) model. Use the field and fieldvalue tables for searching only - the WHERE caluse. Then make life easier by keeping the full entity's data in a CLOB off the main table (e.g. Contacts.data) in a serialized format (WDDX is good for this). Read the data column out, deserialize, and work with on the server side. This is much easier than the myriad of joins you'd need to do otherwise to reproduce the fully hydrated entity from an EAV setup.