Should I denormalize or run multiple queries in DocumentDb? - document-database

I'm learning about data modeling in DocumentDb. Here's where I need some advice
Please see what my documents look like down below.
I can take two approaches here both with pros and cons.
Scenario 1:
If I keep the data denormalized (see my documents below) by keeping project team member information i.e. first, last name, email, etc. in the same document as the project, I can get the information I need in one query BUT when Jane Doe gets married and her last name changes, I'd have to update a lot of documents in the Projects collection. I'd also have to be extremely careful in making sure that all collections with documents that contain employee information get updated as well. If, for example, I update Jane Doe's name in Projects collection but forget to update the TimeSheets collection, I'd be in trouble!
Scenario 2:
If I keep data somewhat normalized and keep only EmployeeId in the project documents, I can then run three queries whenever I want to get a projects list:
Query 1 returns projects list
Query 2 would give me EmployeeId's of all project team members that appear in the first query
Query 3 for employee information i.e. first, last name, email, etc. I'd use the result of Query 2 to run this one
I can then combine all the data in my application.
The problem here is that DocumentDb seems to have a lot of limitations now. I may be reading hundreds of projects with hundreds of employees in project teams. Looks like there's no efficient way to get all employee information whose Id's appear in my second query. Again, please keep in mind that I may need to pull hundreds of employee information here. If the following SQL query is what I'd use for employee data, I may have to run the same query a few times to get all the information I need because I don't think I can have hundreds of OR statements:
SELECT e.Id, e.firstName, e.lastName, e.emailAddress
FROM Employees e
WHERE e.Id = 1111 OR e.Id = 2222
I understand that DocumentDb is still in preview and some of these limitations will be fixed. With that said, how should I approach this problem? How can I efficiently both store/manage and retrieve all project data I need -- including project team information? Is Scenario 1 a better solution or Scenario 2 or is there a better third option?
Here's what my documents look like. First, the project document:
{
id: 789,
projectName: "My first project",
startDate: "9/6/2014",
projectTeam: [
{ id: 1111, firstName: "John", lastName: "Smith", position: "Sr. Engineer" },
{ id: 2222, firstName: "Jane", lastName: "Doe", position: "Project Manager" }
]
}
And here are two employee documents which reside in the Employees collection:
{
id: 1111,
firstName: "John",
lastName: "Smith",
dateOfBirth: "1/1/1967',
emailAddresses: [
{ email: "jsmith#domain1.com", isPrimary: "true" },
{ email: "john.smith#domain2.com", isPrimary: "false" }
]
},
{
id: 2222,
firstName: "Jane",
lastName: "Doe",
dateOfBirth: "3/8/1975',
emailAddresses: [
{ email: "jane#domain1.com", isPrimary: "true" }
]
}

I believe you're on the right track in considering the trade-offs between normalizing or de-normalizing your project and employee data. As you've mentioned:
Scenario 1) If you de-normalize your data model (couple projects and employee data together) - you may find yourself having to update many projects when you update an employee.
Scenario 2) If you normalize your data model (decouple projects and employee data) - you would have to query for projects to retrieve employeeIds and then query for the employees if you wanted to get the list of employees belonging to a project.
I would pick the appropriate trade-off given your application's use case. In general, I prefer de-normalizing when you have a read-heavy application and normalizing when you have a write-heavy application.
Note that you can avoid having to make multiple roundtrips between your application and the database by leveraging DocumentDB's store procedures (queries would be performed on DocumentDB-server-side).
Here's an example store procedure for retrieving employees belonging to a specific projectId:
function(projectId) {
/* the context method can be accessed inside stored procedures and triggers*/
var context = getContext();
/* access all database operations - CRUD, query against documents in the current collection */
var collection = context.getCollection();
/* access HTTP response body and headers from the procedure */
var response = context.getResponse();
/* Callback for processing query on projectId */
var projectHandler = function(documents) {
var i;
for (i = 0; i < documents[0].projectTeam.length; i++) {
// Query for the Employees
queryOnId(documents[0].projectTeam[i].id, employeeHandler);
}
};
/* Callback for processing query on employeeId */
var employeeHandler = function(documents) {
response.setBody(response.getBody() + JSON.stringify(documents[0]));
};
/* Query on a single id and call back */
var queryOnId = function(id, callbackHandler) {
collection.queryDocuments(collection.getSelfLink(),
'SELECT * FROM c WHERE c.id = \"' + id + '\"', {},
function(err, documents) {
if (err) {
throw new Error('Error' + err.message);
}
if (documents.length < 1) {
throw 'Unable to find id';
}
callbackHandler(documents);
}
);
};
// Query on the projectId
queryOnId(projectId, projectHandler);
}
Even though DocumentDB supports limited OR statements during the preview - you can still get relatively good performance by splitting the employeeId-lookups into a bunch of asynchronous server-side queries.

Related

How to update local data after mutation?

I want to find a better way to update local component state after executing mutation. I'm using svelte-apollo but my question is about basic principles. I have watchQuery which get list of items and returns ObservableQuery in component.
query GetItems($sort: String, $search: String!) {
items(
sort: $sort
where: { name_contains: $search }
) {
id
name
item_picture{
pictures{
url
previewUrl
}
}
description
created_at
}
}
In component I call it:
<script>
$: query = GetItems({
variables: {
sort: 'created_at:DESC',
search
}
});
</script>
...
{#each $query.data?.items || [] as item, key (item.id)}
<div>
<Item
deleteItem={dropItem}
item={item}
setActiveItem={setActiveItem}
/>
</div>
{/each}
...
And I have addItem mutation.
mutation addItem($name: String!, $description: String) {
createItem(
input: { data: { name: $name, description: $description } }
) {
item {
name
description
}
}
}
I just simply want to update local state and add new item to an observable query result after addItem mutation, without using refetchQueries (because I don't want to get all list by network when I just added one item).
I seen this item in cache but my view is not updated.
P.S. If you have similar problems and some ways to solve it, be glad to see some cases from you.
I believe in this case, you could use the cache.modify function to modify the cache directly if you’re looking to skip the network request from refetchQueries. Would that work for your use case? https://www.apollographql.com/docs/react/data/mutations/#making-all-other-cache-updates
If you don’t mind the network request, I like using cache.evict to evict the data in the cache that I know changed personally. I prefer that to refetchQueries in most cases because it refetches all queries that used that piece of data, not just the queries I specify.

Accessing storage synchronously with ionic 2 storage service

One of the recurring problems i've been having with ionic 2 is it's storage service. I have successfully set and retrieved stored data. However, when i store something, it is inaccessible on other pages unless i refresh the page/application.
Example one: Editing a contact
I push to an edit contact page, make changes, then saveEdits. saveEdits successfully makes the change to the right contact but fails to update the contact list UNTIL the application is refreshed.
HTML:
<button (click)="saveEdits(newName, newPostCode)"ion-button round>Save Edits</button>
TypeScript:
saveEdits(newName, newPostCode){
console.log("saveid"+this.id);
this.name = newName; //saves property
this.postcode = newPostCode; //saves property
this.items[this.id] = {"id": this.id, "Name": newName, "PostCode": newPostCode};
this.storage.set('myStore',this.items);
//this.navCtrl.pop(ContactPage);
}
Example two: Accessing contacts on another page
On another page i iterate through contacts and display them in a radio alert box list. Again, the contacts are displayed successfully, but when I add a contact on the add contact page, the new contact does not appear on the radio alert box list.
addDests(){
console.log('adddests');
{
let alert = this.alertCtrl.create();
alert.setTitle('Choose Friend');
for(let i = 0; i<this.items.length; i++){
console.log('hello');
alert.addInput({
type: 'radio',
label: this.items[i].Name,
value: this.items[i].PostCode,
checked: false
});
}
alert.addButton('Cancel');
alert.addButton({
text: 'OK',
handler: data => {
console.log(data);
}
});
alert.present();
}
}
You're changing the reference the variable is pointing to:
this.items[this.id] = {"id": this.id, "Name": newName, "PostCode": newPostCode};
I assume that your LIST is iterating (ngFor) over the array referenced by this.items? If yes, update directly the properties of this.items[this.id] instead of re-initializing it.
this.items[this.id].Name = newName;
this.items[this.id].PostCode = newPostCode;
(By the way, I'd recommend to be consistent with your property naming: either Id and Name, or id and name (capital letters matter!)).
Your "list" view will always be refreshed if the references to the objects being used are not changed. The only exception would be an update made in a callback given to a third-part library. In that case, you can use NgZone to "force" Angular to take the update into account.
Also, have a look at Alexander's great advice about Observable.
You should use Angular provider(s) with Observable property to notify subscribers (other pages & components) about changes.
For example read this article: http://blog.angular-university.io/how-to-build-angular2-apps-using-rxjs-observable-data-services-pitfalls-to-avoid/
There are a lot of information on this: https://www.google.com/search?q=angular+provider+observable

Return join table attributes in incluce with loopback

I've got a data structure fairly similar to the one described on the Loopback HasManyThrough documentation page.
For a given Physician (e.g. id 2), I would like to get all their patients with an appointment AND their appointment date.
I can do a GET operation like this:
GET /physicians/2
with the filter header { "include" : {"relation":"patients"} }
And I do get the physician, and the list of patients, but I lose the appointmentDate of the relation.
Or, I can do a GET operation on the relation table like the documentation shows:
GET /appointments
with the filter header { "include" : {"relation":"patient"}, "where":{"physicianId":2}} }
And I get the the appointments, with the date and the patient embedded, but not the physician details.
I can't seem to be able to combine the two.
Is there a way to get the whole data with one query?
The data would be something like this:
[
"name" : "Dr John",
"appointments" : [ {
"appointmentDate": "2014-06-01",
"patient": {
"name": "Jane Smith",
"id": 1
}
}]
]
One way hack I found is to define the relation twice. Once as a HasManyThrough and once as a HasMany to the appointments table, then I can do something like this:
GET /physicians/2
with the filter header { "include" : {"relation":"appointments","scope":{"include":["patient"]} } }
But that doesn't seem right, or could maybe lead to odd behaviours with the duplicated relation.. but maybe I'm paranoid.
You could include both models
GET /appointments
{ "include": ["patient", "physician"], "where": { "physicianId":2 } }
You will get quite a lot of duplicate data though (details of physician with id 2). I believe, that HasManyThrough relation model was initially not supposed to carry any extra data and therefore, it has some limitations. Here is a related github issue.

Ember Data Nested Resources Tree Structure

I have a slightly peculiar problem with loading my tree structure into Ember.
My models are:
book.js
- parts: DS.hasMany('part', {inverse: 'book', async: true})
part.js
- subparts: DS.hasMany('part', {inverse: 'parent_part', async: true}),
With the following API responses:
GET /api/books:
{
books: [
{id: 1, links: {parts: "/api/books/1/parts"}},
...
]
}
GET /api/books/1/parts:
{
parts: [
{
id: 1,
subparts: [10, 11]
},
{
id: 2,
subparts: []
}
]
}
The problem is in the tree nature of the parts: The book only has direct descendants id 1 and 2, but these have sub-parts on their own.
The structure as it is works but results in multiple sub-queries for each part that was not included in the /books/1/parts result. I want to avoid these queries, not only because of performance reasons but also because I will need additional query parameters which would get lost at this step... I know about coalesceFindRequests but it introduces new problems.
To rephrase the problem, Ember Data thinks that every part that is included in the /books/1/parts response should be added directly to the book:parts property. How can I still load all records of the parts tree at the same time?
I tried renaming the fields, but Ember Data assigns the records based on the model name, not the field name.
I fear that some creative adapter overriding will be necessary here. Any idea appreciated. The backend is completely under my control, so I could change things on that end, too.
You need to use a process called sideloading, which should work as you expect (I've had issues in the past with sideloading data). As mentioned in this issue, you want to split your parts into two separate arrays.
{
// These are the direct children
"parts": [{...}, {...}],
// These are the extra records
"_parts": [{...}, {...}]
}

Couchbase Map/Reduce to count total by document type

I'm storing event data in Couchbase documents like this:
{
user: {
id: '0BE2DA2B-9C8F-432D-88C2-B2C1D8D0E4B4',
device: { 'manufacturer': 'Apple', 'os': 'iOS', 'name': 'iPhone', 'version': '5S' }
},
event_type: 'INTERACTION_A',
country: 'GB',
timestamp: 1398781631233
}
I have created Map/Reduce queries to tell me how many events iPhone users have submitted. However, is it possible to use Map/Reduce to query how many unique devices by OS are submitting events?
Each individual device might have submitted 1000s of events, but the result would show how many unique devices, by OS, the system has seen. I'm trying to end up with a data that looks something like this:
{ 'iOS': 2343, 'Android': 6343 }
Is it possible to do this in a single Couchbase view?
Yes, it's possible. You just need to use group=true&group_level=1 in your query.
Create a view like:
map : function(){
emit(doc.os, null);
}
reduce: _count
Then add group=true&group_level=1 to your query:
http://127.0.0.1:8092/default/_design/dev_<designDocName>/_view/<viewName>?connection_timeout=60000&limit=10&skip=0&group=true&group_level=1
Also check this links for more examples:
Writing a simple group by with map-reduce (Couchbase)
http://hardlifeofapo.com/basic-couchbase-querying-for-sql-people/
http://blog.couchbase.com/understanding-grouplevel-view-queries-compound-keys
I think my original question might have been too vague. However, I have reached this solution:
map: function (doc, meta) {
emit([doc.user.device.os, doc.user.id], null);
}
reduce: function (keys, values, rereduce) {
var os = {};
keys.forEach(function (k) { os[k] = 1; });
return Object.keys(os).length;
}
Running this view with group=true&group_level=1 gives me what I wanted.
I'm not confident it will scale, or whether it needs to consider rereduce, however it works for my test data set.