Create hierarchy from flat list using map/reduce

Create hierarchy from flat list using map/reduce - mapreduce

Suppose I had a Couch instance full of documents like the following:
{"id":"1","parent":null},
{"id":"2","parent":"1"},
{"id":"3","parent":"1"},
{"id":"4","parent":"3"},
{"id":"5","parent":"null"},
{"id":"6","parent":"5"}
Is there a way using MapReduce to build a view that would return my documents in this format:
{
"id":"1",
"children": [
{"id":"2"},
{"id":"3","children":[
{"id":"4"}
]}
]
},
{
"id":"5",
"children": [ {"id":"6"} ]
}
My instinct says "no" because I imagine you'd need one pass for each level of the hierarchy, and items can be nested indefinitely deep.

By using only map function this can not be achieved, yes. But the reduce will have access to the whole list of documents emitted by the map functions: http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#Reduce_Functions
To implement that, you will need a robust reduce function, that should also be abled to use "rereduce" efficiently.
In the end, it might be easier to create a view, that will map each document by its parent as key. Example:
function(doc) {
emit(doc.parent, doc._id);
}
This view will allow to query the top level documents with the key "null" and with sub ids like "1", "3" or "5".
A reduce function could added to create a result like this:
null => [1, 5]
1 => [2, 3]
3 => [4]
5 => [6]
The structural tree you whished for is contained therein in a different format and can be created out there.

Related

Is there a way to choose a nested field as a partition key in AWS DynamoDB?

I have a JSON document like:
{
"best_answer": {
"answers": {
"a" :"b",
"c" :"d"
},
"question": "random_question"
},
"blurbs": []
}
And I want to create the partition key on the "question" field (nested inside best_answer). How to do this on the AWS Console?

The only way this is possible is to add the "question" entity as a top level attribute on the item, in this case the partition key, in addition to being embedded in the JSON. Whether that is a good partition key remains to be seen. I cannot comment on that without know more about your use case and its access patterns to start with.

Creating indexes from nested structure in DynamoDB

I wonder if it's possible to create an index that could look like this
{
"dispenserId": "my-dispenser-123", // primary key
"users": ["user5", "user12"],
"robotId": "my-robot-1",
"enabled": true,
"side": left
}
Based on my DynamoDB documents that look like this
{
"robotId": "my-robot-1", // primary key
"dispensers": {
"left": "left-dispenser-123",
"right": "right-dispenser-123",
"users": ["user5", "user12"]
},
"enabled": true,
"users": ["user1", "user32"]
}
I can't figure out how to point at either dispensers.left or dispensers.right and use that as a key, neither can I figure out how to make a side: left/right attribute based on the path of the dispenser ID.
Can it be achieved with the current structure? If not, what document structure would you guys suggest instead. which allows me to hold the same data?

What you are trying to do (use a map element as a key attribute for an index) is not supported by DynamoDB.
The index partition key and sort key (if present) can be any base table attributes of type string, number, or binary. (Source)
You cannot use (an element of) a map attribute as a a key attribute for an index because the key attribute must be a string, number, or binary attribute from the base table.
Consider using the adjacency list design pattern for your data. It will allow you to easily add both the left and right dispensers to your index.

My new structure looks like this
partition key: robotId
sort key: compoundKey
[
{
"robotId": "robot1",
"enabled": true,
"users": [
"user1",
"user3"
],
"compositeKey": "robot--robot1"
},
{
"robotId": "robot1",
"dispenserId": "dispenser1",
"compositeKey": "dispenser--dispenser1",
"side": "left",
"users": [
"user4",
"user61"
]
}
]
Then I have an index with the dispenserId as partition key, so I can either look the dispensers for a given robot (using the table) or look up details about a dispenser (using the index)

How can I use scan/scroll with pagination and sort in ElasticSearch?

I have a ES DB storing history records from a process I run every day. Because I want to show only 20 records per page in the history (order by date), I was using pagination (size + from_) combined scroll, which worked just fine. But when I wanted to used sort in the query it didn't work. So I found that scroll with sort don't work. Looking for another alternative I tried the ES helper scan which works fine for scrolling and sorting the results, but with this solution pagination doesn't seem to work, which I don't understand why since the API says that scan sends all the parameters to the underlying search function. So my question is if there is any method to combine the three options.
Thanks,
Ruben

When using the elasticsearch.helpers.scan function, you need to pass preserve_order=True to enable sorting.
(Tested using elasticsearch==7.5.1)

yes, you can combine scroll with sort, but, when you can sort string, you will need change the mapping for it works fine, Documentation Here
In order to sort on a string field, that field should contain one term
only: the whole not_analyzed string. But of course we still need the
field to be analyzed in order to be able to query it as full text.
The naive approach to indexing the same string in two ways would be to
include two separate fields in the document: one that is analyzed for
searching, and one that is not_analyzed for sorting.
"tweet": {
"type": "string",
"analyzer": "english",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
The main tweet field is just the same as before: an analyzed full-text field.
The new tweet.raw subfield is not_analyzed.
Now, or at least as soon as we have reindexed our data, we can use the
tweet field for search and the tweet.raw field for sorting:
GET /_search
{
"query": {
"match": {
"tweet": "elasticsearch"
}
},
"sort": "tweet.raw"
}

Ember Data Nested Resources Tree Structure

I have a slightly peculiar problem with loading my tree structure into Ember.
My models are:
book.js
- parts: DS.hasMany('part', {inverse: 'book', async: true})
part.js
- subparts: DS.hasMany('part', {inverse: 'parent_part', async: true}),
With the following API responses:
GET /api/books:
{
books: [
{id: 1, links: {parts: "/api/books/1/parts"}},
...
]
}
GET /api/books/1/parts:
{
parts: [
{
id: 1,
subparts: [10, 11]
},
{
id: 2,
subparts: []
}
]
}
The problem is in the tree nature of the parts: The book only has direct descendants id 1 and 2, but these have sub-parts on their own.
The structure as it is works but results in multiple sub-queries for each part that was not included in the /books/1/parts result. I want to avoid these queries, not only because of performance reasons but also because I will need additional query parameters which would get lost at this step... I know about coalesceFindRequests but it introduces new problems.
To rephrase the problem, Ember Data thinks that every part that is included in the /books/1/parts response should be added directly to the book:parts property. How can I still load all records of the parts tree at the same time?
I tried renaming the fields, but Ember Data assigns the records based on the model name, not the field name.
I fear that some creative adapter overriding will be necessary here. Any idea appreciated. The backend is completely under my control, so I could change things on that end, too.

You need to use a process called sideloading, which should work as you expect (I've had issues in the past with sideloading data). As mentioned in this issue, you want to split your parts into two separate arrays.
{
// These are the direct children
"parts": [{...}, {...}],
// These are the extra records
"_parts": [{...}, {...}]
}

Create / Update multiple objects from one API response

all new jsfiddle: http://jsfiddle.net/vJxvc/2/
Currently, i query an api that will return JSON like this. The API cannot be changed for now, which is why I need to work around that.
[
{"timestamp":1406111961, "values":[1236.181, 1157.695, 698.231]},
{"timestamp":1406111970, "values":[1273.455, 1153.577, 693.591]}
]
(could be a lot more lines, of course)
As you can see, each line has a timestamp and then an array of values. My problem is, that i would actually like to transpose that. Looking at the first line alone:
{"timestamp":1406111961, "values":[1236.181, 1157.695, 698.231]}
It contains a few measurements taken at the same time. This would need to become this in my ember project:
{
"sensor_id": 1, // can be derived from the array index
"timestamp": 1406111961,
"value": 1236.181
},
{
"sensor_id": 2,
"timestamp": 1406111961,
"value": 1157.695
},
{
"sensor_id": 3,
"timestamp": 1406111961,
"value": 698.231
}
And those values would have to be pushed into the respective sensor models.
The transformation itself is trivial, but i have no idea where i would put it in ember and how i could alter many ember models at the same time.

you could make your model an array and override the normalize method on your adapter. The normalize method is where you do the transformation, and since your json is an array, an Ember.Array as a model would work.

I am not a ember pro but looking at the manual I would think of something like this:
a = [
{"timestamp":1406111961, "values":[1236.181, 1157.695, 698.231]},
{"timestamp":1406111970, "values":[1273.455, 1153.577, 693.591]}
];
b = [];
a.forEach(function(item) {
item.values.forEach(function(value, sensor_id) {
b.push({
sensor_id: sensor_id,
timestamp: item.timestamp,
value: value
});
});
});
console.log(b);
Example http://jsfiddle.net/kRUV4/
Update
Just saw your jsfiddle... You can geht the store like this: How to get Ember Data's "store" from anywhere in the application so that I can do store.find()?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Create hierarchy from flat list using map/reduce - mapreduce

Related

Is there a way to choose a nested field as a partition key in AWS DynamoDB?

Creating indexes from nested structure in DynamoDB

How can I use scan/scroll with pagination and sort in ElasticSearch?

Ember Data Nested Resources Tree Structure

Create / Update multiple objects from one API response

Categories

Resources