Couch base map reduce query - mapreduce

I have a 3 Json Documents
Document 1
"No" : 1
"City" : "Patiala"
"Value" : 10
Document 2
"No" : 1
"City" : "Delhi"
"Value" : 11
Document 3
"No" : 1
"City" : "Patiala"
"Value" : 11
I want output like
1 <Delhi or Patiala any one city> 32
I tried query with group level 2
map
function(doc, meta)
{
emit(doc.No,[doc.Value,doc.City]);
}
reduce
function(key,values,rereduce){
if(!rereduce){
var sum=0;
var s=[];
var v=[];
v=values[1];
s=values[0];
for(i=0;i<s.length;++i){
sum+= s[i];
}
return (sum,v[0]);
}else{
var sum=0;
var s=[];
var v;
v=values[1];
s=values[0];
for(i=0;i<s.length;++i){
sum+= s[i];
}
return (sum,v);
}
}
and got the following error
(Reducer: Error building index for view `my_first_view`, reason: TypeError: Cannot read property 'length' of null)
I only want to do group by on 'No' field but display any city.

The CouchDB documentation generally warns against abusing reduce functions by doing this kind of thing, so it's probably worth testing out with datasets of a size you are expecting in production.
You are probably best off using a simple view with the map function you have and a reduce function sum on the doc.Value. Call this twice, once with query params ?reduce=false&key=1&limit=1 and once again with ?group=true&key=1.
Having said all that, and this may kill your performance, this will do what you want in a single query.
Map Function:
function (doc) {
emit(doc.No,[doc.City, doc.Value]);
}
Reduce Function:
function (keys, values, rereduce) {
var city;
var sum = 0;
for (var i = 0; i<values.length; i++){
if (!city){
city = values[i][0];
}
sum = sum + values[i][1];
}
return [city, sum];
}
Query URL:
http://host:5984/db/_design/views/_view/view?group=true&key=1
Gives Result:
{"rows":[
{"key":1,"value":["Patiala",32]}
]}

Related

AWS Lambda function to scan/query DynamoDB table using array values as FilterExpression

here's my case: I'm trying to make a query on a table (table name HCI.LocCatApp) using a value sent by API as KeyConditionExpression, and I'm storing the results (which must be numbers not strings) in an array, and I want to use each value from this array as a FilterExpression to scan another table (table name HCI.Category) .. So what I need is to loop on the array values, take each of them as FilterExpression and perform the scan operation. I'm currently trying to use IN but I'm not sure if it's even supported or not.
And keep in mind that the array is being filled during the runtime. And the callback can be performed only once.
here's my code:
'use strict'
var AWS = require('aws-sdk');
var mydocumentClient = new AWS.DynamoDB.DocumentClient();
exports.handler = function (event, context, callback) {
var params = {
TableName: 'HCI.LocCatApp',
KeyConditionExpression : 'LocID = :lid',
ExpressionAttributeValues: {
":lid": event.LocID
},
ProjectionExpression: 'CatID'
};
var catIDs = [];
var catIDsObject = {};
var index = 0;
mydocumentClient.query(params, function (err, data){
if (err) {
callback(err, null);
}else{
data.Items.forEach(function(item){catIDs.push(item.CatID)});
//callback(null, catIDs);
}
})
catIDs.forEach(function(value){
index ++;
var catIDsKey = ":catID"+index;
catIDsObject[catIDsKey] = value;
})
var params2 = {
TableName: 'HCI.Category',
FilterExpression : "CatID IN (:cIDs)",
ExpressionAttributeValues : {
':cIDs' : catIDs
}
};
mydocumentClient.scan(params2, function (err, data){
if (err) {
callback(err, null);
}else{
callback(null, data);
}
})
}
For some reason, the current code runs successfully but it doesn't find any matches, even if I fill in the values manually in the array, there's still no results, the IN operation doesn't seem to work.
And many thanks in advance
In your code catIds is an array of IDs (strings probably).
When you pass it to FilterExpression, you are assuming that it will be converted to a) string b) to a string in correct format.
FilterExpression : "CatID IN (:cIDs)",
ExpressionAttributeValues : {
':cIDs' : catIDs
}
I cannot try this myself at the moment, but I'm assuming this is where the query fails. IN operator expects a comma separated list of values to compare to, in parenthesis. So, after the array is inserted to query, it should be like this
FilterExpression : "CatID IN (cat1, cat2, cat2)",
But most probably it contains extra set of [ and ], and maybe even the array to string conversion causes it to something like [Object object] etc.
One solution would be to use Array.join to concatenate all the elements from the array to single string before passing it to FilterExperession. Something like this
FilterExpression : "CatID IN (:cIDs)",
ExpressionAttributeValues : {
':cIDs' : catIDs.join()
}

CouchDB reduce bug when grouping?

I have a map reduce query that aggregates values extracted from several documents into a single aggregated document that maps to the object structure of by client application.
My key on the outputs is a 2 element array with a dataset identifier and a dataset country segment.
When I run the reduce on 'exact' grouping this works reliably.
The Problem: As soon as I aggregate the data by the dataset identifier only ie. I use group level 1, some of the values are incorrect, most of the incorrect ones are doubled, but some of them have other values. Is there any known issue, or is this a bug with my code?
After running my map query, I have a set of values that looks like:
{ [type]:
{ date:
{ [metric]: [value],
[more metric value combinations]
}
}
}
I have several documents and run a reduce that should join them so that I get the following structure:
{ type1:
{ date:
{ [metric]: [value],
[more metric value combinations]
},
anotherdate:
{ [metric]: [value],
[more metric value combinations]
},
},
type2:
{ date:
{ [metric]: [value],
[more metric value combinations]
}
},
}
To achieve this I use the following reduce query:
function (keys, values) {
//return values[0];
var returndoc = values[0];
for (var i = 1; i < values.length; i++) {
//Merge the current and previous object
returndoc = MergeDocs(returndoc, values[i]);
}
return returndoc;
}
function MergeDocs(doc1, doc2) {
var types = ['Live', 'Benchmark'];
for (var i = 0; i < types.length; i++) {
var t = types[i];
// if the source document does not Benchmark or Live column,
// create it and add values from the other document.
if (!doc1[t] && doc2[t]) {
doc1[t] = doc2[t];
}
// if the source document has a value and the other
// document exists, sum values.
else if (doc1[t] && doc2[t]) {
doc1[t] = MergeReports(doc1[t], doc2[t]);
}
}
return doc1;
}
function MergeReports(report1, report2) {
// iterate over the dates in the report in the report
for (var date in report2) {
// if there is no value for
if (!report1[date]) {
report1[date] = report2[date];
} else {
for (var metric in report2[date]) {
if (!report1[date][metric]) {
report1[date][metric] = report2[date][metric];
} else {
report1[date][metric] =
report1[date][metric] + report2[date][metric];
}
}
}
}
return report1;
}

What's the best way to build an aggregate document in couchdb?

Alright SO users. I am trying to learn and use CouchDB. I have the StackExchange data export loaded as document per row from the XML file, so the documents in couch look basically like this:
//This is a representation of a question:
{
"Id" : "1",
"PostTypeId" : "1",
"Body" : "..."
}
//This is a representation of an answer
{
"Id" : "1234",
"ParentId" : "1",
"PostTypeId" : "2"
"Body" : "..."
}
(Please ignore the fact that the import of these documents basically treated all the attributes as text, I understand that using real numbers, bools, etc. could yield better space/processing efficiency.)
What I'd like to do is to map this into a single aggregate document:
Here's my map:
function(doc) {
if(doc.PostTypeId === "2"){
emit(doc.ParentId, doc);
}
else{
emit(doc.Id, doc);
}
}
And here's the reduce:
function(keys, values, rereduce){
var retval = {question: null, answers : []};
if(rereduce){
for(var i in values){
var current = values[i];
retval.answers = retval.answers.concat(current.answers);
if(retval.question === null && current.question !== null){
retval.question = current.question;
}
}
}
else{
for(var i in values){
var current = values[i];
if(current.PostTypeId === "2"){
retval.push(current);
}
else{
retval.question = current;
}
}
}
return retval;
}
Theoretically, this would yield a document like this:
{
"question" : {...},
"answers" : [answer1, answer2, answer3]
}
But instead I am getting the standard "does not reduce fast enough" error.
Am I using Map-Reduce incorrectly, is there a well-established pattern for how to accomplish this in CouchDb?
(Please also note that I would like a response with the complete documents, where the question is the "parent" and the answers are the "children", not just the Ids.)
So, the "right" way to accomplish what I'm trying to do above is to add a "list" as part of my design document. (and the end I am trying to achieve appears to be referred to as "collating documents").
At any rate, you can configure your map however you like, and combine it with an a "list" in the same function.
To solve the above question, I eliminated my reduce (only have a map function), and then added a function like the following:
{
"_id": "_design/posts",
"_rev": "11-8103b7f3bd2552a19704710058113b32",
"language": "javascript",
"views": {
"by_question_id": {
"map": "function(doc) {
if(doc.PostTypeId === \"2\"){
emit(doc.ParentId, doc);
}
else{
emit(doc.Id, doc);
}
}"
}
},
"lists": {
"aggregated": "function(head, req){
start({\"headers\": {\"Content-Type\": \"text/json\"}});
var currentRow = null;
var currentObj = null;
var retval = [];
while(currentRow = getRow()){
if(currentObj === null || currentRow.key !== currentObj.key){
currentObj = {key: currentRow.key, question : null, answers : []};
retval.push(currentObj);
}
if(currentRow.value.PostTypeId === \"2\"){
currentObj.answers.push(currentRow.value);
}
else{
currentObj.question = currentRow.value;
}
}
send(toJSON(retval));
}"
}
}
So, after you have some elements loaded up, you can access them like so:
http://localhost:5984/<db>/_design/posts/_list/aggregated/by_question_id?<standard view limiters>
I hope this saves people some time.

How do you sort results of a _View_ by value in the in Couchbase?

So from what I understand in Couchbase is that one can sort keys* by using
descending=true
but in my case I want to sort by values instead. Consider the Twitter data in json format, my question is What it the most popular user mentioned?
Each tweet has the structure of:
{
"text": "",
"entities" : {
"hashtags" : [ ... ],
"user_mentions" : [ ...],
"urls" : [ ... ]
}
So having used MongoDB before I reused the Map function and modified it slightly to be usable in Couchbase as follows:
function (doc, meta) {
if (!doc.entities) { return; }
doc.entities.user_mentions.forEach(
function(mention) {
if (mention.screen_name !== undefined) {
emit(mention.screen_name, null);
}
}
)
}
And then I used the reduce function _count to count all the screen_name occurrences. Now my problem is How do I sort by the count values, rather than the key?
Thanks
The short answer is you cannot sort by value the result of you view. You can only sort by key.
Some work around will be to either:
analyze the data before inserting them into Couchbase and create a counter for the values you are interested by (mentions in your case)
use the view you have to sort on the application size if the size of the view is acceptable for a client side sort.
The following JS code calls a view, sorts the result, and prints the 10 hottest subjects (hashtags):
var http = require('http');
var options = {
host: '127.0.0.1',
port: 8092,
path: '/social/_design/dev_tags/_view/tags?full_set=true&connection_timeout=60000&group=true',
method: 'GET'
}
http.request(
options,
function(res) {
var buf = new Buffer(0);
res.on('data', function(data) {
buf += data;
});
res.on('end', function() {
var tweets = JSON.parse(buf);
var rows = tweets.rows;
rows.sort( function (a,b){ return b.value - a.value }
);
for ( var i = 0; i < 10; i++ ) {
console.log( rows[i] );
}
});
}
).end();
In the same time I am looking at other options to achieve this
I solved this by using a compound key.
function (doc, meta) {
emit([doc.constraint,doc.yoursortvalue]);
}
url elements:
&startkey=["jim",5]&endkey=["jim",10]&descending=true

Using RegExp in map function

I'd like to MapReduce data I have in MongoDB.
The data is like this:
{
type: 'DOMcheck',
category: 'Bad label name',
url: 'http://s1.app.int/part/module/doc/2'
...
}
Now i'd like to collect all logs and count uniqs by /part/module part of the url.
I create map function:
function() {
var re = new RegExp(/^(http:\/\/[\w\.]*)(\/[\w]*\/[\w]*)/),
u = [];
u = this.url.match(re);
emit(u[2], 1);
}
and reduce function:
function(key, val) {
var sum = 0;
for(var i in val) sum += val[i];
return sum;
}
and call MapReduce: res = db.logs.mapReduce(m, r, {query: {type:"DOMcheck", category: /bad/i}})
But I have an error:
uncaught exception: map reduce failed: {
"assertion" : "map invoke failed: JS Error: TypeError: u has no properties nofile_b:3",
"assertionCode" : 9014,
"errmsg" : "db assertion failure",
"ok" : 0
}
What's wrong with map function here? If i emit(this.url, 1) map works just fine...
Looks like your regex isn't matching against the url. That is why u has no properties.
You can set invalid record percent to skip desired amount of invalid record not failing job. And you can add counter for invalid record.