CouchDB reduce bug when grouping? - mapreduce

I have a map reduce query that aggregates values extracted from several documents into a single aggregated document that maps to the object structure of by client application.
My key on the outputs is a 2 element array with a dataset identifier and a dataset country segment.
When I run the reduce on 'exact' grouping this works reliably.
The Problem: As soon as I aggregate the data by the dataset identifier only ie. I use group level 1, some of the values are incorrect, most of the incorrect ones are doubled, but some of them have other values. Is there any known issue, or is this a bug with my code?
After running my map query, I have a set of values that looks like:
{ [type]:
{ date:
{ [metric]: [value],
[more metric value combinations]
}
}
}
I have several documents and run a reduce that should join them so that I get the following structure:
{ type1:
{ date:
{ [metric]: [value],
[more metric value combinations]
},
anotherdate:
{ [metric]: [value],
[more metric value combinations]
},
},
type2:
{ date:
{ [metric]: [value],
[more metric value combinations]
}
},
}
To achieve this I use the following reduce query:
function (keys, values) {
//return values[0];
var returndoc = values[0];
for (var i = 1; i < values.length; i++) {
//Merge the current and previous object
returndoc = MergeDocs(returndoc, values[i]);
}
return returndoc;
}
function MergeDocs(doc1, doc2) {
var types = ['Live', 'Benchmark'];
for (var i = 0; i < types.length; i++) {
var t = types[i];
// if the source document does not Benchmark or Live column,
// create it and add values from the other document.
if (!doc1[t] && doc2[t]) {
doc1[t] = doc2[t];
}
// if the source document has a value and the other
// document exists, sum values.
else if (doc1[t] && doc2[t]) {
doc1[t] = MergeReports(doc1[t], doc2[t]);
}
}
return doc1;
}
function MergeReports(report1, report2) {
// iterate over the dates in the report in the report
for (var date in report2) {
// if there is no value for
if (!report1[date]) {
report1[date] = report2[date];
} else {
for (var metric in report2[date]) {
if (!report1[date][metric]) {
report1[date][metric] = report2[date][metric];
} else {
report1[date][metric] =
report1[date][metric] + report2[date][metric];
}
}
}
}
return report1;
}

Related

Infragistics Filtered Row Scanning. Goal is to sum of all filtered rows based on column

I have Infragistics UltraGrid which is filtered on different conditions. Now i want to sum of couple of columns from filtered row result. Check following code
UltraGridColumn column = this.ugResults.DisplayLayout.Bands[0].Columns["Air45HValue"];
double a = 0;
for (int i = 0; i < filtInCnt; i++)
{
if (ugResults.Rows[i].GetCellValue(column) != DBNull.Value)
{
a = a + Convert.ToDouble(ugResults.Rows[i].GetCellValue(column));
}
}
This does not give me a correct answer. I think, calculation is doing on original data source, not on filtered one. How can i do above function on filtered results?
Your code looks correct as I tested it in a small sample application. However, I do not see how you get filtInCnt as I would suggest you to use ugResults.Rows.GetFilteredInNonGroupByRows() instead. This method returns all the filtered rows. Then you should for each the collection and perform your calculations. Here is the code:
namespace WindowsFormsApplication1
{
using System;
using System.Data;
using System.Windows.Forms;
using Infragistics.Win;
using Infragistics.Win.UltraWinGrid;
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
this.ultraGrid1.DisplayLayout.Override.AllowRowFiltering = DefaultableBoolean.True;
this.ultraGrid1.DisplayLayout.Override.FilterUIType = FilterUIType.FilterRow;
this.ultraGrid1.DataSource = GetDataTable();
}
private DataTable GetDataTable(int rows = 100)
{
DataTable table = new DataTable("Table1");
table.Columns.Add("Integer column", typeof(int));
table.Columns.Add("DateTime column", typeof(DateTime));
table.Columns.Add("String column", typeof(string));
for (int i = 0; i < rows; i++)
{
DataRow row = table.NewRow();
row["Integer column"] = i;
row["String column"] = "text";
row["DateTime column"] = DateTime.Today.AddDays(i);
table.Rows.Add(row);
}
return table;
}
private void ultraButton1_Click(object sender, EventArgs e)
{
UltraGridColumn column = this.ultraGrid1.DisplayLayout.Bands[0].Columns["Integer column"];
double a = 0;
foreach(var row in this.ultraGrid1.Rows.GetFilteredInNonGroupByRows())
{
var cellValue = row.GetCellValue(column);
if(cellValue != DBNull.Value)
{
a = a + Convert.ToDouble(cellValue);
}
}
MessageBox.Show("Result: " + a);
}
}
}

Strange issue on reduce function

I always have reduce issue on practicing, like following map and reduce, if I add more document or add more field to emit in the map and reduce like following. it will return values as [], not sure what occur this?
problemNumber : doc.problemNumber,
UserId: idv,
event : doc.event
......
Map
function(doc) {
if(doc.event){
var idv = null;
for (var idu in doc.Data.users){
if (doc.eventData.users[idu].userTypeCode == "M"){
idv = doc.Data.users[idu].UserId;
}
}
var newDoc = {
problemNumber : doc.problemNumber,
UserId: idv,
event : doc.event
};
emit(null, newDoc);
}
}
Reduce
function(keys, values, rereduce) {
var result = [];
var closeMap = {};
for (var i=0; i<values.length; i++){
var doc = values[i];
if (doc.event=='CLOSE'){
closeMap[doc.problemNumber] = 1;
}
}
for (var i=0; i<values.length; i++){
var doc = values[i];
if (doc.event=='OPEN'){
if (closeMap[doc.problemNumber]){
doc.event = 'CLOSE';
}
result.push(doc);
}
}
return result;
}

Couch base map reduce query

I have a 3 Json Documents
Document 1
"No" : 1
"City" : "Patiala"
"Value" : 10
Document 2
"No" : 1
"City" : "Delhi"
"Value" : 11
Document 3
"No" : 1
"City" : "Patiala"
"Value" : 11
I want output like
1 <Delhi or Patiala any one city> 32
I tried query with group level 2
map
function(doc, meta)
{
emit(doc.No,[doc.Value,doc.City]);
}
reduce
function(key,values,rereduce){
if(!rereduce){
var sum=0;
var s=[];
var v=[];
v=values[1];
s=values[0];
for(i=0;i<s.length;++i){
sum+= s[i];
}
return (sum,v[0]);
}else{
var sum=0;
var s=[];
var v;
v=values[1];
s=values[0];
for(i=0;i<s.length;++i){
sum+= s[i];
}
return (sum,v);
}
}
and got the following error
(Reducer: Error building index for view `my_first_view`, reason: TypeError: Cannot read property 'length' of null)
I only want to do group by on 'No' field but display any city.
The CouchDB documentation generally warns against abusing reduce functions by doing this kind of thing, so it's probably worth testing out with datasets of a size you are expecting in production.
You are probably best off using a simple view with the map function you have and a reduce function sum on the doc.Value. Call this twice, once with query params ?reduce=false&key=1&limit=1 and once again with ?group=true&key=1.
Having said all that, and this may kill your performance, this will do what you want in a single query.
Map Function:
function (doc) {
emit(doc.No,[doc.City, doc.Value]);
}
Reduce Function:
function (keys, values, rereduce) {
var city;
var sum = 0;
for (var i = 0; i<values.length; i++){
if (!city){
city = values[i][0];
}
sum = sum + values[i][1];
}
return [city, sum];
}
Query URL:
http://host:5984/db/_design/views/_view/view?group=true&key=1
Gives Result:
{"rows":[
{"key":1,"value":["Patiala",32]}
]}

What's the best way to build an aggregate document in couchdb?

Alright SO users. I am trying to learn and use CouchDB. I have the StackExchange data export loaded as document per row from the XML file, so the documents in couch look basically like this:
//This is a representation of a question:
{
"Id" : "1",
"PostTypeId" : "1",
"Body" : "..."
}
//This is a representation of an answer
{
"Id" : "1234",
"ParentId" : "1",
"PostTypeId" : "2"
"Body" : "..."
}
(Please ignore the fact that the import of these documents basically treated all the attributes as text, I understand that using real numbers, bools, etc. could yield better space/processing efficiency.)
What I'd like to do is to map this into a single aggregate document:
Here's my map:
function(doc) {
if(doc.PostTypeId === "2"){
emit(doc.ParentId, doc);
}
else{
emit(doc.Id, doc);
}
}
And here's the reduce:
function(keys, values, rereduce){
var retval = {question: null, answers : []};
if(rereduce){
for(var i in values){
var current = values[i];
retval.answers = retval.answers.concat(current.answers);
if(retval.question === null && current.question !== null){
retval.question = current.question;
}
}
}
else{
for(var i in values){
var current = values[i];
if(current.PostTypeId === "2"){
retval.push(current);
}
else{
retval.question = current;
}
}
}
return retval;
}
Theoretically, this would yield a document like this:
{
"question" : {...},
"answers" : [answer1, answer2, answer3]
}
But instead I am getting the standard "does not reduce fast enough" error.
Am I using Map-Reduce incorrectly, is there a well-established pattern for how to accomplish this in CouchDb?
(Please also note that I would like a response with the complete documents, where the question is the "parent" and the answers are the "children", not just the Ids.)
So, the "right" way to accomplish what I'm trying to do above is to add a "list" as part of my design document. (and the end I am trying to achieve appears to be referred to as "collating documents").
At any rate, you can configure your map however you like, and combine it with an a "list" in the same function.
To solve the above question, I eliminated my reduce (only have a map function), and then added a function like the following:
{
"_id": "_design/posts",
"_rev": "11-8103b7f3bd2552a19704710058113b32",
"language": "javascript",
"views": {
"by_question_id": {
"map": "function(doc) {
if(doc.PostTypeId === \"2\"){
emit(doc.ParentId, doc);
}
else{
emit(doc.Id, doc);
}
}"
}
},
"lists": {
"aggregated": "function(head, req){
start({\"headers\": {\"Content-Type\": \"text/json\"}});
var currentRow = null;
var currentObj = null;
var retval = [];
while(currentRow = getRow()){
if(currentObj === null || currentRow.key !== currentObj.key){
currentObj = {key: currentRow.key, question : null, answers : []};
retval.push(currentObj);
}
if(currentRow.value.PostTypeId === \"2\"){
currentObj.answers.push(currentRow.value);
}
else{
currentObj.question = currentRow.value;
}
}
send(toJSON(retval));
}"
}
}
So, after you have some elements loaded up, you can access them like so:
http://localhost:5984/<db>/_design/posts/_list/aggregated/by_question_id?<standard view limiters>
I hope this saves people some time.

How do you sort results of a _View_ by value in the in Couchbase?

So from what I understand in Couchbase is that one can sort keys* by using
descending=true
but in my case I want to sort by values instead. Consider the Twitter data in json format, my question is What it the most popular user mentioned?
Each tweet has the structure of:
{
"text": "",
"entities" : {
"hashtags" : [ ... ],
"user_mentions" : [ ...],
"urls" : [ ... ]
}
So having used MongoDB before I reused the Map function and modified it slightly to be usable in Couchbase as follows:
function (doc, meta) {
if (!doc.entities) { return; }
doc.entities.user_mentions.forEach(
function(mention) {
if (mention.screen_name !== undefined) {
emit(mention.screen_name, null);
}
}
)
}
And then I used the reduce function _count to count all the screen_name occurrences. Now my problem is How do I sort by the count values, rather than the key?
Thanks
The short answer is you cannot sort by value the result of you view. You can only sort by key.
Some work around will be to either:
analyze the data before inserting them into Couchbase and create a counter for the values you are interested by (mentions in your case)
use the view you have to sort on the application size if the size of the view is acceptable for a client side sort.
The following JS code calls a view, sorts the result, and prints the 10 hottest subjects (hashtags):
var http = require('http');
var options = {
host: '127.0.0.1',
port: 8092,
path: '/social/_design/dev_tags/_view/tags?full_set=true&connection_timeout=60000&group=true',
method: 'GET'
}
http.request(
options,
function(res) {
var buf = new Buffer(0);
res.on('data', function(data) {
buf += data;
});
res.on('end', function() {
var tweets = JSON.parse(buf);
var rows = tweets.rows;
rows.sort( function (a,b){ return b.value - a.value }
);
for ( var i = 0; i < 10; i++ ) {
console.log( rows[i] );
}
});
}
).end();
In the same time I am looking at other options to achieve this
I solved this by using a compound key.
function (doc, meta) {
emit([doc.constraint,doc.yoursortvalue]);
}
url elements:
&startkey=["jim",5]&endkey=["jim",10]&descending=true