RavenDB index with map reduce distinct [closed] - mapreduce

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I have a slightly complex index that I need over a RavenDB document
given this document definition like:
public class PriceDocument
{
public string Id { get; set; }
public Guid PriceId { get; set; }
public decimal Price { get; set; }
public DateTime? PricingDate { get; set; }
public string Source { get; set; }
public int Version { get; set; }
}
I need to get all the products for a given productId (I can do that when I query)
that are unique by PricingDate (latest) and Source
So given the following data:
var priceDocument = new PriceDocument {
Price = 1m,
Id = productId + "/1",
PricingDate = new DateTime(2011, 4, 1, 8, 0, 0),
PriceId = productId,
Source = "Bloomberg",
Version = 1
};
var priceDocument1 = new PriceDocument {
Price = 1m,
Id = productId + "/2",
PricingDate = new DateTime(2011, 4, 1,9,0,0),
PriceId = productId,
Source = "Bloomberg",
Version = 1
};
I should get as a result priceDocument1, since its latest.
So far I have an index defined like this:
Map = docs =>
from priceDocument in docs
select new {
PricingDate = priceDocument.PricingDate,
PricingSource = priceDocument.Source,
Price = priceDocument.Price,
PriceId = priceDocument.PriceId
};
Reduce = results =>
from result in results
group result by new { result.PricingDate, result.Source } into price
select new {
PricingDate = price.Max(p => price.Key.PricingDate),
PricingSource = price.Key.Source,
};
But it doesnt work at run time, I'm getting an AbstractIndexingExecuter||8||Failed to index documents for index (my index name)
I've recreated this sample in a separate project and I can see that in stats in Raven Studio that I get the error:
Cannot implicitly convert type 'System.DateTimeOffset' to 'int' changed the DateTime? to DateTime
And no luck.
I switched from using the date to use the I can rely on it incrementing so now the index looks like this:
Map = docs =>
from priceDocument in docs
select new {
PricingDate = priceDocument.PricingDate,
PricingSource = priceDocument.Source,
ProductId = priceDocument.ProductId,
ProductVersion = priceDocument.Version
};
Reduce = results =>
from result in results
group result by new {
result.PriceId,
result.PricingDate,
result.Source,
result.Version
} into price
select new {
PricingDate = price.Key.PricingDate,
PricingSource = price.Key,
ProductVersion = price.Max(p=> price.Key.Version)
};
Now that throws no errors, however it also gives no results.

It should probably be this:
PricingDate = price.Max(p => (DateTimeOffset)price.Key.PricingDate)
But what you want doesn't require a map reduce index. You can get that by just using:
session.Query<PriceDocument>()
.Where(x=>x.ProduceId == prodId)
.OrderByDescending(x=>x.PriceDate)
.FirstOrDefault();
The reason your Map/Reduce index doesn't work is that you have different outputs for the map & reduce functions.

Trying changing this line in the Reduce part of the index, from
PricingDate = price.Max(p => price.Key.PricingDate)
to
PricingDate = price.Max(p => (int)price.Key.PricingDate)

Related

Microsoft.SharePoint.Client C# getting only User created Lists (and not Document Libraries)

I am trying to retrieve a list of user generated Lists from a specified website. I do not want System generated lists (eg MicroFeed) nor Document Libraries. Using the Microsoft example I have this code:
public static void LoadLists(Microsoft.SharePoint.Client.Web web, List<String> foldersList)
{
var ctx = web.Context;
ListCollection collList = web.Lists;
IEnumerable<List> listInfo = ctx.LoadQuery(
collList.Include(
list => list.Title,
list => list.Fields.Include(
field => field.Title,
field => field.InternalName)));
ctx.ExecuteQuery();
foreach (List oList in listInfo)
{
FieldCollection collField = oList.Fields;
foreach (Microsoft.SharePoint.Client.Field oField in collField)
{
Regex regEx = new Regex("name", RegexOptions.IgnoreCase);
if (regEx.IsMatch(oField.InternalName))
{
Console.WriteLine("List: {0} \n\t Field Title: {1} \n\t Field Internal Name: {2}",
oList.Title, oField.Title, oField.InternalName);
}
}
}
}
However this returns all Lists and Document Libraries (and heaven knows what else). Is there an easy way to just get back the user defined lists? Here is an example of what I would like to get:
And looking at the documentation from Microsoft they seems to use the term list to refer to actual lists (tables) and document libraries (folders). What is the proper nomenclature for getting the list that is really just like an excel spreadsheet of data? Finally, is it possible for lists (tables) to be nested in side a Document Libraries? I can't seem to be able to do this, but I wanted to check since I am new to SharePoint.
Thanks!
So after having to lookup lots of examples (not from Microsoft, thank you) and stepping thru actual responses, here is the code for loading only the Lists and their field columns (not hidden) created by the user. I am sure that this could be optimized/cleaned up (for example not having to run the secondary queries to get List attributes, but it gave me access denied in original query), but it is working for me. Also needs some loving care for try-catches in case things go south.
First a couple of classes to hold the data:
public class SharePointColumn
{
public string Title { get; set; }
public string InternalName { get; set; }
public string TypeAsString { get; set; }
}
public class SharePointLibrary
{
public SharePointLibrary()
{
Columns = new List<SharePointColumn>();
}
public string Title { get; set; }
public Boolean IsList { get; set; } // If true a list, else DocumentLibrary
public List<SharePointColumn> Columns { get; set; }
}
Then the real code.
public static void LoadLists(Microsoft.SharePoint.Client.Web web, List<SharePointLibrary> sharePointLibraries)
{
var ctx = web.Context;
ListCollection collList = web.Lists;
IEnumerable<List> listInfo = ctx.LoadQuery(
collList.Include(
list => list.Title,
list => list.Fields.Include(
field => field.Title,
field => field.InternalName,
field => field.Hidden,
field => field.TypeAsString)));
ctx.ExecuteQuery();
foreach (List oList in listInfo)
{
// Had to add these because trying to add in above query failed
ctx.Load(oList);
ctx.ExecuteQuery();
// 544 Base Template is MicroFeed
if (oList.Hidden == false && oList.IsCatalog == false && (!oList.IsObjectPropertyInstantiated("IsSiteAssetsLibrary") || oList.IsSiteAssetsLibrary == false) &&
oList.BaseType != BaseType.DocumentLibrary && oList.BaseTemplate != 544)
{
FieldCollection collField = oList.Fields;
SharePointLibrary lib = new SharePointLibrary
{
Title = oList.Title,
IsList = true,
Columns = new List<SharePointColumn>()
};
foreach (Microsoft.SharePoint.Client.Field oField in collField)
{
if (!oField.Hidden)
{
SharePointColumn col = new SharePointColumn();
col.Title = oField.Title;
col.InternalName = oField.InternalName;
col.TypeAsString = oField.TypeAsString;
lib.Columns.Add(col);
}
}
sharePointLibraries.Add(lib);
}
}
}

RavenDB: Why do I get null-values for fields in this multi-map/reduce index?

Inspired by Ayende's article https://ayende.com/blog/89089/ravendb-multi-maps-reduce-indexes, I have the following index, that works as such:
public class Posts_WithViewCountByUser : AbstractMultiMapIndexCreationTask<Posts_WithViewCountByUser.Result>
{
public Posts_WithViewCountByUser()
{
AddMap<Post>(posts => from p in posts
select new
{
ViewedByUserId = (string) null,
ViewCount = 0,
Id = p.Id,
PostTitle = p.PostTitle,
});
AddMap<PostView>(postViews => from postView in postViews
select new
{
ViewedByUserId = postView.ViewedByUserId,
ViewCount = 1,
Id = (string) postView.PostId,
PostTitle = (string) null,
});
Reduce = results => from result in results
group result by new
{
result.Id,
result.ViewedByUserId
}
into g
select new Result
{
ViewCount = g.Sum(x => x.ViewCount),
Id = g.Key.Id,
ViewedByUserId = g.Key.ViewedByUserId,
PostTitle = g.Select(x => x.PostTitle).Where(x => x != null).FirstOrDefault(),
};
Store(x => x.PostTitle, FieldStorage.Yes);
}
public class Result
{
public string Id { get; set; }
public string ViewedByUserId { get; set; }
public int ViewCount { get; set; }
public string PostTitle { get; set; }
}
}
I want to query this index like this:
Return all posts including - for a given user - the integer of how many times, the user has viewed the post. The "views" are stored in a separate document type, PostView. Note, that my real document types have been renamed here to match the example from the article (I certainly would not implement "most-viewed" this way).
The result from the query I get is correct - i.e. I always get all the Post documents with the correct view-count for the user. But my problem is, the PostTitle field always is null in the result set (all Post documents have a non-null value in the dataset).
I'm grouping by the combination of userId and (post)Id as my "uniqueness". The way I understand it (and please correct me if I'm wrong), is, that at this point in the reduce, I have a bunch of pseudo-documents with identical userId /postId combination, some of which come from the Post map, others from the PostView map. Now I simply find any single pseudo-document of the ones, that actually have a value for PostTitle - i.e. one that originates from the Post map. These should all obviously have the same value, as it's the same post, just "outer-joined". The .Select(....).Where(....).FirstOrDefault() chain is taken from the very example I used as a base. I then set this ViewCount value for my final document, which I project into the Result.
My question is: how do I get the non-null value for the PostTitle field in the results?
The problem is that you have:
ViewedByUserId = (string) null,
And:
group result by new
{
result.Id,
result.ViewedByUserId
}
into g
In other words, you are actually grouping by null, which I'm assuming that isn't your intent.
It would be much simpler to have a map/reduce index just on PostView and get the PostTitle from an include or via a transformer.
You understanding of what is going on is correct, in the sense that you are creating index results with userId / postId on them.
Buit what you are actually doing is creating results from PostView with userId /postId and from Post with null /postId.
And that is why you don't have the matches that you want.
The grouping in the index is incorrect. With the following sample data:
new Post { Id = "Post-1", PostTitle = "Post Title", AuthorId = "Author-1" }
new PostView { ViewedByUserId = "User-1", PostId = "Post-1" }
new PostView { ViewedByUserId = "User-1", PostId = "Post-1" }
new PostView { ViewedByUserId = "User-2", PostId = "Post-1" }
The index results are like this:
ViewCount | Id | ViewedByUserId | PostTitle
--------- | ------ | -------------- | ----------
0 | Post-1 | null | Post Title
2 | Post-1 | User-1 | null
1 | Post-1 | User-2 | null
The map operation in the index simply creates a common document for all source documents. Thus, the Post-1 document produces one row, the two documents for Post-1 and User-1 produce two rows (which are later reduced to the single row with ViewCount == 2) and the document for Post-1 and User-2 produces the last row.
The reduce operation the groups all the mapped rows and produces the resulting documents in the index. In this case, the Post-sourced document is stored separately from the PostView-sourced documents because the null value in the ViewedByUserId is not grouped with any document from the PostView collection.
If you can change your way of storing data, you can solve this issue by storing the number of views directly in the PostView. It would greatly reduce duplicate data in your database while having almost the same cost when updating the view count.
Complete test (needs xunit and RavenDB.Tests.Helpers nugets):
using Raven.Abstractions.Indexing;
using Raven.Client;
using Raven.Client.Indexes;
using Raven.Tests.Helpers;
using System.Linq;
using Xunit;
namespace SO41559770Answer
{
public class SO41559770 : RavenTestBase
{
[Fact]
public void SO41559770Test()
{
using (var server = GetNewServer())
using (var store = NewRemoteDocumentStore(ravenDbServer: server))
{
new PostViewsIndex().Execute(store);
using (IDocumentSession session = store.OpenSession())
{
session.Store(new Post { Id = "Post-1", PostTitle = "Post Title", AuthorId = "Author-1" });
session.Store(new PostView { Id = "Views-1-1", ViewedByUserId = "User-1", PostId = "Post-1", ViewCount = 2 });
session.Store(new PostView { Id = "Views-1-2", ViewedByUserId = "User-2", PostId = "Post-1", ViewCount = 1 });
session.SaveChanges();
}
WaitForAllRequestsToComplete(server);
WaitForIndexing(store);
using (IDocumentSession session = store.OpenSession())
{
var resultsForId1 = session
.Query<PostViewsIndex.Result, PostViewsIndex>()
.ProjectFromIndexFieldsInto<PostViewsIndex.Result>()
.Where(x => x.PostId == "Post-1" && x.UserId == "User-1");
Assert.Equal(2, resultsForId1.First().ViewCount);
Assert.Equal("Post Title", resultsForId1.First().PostTitle);
var resultsForId2 = session
.Query<PostViewsIndex.Result, PostViewsIndex>()
.ProjectFromIndexFieldsInto<PostViewsIndex.Result>()
.Where(x => x.PostId == "Post-1" && x.UserId == "User-2");
Assert.Equal(1, resultsForId2.First().ViewCount);
Assert.Equal("Post Title", resultsForId2.First().PostTitle);
}
}
}
}
public class PostViewsIndex : AbstractIndexCreationTask<PostView, PostViewsIndex.Result>
{
public PostViewsIndex()
{
Map = postViews => from postView in postViews
let post = LoadDocument<Post>(postView.PostId)
select new
{
Id = postView.Id,
PostId = post.Id,
PostTitle = post.PostTitle,
UserId = postView.ViewedByUserId,
ViewCount = postView.ViewCount,
};
StoreAllFields(FieldStorage.Yes);
}
public class Result
{
public string Id { get; set; }
public string PostId { get; set; }
public string PostTitle { get; set; }
public string UserId { get; set; }
public int ViewCount { get; set; }
}
}
public class Post
{
public string Id { get; set; }
public string PostTitle { get; set; }
public string AuthorId { get; set; }
}
public class PostView
{
public string Id { get; set; }
public string ViewedByUserId { get; set; }
public string PostId { get; set; }
public int ViewCount { get; set; }
}
}

How to convert a dynamic list into list<Class>?

I'm trying to convert a dynamic list into a list of class-model(Products). This is how my method looks like:
public List<Products> ConvertToProducts(List<dynamic> data)
{
var sendModel = new List<Products>();
//Mapping List<dynamic> to List<Products>
sendModel = data.Select(x =>
new Products
{
Name = data.GetType().GetProperty("Name").ToString(),
Price = data.GetType().GetProperty("Price").GetValue(data, null).ToString()
}).ToList();
}
I have tried these both ways to get the property values, but it gives me null errors saying these properties doesn't exist or they are null.
Name = data.GetType().GetProperty("Name").ToString(),
Price = data.GetType().GetProperty("Price").GetValue(data,
null).ToString()
This is how my Model-class looks like:
public class Products
{
public string ID { get; set; }
public string Name { get; set; }
public string Price { get; set; }
}
Can someone please let me know what I'm missing? thanks in advance.
You're currently trying to get properties from data, which is your list - and you're ignoring x, which is the item in the list. I suspect you want:
var sendModel = data
.Select(x => new Products { Name = x.Name, Price = x.Price })
.ToList();
You may want to call ToString() on the results of the properties, but it's not clear what's in the original data.

Complex MapReduce Query with RavenDB [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
Hope you can help me !!
I am collecting tweets, which have a created_at date (DataPublicacao), and some Hashtags. Each tweet refers to a broadcaster (redeId), and a show (programaId).
I want to query the database for the 20 most used hashtags in a certain period.
I have to map each hashtag, when it was used, and to which broadcaster and tv show it refers to.
Then, I need to be able to count the occurrences of each hashtag in a certain period (I dont know how).
public class Tweet : IModelo
{
public string Id { get; set; }
public string RedeId { get; set; }
public string ProgramaId { get; set; }
public DateTime DataPublicacao { get; set; }
public string Conteudo { get; set; }
public string Aplicacao { get; set; }
public Autor Autor { get; set; }
public Twitter.Monitor.Dominio.Modelo.TweetJson.Geo LocalizacaoGeo { get; set; }
public Twitter.Monitor.Dominio.Modelo.TweetJson.Place Localizacao { get; set; }
public Twitter.Monitor.Dominio.Modelo.TweetJson.Entities Entidades { get; set; }
public string Imagem { get; set; }
public Autor Para_Usuario { get; set; }
public string Retweet_Para_Status_Id { get; set; }
}
And the "entities" are hashtags, usermentions, and urls.
I tried to group the hashtags by broadcaster, tv show, and text, and listing the dates of the occurrences. Then, I have to transform the results, so I can count the occurrences on that period.
public class EntityResult
{
public string hashtagText { get; set; }
public string progId { get; set; }
public string redeId { get; set; }
public int listCount { get; set; }
}
public class HashtagsIndex : AbstractIndexCreationTask<Tweet, HashtagsIndex.ReduceResults>
{
public class ReduceResults
{
public string hashtagText { get; set; }
public DateTime createdAt { get; set; }
public string progId { get; set; }
public string redeId { get; set; }
public List<DateTime> datesList { get; set; }
}
public HashtagsIndex()
{
Map = tweets => from tweet in tweets
from hts in tweet.Entidades.hashtags
where tweet.Entidades != null
select new
{
createdAt = tweet.DataPublicacao,
progId = tweet.ProgramaId,
redeId = tweet.RedeId,
hashtagText = hts.text,
datesList = new List<DateTime>(new DateTime[] { tweet.DataPublicacao })
};
Reduce = results => from result in results
group result by new { result.progId, result.redeId, result.hashtagText }
into g
select new
{
createdAt = DateTime.MinValue,
progId = g.Key.progId,
redeId = g.Key.redeId,
hashtagText = g.Key.hashtagText,
datesList = g.ToList().Select(t => t.createdAt).ToList()
};
}
}
And the query I made so far is:
var hashtags2 = session.Query<dynamic, HashtagsIndex>().Customize(t => t.TransformResults((query, results) =>
results.Cast<dynamic>().Select(g =>
{
Expression<Func<DateTime, bool>> exp = o => o >= dtInit && o <= dtEnd;
int count = g.Where(exp);
return new EntityResult
{
redeId = g.redeId,
progId = g.progId,
hashtagText = g.hashtagText,
listCount = count
};
}))).Take(20).ToList();
Now I need to OrderByDescending(t=>t.count), so I cant Take(20) most used hashtags on that period.
How do I do that?
Is it possible to filter items before the mapreduce process?
A map/reduce index is just like any other index. All documents are processed through all indexes, always. So when phrased with "before" like you asked, the answer is clearly "no".
But I think you are just interested in filtering items during the indexing, and that is easily done in the map:
Map = items => from item in items
where item.foo == whatever // this is how you filter
select new
{
// whatever you want to map
}
This index will process all documents, but the resulting index will only contain items that match the filter you specified in the where clause.
Is it possible to subsequently group by features, like users by age, and then by region
Grouping is done in the reduce step. That is what map/reduce is all about.
My advice to you (and I mean no disrespect by this), is to walk before you try to run. Build a simple prototype or set of unit tests, and try first just basic storage and retrieval. Then try basic indexing and querying. Then try a simple map reduce, such as counting all your tweets. Only then should you attempt an advance map/reduce with other groupings. And if you run into trouble, then you will have code you can post here for help.
Is it possible?
Of course. Anything is possible. :)

Subsonic:SimpleRepository Parent Child relationship

I am trying to use the SimpleRepository feature in Subsonic3 - first of all, I must say a big thanks to RobC - Subsonic really rocks, and I can't wait to see additional updates to the SimpleRepository. I am a big fan of the migration approach (developer/class driven rather than starting with the DB).
I have had a look at the post here:
Parent and Child object in SimpleRepository
but I am still a bit confused.
If I have got these classes defined:
public class Permit {
public int PermitID {get; set;}
public string Number { get; set; }
public DateTime? DateIssued { get; set; }
public Product product { get; set; }
}
public class Product
{
public int ProductID { get; set; }
public string Value { get; set; }
}
and then I want to save the data for a Permit, what should I be doing? Should I have defined a ProductID in the Permit class, and then programatically link them up? Or should the below code work?
var repo = new SimpleRepository("ECPermit", SimpleRepositoryOptions.RunMigrations);
var permit = new Permit();
var product = new Product();
permit.Number = "apermit";
permit.DateAdded = DateTime.Now;
product.Value = "this is a product";
repo.Add(permit);
permit.product = product;
repo.Add(product);
This is creating the Permit and Product table, but no links between them. What am I doing wrong?
Thanks
What you need to be aware of here is that the relationships are created by populating foreign keyvalues. So what you're doing in your example is creating a permit and saving it then setting the ProductID of the permit (but not saving this information), then saving the product. If you reorder your code as follows the ProductID will be set correctly:
var repo = new SimpleRepository("ECPermit", SimpleRepositoryOptions.RunMigrations);
var permit = new Permit();
var product = new Product();
product.Value = "this is a product";
repo.Add(product);
permit.Number = "apermit";
permit.DateAdded = DateTime.Now;
permit.ProductId = product.Id;
repo.Add(permit);