creating view with couchdb that does grouping and unique counting - mapreduce

hi I have documents like so
{
domains: "domain1.com",
ip: "192.168.0.1"
}
documents may have different or duplicate domains/ips
I want a view that give me a list of
domain1 => unique ip count for that domain
domain2 => unique ip count for that domain
etc..
I know how to get a:
domain => ip count with this map/reduce:
"map": "function(doc) { emit(doc.domains, 1) }",<br/>
"reduce": "_sum"
and a group=true parameter
But I can't figure out how to to get a:
domain => unique ip count style list
cheers for any assistance, sorry for my english

Write a view with only a map function and no reduce function
function(doc) {
if (doc.domains) emit(doc.domains, doc.ip);
}
Then create a list function that counts the unique entries.
function(head, req) {
var ips = new Array();
while (row = getRow()) {
if (ips.indexOf(row) != -1) {
ips.push(row.value);
}
}
send(ips.length);
}
Warning: code not tested, might contain bugs.
Finally you call the list function on the map view with key set to the domain you want. Note that this solution won't perform very well if you have a large number of IPs per domain.

As Kim said, it's nearly impossible (or maybe with a very tricky reduce function) to do the whole thing with CouchdDB's Map/Reduce.
However, you could do at least the deduplication part with Map/Reduce in order to get better performance than with Kim's solution.
So, first use a map to index (domain, ip) pairs (values are not important):
function(o) {
emit([o.domain, o.ip], null);
}
Then reduce them with a builtin function:
_count
Now, use a list to count unique ips:
function(head, req) {
var domains = {};
while (row = getRow()) {
var d = row.key[0];
if (d in domains) {
domains[d]++;
} else {
domains[d] = 1;
}
}
send(JSON.stringify(domains));
}
When you call it, query it with group=true.
Note: I haven't tested the code of the list so you might have to slightly adapt it.

Related

How do you specify multiple Sort fields with Solrj?

I have an application using solr that needs to be able to sort on two fields. The Solrj api is a little confusing, providing multiple different APIs.
I am using Solr 4.10.4
I have tried:
for (int i = 0; i < entry.getValue().size();) {
logger.debug("Solr({}) {}: {} {}", epName, entry.getKey(), entry
.getValue().get(i), entry.getValue().get(i + 1));
if (i == 0) {
query.setSort(new SolrQuery.SortClause(entry.getValue().get(i++), SolrQuery.ORDER.valueOf(entry.getValue().get(i++))));
} else {
query.addSort(new SolrQuery.SortClause(entry.getValue().get(i++), SolrQuery.ORDER.valueOf(entry.getValue().get(i++))));
}
}
When I look at the generated URL I only see the last SortClause sort=sequence+asc
I also tried creating a List and the setSorts SolrQuery method and that too seems to output only as single sort field, always the last one.
I was able to create the correct sort clause by generating it manually with strings.
I have tried addOrUpdateSort as well. I think I've tried most of the obvious combinations. of methods in the Solrj API.
This does work:
StringBuilder sortString = new StringBuilder();
for (int i = 0; i < entry.getValue().size();) {
if (sortString.length() > 0) {
sortString.append(",");
}
logger.debug("Solr({}) {}: {} {}", epName, entry.getKey(), entry
.getValue().get(i), entry.getValue().get(i + 1));
sortString.append(entry.getValue().get(i++)).append(" ").
append(SolrQuery.ORDER.valueOf(entry.getValue().get(i++)));
}
query.set("sort",sortString.toString());
The sort clause I want to see is: sort=is_cited+asc,sequence+asc
The solrj API seems to only output the final clause.
I suspect a bug in solrj 4.10
can you substitute setSort with addSort ie
for (int i = 0; i < entry.getValue().size();) {
logger.debug("Solr({}) {}: {} {}", epName, entry.getKey(), entry
.getValue().get(i), entry.getValue().get(i + 1));
if (i == 0) {
query.addSort(new SolrQuery.SortClause(entry.getValue().get(i++), SolrQuery.ORDER.valueOf(entry.getValue().get(i++))));
} else {
query.addSort(new SolrQuery.SortClause(entry.getValue().get(i++), SolrQuery.ORDER.valueOf(entry.getValue().get(i++))));
}
}
And let me know if this worked
Check out addOrUpdateSort()
Updates or adds a single sort field specification to the current sort
information. If the sort field already exist in the sort information map,
its position is unchanged and the sort order is set; if it does not exist,
it is appended at the end with the specified order..
#return the modified SolrQuery object, for easy chaining
#since 4.2

How to return an JSON object I made in a reduce function

I need your help about CouchDB reduce function.
I have some docs like:
{'about':'1', 'foo':'a1','bar':'qwe'}
{'about':'1', 'foo':'a1','bar':'rty'}
{'about':'1', 'foo':'a2','bar':'uio'}
{'about':'1', 'foo':'a1','bar':'iop'}
{'about':'2', 'foo':'b1','bar':'qsd'}
{'about':'2', 'foo':'b1','bar':'fgh'}
{'about':'3', 'foo':'c1','bar':'wxc'}
{'about':'3', 'foo':'c2','bar':'vbn'}
As you can seen they all have the same key, just the values are differents.
My purpse is to use a Map/Reduce and my return expectation would be:
'rows':[ 'keys':'1','value':{'1':{'foo':'a1', 'at':'rty'},
'2':{'foo':'a2', 'at':'uio'},
'3':{'foo':'a1', 'at':'iop'}}
'keys':'1','value':{'foo':'a1', 'bar','rty'}
...
'keys':'3','value':{'foo':'c2', 'bar',vbn'}
]
Here is the result of my Map function:
'rows':[ 'keys':'1','value':{'foo':'a1', 'bar','qwe'}
'keys':'1','value':{'foo':'a1', 'bar','rty'}
...
'keys':'3','value':{'foo':'c2', 'bar',vbn'}
]
But my Reduce function isn't working:
function(keys,values,rereduce){
var res= {};
var lastCheck = values[0];
for(i=0; i<values.length;++i)
{
value = values[i];
if (lastCheck.foo != value.foo)
{
res.append({'change':[i:lastCheck]});
}
lastCheck = value;
}
return res;
}
Is it possible to have what I expect or I need to use an other way ?
You should not do this in the reduce function. As the couchdb wiki explains:-
If you are building a composite return structure in your reduce, or only transforming the values field, rather than summarizing it, you might be misusing this feature.
There are two approaches that you can take instead
Transform the results at your application layer.
Use the list function.
Lists functions are simple. I will try to explain them here:
Lists like views are saved in design documents under the key lists. Like so:
"lists":{
"formatResults" : "function(head,req) {....}"
}
To call the list function you use a url like this
http://localhost:5984/your-database/_design/your-designdoc/_list/your-list-function/your-view-name
Here is an example of list function
function(head, req) {
var row = getRow();
if (!row){
return 'no ingredients'
}
var jsonOb = {};
while(row=getRow()){
//construct the json object here
}
return {"body":jsonOb,"headers":{"Content-Type" : "application/json"}};
}
The getRow function is of interest to us. It contains the result of the view. So we can query it like
row.key for key
row.value for value
All you have to do now is construct the json like you want and then send it.
By the way you can use log
to debug your functions.
I hope this helps a little.
Apparently now you need to use
provides('json', function() { ... });
As in:
Simplify Couchdb JSON response

Map/reduce to get the count and latest date for each document grouped by key

A simple version of my document document is the following structure:
doc:
{
"date": "2014-04-16T17:13:00",
"key": "de5cefc56ff51c33351459b88d42ca9f828445c0",
}
I would like to group my document by key, to get the latest date and the number of documents for each key, something like
{ "Last": "2014-04-16T16:00:00", "Count": 10 }
My idea is to to do a map/reduce view and query setting group to true.
This is what I have so far tried. I get the exact count, but not the correct dates.
map
function (doc, meta) {
if(doc.type =="doc")
emit(doc.key, doc.date);
}
reduce
function(key, values, rereduce) {
var result = {
Last: 0,
Count: 0
};
if (rereduce) {
for (var i = 0; i < values.length; i++) {
result.Count += values[i].Count;
result.Last = values[i].Last;
}
} else {
result.Count = values.length;
result.Last = values[0]
}
return result;
}
You're not comparing dates... Couchbase sorts values by key. In your situation it will not sort it by date, so you should do it manually in your reduce function. Probably it will look like:
result.Last = values[i].Last > result.Last ? values[i].Last : result.Last;
and in reduce function it also can be an array, so I don't think that your reduce function always be correct.
Here is an example of my reduce function that filter documents and leave just one that have the newest date. May be it will be helpful or you can try to use it (seems it looks like reduce function that you want, you just need to add count somewhere).
function(k,v,r){
if (r){
if (v.length > 1){
var m = v[0].Date;
var mid = 0;
for (var i=1;i<v.length;i++){
if (v[i].Date > m){
m = v[i].Date;
mid = i;
}
}
return v[mid];
}
else {
return v[0] || v;
}
}
if (v.length > 1){
var m = v[0].Date;
var mid = 0;
for (var i=1;i<v.length;i++){
if (v[i].Date > m){
m = v[i].Date;
mid = i;
}
}
return v[mid];
}
else {
return v[0] || v;
}
}
UPD: Here is an example of what that reduce do:
Input date (values) for that function will look like (I've used just numbers instead of text date to make it shorter):
[{Date:1},{Date:3},{Date:8},{Date:2},{Date:4},{Date:7},{Date:5}]
On the first step rereduce will be false, so we need to find the biggest date in array, and it will return
Object {Date: 8}.
Note, that this function can be called one time, but it can be called on several servers in cluster or on several branches of b-tree inside one couchbase instance.
Then on next step (if there were several machines in cluster or "branches") rereduce will be called and rereduce var will be set to true
Incoming data will be:
[{Date:8},{Date:10},{Date:3}], where {Date:8} came from reduce from one server(or branch), and other dates came from another server(or branch).
So we need to do exactly the same on that new values to find the biggest one.
Answering your question from comments: I don't remember why I used same code for reduce and rereduce, because it was long time ago (when couchbase 2.0 was in dev preview). May be couchbase had some bugs or I just tried to understand how rereduce works. But I remember that without that if (r) {..} it not worked at that time.
You can try to place return v; code in different parts of my or your reduce function to see what it returns on each reduce phase. It's better to try once by yourself to understand what actually happens there.
I forget to mention that I have many documents for the same key. In fact for each key I can have many documents( message here):
{
"date": "2014-04-16T17:13:00",
"key": "de5cefc56ff51c33351459b88d42ca9f828445c0",
"message": "message1",
}
{
"date": "2014-04-16T15:22:00",
"key": "de5cefc56ff51c33351459b88d42ca9f828445c0",
"message": "message2",
}
Another way to deal with the problem is to do it in the map function:
function (doc, meta) {
var count = 0;
var last =''
if(doc.type =="doc"){
for (k in doc.message){
count += 1;
last = doc.date> last?doc.date:last;
}
emit(doc.key,{'Count':count,'Last': last});
}
}
I found this simpler and it do the job in my case.

How do I query multiple IDs via the ContentSearchManager?

When I have an array of Sitecore IDs, for example TargetIDs from a MultilistField, how can I query the ContentSearchManager to return all the SearchResultItem objects?
I have tried the following which gives an "Only constant arguments is supported." error.
using (var s = Sitecore.ContentSearch.ContentSearchManager.GetIndex("sitecore_master_index").CreateSearchContext())
{
rpt.DataSource = s.GetQueryable<SearchResultItem>().Where(x => f.TargetIDs.Contains(x.ItemId));
rpt.DataBind();
}
I suppose I could build up the Linq query manually with multiple OR queries. Is there a way I can use Sitecore.ContentSearch.Utilities.LinqHelper to build the query for me?
Assuming I got this technique to work, is it worth using it for only, say, 10 items? I'm just starting my first Sitecore 7 project and I have it in mind that I want to use the index as much as possible.
Finally, does the Page Editor support editing fields somehow with a SearchResultItem as the source?
Update 1
I wrote this function which utilises the predicate builder as dunston suggests. I don't know yet if this is actually worth using (instead of Items).
public static List<T> GetSearchResultItemsByIDs<T>(ID[] ids, bool mustHaveUrl = true)
where T : Sitecore.ContentSearch.SearchTypes.SearchResultItem, new()
{
Assert.IsNotNull(ids, "ids");
if (!ids.Any())
{
return new List<T>();
}
using (var s = Sitecore.ContentSearch.ContentSearchManager.GetIndex("sitecore_master_index").CreateSearchContext())
{
var predicate = PredicateBuilder.True<T>();
predicate = ids.Aggregate(predicate, (current, id) => current.Or(p => p.ItemId == id));
var results = s.GetQueryable<T>().Where(predicate).ToDictionary(x => x.ItemId);
var query = from id in ids
let item = results.ContainsKey(id) ? results[id] : null
where item != null && (!mustHaveUrl || item.Url != null)
select item;
return query.ToList();
}
}
It forces the results to be in the same order as supplied in the IDs array, which in my case is important. (If anybody knows a better way of doing this, would love to know).
It also, by default, ensures that the Item has a URL.
My main code then becomes:
var f = (Sitecore.Data.Fields.MultilistField) rootItem.Fields["Main navigation links"];
rpt.DataSource = ContentSearchHelper.GetSearchResultItemsByIDs<SearchResultItem>(f.TargetIDs);
rpt.DataBind();
I'm still curious how the Page Editor copes with SearchResultItem or POCOs in general (my second question), am going to continue researching that now.
Thanks for reading,
Steve
You need to use the predicate builder to create multiple OR queries, or AND queries.
The code below should work.
using (var s = Sitecore.ContentSearch.ContentSearchManager.GetIndex("sitecore_master_index").CreateSearchContext())
{
var predicate = PredicateBuilder.True<SearchResultItem>();
foreach (var targetId in f.Targetids)
{
var tempTargetId = targetId;
predicate = predicate.Or(x => x.ItemId == tempTargetId)
}
rpt.DataSource = s.GetQueryable<SearchResultItem>().Where(predicate);
rpt.DataBind();
}

ASP.NET MVC Custom route regex to catch a substring of items and check for their existence

I'm trying to create a custom route for URL with the following format:
http://domain/nodes/{item_1}/{item_2}/{item3_}/..../{item_[n]}
Basically, there could be a random amount of item_[n], for example
http://domain/nodes/1/3/2
http://domain/nodes/1
http://domain/nodes/1/25/11/45
With my custom route I would like to retrieve an array of items and do some logic (validate and add some specific information to request context) with them.
For example from [http://domain/nodes/1/25/11/45] I would like to get an array of [1, 25, 11, 45] and process it.
So, I have 2 problems here.
The first one is a question actually. Am I looking in the right direction? Or there could be an easier way to accomplish this (maybe without custom routes)?
The second problem is matching incoming url with a regex pattern. Could someone help me with it?
Thanks in advance :)
To solve your problem I think that a way could be to create a routing class and then handle the params accordinlgy.
public class CustomRouting : RouteBase
{
public override RouteData GetRouteData(HttpContextBase httpContext)
{
RouteData result = null;
var repository = new FakeRouteDB(); //Use you preferred DI injector
string requestUrl = httpContext.Request.AppRelativeCurrentExecutionFilePath;
string[] sections = requestUrl.Split('/');
/*
from here you work on the array you just created
you can check every single part
*/
if (sections.Count() == 2 && sections[1] == "")
return null; // ~/
if (sections.Count() > 2) //2 is just an example
{
result = new RouteData(this, new MvcRouteHandler());
result.Values.Add("controller", "Products");
result.Values.Add("action", "Edit");
result.Values.Add("itmes0", sections[1]);
if (sections.Count() >= 3)
result.Values.Add("item2", sections[2]);
//....
}
else
{
//I can prepare a default route
result = new RouteData(this, new MvcRouteHandler());
result.Values.Add("controller", "Home");
result.Values.Add("action", "Index");
}
return result;
}
public override VirtualPathData GetVirtualPath(RequestContext requestContext, RouteValueDictionary values)
{
//I just work with outbound so it's ok here to do nothing
return null;
}
}
In the global.asax
public static void RegisterRoutes(RouteCollection routes)
{
routes.IgnoreRoute("{resource}.axd/{*pathInfo}");
routes.Add(new CustomRouting());
routes.MapRoute("Default", "{controller}/{action}/{id}", new { controller = "Home", action = "Index", id = UrlParameter.Optional });
}
This should give you an idea of what I think. Hope it helps
I can't help you with the first part of your question, but I can have a go at creating the regex.
In your example all the items are digits - is that the only option ? If not, please provide more info on possible characters.
For now the regex would be:
#"http://domain/nodes(?:/(\d+))*"
(?:) is a non capturing group, () is a capturing group.
If you match all occurences, then you'll end up with groups 1-n, where each group will contain the matched number (group number 0 will be the whole match).