Filter data being passed into a crossfilter group without using dimensional filters - grouping

I'm trying to figure out how to add a filter onto a crossfilter group that is not related to a dimensional filter. Let's look at an example:
var livingThings = crossfilter({
// Fact data.
{ name: “Rusty”, type: “human”, legs: 2 },
{ name: “Alex”, type: “human”, legs: 2 },
{ name: “Lassie”, type: “dog”, legs: 4 },
{ name: “Spot”, type: “dog”, legs: 4 },
{ name: “Polly”, type: “bird”, legs: 2 },
{ name: “Fiona”, type: “plant”, legs: 0 }
}); //taken from http://blog.rusty.io/2012/09/17/crossfilter-tutorial/
if we were to make a dimension on type and a group of that dimension:
var typeDim = livingThings.dimension(function(d){return d.type});
var typeGroup = typeDim.group();
we would expect typeGroup.top(Infinity) to output
{{human:2},
{dog:2},
{bird:1},
{plant:1}}
My question is how can we filter the data such that they include only 4 legged creatures in this grouping? I also don't want to use dimension.filter... because i don't want this filter to be global, just for this one grouping. In other words
var filterDim = livingThings.dimension(function(d){return d.legs}).filterExact(4);
is not allowed.
I'm thinking of something similar to what I did to post-filter dimensions as in https://stackoverflow.com/a/30467216/4624663
basically I want to go into the internals of the typeDim dimension, and filter the data before it is passed into the groups. Creating a fake group that calls typeDim.group().top() will most likely not work as the individual livingThings records are already grouped by that point. I know this is tricky: thanks for any help.
V

Probably best to use the reduceSum functionality to create a pseudo-count group that only counts records with 4 or more legs:
var livingThings = crossfilter({
// Fact data.
{ name: “Rusty”, type: “human”, legs: 2 },
{ name: “Alex”, type: “human”, legs: 2 },
{ name: “Lassie”, type: “dog”, legs: 4 },
{ name: “Spot”, type: “dog”, legs: 4 },
{ name: “Polly”, type: “bird”, legs: 2 },
{ name: “Fiona”, type: “plant”, legs: 0 }
}); //taken from http://blog.rusty.io/2012/09/17/crossfilter-tutorial/
var typeDim = livingThings.dimension(function(d){return d.type});
var typeGroup = typeDim.group().reduceSum(function(d) {
return d.legs === 4 ? 1 : 0;
});
That will sum across a calculated value that will be 1 for records with 4 legs and 0 for records with ... not 4 legs. In other words, it should just count 4-legged creatures.

I think, this is what you are looking for. Comment back if I'm wrong.
var dimByLegs = livingThings.dimension(function(d){return d.legs});
dimByLegs.filterExact(4);
var dogs = dimByLegs.group();
dimByLegs.top(Infinity).forEach(function(d){console.log(d.type, d.legs);});
dimByLegs.dispose();

Related

cube.js playground not plotting data correctly

I am using cube.js to compare the change in data over the time by plotting it as a line graph .
Step 1 :
After generating cube.js schema successfully , data looks like this:
Step 2 :
Now, while I am trying to check the line graph, it's showing the line as below . No line is formatted. Unfortunately, it's not working for the bar graph also .
Moreover, in SQL the data type for the value is : float(10,10) and timestamp
Apart from that, cube.js console has not error trace , rather its working fine :
Performing query: scheduler-0070c129-f83a-45db-ae09-aac6f9858200
Executing SQL: scheduler-0070c129-f83a-45db-ae09-aac6f9858200
--
SELECT FLOOR((UNIX_TIMESTAMP()) / 10) as refresh_key
Moreover , I tried as below : [all time ,w/o grouping and pivot settings as I need ] , yet no luck ,
However, If I add measure count , the count is plotting the lie not the expected y-axis data as I configured in pivot settings.
My question is : what's going wrong ?
My goal was to generate a line graph for the change of a numerical value over time:
x-axis: date/time.
y-axis: my numerical value.
Cube.js Generated the following schema for my data.
The problem with this schema was that String Type was assigned to the age dimension(clearly should be a Number). Moreover ,there are no measures for filed age ,which I am trying to plot.
cube(`ConceptDrifts`, {
sql: `SELECT * FROM cube.concept_drifts`,
preAggregations: {
},
joins: {
},
measures: {
count: {
type: `count`,
drillMembers: [date]
},
testCount: {
sql: `test_count`,
type: `sum`
}
},
dimensions: {
age: {
sql: `age`,
type: `string`
},
maxAge: {
sql: `max_age`,
type: `string`
},
sex: {
sql: `sex`,
type: `string`
},
sexSd: {
sql: `sex_sd`,
type: `string`
},
date: {
sql: `date`,
type: `time`
}
},
dataSource: `default`
});
Therefore, I changed the schema at /cube/conf/schema# manually
Added new measures a:
ag :{
type : `number`,
sql : `age`,
drillMembers : [age]
}
And, changed the type (as number ) in dimensions :
dimensions: {
age: {
sql: `age`,
type: `number`
},
maxAge: {
sql: `max_age`,
type: `number`
},
sex: {
sql: `sex`,
type: `number`
},
sexSd: {
sql: `sex_sd`,
type: `number`
},
date: {
sql: `date`,
type: `time`
}
},
dataSource: `default`
});
As a result, the graph looks like below :
More reference :
Data Schema Concepts
Drilldowns

How to use if condition in Karate

Suppose I have the following Json response
[
{
id: 1,
name: "John",
password: "JohnsPassword54",
},
{
id: 2,
name: "David",
password: "DavidsPassword24",
}
]
Then how can I extract the array with name David to do further validation?
e.g. I want to say if name == David then save the id
Well done :) Mastering Json-Path is key to get the most out of Karate !
Just for the sake of demo, here is another option, using the get keyword to get the first element out of the array returned, as Json-Path wildcard searches always return an array:
* def response =
"""
[
{
id: 1,
name: "John",
password: "JohnsPassword54"
},
{
id: 2,
name: "David",
password: "DavidsPassword24"
}
]
"""
* def userId = get[0] response $[?(#.name == 'David')].id
* match userId == 2
I found the solution in the Json expression evaluation -
def user = $..[?(#.name == 'David')]
Then I can use the following -
def userId = user[0].id

"Total rows" in custom Power BI visualizations

I have a question about creating the custom visualization in Power BI.
I want to implement a "total row" functionality which is available in the built-in matrix visualization. The main concept is to automatically sum-up every value and group it by the rows. This is how it's looks like on the matrix visualization:
But, to be honest, I don't know how to achieve this. I try different things but I can't receive this grouped values in the dataViews.
I tried to analyze the built-in matrix.ts code but it's quite different that the custom visualizations code. I found the customizeQuery method which set the subtotalType property to the rows and columns - I tried to add this in my code but I don't see any difference in the dataViews (I don't found the grouped value).
Currently my capabilities.dataViewMappings is set like this:
dataViewMappings: [
{
conditions: [
{ 'Rows': { max: 3 } }
],
matrix: {
rows: {
for: { in: 'Rows' },
},
values: {
for: { in: 'Values' }
},
},
}
]
Does anyone know how we could achieve this "total row" functionality?
UPDATE 1
I already found the solution: when we implement the customizeQuery method (in the same way as the customizeQuery method in the matrix.ts code), and then add the reference to it in the powerbi.visuals.plugins.[visualisationName+visualisationAddDateEpoch].customizeQuery then it works as expected (I receive in the dataViews[0].matrix.row.root children elements that has the total values from row).
The only problem now is that I don't know exactly how to add correctly this reference to the customizeQuery method. For example the [visualisationName+visualisationAddDateEpoch] is Custom1451458639997, and I don't know what those number will be (I know only the name). I created the code in my visualisation constructor as below (and it's working):
constructor() {
var targetCustomizeQuery = this.constructor.customizeQuery;
var name = this.constructor.name;
for(pluginName in powerbi.visuals.plugins) {
var patt = new RegExp(name + "[0-9]{13}");
if(patt.test(pluginName)) {
powerbi.visuals.plugins[pluginName].customizeQuery = targetCustomizeQuery;
break;
}
}
}
But in my opinion this code is very dirty and inelegant. I want to improve it - what is the correct way to tell the Power BI that we implement the custom customizeQuery method and it should use it?
UPDATE 2
Code from update 1 works only with the Power BI in the web browser (web based). On the Power BI Desktop the customizeQuery method isn't invoked. What is the correct way to tell the Power BI to use our custom customizeQuery method? In the code from PowerBI-visuals repository using PowerBIVisualPlayground we could declare it in the plugin.ts file (in the same way like the matrix visual is done):
export let matrix: IVisualPlugin = {
name: 'matrix',
watermarkKey: 'matrix',
capabilities: capabilities.matrix,
create: () => new Matrix(),
customizeQuery: Matrix.customizeQuery,
getSortableRoles: (visualSortableOptions?: VisualSortableOptions) => Matrix.getSortableRoles(),
};
But, in my opinion, from the Power BI Dev Tools we don't have access to add additional things to this part of code. Any ideas?
It seems you're missing the columns mapping in your capabilities. Take a look at the matrix capabilities (also copied for reference below) and as a first step adopt that structure initially. The matrix calculates the intersection of rows and columns so without the columns in capabilities its doubtful you'll get what you want.
Secondly, in the matrix dataview passed to Update you'll get a 'DataViewMatrixNode' with isSubtotal: true Take a look at the unit tests for matrix to see the structure.
dataViewMappings: [{
conditions: [
{ 'Rows': { max: 0 }, 'Columns': { max: 0 }, 'Values': { min: 1 } },
{ 'Rows': { min: 1 }, 'Columns': { min: 0 }, 'Values': { min: 0 } },
{ 'Rows': { min: 0 }, 'Columns': { min: 1 }, 'Values': { min: 0 } }
],
matrix: {
rows: {
for: { in: 'Rows' },
/* Explicitly override the server data reduction to make it appropriate for matrix. */
dataReductionAlgorithm: { window: { count: 500 } }
},
columns: {
for: { in: 'Columns' },
/* Explicitly override the server data reduction to make it appropriate for matrix. */
dataReductionAlgorithm: { top: { count: 100 } }
},
values: {
for: { in: 'Values' }
}
}
}],

Search for Substring in several fields with MongoDB and Mongoose

I am so sorry, but after one day researching and trying all different combinations and npm packages, I am still not sure how to deal with the following task.
Setup:
MongoDB 2.6
Node.JS with Mongoose 4
I have a schema like so:
var trackingSchema = mongoose.Schema({
tracking_number: String,
zip_code: String,
courier: String,
user_id: Number,
created: { type: Date, default: Date.now },
international_shipment: { type: Boolean, default: false },
delivery_info: {
recipient: String,
street: String,
city: String
}
});
Now user gives me a search string, a rather an array of strings, which will be substrings of what I want to search:
var search = ['15323', 'julian', 'administ'];
Now I want to find those documents, where any of the fields tracking_number, zip_code, or these fields in delivery_info contain my search elements.
How should I do that? I get that there are indexes, but I probably need a compound index, or maybe a text index? And for search, I then can use RegEx, or the $text $search syntax?
The problem is that I have several strings to look for (my search), and several fields to look in. And due to one of those aspects, every approach failed for me at some point.
Your use case is a good fit for text search.
Define a text index on your schema over the searchable fields:
trackingSchema.index({
tracking_number: 'text',
zip_code: 'text',
'delivery_info.recipient': 'text',
'delivery_info.street': 'text',
'delivery_info.city': 'text'
}, {name: 'search'});
Join your search terms into a single string and execute the search using the $text query operator:
var search = ['15232', 'julian'];
Test.find({$text: {$search: search.join(' ')}}, function(err, docs) {...});
Even though this passes all your search values as a single string, this still performs a logical OR search of the values.
Why just dont try
var trackingSchema = mongoose.Schema({
tracking_number: String,
zip_code: String,
courier: String,
user_id: Number,
created: { type: Date, default: Date.now },
international_shipment: { type: Boolean, default: false },
delivery_info: {
recipient: String,
street: String,
city: String
}
});
var Tracking = mongoose.model('Tracking', trackingSchema );
var search = [ "word1", "word2", ...]
var results = []
for(var i=0; i<search.length; i++){
Tracking.find({$or : [
{ tracking_number : search[i]},
{zip_code: search[i]},
{courier: search[i]},
{delivery_info.recipient: search[i]},
{delivery_info.street: search[i]},
{delivery_info.city: search[i]}]
}).map(function(tracking){
//it will push every unique result to variable results
if(results.indexOf(tracking)<0) results.push(tracking);
});
Okay, I came up with this.
My schema now has an extra field search with an array of all my searchable fields:
var trackingSchema = mongoose.Schema({
...
search: [String]
});
With a pre-save hook, I populate this field:
trackingSchema.pre('save', function(next) {
this.search = [ this.tracking_number ];
var searchIfAvailable = [
this.zip_code,
this.delivery_info.recipient,
this.delivery_info.street,
this.delivery_info.city
];
for (var i = 0; i < searchIfAvailable.length; i++) {
if (!validator.isNull(searchIfAvailable[i])) {
this.search.push(searchIfAvailable[i].toLowerCase());
}
}
next();
});
In the hope of improving performance, I also index that field (also the user_id as I limit search results by that):
trackingSchema.index({ search: 1 });
trackingSchema.index({ user_id: 1 });
Now, when searching I first list all substrings I want to look for in an array:
var andArray = [];
var searchTerms = searchRequest.split(" ");
searchTerms.forEach(function(searchTerm) {
andArray.push({
search: { $regex: searchTerm, $options: 'i'
}
});
});
I use this array in my find() and chain it with an $and:
Tracking.
find({ $and: andArray }).
where('user_id').equals(userId).
limit(pageSize).
skip(pageSize * page).
exec(function(err, docs) {
// hooray!
});
This works.

Couchbase View _count Reduce For Given Keys

I am trying to write a view in Couchbase using a reduce such as _count which will give me a count of the products at an address.
I have some documents in the database in the following format;
Document 1
{
id: 1,
address: {
street: 'W Churchill St'
city: 'Chicago',
state: 'IL',
},
product: 'Cable'
}
Document 2
{
id: 2,
address: {
street: 'W Churchill St'
city: 'Chicago',
state: 'IL',
},
product: 'Cable'
}
Document 3
{
id: 3,
address: {
street: 'W Churchill St'
city: 'Chicago',
state: 'IL',
},
product: 'Satellite'
}
Document 4
{
id: 4,
address: {
street: 'E Foster Rd'
city: 'New York',
state: 'NY',
},
product: 'Free To Air'
}
I already have a view which gives me all the products at an address which uses a composite key such as;
emit([doc.address.street, doc.address.city, doc.address.state], null)
Now this leads me on to the actual problem, I want to be able to get a count of products at a address or addresses.
I want to be able to see for an array of "keys"
['W Churchill St','Chicago','IL']
['E Foster Rd','New York','NY']
which products and a count of them. So i would expect to see in my results.
'Cable' : 2,
'Satellite': 1,
'Free To Air': 1
however if I specified only this "key",
['W Churchill St','Chicago','IL']
I would expect to see
'Cable' : 2,
'Satellite': 1
How to write my view to accommodate this?
The solution to this was to append my product to the key like so;
emit([doc.address.street, doc.address.city, doc.address.state, doc.product], null)
Then using;
?start_key=[street,city,state]&end_key=[street,city,state,{}]&group_level=4
Result:
{"rows":[
{"key":['W Churchill St','Chicago','IL','Cable'], "value":2},
{"key":['W Churchill St','Chicago','IL','Satellite'], "value":1}
]}
I would then need to repeat this query for each of the addresses and sum the results.