Let's assume that I have a list of 239800 documents like the following:
{
name: somename,
data:{age:someage, income:somevalue, height:someheight, dumplings_consumed:somenumber}
}
I know that I can index the doc by doc.data.age, doc.data.income, height, dumplings_consumed and get list of the doc that after giving a range for each parameters but how can I get a result for query like following:
List of the docs where age is between 25 and 30, income is less than $10 and height is more than 7ft?
Is there a way to get multiple indexes working?
Assuming all three of your example query parameters need to remain dynamic, you would not be able to do such a join with a single CouchDB query. The simplest strategy would be to emit an index that lets you narrow down the "biggest" aspect/dimension of your data, and then filter the rest out in your app's code or a _list function.
Now, for filtering on two aspects of numeric data, GeoCouch could potentially be used — it provides a generic 2-dimensional index, not just limited to latitude and longitude! So you would emit points that contain (say) "age" and "income" mapped to x and y. You'd then query a bbox with first two "between" parameters, and then you'd only have to filter out height on the app side.
Let's have a look at:
http://guide.couchdb.org/draft/views.html
You can search with any expression you want (javascript code) and index documents with it.
For example, by means of Futon, you can create a test database and add the two following documents based on your question:
{ "_id": "36fef0472fb7eec035c87e4f4b0381bf", "_rev": "12-4ef9014a3670a7e6acd58ad92d26fc1e", "data": { "age": 6, "income": 10, "height": 20, "dumplings_consumed": 5 }, "name": "joe" }
{ "_id": "36fef0472fb7eec035c87e4f4b038ffa", "_rev": "8-f0a0a51b830bf3d4bc3ec5697440792f", "name": "mike", "data": { "age": 27, "income": 9, "height": 78, "dumplings_consumed": 256 } }
You just have to go to your database still with Futon and create a temporary view with the following Map function:
function(doc) { var age, income, height; if (doc.name && doc.data && doc.data.age && doc.data.income && doc.data.height) { if ( doc.data.age > 25 && doc.data.age < 30 && doc.data.income < 10 && doc.data.height > 7) { emit(doc.name, doc.data); } } }
Just run and you get the result.
With a permanent view, first time the request is executed, the internal B-tree is built and it takes time. Further executions should be very fast even if documents are added to the database (as long as their number is a fraction of the totality)
Related
I have a FeatureCollection with a column named Dominance which has classified regions into stakeholder dominance. In this case, Dominance contains values as strings; specifically 'Small', 'Medium', 'Large' and 'Others'.
I want to replace these values/strings with 1,2,3 and 4. For that, I use the codes below:
var Shape = ee.FeatureCollection('XYZ')
var Shape_custom = Shape.select(['Dominance'])
var conditional = function(feat) {
return ee.Algorithms.If(feat.get('Dominance').eq('Small'),
feat.set({class: 1}),
feat)
}
var test = Shape_custom.map(conditional)
## This I plan to repeat for all classes
However, I am not able to change the values. The error I am getting is feat.get(...).eq is not a function.
What am I doing wrong here?
The simplest way to do this kind of mapping is using a dictionary. That way you do not need more code for each additional case.
var mapping = ee.Dictionary({
'Small': 1,
'Medium': 2,
'Large': 3,
'Others': 4
});
var mapped = Shape
.select(['Dominance'])
.map(function (feature) {
return feature.set('class', mapping.get(feature.get('Dominance')));
});
https://code.earthengine.google.com/8c58d9d24e6bfeca04e2a92b76d623a2
Suppose I have a table as follows:
TableA =
DATATABLE (
"Year", INTEGER,
"Group", STRING,
"Value", DOUBLE,
{
{ 2015, "A", 2 },
{ 2015, "B", 8 },
{ 2016, "A", 9 },
{ 2016, "B", 3 },
{ 2016, "C", 7 },
{ 2017, "B", 5 },
{ 2018, "B", 6 },
{ 2018, "D", 7 }
}
)
I want a measure that returns the top Group based on its Value that work inside or outside a Year filter context. That is, it can be used in a matrix visual like this (including the Total row):
It's not hard to find the maximal value using DAX:
MaxValue = MAX(TableA[Value])
or
MaxValue = MAXX(TableA, TableA[Value])
But what is the best way to look up the Group that corresponds to that value?
I've tried this:
Top Group = LOOKUPVALUE(TableA[Group],
TableA[Year], MAX(TableA[Year]),
TableA[Value], MAX(TableA[Value]))
However, this doesn't work for the Total row and I'd rather not have to use the Year in the measure if possible (there are likely other columns to worry about in a real scenario).
Note: I am providing a couple solutions in the answers below, but I'd love to see any other approaches as well.
Ideally, it would be nice if there were an extra argument in the MAXX function that would specify which column to return after finding the maximum, much like the MAXIFS Excel function has.
Another way to do this is through the use of the TOPN function.
The TOPN function returns entire row(s) instead of a single value. For example, the code
TOPN(1, TableA, TableA[Value])
returns the top 1 row of TableA ordered by TableA[Value]. The Group value associated with that top Value is in the row, but we need to be able to access it. There are a couple of possibilities.
Use MAXX:
Top Group = MAXX(TOPN(1, TableA, TableA[Value]), TableA[Group])
This finds the maximum Group from the TOPN table in the first argument. (There is only one Group value, but this allows us to covert a table into a single value.)
Use SELECTCOLUMNS:
Top Group = SELECTCOLUMNS(TOPN(1, TableA, TableA[Value]), "Group", TableA[Group])
This function usually returns a table (with the columns that are specified), but in this case, it is a table with a single row and a single column, which means the DAX interprets it as just a regular value.
One way to do this is to store the maximum value and use that as a filter condition.
For example,
Top Group =
VAR MaxValue = MAX(TableA[Value])
RETURN MAXX(FILTER(TableA, TableA[Value] = MaxValue), TableA[Group])
or similarly,
Top Group =
VAR MaxValue = MAX(TableA[Value])
RETURN CALCULATE(MAX(TableA[Group]), TableA[Value] = MaxValue)
If there are multiple groups with the same maximum value the measures above will pick the first one alphabetically. If there are multiple and you want to show all of them, you could use a concatenate iterator function:
Top Group =
VAR MaxValue = MAX(TableA[Value])
RETURN CONCATENATEX(
CALCULATETABLE(
VALUES(TableA[Group]),
TableA[Value] = MaxValue
),
TableA[Group],
", "
)
If you changed the 9 in TableA to an 8, this last measure would return A, B rather than A.
In Google Sheets, I'm trying to query a column and look for a state abbreviation, and if that abbreviation is a match, then "East" if not then "West"
Wanting to return text values in my column based on state abbreviation. We have territory manager split into two domains--East and West. So, trying to easily sort my data by East/West.
Here's what I have:
=IF(M:M={"AL", "CA", "DE","FL","GA","IA","KY","ME","MD","MA","MN","MS","NH","NJ","NY","ND","RI","SD","TN","VT","VA","WV","WI"},"East","West")
But, when I fill down, it just fills down East, and does not seem to actually query M:M
Thoughts?
Not the cleanest code, but this should work:
=ARRAYFORMULA(IF(LEN(A:A), IF((A:A = "foo")+(A:A = "bar") = 1, "WEST", "EAST"), ))
To use IF with an OR in an ARRAYFORMULA, you evaluate the column with 1s and 0s. The A:A = "foo" will evaluate to 1 if foo is in the cell. So if one of your OR criteria is in the cell, the total value in the IF will be 1.
You have a lot of criteria so writing each of them in will take a while ...
E.g. IF( (A:A = "AL") + (A:A = "CA") ... (A:A = "WI") = 1, "East", "West")
Use ISERROR/MATCH():
=IF(ISERROR(MATCH(M:M,{"AL", "CA", "DE","FL","GA","IA","KY","ME","MD","MA","MN","MS","NH","NJ","NY","ND","RI","SD","TN","VT","VA","WV","WI"},0)),"West","East")
I have a map reduce view:
.....
emit( diffYears, doc.xyz );
reduced with _sum.
xyz is then a number which is summed per integer(diffYears).
The output looks roughly like this:
4 1204.9
5 796.19
6 1124.8
7 1112.6
8 1993.62
9 159.26
10 395.41
11 456.05
12 457.97
13 39.80
14 483.68
15 269.469
etc..
What I would like to do is group the results as follows:
Grouping Total per group
0-4 1959.2 i.e add up the xyz's for years 0,1,2,3,4
5-9 3998.5 same for 5,6,7,8,9 ...etc.
10-14 3566.3
I saw a suggestion where a list was used on a view output here: Using a CouchDB view, can I count groups and filter by key range at the same time?
but have been unable to adapt it to get any kind of result.
The code given is:
{
_id: "_design/authors",
views: {
authors_by_date: {
map: function(doc) {
emit(doc.date, doc.author);
}
}
},
lists: {
count_occurrences: function(head, req) {
start({ headers: { "Content-Type": "application/json" }});
var result = {};
var row;
while(row = getRow()) {
var val = row.value;
if(result[val]) result[val]++;
else result[val] = 1;
}
return result;
}
}
}
I substituted var val = row.key in this section:
while(row = getRow()) {
var val = row.value;
if(result[val]) result[val]++;
else result[val] = 1;
}
(although in this case the result is a count.)
This seems to be the way to do it.
(It is like having a startkey and endkey for each grouping which I can do manually, naturally, but not inside a process. Or is there a way of entering multiple start- and endkeys into one GET command???? )
This must be a fairly normal thing to do especially for researchers using statistical analysis.
I assume therefore that it does get done but I cannot locate examples
as far as CouchDB is concerned.
I would appreciate some help with this please or a pointer in the right direction.
Many thanks.
EDIT:
Perhaps the answer lies in a process in 'reduce' to group the output??
You can accomplish what you want using a complex key. The limitation is that the group size is static and needs to be defined in the view.
You'll need a simple step function to create your groups within map like:
var size = 5;
var group = ( doc.diffYears - (doc.diffYears % size)) / size;
emit( [group, doc.diffYears], doc.xyz);
The reduce function can remain _sum.
Now when you query the view use group_level to control the grouping. At group_level=0, everything will be summed and one value will be returned. At group_level=1 you'll receive your desired sums of 0-4, 5-9 etc. At group_level=2 you'll get your original output.
I am trying to plot yearly data on a geochart. I would like the most recent data on top, but for whatever reason, the earliest year is always on top in the actual visualization.
I have tried re-ordering the table to have the latest years as the first entries in the data with no effect.
I thought that maybe it was happening because I used a view to filter my data, but the filter is not reordering the items with the older ones first (so that shouldn't impact how it is displayed).
I do not want to filter out data since I use transparency to display all points. Here is some sample code that displays the same problem:
function drawVisualization() {
var data = new google.visualization.DataTable();
data.addColumn('number', 'Latitude');
data.addColumn('number', 'Longitude');
data.addColumn('number', 'Color');
data.addColumn('number', 'Output (MW)');
data.addRows([
[35, 135, 2, 334],
[35, 135, 1, 100],
[35.1, 135.1, 1, 100],
[35.1, 135, 1, 100],
[35, 135.1, 1, 100],
[34.9, 134.9, 1, 100],
[34.9, 135, 1, 100],
[35, 135.1, 1, 100],
]);
var geochart = new google.visualization.GeoChart(
document.getElementById('visualization'));
geochart.draw(data, {
colorAxis: {
'minValue': 1,
'maxValue': 2,
'values': [1, 2],
'colors': ['black','red'],
},
'markerOpacity': 0.5,
'region': 'JP'
});
}
I can change the values in column 2 or 3 (0-indexed), or I can change the order of the entries in to the data table, but I keep getting the same result. I have a feeling it always sticks bigger sized values in the back so you can still see the little values, but I'm wondering if there is any authoritative reference on it, or any way to get around it.
This is what it looks like no matter what I do:
What I want it to look like is as follows (manipulated the SVG manually to adjust the Z-order):
I played around with it for a bit, and I think you're right: it's automatically z-indexing the markers in size-order. If I read your intent correctly, you are looking to show some subset of years, and you want the markers to be z-indexed by years. I think you can accomplish that with some custom filtering: sort your data by location and year, then for every location, filter out every year with a smaller size than any of the newer years. Something like this should work:
// order by location and year (descending)
var rows = data.getSortedRows([0, 1, {column: 2, desc: true}]);
// parse the rows backwards, removing all years where a location has a newer year with a larger size value
// we don't need to parse row 0, since that will always be the latest year for some location
var size, lat, long;
for (var i = rows.length - 1; i > 0; i--) {
size = data.getValue(rows[i], 3);
lat = data.getValue(rows[i], 0);
long = data.getValue(rows[i], 1);
for (var j = i - 1; j >= 0 && lat == data.getValue(rows[j], 0) && long == data.getValue(rows[i], 1); j--) {
if (size < data.getValue(rows[j], 3)) {
rows.splice(i, 1);
break;
}
}
}
var view = new google.visualization.DataView(data);
view.setRows(rows);
Here's a working example based on your code: http://jsfiddle.net/asgallant/36AmD/
You are correct that the order of the markers is determined by the size, with the larger markers drawn first so they end up below the smaller markers, which is a convenience for most applications. If you wish to hide 'later' markers based on order, you'll have to do that another way, perhaps by hiding the rows of data.
Is there a reason it makes sense to hide data if it covers 'earlier' data? Perhaps an option could be added to disable this automatic reordering, especially if transparent colors are used to allow you to see through.
Try this, helped me in a project:
setTimeout(function () {
$('.google-visualization-table').css("z-index", "1");
}, 500);