crossfilter dimension on 2 fields - mapreduce

My data looks like this
field1,field2,value1,value2
a,b,1,1
b,a,2,2
c,a,3,5
b,c,6,7
d,a,6,7
I don't have a good way of rearranging that data so let's assume the data has to stay like this.
I want to create a dimension on field1 and field2 combined : a single dimension that would take the union of all values in both field1 and field2 (in my example, the values should be [a,b,c,d])
As a reduce function you can assume reduceSum on value2 for example (allowing double counting for now).
(have tagged dc.js and reductio because it could be useful for users of those libraries)

First I need to point out that your data is denormalized, so the counts you get might be somewhat confusing, no matter what technique you use.
In standard usage of crossfilter, each row will be counted in exactly one bin, and all the bins in a group will add up to 100%. However, in your case, each row will be counted twice (unless the two fields are the same), so for example a pie chart wouldn't make any sense.
That said, the "tag dimension" feature is perfect for what you're trying to do.
The dimension declaration could be as simple as:
var tagDimension = cf.dimension(function(d) { return [d.field1,d.field2]; }, true);
Now each row will get counted twice - this dimension and its associated groups will act exactly as if each of the rows were duplicated, with one copy indexed by field1 and the other by field2.
If you made a bar chart with this, say, the total count will be 2N minus the number of rows where field1 === field2. If you click on bar 'b', all rows which have 'b' in either fields will get selected. This only affects groups built on this dimension, so any other charts will only see one copy of each row.

Related

AmCharts4: amCore.percent() not working as intended

I have the following graph:
"1" is a LineSeries, and "2" is a columnSeries. I set the width of the columnSeries like this:
series.columns.template.width = am4core.percent(90);
But as you can see, the columns are far away from 90% width.
Interestingly, without the LineSeries, it looks like this, which is what I want it to look like:
Furthermore, if I write a very high value (80 000) instead of 90, I get the desired columns:
I noticed that the dateAxis behaves differently(different time showing), but I cannot see where this is coming from.
Also, this high value for percentage is not a solution, because it has different widths on different graphs
So after experimenting I found out that the only way to avoid something like this is to use an own dateaxis for columns.
If, for example, you want a line+column, you can add a dateAxis to the line, and a dateAxis to the column.
If you have multiple columns, you could either use one dateAxis for all of them, or give one to every single column. The latter will make the columns "stack", so they are on the exact same positions if one would use the same dates for both datasets.
Furthermore, doing so means that you have to disable labeling on all dateaxes except for one, otherwise the labels on the xAxis will stack.

How to set up COUNTIF or COUNTIFS formula in Google Sheets to compare columns?

I am trying to compare the numerical data in two columns in Google Sheets (say, Col. A and B) and return a count of all of the times that they vary by say, more than 1 (e.g., if A3 = 5 and B3 = 2, this should get counted). The two-column arrays will always be of equal size.
At first, I thought that either COUNTIF or COUNTIFS would be my go-to tool, but I can't get this to work with either formula. These formulas seem to handle criteria within a cell, but - as far as I can tell - can't handle criteria comparing data within two different (adjacent) cells.
Can someone help me with some super syntax work-around to get COUNTIF/COUNTIFS to work... or is there a more appropriate formula to the job (perhaps involving FILTER)?
*Quick Edit: I know I could always add an additional column, which would be very simple in this example. But my real-world spreadsheets are a lot more complex and are already suffering from column overload. A lot of other formulas are already set up around existing columns, and I was hoping to discover a more elegant solution that would allow me to come up with the count without having to add a new column for each and every comparison calculation.
=ARRAYFORMULA(IF(LEN(A:A&B:B), IF(A:A-B:B>1, 1, )+IF(B:B-A:A>1, 1, ), ))
if you want final sum instead of "per row" count use:
=SUM(ARRAYFORMULA(IF(LEN(A:A&B:B), IF(A:A-B:B>1, 1, )+IF(B:B-A:A>1, 1, ), )))
Add a third column, containing e.g. =ABS(SUM(A3-B3)). (The ABS gives you the positive difference regardless of which value is larger.)
At the bottom of that column, use COUNTIF like =COUNTIF(C1:C25, ">1") (where C1:C25 is the range of cels containing those positive differences).

spotfire plot list of elements

I have a data table that has this format :
and I want to plot temperature to time, any idea how to do that ?
This can be done in a TERR data function. I don't know how comfortable you are integrating Spotfire with TERR, there is an intro video here for instance (demo starts from about minute 7):
https://www.youtube.com/watch?v=ZtVltmmKWQs
With that in mind, I wrote the script without loading any library, so it is quite verbose and explicit, but hopefully simpler to follow step by step. I am sure there is a more elegant way, and there are better ways of making it flexible with column names, but this is a start.
Your input will be a data table (dt, the original data) and the output a new data table (dt.out, the transformed data). All column names (and some values) are addressed explicitly in the script (so if you change them it won't work).
#remove the []
dt$Values=gsub('\\[|\\]','',dt$Values)
#separate into two different data frames, one for time and one for temperature
dt.time=dt[dt$Description=='time',]
dt.temperature=dt[dt$Description=='temperature',]
#split the columns we want to separate into a list of vectors
dt2.time=strsplit(as.character(dt.time$Values),',')
dt2.temperature=strsplit(as.character(dt.temperature$Values),',')
#rearrange times
names(dt2.time)=dt.time$object
dt2.time=stack(dt2.time) #stack vectors
dt2.time$id=c(1:nrow(dt2.time)) #assign running id for merging later
colnames(dt2.time)[colnames(dt2.time)=='values']='time'
#rearrange temperatures
names(dt2.temperature)=dt.temperature$object
dt2.temperature=stack(dt2.temperature) #stack vectors
dt2.temperature$id=c(1:nrow(dt2.temperature)) #assign running id for merging later
colnames(dt2.temperature)[colnames(dt2.temperature)=='values']='temperature'
#merge time and temperature
dt.out=merge(dt2.time,dt2.temperature,by=c('id','ind'))
colnames(dt.out)[colnames(dt.out)=='ind']='object'
dt.out$time=as.numeric(dt.out$time)
dt.out$temperature=as.numeric(dt.out$temperature)
Gaia
because all of the example rows you've shown here contain exactly four list items and you haven't specified otherwise, I'll assume that all of the data fits this format.
with this assumption, it becomes pretty trivial, albeit a little messy, to split the values out into columns using the RXReplace() expression function.
you can create four calculated columns, each with an expression like:
Int(RXReplace([values],"\\[([\\d\\-]+),([\\d\\-]+),([\\d\\-]+),([\\d\\-]+)]","\\1",""))
the third argument "\\1" determines which number in the list to extract. backslashes are doubled ("escaped") per the requirements of the RXReplace() function.
note that this example assumes the numbers are all whole numbers. if you have decimals, you'd need to adjust each "phrase" of the regular expression to ([\\d\\-\\.]+), and you'd need to wrap the expression in Real() rather than Int() (if you leave this part out, the result will be a String type which could cause confusion later on when working with the data).
once you have the four columns, you'll be able to unpivot to get the data easily.

Horizontal stretching in ListRenderer

I have a list that should display 7 items that each look like this:
Date Weekday Distance Time
Long text that may span many lines
two column text Distance Time
two column text Distance Time
two column text Distance Time
The last lines repeat in a number depending on the data, i e there may be different amounts of such lines for each list item.
I have tried implementing this with a ListCellRenderer that creates a table according to the requirements above, but I have a few problems with it:
The long text that may span many lines is implemented in a SpanLabel. But this text will not display more than one line anyway
Each item in the list will get space for the same number of lines below the first two..
So it seems that items in a list must be of the same size.
Later I also want to be able to detect selection on the entire list item, not just individual fields of it.
Is there a better way to do this?
How do I ensure that the SpanLabel actually gets as much space as it needs?
How do I ensure that the unknown number of lines gets the space they need, depending on how many they are?
Don't use a list: https://www.codenameone.com/blog/deeper-in-the-renderer.html
Lists in Codename One assume every entry is exactly the same height and provide no flexibility here.
I suggest doing something like the property cross demo: https://www.udemy.com/learn-mobile-programming-by-example-with-codename-one/
Where we use a Container with components within to provide a list like behavior with the full flexibility that arbitrary components allow.

kettle sample rows for each type

I have a set of rows let's say "rowId","type","value". I need on output set of 10 sample rows for each "type". How can I do it? "type" has aprox. 100 different, and changing values, so switch is not good option.
Well I've figured a walkaround from this situation. I splited transformation in parts. First part collects all data to a temp table, finds unique types, and copies them to the result.
The second one runs for every input row (where we have types), and collects data of a given type from temp table. Then you need no grouping to do stratified sample.