We have a question on designing schema and handling analytics requirement for our product and would appreciate your advise on this. We are just getting started with Cube.js. Here is our req: We have data (for simplicity...i will use an example) where say we have multiple columns (attributes) and say 1 "value" and 1 "weight" column. We need to calculate weighted averages across all combinations of the columns (attributes) and the value / weight columns.
e.g. Group by Column 1 and weighted average (value/Weight column)
or Group by Column 1, 2 and weighted average etc. etc...
it can be many types of combinations and we have atleast 8 to 12 columns like that
Wondering how best to model?
Probably for you will be convenient to create one cube with several predefined segments or also you can create several cubes per each attribute.
It depends on your data.
Related
I have a dataset with around 15 numeric columns and two categorical columns which are a "State" column and an "Income" column with six buckets representing each different income range. Do I need to encode the "Income" column if it contains integers 1-6 representing each income range? In addition, what type of encoder should I use for the "state" column and does anyone have any good resources on this?
In addition, does one typically perform feature selection (wrapper and filter methods such as Pearson's and Recursive Feature Elimination) before PCA? What is the typical correlation threshold when using a method like Pearson's? And what is the ideal number of dimensions or explained variance ratio one should use when running PCA. I'm confused if you use one of them or both. Thank you.
I am new to bootstrapping analyses and I could not find much about bootstrapping community matrix. So my question is:
I have a community matrix (sample as rows and species as columns) composed of hundreds of species and about 10000 lines. I also have a column "groups" in this matrix, but the number of samples (rows) in each group is different. For example group "A" have 5 samples, "B" have 4 samples and "C" have 5 samples. I need to bootstrap each group 1000 times by taking only 4 random samples on each replication. For each replication, I need to calculate the mean value of each column (species). How can I bootstrap this matrix, in R, by groups, and generate a new matrix (1000 rows for each group) with the mean of each replication?
I have a dataset which is categorical dataset. I am using WEKA software for feature selection. I have used CfsSubsetEval as attribute evaluator with Greedystepwise method. I came to know this link that CFS uses Pearson correlation to find the strong correlation between the dataset. I also found out how to calculate Pearson correlation coefficient using this link. As per the link the data values need to be numerical for evaluation. Then how can WEKA did the evaluation on my categorical dataset?
The strange result is that Among 70 attributes CFS selects only 10 attributes. Is it because of the categorical dataset? Additionally my dataset is a highly imbalanced dataset where imbalanced ration 1:9(yes:no).
A Quick question
If you go through the link you can found the statement the correlation coefficient to measure the strength and direction of the linear relationship between two numerical variables X and Y. Now I can understand the strength of the correlation coefficient which is varied in between +1 to -1 but what about the direction? How can I get that? I mean the variable is not a vector so it should not have a direction.
The method correlate in the CfsSubsetEval class is used to compute the correlation between two attributes. It calls other methods, depending on the attribute types, which I've linked here:
two numeric attributes: num_num
numeric/nominal attributes: num_nom2
two nominal attributes: nom_nom
How can I do custom number formatting in a Power Bi visual?
I don't want to show all value as million. I want to put thousand for 1-day value, and million for 1-week value and year for 1-year value.
Power BI charts follow the principles of good data visualisation. That includes a scale that is relevant to the data with labels that relate to the scale.
In the visualisation, the differences for the values less than 1M are not discernible. The label with the 0M supports that approach, although it doesn't look great. But that happens when you have a chart with very large AND very small values. Power BI only supports one display unit and you selected Millions.
You may want to consider using a different visual for the data. Not all visuals to be shown as charts. If you want to show the exact numbers, then a simple table might be a better approach. In a sorted list of numbers, the digits in a number act very much like a horizontal bar.
Or split the chart in two and show one chart for values above 1M and another for values below 1M.
Or use Thousands as display units instead of Millions.
I have a matrix that has two row's dimensions- one for the country, and one that shows different KPIs. Then I have two dimensions for columns - one with the product name and one with the product logo. Basically all KPI's for each country broken down by product:
What I need to have is a total for each row, e.g. total KPI1 for all products, then total KPI2 for all products and so on.
However, when I go to the formatting tab and turn subtotals to be ON, it appears in this way:
It gives me a total for each product separately which is basically the same number. Is there any way to have only 1 total for the whole row?
Go to Format > Subtotal and turn on Per column level. Then you can specify which levels you want to show subtotals for below that toggle.