Outlier treatment in Weka Explorer - weka

Once outliers are identified in Weka, instead of removing them how do i treat the values. For example a dataset has column credit_balance which has values starting from 1 to 1000, and after applying formula for interquartile range we get the outlier records with values more than 800. So now instead of removing these records i want to replace the values 800 with mean as 300.
how can we do that in Weka Explorer?
Thanks

Crude method : Explore with below navigation.
Flow > Preprocess > Filter > Choose > Filters > Unsupervised > Attribute > AddExpression – this option will create new field e.g : ifelse(a2 > 1000,200, 1)
here a2 - is your attribute number.
Limitation : This will work for specific attributes only.

Related

evaluating a textual equation into a number

AS a newbie to m I still can't get my head around it. Here is the query I have. It's gone through a number of steps to get to the below. How do I use Expression.Evaluate against the whole query against the column EntryFee, which for the sake of simplicity the query is called #"Nearly There". I want to Evaluate the entire column Entry Fee. To reiterate it needs to be done in Power Query "M"
Snapshot of table/query
You can either add a new custom column with the code
Expression.Evaluate([Entry Fee])
or do a column transformation
= Table.TransformColumns(#"Nearly There", {{"Entry Fee", Expression.Evaluate, type number}})
To generate this step, you can select the column and then Transform tab > Format > Trim and then replace Text.Trim with Expression.Evaluate in the generated code.

Amazon QuickSight - Display number

I am trying Amazon Quicksight but I don't understand if this is possible.
I should display a number that is calculated:
[(a-b) / c]
a - is chosen from a list of data in the column A
b - is the mean of the column B
c - is the mean of the column C
it's possible?
Thanks
Where a differs depending on the row in column A? I don't think this is possible as you are writing a formula using both aggregated fields (mean of b or c) and a non-aggregated field (a).
I tried the formula with both and got the following error (using the avg function):
Mismatched aggregation. Custom aggregations can’t contain both
aggregate "AVG" and non-aggregated fields “AVG("ColumnId-2")”, in
any combination.
#Occamatic is right about inability to use both aggregated fields and a non-aggregated field in your formula.
However, you can circumvent this by using 'a' in an aggregated function in your calculated field. Example:
( sumIf({a},{a}={a}) - b ) / c
Please amend to the specifics to your dashboard, possibly with use of parameters in ifelse statements, but a version of this should work.
For instance, I myself can't use:
ifelse({metric_type}='Averages',avg({metric_value}),sum({metric_value}))
Instead I use:
ifelse(avgIf({metric_value},{metric_type}='Averages') > 0,avg({metric_value}),sum({metric_value}))

How to hide blanks in a matrix visualization with hierarchical rows

I've built a table of data following this helpful guide:
https://www.daxpatterns.com/parent-child-hierarchies/
I'm following it exactly but I'll still explain things here so you don't have to go through the whole article if you don't want to. I have a table of Names with corresponding keys, and ParentKeys forming hierarchies.
I added a column for the path, columns for each level of the path, depth of hierarchy and an IsLeaf column:
If I want to make a matrix and include City (from another table), all hierarchies will expand to the maximum length, and blanks are filled in with the "parent's" name:
The DAX Patterns website explains how to get around this. First add these two measures:
BrowseDepth = ISFILTERED (Nodes[Level1]) + ISFILTERED (Nodes[Level2]) + ISFILTERED (Nodes[Level3])
MaxNodeDepth = MAX (Nodes[HierarchyDepth])
And then you can factor that into calculations with this measure:
Sales Amount Simple =
IF (
Nodes[BrowseDepth] > Nodes[MaxNodeDepth],
BLANK (),
SUM (Transactions[Amount])
)
If this is the only value on a matrix visual, it turns out fine:
But if I add any other values, I get expanded hierarchies and blanks again:
My problem would be solved if I could filter out blank values, but that filters out the entire hierarchy. Do I have to make a measure using the Sales Amount format above for every value I want to include? I'm trying to add things like addresses that can't be aggregated.
Basiacally yes, you have to re do the measure. However you can embed existing into this patern which makes it a little easier.

Power BI Dashboard where the core filter condition is a disjunction on numeric fields

We are trying to implement a dashboard that displays various tables, metrics and a map where the dataset is a list of customers. The primary filter condition is the disjunction of two numeric fields. We want to the user to be able to select a threshold for [field 1] and a separate threshold for [field 2] and then impose the condition [field 1] >= <threshold> OR [field 2] >= <threshold>.
After that, we want to also allow various other interactive slicers so the user can restrict the data further, e.g. by country or account manager.
Power BI naturally imposes AND between all filters and doesn't have a neat way to specify OR. Can you suggest a way to define a calculation using the two numeric fields that is then applied as a filter within the same interactive dashboard screen? Alternatively, is there a way to first prompt the user for the two threshold values before the dashboard is displayed -- so when they click Submit on that parameter-setting screen they are then taken to the main dashboard screen with the disjunction already applied?
Added in response to a comment:
The data can be quite simple: no complexity there. The complexity is in getting the user interface to enable a disjunction.
Suppose the data was a list of customers with customer id, country, gender, total value of transactions in the last 12 months, and number of purchases in last 12 months. I want the end-user (with no technical skills) to specify a minimum threshold for total value (e.g. $1,000) and number of purchases (e.g. 10) and then restrict the data set to those where total value of transactions in the last 12 months > $1,000 OR number of purchases in last 12 months > 10.
After doing that, I want to allow the user to see the data set on a dashboard (e.g. with a table and a graph) and from there select other filters (e.g. gender=male, country=Australia).
The key here is to create separate parameter tables and combine conditions using a measure.
Suppose we have the following Sales table:
Customer Value Number
-----------------------
A 568 2
B 2451 12
C 1352 9
D 876 6
E 993 11
F 2208 20
G 1612 4
Then we'll create two new tables to use as parameters. You could do a calculated table like
Number = VALUES(Sales[Number])
Or something more complex like
Value = GENERATESERIES(0, ROUNDUP(MAX(Sales[Value]),-2), ROUNDUP(MAX(Sales[Value]),-2)/10)
Or define the table manually using Enter Data or some other way.
In any case, once you have these tables, name their columns what you want (I used MinNumber and MinValue) and write your filtering measure
Filter = IF(MAX(Sales[Number]) > MIN(Number[MinCount]) ||
MAX(Sales[Value]) > MIN('Value'[MinValue]),
1, 0)
Then put your Filter measure as a visual level filter where Filter is not 0 and use MinCount and MinValues column as slicers.
If you select 10 for MinCount and 1000 for MinValue then your table should look like this:
Notice that E and G only exceed one of the thresholds and tha A and D are excluded.
To my knowledge, there is no such built-in slicer feature in Power BI at the time being. There is however a suggestion in the Power BI forum that requests a functionality like this. If you'd be willing to use the Power Query Editor, it's easy to obtain the values you're looking for, but only for hard-coded values for your limits or thresh-holds.
Let me show you how for a synthetic dataset that should fit the structure of your description:
Dataset:
CustomerID,Country,Gender,TransactionValue12,NPurchases12
51,USA,M,3516,1
58,USA,M,3308,12
57,USA,M,7360,19
54,USA,M,2052,6
51,USA,M,4889,5
57,USA,M,4746,6
50,USA,M,3803,3
58,USA,M,4113,24
57,USA,M,7421,17
58,USA,M,1774,24
50,USA,F,8984,5
52,USA,F,1436,22
52,USA,F,2137,9
58,USA,F,9933,25
50,Canada,F,7050,16
56,Canada,F,7202,5
54,Canada,F,2096,19
59,Canada,F,4639,9
58,Canada,F,5724,25
56,Canada,F,4885,5
57,Canada,F,6212,4
54,Canada,F,5016,16
55,Canada,F,7340,21
60,Canada,F,7883,6
55,Canada,M,5884,12
60,UK,M,2328,12
52,UK,M,7826,1
58,UK,M,2542,11
56,UK,M,9304,3
54,UK,M,3685,16
58,UK,M,6440,16
50,UK,M,2469,13
57,UK,M,7827,6
Desktop table:
Here you see an Input table and a subset table using two Slicers. If the forum suggestion gets implemented, it should hopefully be easy to change a subset like below to an "OR" scenario:
Transaction Value > 1000 OR Number or purchases > 10 using Power Query:
If you use Edit Queries > Advanced filter you can set it up like this:
The last step under Applied Steps will then contain this formula:
= Table.SelectRows(#"Changed Type2", each [NPurchases12] > 10 or [TransactionValue12] > 1000
Now your original Input table will look like this:
Now, if only we were able to replace the hardcoded 10 and 1000 with a dynamic value, for example from a slicer, we would be fine! But no...
I know this is not what you were looking for, but it was the best 'negative answer' I could find. I guess I'm hoping for a better solution just as much as you are!

Country is selected as "All" calculate a specific column value only

Need some advise on how to filter data when in "Country" column is selected as "All" then consider only specific value which is in different column(Raw) else calculate respective values(column Coverage) for respective Countries
There are 3 columns - Country, Coverage and Raw
Basically when selected as "ALL" it considers total of coverage column but I want it to consider a single value from Raw column
Note - I'm using this in Line chart in Power BI
Thank you in advance!
You can create measures that have different behaviors for different filter conditions using functions like ISFILTERED or HASONEVALUE. For example:
Measure = IF(ISFILTERED(Table1[Country]), <Calculation 1>, <Calculation 2>)
or
Measure = IF(HASONEVALUE(VALUES(Table1[Country])), <Calculation 2>, <Calculation 1>)