Beginner rbind function - rbind

I cannot for the life of me understand the rbind function. I've tried using the examples on here, but I can't figure out what I am doing incorrectly. All I would like to do is add the data from my second data frame under the first.
Does rbind require the columns be the same name or...?
ParticipantA=c("A","B","C","D")
Score1A=c("21","20","21","21")
Score2A=c("32","40","32","31")
Score3A=c("47","50","43","46")
BlockA=data.frame(ParticipantA,Score1A,Score2A,Score3A)
BlockA$Major=c("Computer_Science","Computer_Science","Computer_Science","Computer_Science")
BlockA$Gender=c("Female","Female","Male","Male")
ParticipantB=c("E","F","G","H")
Score1B=c("28","28","21","22")
Score2B=c("30","36","37","32")
Score3B=c("41","49","49","46")
BlockB=data.frame(ParticipantB,Score1B,Score2B,Score3B)
BlockB$Major=c("Medical","Medical","Medical","Medical")
BlockB$Gender=c("Female","Female","Male","Male")

rbind requires that all columns be of the same name and class.

The problem is in the column titles. rbind uses column titles to orient how it will bind the rows. The columns can be in different orders, R will just use the first element to determine column order.
Alternatively, adding another column to your data frames, with the value "A" or "B" in it could preserve your information without putting "A"s and "B"s in your column names <-- the reason you can't use rbind. The additional column would also allow you to do more analyses in R, e.g. regression and other linear models.
Here is one way to handle your data:
Create a uniform set of column names that can be used for the data frames "BlockA" and "BlockB"
final_colnames <- c("Block", "Participant", "Score1", "Score2", "Score3")
Create a new list to identify which block the participants belong to.
BlockA = c("A", "A", "A", "A")
Your previous data
ParticipantA = c("A", "B", "C", "D")
Score1A = c("21", "20", "21", "21")
Score2A = c("32", "40", "32", "31")
Score3A = c("47", "50", "43", "46")
The label "BlockA" is recycled here to name the new data frame, but not before adding the "BlockA" column list of "A" "A" "A" "A".
BlockA = data.frame(BlockA, ParticipantA, Score1A, Score2A, Score3A)
The new column names have to be added at this point, so that the number of names and the number of columns are equal.
colnames(BlockA) <- final_colnames
Now you can add the remaining columns
BlockA$Major = c("Computer_Science", "Computer_Science", "Computer_Science", "Computer_Science")
BlockA$Gender = c("Female", "Female", "Male", "Male")
BlockB is the same process
BlockB = c("B", "B", "B", "B") # the extra column
ParticipantB = c("E", "F", "G", "H")
Score1B = c("28", "28", "21", "22")
Score2B = c("30", "36", "37", "32")
Score3B = c("41", "49", "49", "46")
BlockB = data.frame(BlockB, ParticipantB, Score1B, Score2B, Score3B)
colnames(BlockB) <- final_colnames # renaming the columns
BlockB$Major = c("Medical", "Medical", "Medical", "Medical")
BlockB$Gender = c("Female", "Female", "Male", "Male")
Uniform column names mean that rbind will now work.
rbind(BlockA,BlockB)

Related

Power BI - Power Query Editor: Remove All Duplicates (Don't leave any rows that were part of the duplicate)

So, I know how to remove duplicates which leave one row behind. What I want to do is remove all of the rows associated with a duplicate, because we don't know which of the duplicates we want to keep, and for our purposes therefore don't want any of them in our table. There are only two columns. One column contains the duplicates. The second has unique values per duplicate, but we don't want any of them to remain.
Thank you.
Here is a possible workaround. Use Table.Group to count the duplication, then retain only unique entries using Table.SelectRows.
let
Source = Table.FromRecords({
[a = "A", b = "a"], // < duplicated
[a = "B", b = "a"],
[a = "A", b = "a"] // < duplicated
})
in
Table.SelectRows(
Table.Group(Source, {"a", "b"}, {"Count", Table.RowCount}),
each [Count] = 1
)
/*
* Output
*
* a b Count
* --- --- -----
* B a 1
*/

used if statement in select query does not work

I have an order list in csv file. I want to transform the data.
here's my query. I have to add 2 new columns "gift box set" and blue oyser mushroom". If name in V column contains gift box set, then show 1, otherwise leave it blank. Please help me to correct my query. also how can I give a new column name for those 2 added columns?
=query(orders!A2:AP134, "select Q, K, V, if(V="Gift Box Set 5lbs", "1", ""), if(V="Blue Oyster Mushroom 3lbs", "1", "")", true)
There is no if statement in QUERY syntax.
Try a simpler approach without QUERY:
=ARRAYFORMULA(
{
Q2:Q134,
K2:K134,
V2:V134,
IF(V2:V134 = "Gift Box Set 5lbs", 1, ""),
IF(V2:V134 = "Blue Oyster Mushroom 3lbs", 1, "")
}
)
try:
=ARRAYFORMULA({Q2:Q, K2:K,
IF(V2:V="Gift Box Set 5lbs", "1", ),
IF(V2:V="Blue Oyster Mushroom 3lbs", "1", )})
or:
=ARRAYFORMULA(FILTER({Q2:Q, K2:K,
IF(V2:V="Gift Box Set 5lbs", "1", ),
IF(V2:V="Blue Oyster Mushroom 3lbs", "1", )},
REGEXMATCH(V2:V, "Gift Box Set 5lbs|Blue Oyster Mushroom 3lbs"))
or:
=QUERY(INDEX(IF(REGEXMATCH(V2:V, "Gift Box Set 5lbs|Blue Oyster Mushroom 3lbs"),
{Q2:Q, K2:K, ROW(A2:A)^0}, )),
"where Col3=1", 0)

DAX: How do I write an IF statement to return a calculation for multiple (specific) values selected?

This is driving me nuts. Let's say we want to use a slicer which has two distinct values to choose from a dimension. There is A and B.
Let us also say that my Fact table is connected to this dimension, however it has the same dimension with more options.
My slicer now has A, B and (Blank). No biggie.
Let's now say I want to list out all of the possible calculation outcomes by selecting the slicer in a DAX formula, but in my visual I need all those outcomes to be listed in an IF() branched formula:
I can list out A:
IF(MAX(SlicerDim[Column]) = "A", CALCULATE([Calculation], SlicerDim[Column] = "A")
I can list out B:
IF(MAX(SlicerDim[Column]) = "A", CALCULATE([Calculation], SlicerDim[Column] = "A")
I can list out the (Blank) calculation too:
CALCULATE([Calculation], SlicerDim[Column] = Blank())
And I've managed to get a calculation out of it even when all of the slicer elements are on or off, using:
NOT(ISFILTERED(SlicerDim[Column])), CALCULATE([Calculation], SlicerDim[Column] = "A" || SlicerDim[Column] = "B")
Notice I need this IF() branch to actually return a calculation using A & B values, so now I have returns for when A or B or (Blank) or All or None are selected; BUT NOT when multiple values of A & B are selected!
How do I write out this IF() branch for it to return the same thing, but when both A & B are selected? Since there are only two real options in the slicer - I managed to use MIN() and MAX() get it to work by using their names or Index numbers.
IF((MIN(SlicerDim[Column]) = "A" && MAX(SlicerDim[Column]) = "B") || NOT(ISFILTERED(Paslauga[Paslauga])), CALCULATE([Calculation], SlicerDim[Column] = "A" || SlicerDim[Column] = "B")
BUT - I want a more understandable/robust/reusable formula, so that I could list out many selectable values from the slicer and have it return a calculation for specifically selected slicer values.
Please, help.
I've been searching high and low and there seems to not be an easy way to fix this albeit scraping the IF route and just using a damn slicer for this type of dilemma.
TL;DR:
How do I write an IF() branch calculation using DAX to get an outcome when All/None or non-blank or Specific slicer values are selected?
My best effort:
I am looking to improve the first IF() branch to not have to use MIN/MAX, because I would like to be able to reuse this type of formula if there were more than two real options in the slicer:
IF_branch =
IF((MIN(SlicerDim[Column]) = "A" && MAX(SlicerDim[Column]) = "B" || NOT(ISFILTERED(SlicerDim[Column])), CALCULATE([Calculation], SlicerDim[Column] = "A" || SlicerDim[Column] = "B"),
IF(MAX(SlicerDim[Column]) = "A", CALCULATE([Calculation], SlicerDim[Column] = "A"),
IF(MAX(SlicerDim[Column]) = "B", CALCULATE([Calculation], SlicerDim[Column] = "B"),
CALCULATE([Calculation], SlicerDim[Column] = BLANK()))))
Think what you are looking for is CONTAINS and VALUES
VALUES will give you the distinct current selection in scope.
CONTAINS lets you check if a table contains any row with a set of values.
[]
Formulas:
selected Scenarios = CONCATENATEX(VALUES(DimScenario[ScenarioName]);[ScenarioName];";")
Contains Forecast and Budget? =
IF(
CONTAINS(VALUES(DimScenario[ScenarioName]);[ScenarioName];"Forecast") &&
CONTAINS(VALUES(DimScenario[ScenarioName]);[ScenarioName];"Budget")
;"Yes"
;"No"
)

Return top value ordered by another column

Suppose I have a table as follows:
TableA =
DATATABLE (
"Year", INTEGER,
"Group", STRING,
"Value", DOUBLE,
{
{ 2015, "A", 2 },
{ 2015, "B", 8 },
{ 2016, "A", 9 },
{ 2016, "B", 3 },
{ 2016, "C", 7 },
{ 2017, "B", 5 },
{ 2018, "B", 6 },
{ 2018, "D", 7 }
}
)
I want a measure that returns the top Group based on its Value that work inside or outside a Year filter context. That is, it can be used in a matrix visual like this (including the Total row):
It's not hard to find the maximal value using DAX:
MaxValue = MAX(TableA[Value])
or
MaxValue = MAXX(TableA, TableA[Value])
But what is the best way to look up the Group that corresponds to that value?
I've tried this:
Top Group = LOOKUPVALUE(TableA[Group],
TableA[Year], MAX(TableA[Year]),
TableA[Value], MAX(TableA[Value]))
However, this doesn't work for the Total row and I'd rather not have to use the Year in the measure if possible (there are likely other columns to worry about in a real scenario).
Note: I am providing a couple solutions in the answers below, but I'd love to see any other approaches as well.
Ideally, it would be nice if there were an extra argument in the MAXX function that would specify which column to return after finding the maximum, much like the MAXIFS Excel function has.
Another way to do this is through the use of the TOPN function.
The TOPN function returns entire row(s) instead of a single value. For example, the code
TOPN(1, TableA, TableA[Value])
returns the top 1 row of TableA ordered by TableA[Value]. The Group value associated with that top Value is in the row, but we need to be able to access it. There are a couple of possibilities.
Use MAXX:
Top Group = MAXX(TOPN(1, TableA, TableA[Value]), TableA[Group])
This finds the maximum Group from the TOPN table in the first argument. (There is only one Group value, but this allows us to covert a table into a single value.)
Use SELECTCOLUMNS:
Top Group = SELECTCOLUMNS(TOPN(1, TableA, TableA[Value]), "Group", TableA[Group])
This function usually returns a table (with the columns that are specified), but in this case, it is a table with a single row and a single column, which means the DAX interprets it as just a regular value.
One way to do this is to store the maximum value and use that as a filter condition.
For example,
Top Group =
VAR MaxValue = MAX(TableA[Value])
RETURN MAXX(FILTER(TableA, TableA[Value] = MaxValue), TableA[Group])
or similarly,
Top Group =
VAR MaxValue = MAX(TableA[Value])
RETURN CALCULATE(MAX(TableA[Group]), TableA[Value] = MaxValue)
If there are multiple groups with the same maximum value the measures above will pick the first one alphabetically. If there are multiple and you want to show all of them, you could use a concatenate iterator function:
Top Group =
VAR MaxValue = MAX(TableA[Value])
RETURN CONCATENATEX(
CALCULATETABLE(
VALUES(TableA[Group]),
TableA[Value] = MaxValue
),
TableA[Group],
", "
)
If you changed the 9 in TableA to an 8, this last measure would return A, B rather than A.

How to Return Text with IF Function in an Array

In Google Sheets, I'm trying to query a column and look for a state abbreviation, and if that abbreviation is a match, then "East" if not then "West"
Wanting to return text values in my column based on state abbreviation. We have territory manager split into two domains--East and West. So, trying to easily sort my data by East/West.
Here's what I have:
=IF(M:M={"AL", "CA", "DE","FL","GA","IA","KY","ME","MD","MA","MN","MS","NH","NJ","NY","ND","RI","SD","TN","VT","VA","WV","WI"},"East","West")
But, when I fill down, it just fills down East, and does not seem to actually query M:M
Thoughts?
Not the cleanest code, but this should work:
=ARRAYFORMULA(IF(LEN(A:A), IF((A:A = "foo")+(A:A = "bar") = 1, "WEST", "EAST"), ))
To use IF with an OR in an ARRAYFORMULA, you evaluate the column with 1s and 0s. The A:A = "foo" will evaluate to 1 if foo is in the cell. So if one of your OR criteria is in the cell, the total value in the IF will be 1.
You have a lot of criteria so writing each of them in will take a while ...
E.g. IF( (A:A = "AL") + (A:A = "CA") ... (A:A = "WI") = 1, "East", "West")
Use ISERROR/MATCH():
=IF(ISERROR(MATCH(M:M,{"AL", "CA", "DE","FL","GA","IA","KY","ME","MD","MA","MN","MS","NH","NJ","NY","ND","RI","SD","TN","VT","VA","WV","WI"},0)),"West","East")