I have extracted the indexes of certain rows that I'd like to assign to rows of another DataFrame with the following command:
indexes = df1[df1.iloc[:, 0].isin(df2.iloc[:, 0].values)].index.values
What I'd like to do is assign certain values of columns of df2 to the rows (of which I have the indexes) to certain columns of df1.
For example:
df1:
index | col1 | col2 | col3
0 | ABC | DEF | GHI
1 | JKL | MNO | PQR
2 | STU | VWX | YZ
df2:
index | colA | colB | colC
0 | WHAT | EVER | 123
2 | 111 | 222 | 333
What I'd like to do now for example is to assign the value of colB (df2) to col3 (df1) according to the indexes. So the result should be:
df1:
index | col1 | col2 | col3
0 | ABC | DEF | EVER <- value of colB (df2)
1 | JKL | MNO | PQR
2 | STU | VWX | 222 <- value of colB (df2)
I'm aware that I can set values with .iloc (integer location) function. But I can't figure out how to do this with the corresponding indexes.
Also I'd appreciate a good Pandas guide (as you can see I'm new with Pandas)
Greetings,
Frame
Related
I have two tables which are connected by an ID column (not shown in the picture). Here is how the data looks:
| column1 | column2 |
| -------- | -------------- |
| Mike | 345 |
| Steve | 987 |
| Andy | 0 |
| Lucas | 0 |
--
| column3 | column4 |
| -------- | -------------- |
| Mike | 543 |
| Lucas | 0 |
| Andy | 678 |
| Steve | 0 |
I wish to create a calculated column which concatenates the results from the second table in the picture (column3, column4) only if the result in column2 is zero. If the result of column2 is not zero then it should have precedence in concatenation.
Also if both column2 and column4 are zero then there should be no concatenation.
I'm expecting something like this:
| Column3 | Column4 | Concat column|
|---- |------| -----|
| Mike | 543 | Mike 345 |
| Lucas | 0 | |
| Andy | 678 | Andy 678 |
| Steve | 0 | Steve 987 |
Try this.
ConcatColumn = IF(Table1[Column2]<>0,Table1[Column1]&Table1[Column1],RELATED(Table2[Column3])&RELATED(Table2[Column4]))
Before using the above calculated column, First you have to establish relationship Table1 & Table2 by Column1 & Column3.
Also it is assumed that Column2 & Column4 have datatype as WholeNumber
I have a sample table with the following values:
location | col1 | col2 | col3 | col4
------------------------------------------
usa1 | 1 | 1 | 1 | 1
usa2 | 1 | 0 | 1 | 1
values are boolean for true (1) and false (0).
I would like to add a new column that shows the sum per row. from https://www.c-sharpcorner.com/article/sum-multiple-column-using-dax-in-power-bi/
it suggested the following approach:
Measure Total = SUM(table[col1]) + SUM(table[col2]) + ... + SUM(table[colx])
I am getting the expected sum for the four columns I tried. But if I have 20 columns, I was hoping you can guide me to write the DAX more efficiently.
expected output
location | col1 | col2 | col3 | col4 | sum
------------------------------------------
usa1 | 1 | 1 | 1 | 1 | 4
usa2 | 1 | 0 | 1 | 1 | 3
I would use unpivoting feature of PowerQuery to go from wide to long by selecting location and unpivot all other columns.
Then the sum by location would be immediate in any visual, no need for DAX.
One way I do it is
Sum = table[col1] + table[col2] + table[col3] + ...
I am not sure if there is another way for your situation since I only had at most 5 columns to add.
Given the following table have, I would like to delete the records that satisfy the conditions based on the to_delete table.
data have;
infile datalines delimiter="|";
input id :8. item :$8. datetime : datetime18.;
format datetime datetime18.;
datalines;
111|Basket|30SEP20:00:00:00
111|Basket|30SEP21:00:00:00
111|Basket|31DEC20:00:00:00
111|Backpack|31MAY22:00:00:00
222|Basket|31DEC20:00:00:00
222|Basket|30JUN20:00:00:00
;
+-----+----------+------------------+
| id | item | datetime |
+-----+----------+------------------+
| 111 | Basket | 30SEP20:00:00:00 |
| 111 | Basket | 30SEP21:00:00:00 |
| 111 | Basket | 31DEC20:00:00:00 |
| 111 | Backpack | 31MAY22:00:00:00 |
| 222 | Basket | 31DEC20:00:00:00 |
| 222 | Basket | 30JUN20:00:00:00 |
+-----+----------+------------------+
data to_delete;
infile datalines delimiter="|";
input id :8. item :$8. datetime : datetime18.;
format datetime datetime18.;
datalines;
111|Basket|30SEP20:00:00:00
111|Backpack|31MAY22:00:00:00
222|Basket|30JUN20:00:00:00
;
+-----+----------+------------------+
| id | item | datetime |
+-----+----------+------------------+
| 111 | Basket | 30SEP20:00:00:00 |
| 111 | Backpack | 31MAY22:00:00:00 |
| 222 | Basket | 30JUN20:00:00:00 |
+-----+----------+------------------+
In the past, I used to operate with the catx() function to concatenate the conditions in a where statement, but I wonder if there is a better way of doing this
proc sql;
delete from have
where catx('|',id,item,datetime) in
(select catx('|',id,item,datetime) from to_delete);
run;
+-----+--------+------------------+
| id | item | datetime |
+-----+--------+------------------+
| 111 | Basket | 30SEP21:00:00:00 |
| 111 | Basket | 31DEC20:00:00:00 |
| 222 | Basket | 31DEC20:00:00:00 |
+-----+--------+------------------+
Please note that it should allow the have table to have more columns than the table to_delete.
You can use except from to compute difference set of two sets:
proc sql;
create table want as
select * from have except select * from to_delete
;
quit;
Here is the sheet for testing: https://docs.google.com/spreadsheets/d/11CoQ_PAtVNQBkbtnHH0xR4bhCQVU-pcz645h1akTQuA/edit?usp=sharing
I have a table like this:
| id | category | irrelevant |
|----|----------|------------|
| 1 | cat1 | FALSE |
| 2 | cat2 | FALSE |
| 3 | | TRUE |
| 4 | cat1 | FALSE |
Each item has an ID and a category or, if it is considered irrelevant, it has no category and the column "irrelevant" is marked as TRUE.
What I would like to do is to write a formula that will return the number of items in each category plus a row with the number of irrelevant items. So in the case above the result would be:
| category | number |
|------------|--------|
| cat1 | 2 |
| cat2 | 1 |
| irrelevant | 1 |
If I try something like:
=QUERY(A1:C5,"select B,count(A) group by B")
I get the correct numbers, but since "irrelevant" is not a category its cell is empty, so the result is:
| category | count id |
|----------|----------|
| | 1 |
| cat1 | 2 |
| cat2 | 1 |
Notice the empty "B2" cell. Is there a way to rename it to "irrelevant" without altering the first table? One thing I tried was just to count the irrelevant items.
=transpose(query(A1:C5, "select count(A) where C = TRUE label count(A) 'irrelevant'"))
which returns me simply
| irrelevant | 1 |
And then altering slightly the first formula so it doesn't count the "empty" categories and finally joining both of them in an array:
={
QUERY(A1:C5,"select B,count(A) where B <> '' group by B");
TRANSPOSE(QUERY(A1:C5, "select count(A) where C = TRUE label count(A) 'irrelevant'"))
}
This returns me what I want for the example above
| category | count id |
|------------|----------|
| cat1 | 2 |
| cat2 | 1 |
| irrelevant | 1 |
But this won't work if my original table doesn't have irrelevant items. Which can occur depending on the range I chose to query, so if I want to query a table like this:
| id | category | irrelevant |
|----|----------|------------|
| 5 | cat1 | FALSE |
| 6 | cat2 | FALSE |
| 7 | cat2 | FALSE |
| 8 | cat3 | FALSE |
The solution I found will not work. Any suggestions on how can I do that?
try:
=ARRAYFORMULA(QUERY(IF((B2:B="")*(C2:C<>""), "irrelevant", ),
"select Col1,count(Col21)
where Col1 is not null
group by Col1
label count(Col2)''"))
Object: Sum up the nearest date's value by a given date
Here is my data
Table: MyData
+-------------------------------+
| ID TradeDate Value |
+-------------------------------+
| 1 2018/11/30 105 |
| 1 2018/11/8 101 |
| 1 2018/10/31 100 |
| 1 2018/9/30 100 |
| 2 2018/11/30 200 |
| 2 2018/10/31 201 |
| 2 2018/9/30 205 |
| 3 2018/11/30 300 |
| 3 2018/10/31 305 |
| 3 2018/9/30 301 |
+-------------------------------+
I create a table named 'DateList' and use slicer to select a specified date
DateList Slicer
I want to achieve the result as follows:
as of *11/9/2018*
+-----------------------------------+
| ID TradeDate Value |
+-----------------------------------+
| 1 2018/11/8 101 |
| 2 2018/10/31 201 |
| 3 2018/10/31 305 |
+-----------------------------------+
| Total 607 |
+-----------------------------------+
Currently, I try to use the steps to achieve the above result.
First, i want to find the nearest date from table 'MyData' use the new measure
MyMaxDate = CALCULATE(MAX(MyData[TradeDate]),Filter(MyData, MyData[TradeDate] <= FIRSTDATE(DateList[Date]) ))
Second, i create a new measure "MySum" to the sum up the values if [tradedate] equal to the "MyMaxDate"
MySum = CALCULATE(SUM(MyDate[Value]),Filter(MyData, MyData[TradeDate] = MyMaxDate))
Third, i create a matrix to show the result (see Result)
Unfortunately, the result 1313 is different from my goal 607
So, how can i fix my DAX formula to achieve the right result?
Many Thanks
You can calculate the closest date by taking a min over the difference in dates and then taking the minimal date with that minimal difference.
MyDate =
VAR SlicerDate = MIN(DateList[Date])
VAR MinDiff =
MINX(
FILTER(ALL(MyData),
MyData[ID] IN VALUES(MyData[ID])
),
ABS(SlicerDate - MyData[TradeDate]))
RETURN
MINX(
FILTER(ALL(MyData),
MyData[ID] IN VALUES(MyData[ID])
&& ABS(SlicerDate - MyData[TradeDate]) = MinDiff
),
MyData[TradeDate])
From there you can create the summing measure fairly easily:
MySum = CALCULATE(SUM(MyData[Value]), FILTER(MyData, MyData[TradeDate] = [MyDate]))