Keeping duplicates and deleting rest from pandas dataframe - python-2.7

I have 3 different pandas dataframe, which I have concatenated. Now I would like to keep only those rows which appear in three columns and delete the rest. For instance
Column1 Column2 Column3
0 John a Sam
1 Sam b Rob
2 Daniel c John
3 Varys d Ella
I want to keep only those rows in Column1, which appear in both Column1 and Column2. In the above example its ROW -- 0 & 1.
Desired output
Column1 Column2
0 John a
1 Sam b

Filter the df by pass the series 'Column3' as an arg to isin to test for membership:
In [42]:
df[df['Column1'].isin(df['Column3'])]
Out[42]:
Column1 Column2 Column3
0 John a Sam
1 Sam b Rob

Related

How to filter out a list of names in a column based on a condition in Amazon Quicksight?

Suppose I have a table of two Columns
Col1
Col2
A
France
A
USA
B
Germany
B
Spain
C
Netherland
C
USA
D
Japan
E
USA
F
Canada
How could I remove (in Quicksight) all the elements in Col1 if it contains USA at least once in Col2 ?
Final Table should look like this:-
Col1
Col2
B
Germany
B
Spain
D
Japan
F
Canada
Create a calculated field. Using countIf and sumOver, an example would be like this:
sumOver
(
countIf({Col2}, {Col2} = 'USA'),
[{Col1}]
)
Then use a filter in your analysis, such as <your_calculated_field_name> Equals 0

Data step issue in sas enterprise guide

I need to write data step query in sas where i need to give sequence numbers to a column starting from a particular number.
For example right now my table looks like this:
Column 1 Column 2
abc book1
xyz book2
zex book3
I want my table to look like this:
Column 1 Column 2 Column3
abc book1 151
xyz book2 152
zex book3 153
How to add Column 3 with a sequence number staring from a particular number?
How about this
data have;
input Column1 $ Column2 $;
datalines;
abc book1
xyz book2
zex book3
;
data want;
do Column3 = 150 by 1 until (lr);
set have end=lr;
output;
end;
run;

How to create new column in power bi using given string match condition in first column and get value from another column, make new column?

My table is as follow
Col1 Col2
11_A 9
12_B 8
13_C 7
14_A 6
15_A 4
The table we need after the query
Col1 Col2 Col3
11_A 0 9
12_B 8 0
13_C 7 0
14_A 0 6
15_A 0 4
My query is
Col3 =
LEFT( 'Table'[Col2],
SEARCH("A", 'Table'[Col1], 0,
LEN('Table'[Col1])
)
)
Go to the query designer Add Column > Custom Column and use the following expression:
Update
You need two expressions (two new columns) for this:
One is:
'Your Column3
=if Text.Contains([Col1], "A") = true then [Col2] else 0
And the second:
'Your Column2
=if Text.Contains([Col1], "A") = false then [Col2] else 0
There are many ways to solve this,
Another easy way I like to do this with no-coding is to use Conditional Columns:
In PBI select Power Query Editor
Select your table on the edge of the screen
Select Add Column tab
Select Conditional Columns...
Name your column
Enter your condition as in the picture
You can add several conditions if you like
Don't forget to format your column to numeric if needed.
see picture
Adding columns using Conditional Column

Row with duplicate values in one column and different values in a second column

In Power BI, I need to identify all distinct duplicate values in a Column A that have distinct values in Column B.
Example input:
Name Index
-------------
john 1
mary 1
john 1
jim 1
john 2
mary 1
jim 2
jim 1
john 2
mary 2
Desired result:
Name Index
-------------
john 1
mary 1
jim 1
john 2
jim 2
mary 2
Column Name in my Power BI is a concatenated column
Is this possible?
You should be able to do this pretty easily in the Power Query Editor GUI.
Select the combination of columns that you want to remove duplicates on (name and index in your case) and then under the Home tab you can select Remove Rows > Remove Duplicates.
This will automatically generate the Table.Distinct M code that chillin suggests.
Provided your previous step is a table, you should just be able to use:
Table.Distinct(nameOfPreviousStep, {"Name", "Index"})
Below is an example of what I mean:
let
someTable = Table.FromRows({{"john",1},{"mary",1},{"john",1},{"jim",1},{"john",2},{"mary",1},{"jim",2},{"jim",1},{"john",2},{"mary",2}}, type table[Name=text, Index=Int64.Type]),
removeDuplicates = Table.Distinct(someTable, {"Name", "Index"})
in
removeDuplicates
Try it out and see if it gives you your expected output (I think it does based on what I've seen).

Combining SAS Data Sets with different no. of columns

I am having problem in combining two tables with different no. of columns.
Say my first table is table1:
table1
t1_col_1 t1_col_2 t1_col_3 ... t1_col_13
and my second table is table2:
table2
t2_col_1 t2_col2 t2_col3 t2_col4
Now if I type command:
data table3;
set tabel1 table2;
run;
What will be the out put of table3 ?
The SAS link says this command do a concatanation:
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001107839.htm
Since the columns no. are different, concatenation will cause problem.
So how does this command exactly works ? And what will be its output in this case ?
Appending (concatenating) two or more data sets is basically just stacking the data sets together with values in variables of the same name being stacked together. Unique variables in each data set will form their own variables in the new combined data set. Right now we have different number of variables. This article explains how concatenation works between data sets with different variables: http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001312944.htm
For example, suppose we have:
data work.table1;
input col1 $ col2 col3 col4;
datalines;
George 10 10 10
Lucy 10 10 10
;
run;
data work.table2;
input col1 $ col2;
datalines;
Shane 3
Peter 3
;
run;
data work.table3;
set table1 table2;
run;
OUTPUT:
col1 col2 col3 col4
George 10 10 10
Lucy 10 10 10
Peter 3 . . <== These entries are
Shane 3 . . empty.
col1 and col2 are present in both sets, so the values inside them will be stacked. col3 and col4 are only present in table1, so some of the values under them in the new combined set will be empty.