How to compare rows within the same dataframe and create a new column for similar rows - row

obj_id
Column B
Colunm C
a1
cat
bat
a2
bat
man
r1
man
apple
r2
apple
cat
The orignal dataframe (above) is called df
I am trying to make a new colunm called new_obj_id where if rows in column B match any row of col C the new_obj_id should then have values of obj_id that match col B
obj_id
Column B
Colunm C
new_obj_id
a1
cat
bat
a2
a2
bat
man
r1
r1
man
apple
r2
r2
apple
cat
a1
This is the expected table
This is what I tried but couldn't get through:
dataframe1['new_obj_id'] = dataframe1.apply(lambda x: x['obj_id']
if x['Column_B'] in x['Column C']
else 'none', axis=1)

Try this:
df['new_obj_id'] = df['Column C'].map(dict(zip(df['Column B'],df['obj_id'])))
Output:
0 a2
1 r1
2 r2
3 a1

Related

A formula that finds the value in range 1 in range 2 and returns the value in another column at that row position

data1
data2
data2
rusult
a
a
apple
apple
c
b
banna
kiwi
b
c
kiwi
banna
c
kiwi
a
apple
a
apple
b
banna
c
kiwi
Find the first value 'a' in data2.
Using the found row position as the index, find the value of the second column of data2.
Record in result.
Repeat process. I want to make this work as a formula!
use:
=INDEX(IFNA(VLOOKUP(AF5:AF; AH:AI; 2; )))

Redshift generate rows as many as value in another column

df
customer_code contract_code product num_products
C0134 AB01245 toy_1 4
B8328 EF28421 doll_4 2
I would like to transform this table based on the integer value in column num_products and generate a unique id for each row:
Expected_df
unique_id customer_code contract_code product num_products
A1 C0134 AB01245 toy_1 1
A2 C0134 AB01245 toy_1 1
A3 C0134 AB01245 toy_1 1
A4 C0134 AB01245 toy_1 1
A5 B8328 EF28421 doll_4 1
A6 B8328 EF28421 doll_4 1
unique_id can be any random characters as long as I can use a count(distinct) on it later on.
I read that generate_series(1,10000) i is available in later versions of Postgres but not in Redshift
You need to use a recursive CTE to generate the series of number. Then join this with you data to produce the extra rows. I used row_number() to get the unique_id in the example below.
This should meet you needs or at least give you a start:
create table df (customer_code varchar(16),
contract_code varchar(16),
product varchar(16),
num_products int);
insert into df values
('C0134', 'AB01245', 'toy_1', 4),
('B8328', 'EF28421', 'doll_4', 2);
with recursive nums (n) as
( select 1 as n
union all
select n+1 as n
from nums
where n < (select max(num_products) from df) )
select row_number() over() as unique_id, customer_code, contract_code, product, num_products
from df d
left join nums n
on d.num_products >= n.n;
SQLfiddle at http://sqlfiddle.com/#!15/d829b/12

Count with three different column in-between two tables in Power BI

I have a two tables are Data and Report.
Data
Data table contain the following three columns are Check, Supplier Status and Condition.
Report
Report table contain Supplier Status only.
Result
I am trying to get the count according to the supplier status based on the check (expect “NA”) and condition (=X) only from Data table to Report table.
I am trying to count Ok and Not Ok according to the Supplier status (expect “NA”) with condition =X
Data
Desired Result:
SUPPLIER STATUS NOT OK OK
A1 5 5
A2 4 4
A3 3 3
A4 2 2
A5 1 1
MIXED 1 3
CHECK SUPPLIER STATUS CONDITION
OK A1 X
OK A1 X
OK A1 X
OK A1 X
OK A1 X
NOT OK A1 X
NOT OK A1 X
NOT OK A1 X
NOT OK A1 X
NOT OK A1 X
OK A2 X
OK A2 X
OK A2 X
OK A2 X
NOT OK A2 X
NOT OK A2 X
NOT OK A2 X
NOT OK A2 X
OK A3 X
OK A3 X
OK A3 X
NOT OK A3 X
NOT OK A3 X
NOT OK A3 X
OK A4 X
OK A4 X
NOT OK A4 X
NOT OK A4 X
OK A5 X
NOT OK A5 X
OK MIXED X
OK MIXED X
OK MIXED X
NOT OK MIXED X
OK NA NA
OK NA NA
OK NA NA
NOT OK NA NA
NOT OK NA NA
NOT OK NA NA
I would actually use a measure, not a calculated column. To get a measure filter results like you did in the visual, you need to use the combination of CALCULATE/FILTER functions.
https://learn.microsoft.com/en-us/dax/calculate-function-dax
https://learn.microsoft.com/en-us/dax/filter-function-dax
Count = CALCULATE(COUNTROWS(DATA), FILTER(DATA, DATA[CONDITION] = "X"))
Drop this measure in the values container of the matrix visual.
You can also have a separate measure for counting OK and NOR OK, like:
#Not OK = CALCULATE(COUNTROWS(DATA), FILTER(DATA, DATA[CONDITION] = "X" && DATA[SUPPLIER STATUS] = "NOT OK"))
#OK = CALCULATE(COUNTROWS(DATA), FILTER(DATA, DATA[CONDITION] = "X" && DATA[SUPPLIER STATUS] = "OK"))

Conditional calculation based on another column

I have a cross reference table and another table with the list of "Items"
I connect "PKG" to "Item" as "PKG" has distinct values.
Example:
**Cross table** **Item table**
Bulk PKG Item Value
A D A 2
A E B 1
B F C 4
C G D 5
E 8
F 3
G 1
After connecting the 2 above tables by PKG and ITEM i get the following result
Item Value Bulk PKG
A 2
B 1
C 4
D 5 A D
E 8 A E
F 3 B F
G 1 C G
As you can see nothing shows up for the first 3 values since it is connected by pkg and those are "Bulk" values.
I am trying to create a new column that uses the cross reference table
I want to create the following with a new column
Item Value Bulk PKG NEW COLUMN
A 2 5
B 1 3
C 4 1
D 5 A D 5.75
E 8 A E 9.2
F 3 B F 3.45
G 1 C G 1.15
The new column is what I am trying to create.
I want the original values to show up for bulk as they appear for pkg. I then want the Pkg items to be 15% higher than the original value.
How can I calculate this based on the setup?
Just write a conditional custom column in the query editor:
New Column = if [Bulk] = null then [Value] else 1.15 * [Value]
You can also do this as a DAX calculated column:
New Column = IF( ISBLANK( Table1[Bulk] ), Table1[Value], 1.15 * Table1[Value] )

Splunk query to compare two fields and select value from 3rd field if the comparison match

I am very new to splunk and need your help in resolving below issue.
I have two CSV files uploaded in splunk instance. Below mentioned is each file and its fileds.
Apple.csv
a. A1 b. A2 c. A3
Orange.csv
a. O1 (may have values matching with values of A3) b. O2
My requirement is as below:
Select set of values of A1,A2,A3 and O2 from Apple.csv and Orange.csv
where A1=”X” and A2=”Y” and A3 = O1
and display the values in a table:
A1 A2 A3
X Y 123
LP HJK 222
X Y 999
O1 O2
999 open
123 closed
65432 open
Output
A1 A2 A3 O2
X Y 123 Open
X Y 999 closed
Very much appreciate your help.
You could do this
source="apple.csv" OR source="orange.csv"
| eval grouping=coalesce(A3,O1)
| stats first(A1) as A1 first(A2) as A2 first(A3) as A3 first(O2) as O2 by grouping
| fields - grouping
Although I would think that considering the timestamp of the events might also be important...