How to populate a cell with information stored in a location relative to the value in another cell? - if-statement

Let's say I have
Sheet 1
A B C
1 # 20 30
2 # 75 90
3 # 46 21
Sheet 2
A B C
1 X Y
Where "X" is a dropdown list of Sheet 1's B column.
"Y" is where I want to fill in the value in the C column of the respective row of Sheet 1 for Sheet 2's column A selection.
How might I do this?

try:
=VLOOKUP(A1; 'Sheet 1'!B:C; 2; 0)
for arrayformula use:
=ARRAYFORMULA(IFNA(VLOOKUP(A1; 'Sheet 1'!B:C; 2; 0)))

In C1 try
=IF(LEN(A1), VLOOKUP(A1, Sheet1!B:C, 2, 0),)

Related

Generate sum of all possible combinations of id

I have a dataset with the structure that looks something like this:
Group ID Value
1 A 10
1 B 15
1 C 20
2 D 10
2 E 25
Within each Group, I want to obtain the sum of all possible combinations of two or more IDs. For instance, within group 1, I can have the following combinations: AB, AC, BC, ABC. So, in total I have four possible combinations for group 1, of which I'd like to get the sum of the variable value.
I am using the formula for combinations of N elements in groups of size R to identify how many observations I need to add to the dataset to have enough observations.
For Group 1, the number of observations I need are:
3!/((3-2)!*2!)*2 = 6 for the two-IDs combinations
3!/(3-3)!*3!)*3 = 3 for the three-IDs combination.
So a total of 9 observations. Since I already have three, I can use the command:expand 6 if Group==1. For Group 1 I would get something like
Group ID Value
1 A 10
1 B 15
1 C 20
1 A 10
1 B 15
1 C 20
1 A 10
1 B 15
1 C 20
Now, I am stuck here on how to proceed to tell Stata to identify the combinations and create the summation. Ideally, I want to create two new variables, to identify the tuples and get the summation, so something that looks like:
Group ID Value Touple Sum
1 A 10 AB 25
1 B 15 AB 25
1 A 10 AC 30
1 C 20 AC 30
1 B 15 BC 35
1 C 20 BC 35
1 A 10 ABC 45
1 B 15 ABC 45
1 C 20 ABC 45
In this way, I could then just drop the duplicates in terms of Group and Tuples. Once I have the Tuples variable, getting the sum is straightforward, but getting the Tuples, I can't get my head around it.
Any advice on how to do this?
I tried doing this with nested loops and the tuples command.
First I create and save a tempfile to store results:
clear
tempfile group_results
save `group_results', replace emptyok
Then I input and save data, along with a local for the number of groups:
clear
input Group str1 ID Value
1 A 10
1 B 15
1 C 20
2 D 10
2 E 25
2 F 13 // added to test
2 G 2 // added to test
end
sum Group
local num_groups = r(max)
tempfile base
save `base', replace
Here's the core of the code. The outer loop here iterates over Groups. Then it makes a list of the IDs in that group, and uses the tuples command to make a list of the unique combinations of those IDs, with a minimum size of 2. The k loop iterates through the number of tuples and the m loop makes an indicator for tuple membership.
forvalues i = 1/`num_groups' {
display "Starting Group `i'"
use `base' if Group==`i', clear
* Make list of IDs to get unique combos of
forvalues j = 1/`=_N' {
local tuple_list`i' = "`tuple_list`i'' " + ID[`j']
}
* Get all unique combos in list using tuples command
tuples `tuple_list`i'', display min(2)
forvalues k = 1/`ntuples' {
display "Tuple `k': `tuple`k''"
local length = wordcount("`tuple`k''")
gen intuple=0
gen tuple`k'="`tuple`k''"
forvalues m = 1/`length' {
replace intuple=1 if ID==word("`tuple`k''",`m')
}
* Calculate sum of values in that tuple
gegen group_sum`k' = sum(Value) if intuple==1
drop intuple
list
}
* Reshape into desired format
reshape long tuple group_sum, i(Group ID Value) j(tuple_num)
drop if missing(group_sum)
sort tuple_num
list
append using `group_results'
save `group_results', replace
}
* Full results
use `group_results', clear
sort Group tuple_num
list
I hope this helps. The list commands will give you a busy results window but it shows what's all happening. Here's the output at the end of the i loop for Group 1:
+--------------------------------------------------+
| Group ID Value tuple_~m tuple group_~m |
|--------------------------------------------------|
1. | 1 C 20 1 B C 35 |
2. | 1 B 15 1 B C 35 |
3. | 1 A 10 2 A C 30 |
4. | 1 C 20 2 A C 30 |
5. | 1 A 10 3 A B 25 |
|--------------------------------------------------|
6. | 1 B 15 3 A B 25 |
7. | 1 C 20 4 A B C 45 |
8. | 1 A 10 4 A B C 45 |
9. | 1 B 15 4 A B C 45 |
+--------------------------------------------------+
This could be inefficient if your data is actually much larger!

How to write a foreach loop statement in SAS?

I'm working in SAS as a novice. I have two datasets:
Dataset1
Unique ID
ColumnA
1
15
1
39
2
20
3
10
Dataset2
Unique ID
ColumnB
1
40
2
55
2
10
For each UniqueID, I want to subtract all values of ColumnB by each value of ColumnA. And I would like to create a NewColumn that is 1 anytime 1>ColumnB-Column >30. For the first row of Dataset 1, where UniqueID= 1, I would want SAS to go through all the rows in Dataset 2 that also have a UniqueID = 1 and determine if there is any rows in Dataset 2 where the difference between ColumnB and ColumnA is greater than 1 or less than 30. For the first row of Dataset 1 the NewColumn should be assigned a value of 1 because 40 - 15 = 25. For the second row of Dataset 1 the NewColumn should be assigned a value of 0 because 40 - 39 = 1 (which is not greater than 1). For the third row of Dataset 1, I again want SAS to go through every row of ColumnB in Dataset 2 that has the same UniqueID as in Dataset1, so 55 - 20 = 35 (which is greater than 30) but NewColumn would still be assigned a value of 1 because (moving to row 3 of Datatset 2 which has UniqueID =2) 20 - 10 = 10 which satisfies the if statement.
So I want my output to be:
Unique ID
ColumnA
NewColumn
1
15
1
1
30
0
2
20
1
I have tried concatenating Dataset1 and Dataset2 into a FullDataset. Then I tried using a do loop statement but I can't figure out how to do the loop for each value of UniqueID. I tried using BY but that of course produces an error because that is only used for increments.
DATA FullDataset;
set Dataset1 Dataset2; /*Concatenate datasets*/
do i=ColumnB-ColumnA by UniqueID;
if 1<ColumnB-ColumnA<30 then NewColumn=1;
output;
end;
RUN;
I know I'm probably way off but any help would be appreciated. Thank you!
So, the way that answers your question most directly is the keyed set. This isn't necessarily how I'd do this, but it is fairly simple to understand (as opposed to a hash table, which is what I'd use, or a SQL join, probably what most people would use). This does exactly what you say: grabs a row of A, says for each matching row of B check a condition. It requires having an index on the datasets (well, at least on the B dataset).
data colA(index=(id));
input ID ColumnA;
datalines;
1 15
1 39
2 20
3 10
;;;;
data colB(index=(id));
input ID ColumnB;
datalines;
1 40
2 55
2 30
;;;;
run;
data want;
*base: the colA dataset - you want to iterate through that once per row;
set colA;
*now, loop while the check variable shows 0 (match found);
do while (_iorc_ = 0);
*bring in other dataset using ID as key;
set colB key=ID ;
* check to see if it matches your requirement, and also only check when _IORC_ is 0;
if _IORC_ eq 0 and 1 lt ColumnB-ColumnA lt 30 then result=1;
* This is just to show you what is going on, can remove;
put _all_;
end;
*reset things for next pass;
_ERROR_=0;
_IORC_=0;
run;

How can I get the last value of a column in my dataset in power bi?

In the given table how to get the last value of a particular column in a dataset
Roll No Name Index Score
1 ab1 1 23
2 ab2 2 43
3 ab3 3 42
Here we have to pick the Last Row Score value?
1) The first thing we have to create a custom index column in the table
2) Then we have to use the below formula to get the last Row Score value Score value
CALCULATE(LASTNONBLANK(Table[Score], 1), FILTER(ALL(Table), Table[Index] = MAX(Table[Index])))
The result would be 42.

Replace a row value with previous by group in SAS

Is there a way I could replace a row value to its previous row by each group?
Below is the before and after data set. Product for each type - C needs to be changed as type - L for each customer when the ID is same it has the highest amount.
Before
ObsCust LINK_ID Type Product Amount
1 1 12432 L A 23
2 1 12432 C B 0
3 2 23213 L C 234
4 2 23145 L D 25
5 2 23145 C E 0
6 3 21311 L F 34
7 3 21324 L G 45
8 3 21324 L H 35
9 3 21324 C I 0
After
Cust LINK_ID Type Product Amount
1 12432 L A                234
1 12432 C A                   -  
2 23213 L C           23,212
2 23145 L D                335
2 23145 C D                   -  
3 21311 L F                323
3 21324 L G             2,344
3 21324 L H                  34
3 21324 C G                   -  
Thank you!
if i understand correctly, you want to have product value for C Type be the product associated with the highest amount in L Types. If this is correct one possible way is to use the following. First the product with the highest amount for L-Type within each group of customers and IDs are calculated as follows:
note that the original dataset is assumed to be named "example".
proc sql;
create table L_Type as
select cust, LINK_ID, product, amount
from example
where type = 'L' and amount = max(amount)
group by cust, LINK_ID
;
quit;
then product calculated above is coded for c type in the original example.
proc sql;
select
e.cust
, e.LINK_ID
, e.type
, case when e.type = 'C' then b.product end as product
, e.amount
from example e left join L_Type b
on e.cust = b.cust and e.LINK_ID = b.LINK_ID
;
quit;
So you have a couple processing tasks to do:
Have you considered all the edge cases ?
For a customer find the row(s) with the maximum amount.
Is one of them type L ?
No, do nothing
Yes, track the Product and LinkId as follows
Is there more than one 'maximal' row ?
No, track the Product & LinkId from the one row
Yes, Is there more than one Product in the rows ?
No, track the Product value
Is there more than one LinkId ?
No, track the LinkId
Yes, Which LinkIds?
Track all the different LinkIds
Track one of these: first, lowest, highest, last LinkId
Yes, now what ?
Log an error ?
Track one of the Product values because only one can be used, which one ?
first occurring ?
lowest value ?
highest value ?
last occurring ?
For the tracked LinkIds (there might not be any) apply the tracked Product to the rows that are type C (or perhaps type not L)

Stata programming and putexcel-loop

I'm trying to make an automated Excel file that documents the number of observations dropped during my sample construction, using putexcel and a simple program.
I'm pretty new to programming, but the program below does the job. It stores 4 global macros for each time I drop some observations: 1) Number of observations dropped, 2) Share of observations dropped, 3) Number of observations left in the data set and 4) a string that describes why I drop the observations.
To export the results to excel I use the putexcel-command -- which is working fine. The problem is that I need to drop observations a lot of times in the dofile and I wondered if I could somehow incorporate the putexcel part in the program to make it loop over cells.
In other words, what I want is the program to automatically save the description ($why) in A1 the first time, in A8 the second time and so on.
I have provided an example of my code below:
** Generate some data:
clear
input id year wage
1 1 200
1 2 250
1 3 300
2 1 152
2 2 150
2 3 140
3 1 300
3 2 320
3 3 360
end
** Define program
cap program drop dropdata
program define dropdata
count
global N = r(N)
count if `1'
global drop = r(N)
global share = ($drop/$N)
drop if `1'
count
global left = r(N)
global why = "`2'"
end
** Drop if first year
dropdata year==1 "Drop if first year"
** Export to excel
putexcel set "documentation.xlsx", modify
putexcel A1 = ("$why")
putexcel A3 = ("Obs. dropped") A4 = ("Share dropped") A5 = ("Observations left")
putexcel B3 = ($drop) B4 = ($share) B5=($left)
** Now drop if wage is < 300
dropdata wage<300 "Drop if wage<300"
putexcel A8 = ("$why")
putexcel A10 = ("Obs. dropped") A11 = ("Share dropped") A12 = ("Observations left")
putexcel B10 = ($drop) B11 = ($share) B12 = ($left)
The issue with this is that Stata does not know what cells are filled and which are not, so I think it would probably be easiest to include another argument in your program define that says the number of times you have run the program.
Here is an example:
** Generate some data:
clear
input id year wage
1 1 200
1 2 250
1 3 300
2 1 152
2 2 150
2 3 140
3 1 300
3 2 320
3 3 360
end
** Define program
cap program drop dropdata
program define dropdata
count
local N = r(N)
count if `1'
local drop = r(N)
local share = ($drop/$N)
drop if `1'
count
local left = r(N)
local why = "`2'"
local row1 = `3'*7 + 1
local row3 = `row1' + 2
local row4 = `row1' + 3
local row5 = `row1' + 4
putexcel set "documentation.xlsx", modify
putexcel A`row1' = ("`why'")
putexcel A`row3' = ("Obs. dropped") A`row4' = ("Share dropped") A`row5' = ("Observations left")
putexcel B`row3' = (`drop') B`row4' = (`share') B`row5' = (`left')
end
** Drop if first year
dropdata year==1 "Drop if first year" 0
** Now drop if wage is < 300
dropdata wage<300 "Drop if wage<300" 1
Note that the change is to include the number of calls already done as the third argument in dropdata, then we add the putexcel commands to rows based on that number.
As an aside:
I changed all of your globals to locals because they're safer. Also, in general, if you want to return macros from a program that you write, you tell Stata the program is, for example an rclass and then use statements like below:
program define return_2, rclass
return local asdf 2
end
and then you can access the local asdf (which is equal to 2) as the local r(asdf) and you can check the values of all locals returned by the program with the command return list