How to store possible values of a variable in local macro? - stata

I want to store the distinct values of a variable of my dataset in a local macro. I thought that there could be a way using a function as table and storing some r(). But I could not find any function with an useful r() that returns what I want.
As an example, I would like to find an expression to substitute in the code below and get as a return a local with Domestic Foreign
sysuse auto
table foreign
local foreign_unique_values = r(...)

As suggested by William Lisowski in comments, levelsof does this.
In my example code would be:
sysuse auto
levelsof foreign
local foreign_distinct_values = r(levels)
or with a categorical variable:
levelsof make
local make_distinct_values = r(levels)

Related

Labeling variables after recoding them

I would like to label the variables back to their original variable labels after I recode them in Stata. How can I accomplish this?
sysuse auto, clear
recode foreign (1=2 "Foreign") (0=1 "Domestic"), gen(foreign1)
drop foreign
rename foreign1 foreign
* label var foreign "Car type"
foreach var of varlist foreign {
local var_label: var label `var'
local var_label1: regexm("`var_label'", "\((.)+\)")
label var `var' "`var_label1'"
}
The solution with regexm() looks awkward to me, which is presumably part of the question.
In your example, there is a simple alternative that leaves the variable label intact:
sysuse auto, clear
replace foreign = 1 + foreign
label def origin 1 Domestic 2 Foreign, modify
. d foreign
Variable Storage Display Value
name type format label Variable label
-----------------------------------------------------------------------------------------
foreign byte %8.0g origin Car origin
This works too:
sysuse auto, clear
recode foreign (1=2 "Foreign") (0=1 "Domestic"), gen(foreign1)
_crcslbl foreign1 foreign
drop foreign
rename foreign1 foreign
d foreign
You are aware of the scope for saving the variable label as a local macro for safe-keeping.
(In general, 0-1 indicator variables are immensely more useful and natural statistically than 1-2 indicators, but I presume that you are just making up a reproducible example. If in doubt see e.g. https://www.stata-journal.com/article.html?article=dm0099 )

How to create a variable for when other variable takes up highest value, with conditions on countries?

I'm trying to create a new variable under conditions on other variables. I have countries in Africa with each country divided into constituencies; for each I have the number of votes for a candidate.
I am trying to work for one country at a time (country=ctr) and to create the value in each constituency (cst)
I would like to create a variable win1 = 2 when the votes take the highest value in a given constituency, and in a given country.
I have tried :
by cst : replace win1=2 if cv1=max(cv1) in (ctr==566)
by ctr cst (cv1) : replace win1=2 if cv1==cv1[_N]
Errors:
in is for observation numbers. It's not an alternative to if.
You need == to test equality, not =.
max() as a Stata function requires two or more arguments and works rowwise, not across groups of observations.
This code assumes no missing values.
It's also easier than you think in so far as you can work with several countries at once.

How to reference column of DAX variable?

I have a DAX variable that contains a table. How can I reference a particular column in that variable?
For example, in the below command, the EVALUATE returns an error. But it works if I replace table1 with FactInternetSales (which is the name of the table which contains that column)
define var table1=FactResellerSales
EVALUATE ROW("a",COUNTBLANK(table1[SalesAmount]))
You can reference them only using functions that iterate.
For example,
DEFINE
VAR _TABLE1 = FactResellerSales
EVALUATE
{COUNTX(_TABLE1 , _TABLE1 [SalesAmount)}
Other example of iterator functions are sumx, filter.

How to collapse data while retaining other variables?

I am trying to collapse my data using proc sql. However, i noticed that when I tried to collapse my data I lost a bunch of variables that I wanted to keep. I am trying to collapse my data based on the variable MRN (which is numeric). The other variables I want to keep are CITY and SITE (these are character values) and these are constant for each unique MRN, so collapsing them should be fine.
Here is the code I am using
proc sql;
create table collapsed_data as
select distinct mrn,
sum(msk_tx_yes) as msk_tx_yes,
sum(msk_cancel_tx_yes) as msk_cancel_tx_yes,
sum(msk_ca_yes) as msk_ca_yes,
sum(msk_cancel_ca_yes) as msk_cancel_ca_yes,
sum(msk_dc_yes) as msk_dc_yes,
sum(conc_psych_tx_yes) as conc_psych_tx_yes,
sum(conc_psych_ca_yes) as conc_psych_ca_yes,
sum (conc_psych_dc_yes) as conc_psych_dc_yes,
sum (conc_yes) as conc_yes,
sum (psych_yes) as psych_yes,
sum (foot_prog) as foot_prog,
sum (hand_prog) as hand_prog,
sum (surg_prog) as surg_prog,
sum (sx_yes) as sx_yes
from temp_collapsed_data
group by mrn;
quit;
I'm not sure how to use the SELECT and DISTINCT functions together.
I thought maybe I could add the variables CITY and STATE after SELECT, while keeping DISTINCT but it doens't sem to work.
I want to be able to keep CITY and STATE in the new table along with the new summed variables I am making. How can I achieve this without turning CITY and STATE into dummy coded variables? I would like to keep them as character values if possible.
Anyone know how I can achieve this?
Yur code is already correct. Just add the variables to the select statement.
proc sql;
create table collapsed_data as
select distinct mrn, city, site,
sum(msk_tx_yes) as msk_tx_yes,
sum(msk_cancel_tx_yes) as msk_cancel_tx_yes,
sum(msk_ca_yes) as msk_ca_yes,
sum(msk_cancel_ca_yes) as msk_cancel_ca_yes,
sum(msk_dc_yes) as msk_dc_yes,
sum(conc_psych_tx_yes) as conc_psych_tx_yes,
sum(conc_psych_ca_yes) as conc_psych_ca_yes,
sum (conc_psych_dc_yes) as conc_psych_dc_yes,
sum (conc_yes) as conc_yes,
sum (psych_yes) as psych_yes,
sum (foot_prog) as foot_prog,
sum (hand_prog) as hand_prog,
sum (surg_prog) as surg_prog,
sum (sx_yes) as sx_yes
from temp_collapsed_data
group by mrn;
quit;
The distinct statement will result in not having two rows with the same information.

Pass date value from one session to another session in the same workflow

I have two sessions i a workflow like below
workflow1->session1->session2
i have a join_date column in a table in Mapping1 , in session1
i want to pick this join_date value and pass to mpping2/session2
If join date value changes in the table in session1 then the same value should pick and pass to session2
I will use this date value in a query in session2 .
Please tell me how to achieve this?
Thank you
You can do this using mapping and workflow variables.
In mapping1 create a mapping variable say var1 and set its value to join_date.
Create a workflow variable in the workflow, say var_wkf
In session1, in Post-session on success variable assignment, assign var_wkf = var1
In mapping2, create a mapping variable, say var2
In session2, in Pre-session variable assignment, assign var2=var_wkf
You can use the var2 variable in mapping2, it should have the value set in mapping1