Stata Using Putexcel with bysort() command - stata

I have a large set of categorical data that I need moved to Excel. I also need to sort the data by specific criteria - whether a state has adopted a certain policy - which I'll call var1. For each var2, var3, ... , var n, I want to write something similar to
bysort var1: tab var2 [aw = weight]
Because the data are categorical, I'm not interested in mean, sd, etc., only the number and proportion of responses for each category. But when I write
putexcel A1 = bysort var1: tab var2 [aw = weight]
the console tells me "weights not allowed." If I add parentheses and write
putexcel A1 = (bysort var1: tab var2 [aw = weight])
the console says "bysort not found."
Any idea what's going on here?

Related

SAS: How to vertically join multiple datasets where a variable is numeric in one dataset and character in the other

This is complicated a little because we're using a pipeline with a filelist to compile the data, so there are 50+ datasets coming in. I need to combine many, many datasets vertically, but var2 is numeric in some and character in others. Var1 is not important, so we can drop it, but when I try to drop it in the data step, it is throwing an error because of the differing data types. More details below.
Here's what I want to do at it's most basic...
data in1;
input var1 $ var2
datalines;
a 1
b 2
;
data in2;
input var1 $ var2 $
datalines;
a 1a
b 2b
;
data newdd;
set in1 in2;
run;
Is it possible to combine these datasets in the "data newdd" step without changing the inputs? Is there a way to drop var2 in this data step in a way that will still let it merge var1 and not throw an error? Or better yet, can I make var2 read in as character in all cases?
To drop var2 use a data set option.
data newdd;
set in1(drop=var2) in2(drop=var2);
run;
To combine and ensure it's the same:
data newdd;
set in1 (in=t1 rename=var2=_var2) in2(in=t2 );
if t1 then var2 = put(_var2, 8. -l);
run;
Long answer - you need to fix how you read in the files - all 50 from source so they're consistent. You can have SAS generate the correct types/INPUT statement if you have a master list of variables and type/length/format/informat, type at minimum.

Stata: Replace values of one row based on another if data are missing

In Stata, I am trying to change the values--both string and numeric--of one row based on the one just above or just below it only if the values are missing. Here are some sample data:
input
str40 id var1 var2 var3 var4 str40 var5_string str40 var6_string
"correctly-spelled" 10 20 . . "random text 1" ""
"misspelled" . . 30 40 "" "random text 2"
end
Essentially, I want my final dataset to look as follows:
input
id var1 var2 var3 var4 var5_string var6_string
"correctly-spelled" 10 20 30 40 "random text 1" "random text 2"
end
I need a row-specific solution (i.e. avoiding collapse), because my (wide) dataset has thousands of labeled variables, and I don't want to lose the labels due to collapse. Also, not all of the variables are numeric, and the naming conventions of the variables are not consistent. Accordingly, fixing the spelling of id with a simple replace, executing a collapse (firstnm) id var5_string var6_string (mean) var1 var2 var3 var4, by(id), or using var* for anything won't help. Basically, what happened was one person merged using the "correctly-spelled" id, the other person merged using the "misspelled" id, and I don't have any of the source files. Thanks!
If you can assume that the misspelled ID comes right after (or right before) the correctly spelled, you can use _n±1 to get the previous or following value. For more information on system variables see help _variables
If you assume the correct one always comes first, then the second replace would be sufficient.
mi() is the abbreviated missing() function.
the second conditions & !mi(var'[_n±1])`, are just to make sure that non-missing don't get replaced by missing values, should two valid (but different) ID's come up sequentially. Depending on your data, this further condition might not be necessary.
local list_of_vars var1 var2 var3 var4 var5_string var6_string
foreach var of local list_of_vars {
replace `var' = `var'[_n-1] if mi(`var') & !mi(`var'[_n-1])
replace `var' = `var'[_n+1] if mi(`var') & !mi(`var'[_n+1])
}
. list
+-------------------------------------------------------------------------------+
| id var1 var2 var3 var4 var5_string var6_string |
|-------------------------------------------------------------------------------|
1. | correctly-spelled 10 20 30 40 random text 1 random text 2 |
2. | misspelled 10 20 30 40 random text 1 random text 2 |
+-------------------------------------------------------------------------------+
Then just keep the correct ones. Hopefully you can identify them somehow.
// the following is just to be able to identify the correct id's, of course you will have to adapt it so that it matches only the correctly-spelled IDs or you have other way of identifying them :)
gen _ck_corect_id = (id=="correctly-spelled")
keep if _ck_corect_id==1

Summary table of many variables when each needs to be restricted using if

I have three different variables in Stata, var1, var2, and var3.
I need to make a summary table of these three variables so that I have the observation number, mean, sd, min, max as the fields in the resulting summary table.
I am using the following code :
su var1 if restriction == 2
su var2 if restriction == 3
su var3 if restriction == 4
Since the summary table is created from variables that are applied with restrictions, I am unable to use :
su var1 var2 var3
I would be very grateful if anyone has any ideas on how to modify my code so that instead of three lines of code I can use one line of code to get a single table will all the stats I require, which I can then copy as a table into my Word document.
Nothing reproducible here without example data. Please study https://stackoverflow.com/help/mcve
But I would go
gen var1_2 = var1 if restriction == 2
gen var2_3 = var2 if restriction == 3
gen var3_4 = var3 if restriction == 4
summarize var1_2 var2_3 var3_4

Local macro on subsample data using if statement in Stata

I want to use the local command in Stata to store several variables that I afterwards want to export as two subsamples. I separate the dataset by the grouping variable grouping_var, which is either 0 or 1. I tried:
if grouping_var==0 local vars_0 var1 var2 var3 var4
preserve
keep `vars_0'
saveold "data1", replace
restore
if grouping_var==1 local vars_1 var1 var2 var3 var4
preserve
keep `vars_1'
saveold "data2", replace
restore
However, the output is not as I expected and the data is not divided into two subsamples. The first list includes the whole dataset. Is there anything wrong in how I use the if statement here?
There is a bit of confusion between the "if qualifier" and the "if command" here. The syntax if (condition) (command) is the "if command", and generally does not provide the desired behavior when written using observation-level logical conditions.
In short, Stata evaluates if (condition) for the first observation, which is why your entire data set is being kept/saved in the first block (i.e., in your current sort order, grouping_var[1] == 0). See http://www.stata.com/support/faqs/programming/if-command-versus-if-qualifier/ for more information.
Assuming you want to keep different variables in each case, something like the code below should work:
local vars_0 var1 var2 var3 var4
local vars_1 var5 var6 var7 var8
forvalues g = 0/1 {
preserve
keep if grouping_var == `g'
keep `vars_`g''
save data`g' , replace
restore
}

How to match data in SAS

I have a dataset which contain three variables var1, var2, and Price. Price is the price of var2. var1 is a subsample of of Var2. Now, I want to find the price of each product in var1 by matching the name of Var1 with Var2.
The data looks like this. Can anyone help me solve this out please. Many thanks
Var1 Var2 Price
apple ?
apple 2
banana ?
banana 2.1
apple ?
orange ?
orange 4
banana ?
yoghurt 2
You could do this through SQL by merging your prices onto your dataset by var1/var2:
proc sql ;
create table output as
select a.var1, a.var2, b.price
from input a
left join (select distinct var2, price
from input
where not missing(var2)) as b
on (a.var1=b.var2
or a.var2=b.var2)
;quit ;
Try to use hash table.
data want;
if 0 then set have(keep=var2 price where=(not missing(var2)));
if _n_=1 then do;
declare hash h (dataset:'have1(keep=var2 price where=(not missing(var2)))');
h.definekey('var2');
h.definedata('price');
h.definedone();
call missing(var2,price);
end;
set have;
rc=h.find(key:var1);
drop rc;
run;