I have data structured as follows in Table 1:
ID Variable1 Variable2
1 2 5
2 10 2
3 14 3
4 4 3
I need to add the following data to the above table for each row in Table 2:
Coef Value
Variable1C 4.2
Variable2C 5.6
The final result should be:
ID Variable1 Variable2 Variable1C Variable2C
1 2 5 4.2 5.6
2 10 2 4.2 5.6
3 14 3 4.2 5.6
4 4 3 4.2 5.6
How might I pursue this? So far, I've only be able to get one of data by transforming table 2 and then adding it, but this is not what I want.
A simple data step should do that.
data want ;
set have ;
Variable1C=4.2 ;
Variable2=5.6;
run;
If you have the data in a table then transpose it and combine them.
proc transpose data=table2 out=wide ;
id coef ;
var value ;
run;
data want ;
set table1;
if _n_=1 then set wide ;
run;
Related
I have a data of three variables. one is id, second is observation count for that id, and third is the value of that observation. I want to transpose the data from long to wide. The issue is that I am getting an error saying my by group is not sorted in ascending order (even though it is). Another issue is that not all values have same amout of observations , please see example below and data structure of what I am looking for
data have;
input id observation value;
cards;
1 1 '4.8.9'
1 2 '4.5.7'
2 1 '5.0.5'
3 1 '4.2.0'
3 2 '4.1.0'
3 3 '5.1.9';run;
data want;
input id observation1 observation2 observation3;
cards;
1 '4.8.9' '4.5.7' NA
2 '5.0.5' NA NA
3 '4.2.0' '4.1.0' '5.1.9'
;run;
/* i have tried the following:
proc transpose data=b out=c ;
by value ;
id id;
var value;
run;
proc transpose data=b out=c ;
by value ;
id id;
var observation;
run;
*/
Your BY variable is called ID in your example dataset.
Your example data step is not defining VALUE as character. Also don't indent the in-line data lines.
You can use the prefix= option to help name the new variables. Also let's modify the value of OBSERVATION for ID=2 to demonstrate more clearly how the value of OBSERVATION is setting the variable name instead of just the order of the observations in the ID group. Now the value '5.0.5' will be stored in OBSERVATION2 even though it is the first observation for that value of ID.
data have;
input id observation value $;
cards;
1 1 '4.8.9'
1 2 '4.5.7'
2 2 '5.0.5'
3 1 '4.2.0'
3 2 '4.1.0'
3 3 '5.1.9'
;
proc transpose data=have out=want(drop=_name_) prefix=observation;
by id;
id observation;
var value;
run;
Results:
Obs id observation1 observation2 observation3
1 1 '4.8.9' '4.5.7'
2 2 '5.0.5'
3 3 '4.2.0' '4.1.0' '5.1.9'
I want to transpose a simple dataset as left, to become a dataset at the right. They are all numeric variables. Please also make the variable names as I put there (I have a lot of variables I want to follow this pattern), would prefer not to rename them by hand one by one if possible. Thank you!
Here is a simple approach. I added another id for demonstration. You can re-arrange the columns if you like.
data have;
input id Vistime v1 v2;
datalines;
1 1 2 5
1 2 3 6
1 3 4 7
2 1 2 5
2 2 3 6
2 3 4 7
;
proc transpose data=have out=temp;
by id Vistime;
var v1 v2;
run;
proc transpose data=temp delim=_ out=want(drop=_:);
by id;
var col1;
id _name_ Vistime;
run;
Result
id v1_1 v2_1 v1_2 v2_2 v1_3 v2_3
1 2 5 3 6 4 7
2 2 5 3 6 4 7
I have a dataset like this (but with several hundred vars):
id q1 g7 q3 b2 zz gl az tre
1 1 2 1 1 1 2 1 1
2 2 3 3 2 2 2 1 1
3 1 2 3 3 2 1 3 3
4 3 1 2 2 3 2 1 1
5 2 1 2 2 1 2 3 3
6 3 1 1 2 2 1 3 3
I'd like to keep id, b2, and tre, but set everything else to missing. In a dataset this small, I can easily use call missing (q1, g7, q3, zz, gl, az) - but in a set with many more variables, I would effectively like to say call missing (of _ALL_ *except ID, b2, tre*).
Obviously, SAS can't read my mind. I've considered workarounds that involve another data step or proc sql where I copy the original variables to a new ds and merge them back on post, but I'm trying to find a more elegant solution.
This technique uses an un-executed set statement (compile time function only) to define all variables in the original data set. Keeps the order and all variable attributes type, labels, format etc. Basically setting all the variables to missing. The next SET statement which will execute brings in only the variables the are NOT to be set to missing. It doesn't explicitly set variables to missing but achieves the same result.
data nomiss;
input id q1 g7 q3 b2 zz gl az tre;
cards;
1 1 2 1 1 1 2 1 1
2 2 3 3 2 2 2 1 1
3 1 2 3 3 2 1 3 3
4 3 1 2 2 3 2 1 1
5 2 1 2 2 1 2 3 3
6 3 1 1 2 2 1 3 3
;;;;
run;
proc print;
run;
data manymiss;
if 0 then set nomiss;
set nomiss(keep=id b2 tre:);
run;
proc print;
run;
Another fairly simple option is to set them missing using a macro, and basic code writing techniques.
For example, let's say we have a macro:
%call_missing(var=);
call missing(&var.);
%mend call_missing;
Now we can write a query that uses dictionary.columns to identify the variables we want set to missing:
proc sql;
select name
from dictionary.columns
where libname='WORK' and memname='HAVE'
and not (name in ('ID','B2','TRE')); *note UPCASE for all these;
quit;
Now, we can combine these two things to get a macro variable containing code we want, and use that:
proc sql;
select cats('%call_missing(var=',name ,')')
into :misslist separated by ' '
from dictionary.columns
where libname='WORK' and memname='HAVE'
and not (name in ('ID','B2','TRE')); *note UPCASE for all these;
quit;
data want;
set have;
&misslist.;
run;
This has the advantage that it doesn't care about the variable types, nor the order. It has the disadvantage that it's somewhat more code, but it shouldn't be particularly long.
If the variables are all of the same type (numeric or character) then you could use an array.
data want ;
set have;
array _all_ _numeric_ ;
do over _all_;
if upcase(vname(_all_)) not in ('ID','B2') then _all_=.;
end;
run;
If you don't care about the order then just drop the variables and add them back on with 0 observations.
data want;
set have (keep=ID B2 TRE:) have (obs=0 drop=ID B2 TRE:);
run;
I am wanting to count the number of time a certain value appears in a particular column in sas. For example in the following dataset the value 1 appears 3 times
value 2 appears twice, value 3 appears once, value 4 appears 4 times and value 5 appears four times.
Game_ball
1
1
1
2
2
3
4
4
4
5
5
5
5
5
I want the dataset to represented like the following:
Game_ball Count
1 3
2 2
3 1
4 4
5 4
. .
. .
. .
Thanks in advance
As per #Dwal, proc freq is the easiest solution.
Using your sample data,
proc freq data=sample;
table game_ball/out=output;
run;
Or do it in one-pass data step
proc sort data = sample;by game_ball;run;
data output;
set sample;
retain count;
if first.game_ball then count = 0;
count + 1;
if last.game_ball then output;
by game_ball;
run;
Or in SQL
proc sql;
create table output as
select game_ball, count(*) as count
from sample
group by game_ball;
quit;
I have data that looks like this:
id t x
1 1 3.7
1 3 1.2
1 4 2.4
2 2 6.0
2 4 6.1
2 5 6.2
For each id I want to add observations as necessary so there are values for all 1<=t<=5.
So my desired result is:
id t x
1 1 3.7
1 2 .
1 3 1.2
1 4 2.4
1 5 .
2 1 .
2 2 6.0
2 3 .
2 4 6.1
2 5 6.2
My real setting involves massive amounts of data, so I'm looking for the most efficient way to do this.
Here's probably the simplest way, using the COMPLETETYPES option in PROC SUMMARY. I'm making the assumption that the combinations of id and t are unique in the data.
The only thing I'm not sure of is whether you'll run into memory issues when running against a very large dataset, I have had problems with PROC SUMMARY in this respect in the past.
data have;
input id t x;
cards;
1 1 3.7
1 3 1.2
1 4 2.4
2 2 6.0
2 4 6.1
2 5 6.2
;
run;
proc summary data=have nway completetypes;
class id t;
var x;
output out=want (drop=_:) max=;
run;
One option is to use PROC EXPAND, if you have ETS. I'm not sure if it'll do 100% of what you want, but it might be a good start. It seems like so far the main problem is it won't do records at the start or the end, but I think that's surmountable; just not sure how.
proc expand data=have out=want from=daily method=none extrapolate;
by id;
id t;
run;
That fills in 2 for id 1 and 3 for id 2, but does not fill in 5 for id 1 or 1 for id 2.
To do it in base SAS, you have a few options. PROC FREQ with the SPARSE option might be a good option.
proc freq data=have noprint;
tables id*t/sparse out=want2(keep=id t);
run;
data want_fin;
merge have want2;
by id t;
run;
You could also do this via PROC SQL, with a join to a table with the possible t values, but that seems slower to me (even though the FREQ method requires two passes, FREQ will be pretty fast and the merge is using already sorted data so that's also not too slow).
Here's another approach, provided that you already know the minimum/maximum values for T. It creates a template that contains all values of ID and T, then merges with the original data set so that you keep the values of X.
proc sort data=original_dataset out=template(keep=id) nodupkey;
by id;
run;
data template;
set template;
do t = 1 to 5; /* you could make these macro variables */
output;
end;
run;
proc sort data=original_dataset;
by id t;
run;
data complete_dataset;
merge template(in=in_template) original_dataset(in=in_original);
by id t;
if in_template then output;
run;