I have a SAS column as below
-10
20
-30
40
I want to make the column like
10
20
30
40
I need to remove the sign and keep the same number. I don't know how to do this.
You can use ABS function.
A small sample code:
data begin;
input var ##;
cards;
1 1 -1 -1 2 -2 -3 3
; run;
data wanted;
set begin;
var2= abs(var);
run;
For more on abs see documentation
EDIT: In case you are dealing with strings you can just remove the string:
data begin;
input var $ ##;
cards;
1 1 -1 -1 2 -2 -3 3
; run;
data wanted;
set begin;
var2= tranwrd(var, '-', '');
run;
Also documentation on TRANWRD
two ways without creating additional variables:
data begin;
input var ##;
cards;
1 1 -1 -1 2 -2 -3 3
; run;
data wanted;
set begin;
var= abs(var);
run;
proc sql noprint;
create table wanted2 as
select abs(var)as var from begin;quit;
Another way would be to be to create a new variable where var2=sqrt(var**2)
Related
I want to recode the max value of a variable as 1 and 0 when it is not. For each variable, there may be multiple observations with the max value. The max value for each value is not fixed, i.e. from cycle to cycle the max value for each variable may change. And there are hundreds of variables, cannot "hard-code" anything.
The final product would have the same dimensions as the original table, i.e. equal number of rows and columns as a matrix of 0s and 1s.
This is within SAS. I attempted to calculate the max of each variable and then append these max as a new observation into the data. Then comparing down the column of each variable against the "max" observation... looking into examples of the following did not help:
SQL
Array in datastep
proc transpose
formatting
Any insight would be much appreciated.
Here is a version done with SQL:
The idea is that we first calculate the maximum. The Latter select. Then we join the data to original and the outer the case-select specifies if the flag is set up or not.
data begin;
input var value;
cards;
1 1
1 2
1 3
1 2.5
1 1.7
1 3
2 34
2 33
2 33
2 33.7
2 34
2 34
; run;
proc sql;
create table result as
select a.var, a.value, case when a.value = b.maximum then 1 else 0 end as is_max from
(select * from begin) a
left join
(select max(value) as maximum, var from begin group by var) b
on a.var = b.var
;
quit;
To avoid "hard-code" you need to use some code generation.
First let's figure out what code you could use to solve the problem. Later we can look into ways to generate that code.
It is probably easiest to do this with PROC SQL code. SAS will allow you to reference the MAX() value of a variable. Also note that SAS evaluates boolean expressions to 1 (TRUE) or 0 (FALSE). So you just want to generate code like:
proc sql;
create table want as
select var1=max(var1) as var1
, var2=max(var2) as var2
from have
;
quit;
To generate the code you need a list of the variables in your source dataset. You can get those with PROC CONTENTS but also with the metadata table (view) DICTIONARY.COLUMNS (also accessible as SASHELP.VCOLUMN from outside PROC SQL).
If the list of variables is small then you could generate the code into a single macro variable.
proc sql noprint;
select catx(' ',cats(name,'=max(',name,')'),'as',name)
into :varlist separated by ','
from dictionary.columns
where libname='WORK' and memname='HAVE'
order by varnum
;
create table want as
select &varlist
from have
;
quit;
The maximum number of characters that will fit into a macro variable is 64K. So long enough for about 2,000 variables with names of 8 characters each.
Here is little more complex way that uses PROC SUMMARY and a data step with a temporary array. It does not really need any code generation.
%let dsin=sashelp.class(obs=10);
%let dsout=want;
%let varlist=_numeric_;
proc summary data=&dsin nway ;
var &varlist;
output out=summary(drop=_type_ _freq_) max= ;
run;
data &dsout;
if 0 then set &dsin;
array vars &varlist;
array max [10000] _temporary_;
if _n_=1 then do;
set summary ;
do _n_=1 to dim(vars);
max[_n_]=vars[_n_];
end;
end;
set &dsin;
do _n_=1 to dim(vars);
vars[_n_]=vars[_n_]=max[_n_];
end;
run;
Results:
Obs Name Sex Age Height Weight
1 Alfred M 0 1 1
2 Alice F 0 0 0
3 Barbara F 0 0 0
4 Carol F 0 0 0
5 Henry M 0 0 0
6 James M 0 0 0
7 Jane F 0 0 0
8 Janet F 1 0 1
9 Jeffrey M 0 0 0
10 John M 0 0 0
I have a dataset in SAS and I want to Convert one column into string by the Product. I have attached the image of input and output required.
I need the Colomn STRING in the outut. can anyone please help me ?
I have coded a data step to create the input data:
data have;
input products $
dates
value
;
datalines;
a 1 0
a 2 0
a 3 1
a 4 0
a 5 1
a 6 1
b 1 0
b 2 1
b 3 1
b 4 1
b 5 0
b 6 0
c 1 1
c 2 0
c 3 1
c 4 1
c 5 0
c 6 1
;
Does the following suggested solution give you what you want?:
data want;
length string $ 20;
do until(last.products);
set have;
by products;
string = catx(',',string,value);
end;
do until(last.products);
set have;
by products;
output;
end;
run;
Here's my quick solution.
data temp;
length cat $20.;
do until (last.prod);
set have;
by prod notsorted;
cat=catx(',',cat,value);
end;
drop value date;
run;
proc sql;
create table want as
select have.*, cat as string
from have inner join temp
on have.prod=temp.prod;
quit;
I don't know how to describe this question but here is an example. I have an initial dataset looks like this:
input first second $3.;
cards;
1 A
1 B
1 C
1 D
2 E
2 F
3 S
3 A
4 C
5 Y
6 II
6 UU
6 OO
6 N
7 G
7 H
...
;
I want an output dataset like this:
input first second $;
cards;
1 "A,B,C,D"
2 "E,F"
3 "S,A"
4 "C"
5 "Y"
6 "II,UU,OO,N"
7 "G,H"
...
;
Both tables will have two columns. Unique value of range of the column "first" could be 1 to any number.
Can someone help me ?
something like below
proc sort data=have;
by first second;
run;
data want(rename=(b=second));
length new_second $50.;
do until(last.first);
set have;
by first second ;
new_second =catx(',', new_second, second);
b=quote(strip(new_second));
end;
drop second new_second;
run;
output is
first second
1 "A,B,C,D"
2 "E,F"
3 "A,S"
4 "C"
5 "Y"
6 "II,N,OO,UU"
7 "G,H"
You can use by-group processing and the retain function to achieve this.
Create a sample dataset:
data have;
input id value $3.;
cards;
1 A
1 B
1 C
1 D
2 E
2 F
3 S
3 A
4 C
5 Y
6 II
6 UU
6 OO
6 N
7 G
7 H
;
run;
First ensure that your dataset is sorted by your id variable:
proc sort data=have;
by id;
run;
Then use the first. and last. notation to identify when the id variable is changing or about to change. The retain statement tells the datastep to keep the value within concatenated_value over observations rather than resetting it to a blank value. Use the quote() function to apply the " chars around the result before outputting the record. Use the cats() function to perform the actual concatenation and separate the records with a ,.
data want;
length contatenated_value $500.;
set have;
by id;
retain contatenated_value ;
if first.id then do;
contatenated_value = '';
end;
contatenated_value = catx(',', contatenated_value, value);
if last.id then do;
contatenated_value = quote(cats(contatenated_value));
output;
end;
drop value;
run;
Output:
contatenated_
value id
"A,B,C,D" 1
"E,F" 2
"S,A" 3
"C" 4
"Y" 5
"II,UU,OO,N" 6
"G,H" 7
In my sas data set there are groups, i.e. id and I want delete groups with missing values in a certain variable.
For example I have this sas data set:
data have;
input v1 v2 v3 id;
datalines;
9 7 210 1
0 6 . 1
9 3 320 2
6 1 . 1
9 4 432 2
;
run;
I tried this:
/*Order by id*/
proc sort data=have;
by id;
run;
/*Select no missing observations by id*/
data=want;
set=have;
if cmiss(of _all_) then delete;
run;
However this code does not exclude id's with missing values. It delete missing values.
Hmmm. You can use proc sql for this:
proc sql;
delete from have
where exists (select 1 from have have2 where have.id = have2.id and (have2.v1 is null or have2.v2 is null or have2.v3 is null);
One idea might be to use a double DOW loop. First to check for any missing values and then a second one to output the records for the ids with no missing values.
data have;
input v1 v2 v3 id;
datalines;
9 7 210 1
0 6 . 1
9 3 320 2
6 1 . 1
9 4 432 2
1 2 333 3
;
You will need to sort as in your example.
data want ;
do until (last.id);
set have;
by id;
anymissing=max(anymissing,cmiss(of v1-v3));
end;
do until (last.id);
set have;
by id;
if not anymissing then output;
end;
run;
You just dont want to have lines with missing Columns in your result dataset. So why delete, just exclude them when writing result-dataset or overwrite source-Dataset.:
data have;/*overwriting my have dataset instead of deleting lines*/
set have;
if not cmiss(of _ALL_);
run;
When you want to remove all lines for a group if only one line has a missing value you can do this, Store an ID if it has no value and then dont write any line with that id, and you just get ID lines you want as result. Important is that the ID with missing value is first in dataset, but that should be that way because of proc sort:
data want;
retain x;
set have;
if cmiss(of _ALL_) then
x= id;
if x ne id;
run;
Suppose I want to only apply proc means or the better means macro to only non zero entries in my dataset? Is there an easy option to do this? If I have a dataset:
A B C
0 1 2
2 2 0
2 0 1
How can I use proc means or the better means macro to ignore the 0 values?
You can create a view to convert them on the fly. BETTERMEANS may have a way of handling this; not sure.
data have;
input A B C ;
format a b c zeromissing1.;
datalines;
0 1 2
2 2 0
2 0 1
;;;;
run;
data have_z/view=have_z;
set have;
array num _numeric_;
do _i = 1 to dim(num);
if num[_i]=0 then num[_i]=.;
end;
run;
proc means data=have_z;
var a b c;
run;