concatenating an array in SAS using do loop - sas

I have a SAS dataset like this:
var1 var2 var3 var4 var5
there is no spoon Neo
And want this:
var
thereisnospoonNeo
I have a data step like the following:
data want;
set have;
format var $100.;
array v $ 5 var1-var5;
var = "";
do i=1 to 5;
var = var || v{i};
end;
run;
But my have has var = "";

The issue is that the initial value of var is actually 100 characters of white space, and any concatenation truncates off the list. One must add
var = strip(var) || v{i};
to achieve the desired result.

Related

SAS: dim and macro variables

data example1;
input var1 var2 var3;
datalines;
10 11 14
3 5 8
0 1 2
;
data example2;
input var;
datalines;
1
2
8
;
Let's say that the number of var variables depending on data input. I want to put that number to macro variable and use in another data step, for example:
%macro m(input);
data &input.;
set &input.;
array var_array[*] var:;
%let array_dim = dim(var_array);
do i = 1 to &array_dim;
var_array[i] = var_array[i] + 1;
end;
drop i;
run;
data example2;
set example2;
var2 = var * &array_dim; /* doesn't work */
run;
%mend;
%m(example1);
%let array_dim = dim(var_array); doesn't work in second data step, because dim(var_array) isn't evaluated, but %eval or %sysevalf in declaring the macro variable does't work here. How to do that correctly?
You are mixing up macro code and data step code in a way that is not supported in SAS. If you want to assign a macro variable a value that you're generating as part of a data step, you need to use call symput.
Also, if you create a macro variable during a data step, you cannot resolve it during the same data step in the way that you are attempting to do (unless you use the resolve function...). It's easier just to use a data set variable for this.
So here's a fixed version of your code that I think probably does what you want:
%macro m(input);
data &input.;
set &input.;
array var_array[*] var:;
array_dim = dim(var_array);
/*Only export the macro variable once, for the first row*/
if _n_ = 1 then call symput('array_dim_mvar', array_dim);
do i = 1 to array_dim;
var_array[i] = var_array[i] + 1;
end;
drop i;
run;
data example2;
set example2;
var2 = var * &array_dim_mvar;
run;
%mend;
%m(example1);

split string to columns with content fill

I have data that looks like this:
ID Sequence
---------------------------------
101 E6S,K11T,Q174K,D177E
102 K11T,V245EKQ
I need to add:
A new column with column heading for each sequence, add prefix 'RT', drop the letters following the numeric part of the sequence
Fill the new column with the letters that follow the numeric part
of the sequence
I need to create this:
ID Sequence RTE6 RTK11 RTQ174 RTD177 RTV245
-----------------------------------------------------------------------
101 E6S,K11T,Q174K,D177E S T K E
102 K11T,V245EKQ T EKQ
I assume you want a SAS data set and not a report. ANYDIGIT makes it pretty easy to find the last non-digit sub-string.
data seq;
infile cards firstobs=3;
input id:$3. sequence :$50.;
cards;
ID Sequence
---------------------------------
101 E6S,K11T,Q174K,D177E
102 K11T,V245EKQ
;;;;
run;
proc print;
run;
data seq2V / View=seq2V;
set seq;
length w name sub $32 subl 8;
do i = 1 by 1;
w = scan(sequence,i,',');
if missing(w) then leave;
subl = anydigit(w,-99);
name = substrn(w,1,subl);
sub = substrn(w,subl+1);
output;
end;
run;
proc transpose data=seq2V out=seq3(drop=_name_) prefix=RT;
by id sequence;
var sub;
id name;
run;
proc print;
run;
I had a similar problem a while ago. The code is adapted to your problem.
If found this solution to work faster than anything I tried with proc transpose.
Still overall performance on huge datasets (espc. using many different sequences) is not great at all, as we loop 2*2 over all strings and also the final variables.
Can anyone offer a faster solution?
(Caution: MacroVar is limited to 65534 Characters.)
data var_name ;
set in_data;
length var string $30.;
do i = 1 to countw(Sequence, ',');
string = scan(Sequence,i,',');
var = substr(string,1,anydigit(string,-99));
output;
keep var;
end;
run;
proc sql noprint;
select distinct compress("RT"||var) into :var_list separated by ' '
from var_name;
quit;
%put &var_list.;
data out_data;
set in_data;
length string &var_list. $30. n 8. ;
array a_var [*] &var_list.;
do i = 1 to countw(Sequence, ',');
string = scan(Sequence,i,',');
do j = 1 to dim(a_var);
n = anydigit(string,-99) ;
if substr(vname(a_var[j]),3) eq substr(string,1,n) then a_var[j] = substr(string,n+1);
end;
end;
drop string i j n;
run;

How to calculate a mean for the non zero values using proc means or proc summary

I want to have a mean which is based in non zero values for given variables using proc means only.
I know we do can calculate using proc sql, but I want to get it done through proc means or proc summary.
In my study I have 8 variables, so how can I calculate mean based on non zero values where in I am using all of those in the var statement as below:
proc means = xyz;
var var1 var2 var3 var4 var5 var6 var7 var8;
run;
If we take one variable at a time in the var statement and use a where condition for non zero variables , it works but can we have something which would work for all the variables of interest mentioned in the var statement?
Your suggestions would be highly appreciated.
Thank you !
One method is to change all of your zero values to missing, and then use PROC MEANS.
data zeromiss /view=zeromiss ;
set xyz ;
array n{*} var1-var8 ;
do i = 1 to dim(n) ;
if n{i} = 0 then call missing(n{i}) ;
end ;
drop i ;
run ;
proc means data=zeromiss ;
var var1-var8 ;
run ;
Create a view of your input dataset. In the view, define a weight variable for each variable you want to summarise. Set the weight to 0 if the corresponding variable is 0 and 1 otherwise. Then do a weighted summary via proc means / proc summary. E.g.
data xyz_v /view = xyz_v;
set xyz;
array weights {*} weight_var1-weight_var8;
array vars {*} var1-var8;
do i = 1 to dim(vars);
weights[i] = (vars[i] ne 0);
end;
run;
%macro weighted_var(n);
%do i = 1 to &n;
var var&i /weight = weight_var&i;
%end;
%mend weighted_var;
proc means data = xyz_v;
%weighted_var(8);
run;
This is less elegant than Chris J's solution for this specific problem, but it generalises slightly better to other situations where you want to apply different weightings to different variables in the same summary.
Can't you use a data statement?
data lala;
set xyz;
drop qty;
mean = 0;
qty = 0;
if(not missing(var1) and var1 ^= 0) then do;
mean + var1;
qty + 1;
end;
if(not missing(var2) and var2 ^= 0) then do;
mean + var2;
qty + 1;
end;
/* ... repeat to all variables ... */
if(not missing(var8) and var8 ^= 0) then do;
mean + var8;
qty + 1;
end;
mean = mean/qty;
run;
If you want to keep the mean in the same xyz dataset, just replace lala with xyz.

do loop on sas but not with a macro

It is a simple one but I'm a struggling a bit.
What I have :
What I want :
I want to remove the v0 , v1 and etc.
I'm using this piece of code
data IndieDay20140704;
set IndieDay20140704;
do i=1 to 5;
VAR1=tranwrd(var1,"v&i","");
end;
run;
It is not working correctly as it is giving me this instead (see below) plus the error
WARNING: Apparent symbolic reference I not resolved.
Questions:
1) Do I need a macro?
2) Why the error?
Many thanks for your insights.
There's an error because you're (unintentionally) using macro variable i, that you did not initialize.
I guess the idea of tranwrd is to remove words in VAR2, VAR3.. from VAR1.
The logical error is to do it also for VAR1 itself.
Check if this helps (using array):
data IndieDay20140704;
length VAR1 VAR2 VAR3 VAR3 VAR5 $10;
VAR1 = 'TEST IT';VAR5 = 'TEST';
output;
VAR1 = 'STEST IT';VAR5 = 'TEST';
output;
run;
data IndieDay20140704_modified / view= IndieDay20140704_modified;
set IndieDay20140704;
array vals VAR1 - VAR5;
do i=1 to dim(vals);
if i ne 1 then VAR1=tranwrd(var1,trim(vals(i)),"");
end;
drop i;
run;
Here I'm creating a SAS view on top of table (not a good idea to overwrite the source).
Also I think you should trim() the values from VAR2,VAR3... depending on what you want to achieve and what's in the data.
EDIT:
here the version with 'v0', 'v1'...'v5' strings:
data IndieDay20140704;
length VAR1$10;
VAR1 = 'TEST v0';
output;
VAR1 = 'TEST v11';
output;
VAR1 = 'TEST v1';
output;
run;
data IndieDay20140704_modified / view= IndieDay20140704_modified;
set IndieDay20140704;
org_var1 = var1;
do i=0 to 5;
var1 =tranwrd(var1, catt('v', put(i, 1. -L)),"");
end;
run;
catt('v', put(i, 1. -L)) concatenates string 'v' and the result of put.
put(i, 1. -L)) converts numeric variable i to text using plain numeric format w.d, 1. used here - enough for single digit numbers, -L left aligns the result
Here's one way, there are many others and this may not work if your data has a lot of variability.
data have;
length VAR1$10;
VAR1 = 'fic19v0.csv';
output;
VAR1 = 'fic19v1.cs';
output;
run;
data want ;
set have;
original_var=var1;
var1=substr(var1, 1, index(var1, ".")-3)||".csv";
run;

Reading text file in SAS with delimiter in wrong places

I am reading a .txt file into SAS, that uses "|" as the delimiter. The issue is there is one column that is using "|" as a word separator as well instead of acting like delimiter, this needs to be in one column.
For example the txt file looks like:
apple|fruit|Healthy|choices|of|food|12|2012|chart
needs to look like this in the SAS dataset:
apple | fruit | Healthy choices of Food | 12 | 2012 | chart
How do I eliminate "|" between "Healthy choices of Food"?
I think this will do what you want:
data tmp1;
length tmp $100;
input tmp $;
cards;
apple|fruit|Healthy|choices|of|food|12|2012|chart
apple|fruit|Healthy|choices|of|food|and|lots|of|other|stuff|12|2012|chart
;
run;
data tmp2;
set tmp1;
num_delims=length(tmp)-length(compress(tmp,"|"));
expected_delims=5;
extra_delims=num_delims-expected_delims;
length new_var $100;
i=1;
do while(scan(tmp,i,"|") ne "");
if i<=2 or (extra_delims+2)<i<=num_delims then new_var=trim(new_var)||scan(tmp,i,"|")||"|";
else new_var=trim(new_var)||scan(tmp,i,"|")||"#";
i+1;
end;
new_var=left(tranwrd(new_var,"#"," "));
run;
This isn't particularly elegant, but it will work:
data tmp;
input tmp $50.;
cards;
apple|fruit|Healthy|choices|of|food|12|2012|chart
;
run;
data tmp;
set tmp;
var1 = scan(tmp,1,'|');
var2 = scan(tmp,2,'|');
var4 = scan(tmp,-3,'|');
var5 = scan(tmp,-2,'|');
var6 = scan(tmp,-1,'|');
var3 = tranwrd(tmp,trim(var1)||"|"||trim(var2),"");
var3 = tranwrd(var3,trim(var4)||"|"||trim(var5)||"|"||trim(var6),"");
var3 = tranwrd(var3,"|"," ");
run;
Expanding a little on Itzy's answer, here is another possible solution:
data want;
/* Define variables */
attrib item length=$10 label='Item';
attrib class length=$10 label='Family';
attrib desc length=$80 label='Item Description';
attrib count length=8 label='Some number';
attrib year length=$4 label='Year';
attrib somevar length=$10 label='Some variable';
length countc $8; /* A temp variable */
infile 'c:\temp\delimited_temp.txt' lrecl=1000 truncover;
input;
item = scan(_infile_,1,'|','mo');
class = scan(_infile_,2,'|','mo');
countc = scan(_infile_,-3,'|','mo'); /* Temp var for numeric field */
count = inputn(countc,'8.'); /* Re-read the numeric field */
year = scan(_infile_,-2,'|','mo');
somevar = scan(_infile_,-1,'|','mo');
desc = tranwrd(
substr(_infile_
,length(item)+length(class)+3
,length(_infile_)
- ( length(item)+length(class)+length(countc)
+length(year)+length(somevar)+5))
,'|',' ');
drop countc;
run;
The key in this case it to read your file directly and handle the delimiters yourself. This can be tricky and requires that your data file is exactly as described. A much better solution would be to go back to whoever gave this this data and ask them to deliver it to you in a more appropriate form. Good luck!
Another possible workaround.
data tmp;
infile '/path/to/textfile';
input tmp :$100.;
array varlst (*) $30 v1-v6;
a=countw(tmp,'|');
do i=1 to dim(varlst);
if i<=2 then
varlst(i) = scan(tmp,i,'|');
else if i>=4 then
varlst(i) = scan(tmp,a-(dim(varlst)-i),'|');
else do j=3 to a-(dim(varlst)-i)-1;
varlst(i)=catx(' ', varlst(i),scan(tmp,j,'|'));
end;
end;
drop tmp a i j;
run;